[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9829.0. "HP performance" by NNTPD::"[email protected]" (Clemens Esser) Thu May 15 1997 08:57

We did a benchmark with some Finnite Element software.
The code is all F77 and we used DUNIX V4.0b.
With some small problems an 466 MHz Alpha was twice as fast 
as an HP K460. But in Large problems the HP is 1.5 times faster.

Both machines have enough memory so paging isn't the issue. Also 
I/O is not the bottle neck.

In these problems very large array's of data are in memory and
accesed. In the HP compiler they used an option like "-O prefetch"
That option gets more data into the cache so the CPU will not wait 
to much. On the HP this gave around 30% perfromance improvement.

Does anyone know
- what this option of HP does. 
- how can we achieve the same effect
- is this an option to implement in our compiler??

Cross posted in The fortran notes file as 
            well as Digital UNIX. !!!! 
[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
9829.1Try -O5WIBBIN::NOYCEPulling weeds, pickin' stonesThu May 15 1997 12:0121
> Cross posted in The fortran notes file as 
>             well as Digital UNIX. !!!! 

Well, I couldn't find a corresponding note in TURRIS::FORTRAN -- did you
post this somewhere else?

Different Alpha systems have different levels of memory performance.  What
system were you using?

Prefetching is the process of starting the memory reference before it is needed,
so that the time needed to access it (dozens of cycles) can be overlapped with
useful work.  Digital Fortran performs prefetching in some loops if you compile
with -O5.  Future versions will be able to prefetch in more cases.

If you want to discuss this further in TURRIS::FORTRAN, it would be helpful to
know:
	What's a "small" or "large" problem?  Can you tell us how much data
	the program needs to sweep over?
	What version of f77 are you using?
	Can you identify where in the code the time is going?  Is there a
	single time-consuming loop you could show us?
9829.2let me first do my homeworkIJSAPL::ESSERClemens Esser Dutch ARCThu May 15 1997 12:0712
Hi,

Thanks for the quick answer. When I wanted to post it in the
fortran notes file I found some more info I first wanted to read.

I was not able to remove this note because of network problems.

I will first try what I have learned before I will bother other
people with my problems.

Clemens
9829.3Homework suggestionsPERFOM::HENNINGFri May 16 1997 06:2424
    Sometimes the following works...
    
    How HP can win a benchmark: 
        - make the problem bigger, so it fits in their high-speed L1 cache 
          but not in Digital's high-speed D or S caches
    
    How HP can lose a benchmark:
        - make the problem even bigger, so it fits in neither one of the
          caches and instead runs at main memory speed
    
    Depending on your platform, this may or may not work out for you. 
    Since you say 466 MHz, I  think you are running with a rawhide-class
    memory system, which has about the same bandwidth as the K460 (for
    single-CPU problems).  
    
    Be sure you have the latest (December 96) versions of both KAP and f77.
    
    Try compiling with each of:
    
        f77 -O5
        kf77 -O5
        kf77 -O5 -fkapargs='-ag=a'
        kf77 -O5 -fkapargs='-tune=ev4 -ag=a'