| Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) | 
| Notice: | Welcome to the Digital UNIX Conference | 
| Moderator: | SMURF::DENHAM | 
| Created: | Thu Mar 16 1995 | 
| Last Modified: | Fri Jun 06 1997 | 
| Last Successful Update: | Fri Jun 06 1997 | 
| Number of topics: | 10068 | 
| Total number of notes: | 35879 | 
We did a benchmark with some Finnite Element software.
The code is all F77 and we used DUNIX V4.0b.
With some small problems an 466 MHz Alpha was twice as fast 
as an HP K460. But in Large problems the HP is 1.5 times faster.
Both machines have enough memory so paging isn't the issue. Also 
I/O is not the bottle neck.
In these problems very large array's of data are in memory and
accesed. In the HP compiler they used an option like "-O prefetch"
That option gets more data into the cache so the CPU will not wait 
to much. On the HP this gave around 30% perfromance improvement.
Does anyone know
- what this option of HP does. 
- how can we achieve the same effect
- is this an option to implement in our compiler??
Cross posted in The fortran notes file as 
            well as Digital UNIX. !!!! 
[Posted by WWW Notes gateway]
| T.R | Title | User | Personal Name | Date | Lines | 
|---|---|---|---|---|---|
| 9829.1 | Try -O5 | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Thu May 15 1997 11:01 | 21 | 
| > Cross posted in The fortran notes file as > well as Digital UNIX. !!!! Well, I couldn't find a corresponding note in TURRIS::FORTRAN -- did you post this somewhere else? Different Alpha systems have different levels of memory performance. What system were you using? Prefetching is the process of starting the memory reference before it is needed, so that the time needed to access it (dozens of cycles) can be overlapped with useful work. Digital Fortran performs prefetching in some loops if you compile with -O5. Future versions will be able to prefetch in more cases. If you want to discuss this further in TURRIS::FORTRAN, it would be helpful to know: What's a "small" or "large" problem? Can you tell us how much data the program needs to sweep over? What version of f77 are you using? Can you identify where in the code the time is going? Is there a single time-consuming loop you could show us? | |||||
| 9829.2 | let me first do my homework | IJSAPL::ESSER | Clemens Esser Dutch ARC | Thu May 15 1997 11:07 | 12 | 
| Hi, Thanks for the quick answer. When I wanted to post it in the fortran notes file I found some more info I first wanted to read. I was not able to remove this note because of network problems. I will first try what I have learned before I will bother other people with my problems. Clemens | |||||
| 9829.3 | Homework suggestions | PERFOM::HENNING | Fri May 16 1997 05:24 | 24 | |
|     Sometimes the following works...
    
    How HP can win a benchmark: 
        - make the problem bigger, so it fits in their high-speed L1 cache 
          but not in Digital's high-speed D or S caches
    
    How HP can lose a benchmark:
        - make the problem even bigger, so it fits in neither one of the
          caches and instead runs at main memory speed
    
    Depending on your platform, this may or may not work out for you. 
    Since you say 466 MHz, I  think you are running with a rawhide-class
    memory system, which has about the same bandwidth as the K460 (for
    single-CPU problems).  
    
    Be sure you have the latest (December 96) versions of both KAP and f77.
    
    Try compiling with each of:
    
        f77 -O5
        kf77 -O5
        kf77 -O5 -fkapargs='-ag=a'
        kf77 -O5 -fkapargs='-tune=ev4 -ag=a'
    
 | |||||