[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9453.0. "More CPU Questions" by RHETT::HALETKY () Wed Apr 09 1997 13:08

    Hello,
    
    I have a cusotmer that has questions about the different processors
    performance in regards to Digital Unix. Here is his questions/comments:
    
    A couple of other issues regarding the various versions of Alpha which
    I'd appreciate a confirmation of: my latest computer architecture book
    has, among other things, the following data regarding the Alpha CPU:
    
    1) It has (all versions of the CPU, that is) two FP units: one for
    add/subtract, one for *,/, and branch prediction;
    
    2)      - the 21064(A) can only start one FP op per cycle;  latency of
              add/mult is 6 cycles;  latency of load is 3 cycles;
            - the 21164(A) can start one add and one mult per cycle;
              latency is 4 cycles;  latency of load is 2 cycles.
    
    Q: Does this imply that a 21164 running at the same clock speed could
    be
    up to 33% faster than a 21064A?
    
    And, are the data for the 21264 similar to those for the 21164?
    
    
    Best regards,
    Ed Haletky
    Digital CSC
T.RTitleUserPersonal
Name
DateLines
9453.1Closer to 2x in many casesWIBBIN::NOYCEPulling weeds, pickin' stonesWed Apr 09 1997 16:0732
>    Q: Does this imply that a 21164 running at the same clock speed could
>    be
>    up to 33% faster than a 21064A?

Well, I don't know whether it implies it or not.  The fact is that for some
programs a 21164 is twice as fast as a 21064A at the same clock speed, since
it can issue twice as many floating-point operations per cycle.

You've ignored another enormous difference between the processors: size of
on-chip caches.  The 21064A has 16KB I-cache & 16KB D-cache.  The 21164 has
8KB I-cache and 8KB D-cache, plus a 96KB on-chip level-2 cache.  For applications
that benefit from the level-2 cache, the 21164 can make a very big difference.

For SPECint, 21164-based machines tend to perform a bit less than twice
as fast as 21064A-based machines at the same clock rate.

For SPECfp, 21164-based machines tend to perform a bit more than twice as
fast as 21064A-based machines at the same clock rate.

Most of the time, for most applications, even a 21164 is not issuing many
instructions per cycle.  In fact, for many important applications, it averages
well under one per cycle.  This is mainly because it spends its time waiting
for memory (that's one reason the on-chip L2 cache helps, but it's not enough.)

The 21264 attacks this in two ways.  It provides much faster access to off-chip
cache and main memory, and it allows instructions to issue out-of-order, even
if earlier instructions are still waiting for their inputs to become available.
(The latter technique is the reason Pentium Pro is twice as fast as Pentium at
the same clock rate for many applications.)

As a rough rule of thumb, expect a 21264 to have twice the performance of a
21164 at the same clock rate.
9453.2WIBBIN::NOYCEPulling weeds, pickin' stonesWed Apr 09 1997 16:1715
For lots of details on 21264, see the presentation from last October's
Microprocessor Forum:

http://www.digital.com/semiconductor/a264up1/index.html

To answer the questions I think I saw,  the 21264 can
  - fetch 4 instructions per cycle
  - issue 4 integer ops (including two loads and/or stores)
    plus a floating add,sub,div, or sqrt, plus a floating mul
    (total of 6 instructions) per cycle
  - retire up to 11 instructions per cycle

Floating latency is 4 cycles (like 21164), load latency for D-cache hits
is 3 cycles (like 21064).  On-chip caches are 64K I-cache, 64K D-cache,
two-way set-associative.