[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

9453.0. "More CPU Questions" by RHETT::HALETKY () Wed Apr 09 1997 12:08

    Hello,
    
    I have a cusotmer that has questions about the different processors
    performance in regards to Digital Unix. Here is his questions/comments:
    
    A couple of other issues regarding the various versions of Alpha which
    I'd appreciate a confirmation of: my latest computer architecture book
    has, among other things, the following data regarding the Alpha CPU:
    
    1) It has (all versions of the CPU, that is) two FP units: one for
    add/subtract, one for *,/, and branch prediction;
    
    2)      - the 21064(A) can only start one FP op per cycle;  latency of
              add/mult is 6 cycles;  latency of load is 3 cycles;
            - the 21164(A) can start one add and one mult per cycle;
              latency is 4 cycles;  latency of load is 2 cycles.
    
    Q: Does this imply that a 21164 running at the same clock speed could
    be
    up to 33% faster than a 21064A?
    
    And, are the data for the 21264 similar to those for the 21164?
    
    
    Best regards,
    Ed Haletky
    Digital CSC

T.R	Title	User	Personal Name	Date	Lines
9453.1	Closer to 2x in many cases	WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Wed Apr 09 1997 15:07`	32
	> Q: Does this imply that a 21164 running at the same clock speed could > be > up to 33% faster than a 21064A? Well, I don't know whether it implies it or not. The fact is that for some programs a 21164 is twice as fast as a 21064A at the same clock speed, since it can issue twice as many floating-point operations per cycle. You've ignored another enormous difference between the processors: size of on-chip caches. The 21064A has 16KB I-cache & 16KB D-cache. The 21164 has 8KB I-cache and 8KB D-cache, plus a 96KB on-chip level-2 cache. For applications that benefit from the level-2 cache, the 21164 can make a very big difference. For SPECint, 21164-based machines tend to perform a bit less than twice as fast as 21064A-based machines at the same clock rate. For SPECfp, 21164-based machines tend to perform a bit more than twice as fast as 21064A-based machines at the same clock rate. Most of the time, for most applications, even a 21164 is not issuing many instructions per cycle. In fact, for many important applications, it averages well under one per cycle. This is mainly because it spends its time waiting for memory (that's one reason the on-chip L2 cache helps, but it's not enough.) The 21264 attacks this in two ways. It provides much faster access to off-chip cache and main memory, and it allows instructions to issue out-of-order, even if earlier instructions are still waiting for their inputs to become available. (The latter technique is the reason Pentium Pro is twice as fast as Pentium at the same clock rate for many applications.) As a rough rule of thumb, expect a 21264 to have twice the performance of a 21164 at the same clock rate.
9453.2		WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Wed Apr 09 1997 15:17`	15
	For lots of details on 21264, see the presentation from last October's Microprocessor Forum: http://www.digital.com/semiconductor/a264up1/index.html To answer the questions I think I saw, the 21264 can - fetch 4 instructions per cycle - issue 4 integer ops (including two loads and/or stores) plus a floating add,sub,div, or sqrt, plus a floating mul (total of 6 instructions) per cycle - retire up to 11 instructions per cycle Floating latency is 4 cycles (like 21164), load latency for D-cache hits is 3 cycles (like 21064). On-chip caches are 64K I-cache, 64K D-cache, two-way set-associative.