| Below is the final draft that I sent to Byte. Some editting
was done by Byte, but it should be close to the published
version.
Pete
Alpha 21164PC -- Leadership Performance for Windows NT Desktop
Systems
By Pete Bannon
Smaller, cheaper and faster are the watchwords for semiconductor
manufacturers today as they scramble to satisfy the
enormous appetites of multimedia, CAD, and data manipulation
applications on the desktop. Corporations and small businesses are
using desktop systems for video conferencing, voice synthesis, and
enterpise-wide data access. Home users are surfing the Web,
running sophisticated video games and creating home movies on
their PCs. As each breakthrough drives the imagination toward new
horizons, this trend is not likely to abate anytime soon.
Designed with these applications in mind, Digital Semiconductor's
new Alpha 21164PC microprocessor delivers more CPU cycles and greater data
bandwidth in a smaller package than any other microprocessor on the market.
For example, the 533-MHz 21164PC has a smaller die size than the 200-MHz
Pentium with MMX and provides significantly higher performance. The 21164PC
outperforms the Pentium chip by more than two times in SPECint95 performance
and by more than three times in SPECfp95 performance. The 21164PC, which was
co-designed by Digital Semiconductor and Mitsubishi, supports an astonishing
2.1 BIPS (2133 MIPS) and 1066 MFLOPS. The estimated performance of the 21164PC
configured with 2MB of external cache and 125ns main memory is 14 SPECint95 and
17 SPECfp95. These characteristics and the 21164PC's price points make an
Alpha processor solution the clear leader for Windows NT PCs.
Full Windows compatibility provides an additional 21164PC edge in the
desktop market. In addition to a large number of native Windows NT
applications, DIGITAL FX!32, a breakthough software translation technology,
gives Alpha system users access to the full suite of 32-bit x86 Windows and
Windows NT applications, running with high performance.
Further, as the first Alpha processor to implement Motion Video Instructions
(MVI), the 21164PC dramatically increases the Alpha processor advantage over
competing products in motion video applications. For example, the
21164PC supports full-frame-rate DVD playback and high-quality video
conferencing in software, eliminating the need for dedicated multimedia
hardware and reducing overall system cost.
Innovation, Small Die Size Target PC Market
A full implementation of the Alpha architecture, the 21164PC
leverages the design of Digital Semiconductor's Alpha 21164, a
processor that has maintained performance leadership in the
industry since its introduction. The 21164PC, depicted in Figure 1, draws from
this technology and incorporates innovation and a smaller die size to
achieve its advanced design.
Implemented in Digital Semiconductor's 0.35-micron CMOS process,
the 21164PC features a die size of 8.5 mm by 16.2 mm and contains
3.5 million transistors. This small die size (a 30% reduction from
previous Alpha processor implementations) enables significant
manufacturing cost savings, which translate directly to more
affordable PCs for a broader market.
The Alpha Architecture
The Alpha architecture is a 64-bit load and store RISC architecture
that is designed with particular emphasis on speed, multiple
instruction issue, and multiple processors and focuses on
uncompromised support for many operating systems, including Windows NT,
Digital UNIX, Linux, and Open VMS. All registers are 64 bits long
and all operations are performed between these registers. Alpha
instructions are 32 bits long and memory operations are either load
or store operations of data that is 8, 16, 32, or 64 bits in length.
The 21164PC takes full advantage of the Alpha architecture in a
quad-issue implementation featuring a 7-stage integer pipeline and
9-stage floating point pipeline. The 21164PC has a large 16KB
instruction cache (Icache) and features a bandwidth of 8000MB per
second to the instruction issue unit. This exceptional data transfer
capacity plus an aggressive instruction pre-fetch scheme keep the
chip's pipelines full. The 21164PC's Icache pre-fetches 96 bytes
ahead of the current program counter, providing significant
performance improvements in long code sequences where instructions
can be fetched 300% faster than is possible without pre-fetching.
Streamlined Instruction Issue and Execution
The 21164PC's instruction unit is comprised of an instruction buffer,
slotter, and issue unit. The simple instruction issue design
maximizes the 21164PC's clock frequency with little impact on the number of
instructions that can be issued in a cycle. The microprocessor's instruction
buffer holds two sets of four instructions, facilitating the chip's quad-issue
operation. The buffer optimizes the flow of instructions into the slotter unit
by removing bubbles from the pipeline that are caused by taken branches. The
slotter attempts to assign four instructions to the pipelines each cycle and
refills when all the instructions have been assigned. The issue unit allows
the instructions to execute after assuring the availability of the required
system resources.
The integer execution unit contains a register file and several
functional units in four stages of two parallel pipelines. The
pipelines contain differing sets of units with the 64-bit adder, logical and
load units being common to both pipelines. The 21164PC's instructions execute
in one cycle with the exception of loads and conditional move instruction, which
require two cycles. In addition, the 21164PC incorporates a special hardware
feature that allows the common code sequence, COMPARE followed by DEPENDENT
BRANCH, to execute in one cycle instead of two, thereby streamlining
application performance.
The new MVI instructions -- PERR, MAX/MIN and PACK/UNPACK -- are implemented in
the integer unit, saving die space and reducing cost. This efficient
implementation delivers an impressive 400% improvement in MPEG-2 compression
for the very low cost of 0.6% of the 21164PC's area. This design is possible
because the 21164PC's 31 64-bit integer registers provide sufficient storage
to support the chip's issue bandwidth of 533 million MVI instructions per
second concurrently with 533 million additional integer instructions per
second. In addition, the supporting instructions used by MVI, including
compares, adds, shifts, loads, and stores, already exist in the integer unit,
eliminating the need for additional instruction logic on the 21164PC chip.
The 21164PC's floating point unit allows for exceptional performance in
floating point-intensive applications such as 3-D graphics on the desktop.
The unit is made up of two 64-bit execution units -- an add pipeline that
executes all floating point instructions except multiply, and a multiply
pipeline. Both units are fully pipelined and have a latency of four cycles.
To maximize performance further, the floating point unit incorporates two
dedicated floating point load data pipelines that allow floating point load and
store instructions to be executed in parallel with floating point operates.
Memory Unit Delivers High Throughput
The 21164PC's memory unit, which features very high data
bandwidth, maximizes operational efficiency and CPU utilization.
The 8KB data cache (Dcache)-- a dual-ported, fully pipelined, non-
blocking cache -- allows the 21164PC to move rapidly through
programs that process large amounts of data. Because the Dcache is
non-blocking (up to 21 loads can miss), the processor can continue
to operate uninterrupted when cache misses occur. In addition, the design
interleaves cache fills from memory with processor operations. These design
characteristics give the 21164PC a significant advantage compared with other
processor designs. For example, the peak data bandwidth of the 21164PC
operating at 533 MHz is ten times higher than the peak bandwidth of a Pentium
operating at 200 MHz.
Further, due to the 21164PC's robust write buffer, more things
can happen simultaneously in memory. The write buffer has six 32-
byte entries, with each entry providing an opportunity to collapse into a
single transaction multiple writes to the same address.
The 21164PC's L2 cache controller helps maximize application
performance by streamlining L2 cache accesses. The cache
controller, which is also non-blocking, does this by ordering requests
to the L2 cache to achieve an optimal balance between bandwidth
utilization and access latency.
PC-Compatible Motherboard
Figure 2 depicts the block diagram of the AlphaPC 164SX motherboard, a sample
design incorporating the 21164PC. Featuring an ATX form factor and outstanding
price/performance, the motherboard is ideally suited for Windows NT desktop
systems. The design database is available, at no charge, from Digital
Semiconductor.
The AlphaPC 164SX motherboard satisfies all ATX requirements for hole
placement, component spacing and component height. The six-layer
AlphaPC 164SX module can be installed in any ATX enclosure and
uses standard ATX power supplies.
The AlphaPC 164SX also offers extensive flexibility, allowing PC
manufacturers to configure systems that satisfy a broad range of
applications, cost effectively. The AlphaPC 164SX's 413-pin ZIF
socket accepts 400-MHz and 533-MHz 21164PCs, giving PC
manufacturers two high-performance processor choices. The
motherboard also accepts L2 caches sized from 512KB to 4MB and
operating at speeds of 66MHz to 133MHz, offering a spectrum of
data handling capabilities.
The 21174 Core Logic Chipset that is configured on the motherboard
provides high-speed access to memory and PCI peripheral devices.
The 21174 features support for 16Mbit or 64Mbit SDRAM memory in
configurations from 32MB to 512MB. Using the chipset's PCI interface, the
motherboard accommodates a full range of I/O device configurations.
The AlphaPC 164SX contains two 64-bit and two 32-bit PCI slots. Operating at
33 MHz, the 64-bit PCI interfaces provide up to 260MB per second of I/O
bandwidth, satisfying the high-performance demands of I/O devices such as
graphics, ATMs and RAID.
The 21164PC is also fully compatible with existing chipsets, such
as Digital Semiconductor's 21172 for the Alpha 21164, allowing
manufacturers to design-in these products.
To enable easy, cost-effective configuration of traditional ISA
devices, the AlphaPC 164SX motherboard incorporates a Cypris CY82C693 PCI to
ISA bridge and two ISA slots. This part provides on-board USB, IDE and
keyboard/mouse interfaces.
Availability
Alpha 21164PC microprocessors will be available from Digital Semiconductor
in Q2 of this year. Two versions of the 21164PC -- 400-MHz and 533-MHz -- will
be offered for approximately $1/MHz.
For more information regarding these and other Digital Semiconductor Alpha
products, contact your local semiconductor distributor or call the Digital
Semiconductor Information Line: United States and Canada 1-800-332-2717;
outside North America +1-508-628-4760. Or visit the Digital Semiconductor
Alpha Web site: www.alpha.digital.com.
********************************************************************
Biography for Peter Bannon
Peter J. Bannon
Consulting Engineer
Digital Semiconductor
Digital Equipment Corporation
Peter J. Bannon is a hardware consulting engineer with Digital
Equipment Corporation. He has participated in the design and/or
verification of several CISC and RISC microprocessor chips and was a
member of the Alpha 21164 architecture team. He joined Digital in
1984 after receiving a Bachelor of Science in computer system design
from the University of Massachusetts.
Bannon holds three patents for VAX CPU designs and has filed six
patent applications for the Alpha 21164 microprocessor.
###
|