stringtranslate.com

DEC Alpha

Alpha microprocessors

Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers (CISC) and to be a highly competitive RISC processor for Unix workstations and similar markets.

Alpha is implemented in a series of microprocessors originally developed and fabricated by DEC. These microprocessors are most prominently used in a variety of DEC workstations and servers, which eventually formed the basis for almost all of their mid-to-upper-scale lineup. Several third-party vendors also produced Alpha systems, including PC form factor motherboards.

Operating systems that support Alpha included OpenVMS (formerly named OpenVMS AXP), Tru64 UNIX (formerly named DEC OSF/1 AXP and Digital UNIX), Windows NT (discontinued after NT 4.0; and prerelease Windows 2000 RC2),[2] Linux (Debian, SUSE,[3] Gentoo and Red Hat), BSD UNIX (NetBSD, OpenBSD and FreeBSD up to 6.x), Plan 9 from Bell Labs, and the L4Ka::Pistachio kernel. A port of Ultrix to Alpha was carried out during the initial development of the Alpha architecture, but was never released as a product.[4]

The Alpha architecture was sold, along with most parts of DEC, to Compaq in 1998.[5] Compaq, already an Intel x86 customer, announced that they would phase out Alpha in favor of the forthcoming Hewlett-Packard/Intel Itanium architecture, and sold all Alpha intellectual property to Intel, in 2001,[6] effectively killing the product. Hewlett-Packard purchased Compaq in 2002, continuing development of the existing product line until 2004, and selling Alpha-based systems, largely to the existing customer base, until April 2007.[7]

History

PRISM

Alpha emerged from an earlier RISC project named Parallel Reduced Instruction Set Machine (PRISM), itself the product of several earlier projects. PRISM was intended to be a flexible design, supporting Unix-like applications, and Digital's existing VAX/VMS software, after minor conversion. A new operating system named MICA would support both ULTRIX and VAX/VMS interfaces on a common kernel, allowing software for both platforms to be easily ported to the PRISM architecture.[8]

Started in 1985, the PRISM design was continually changed during its development in response to changes in the computer market, leading to lengthy delays in its introduction. It was not until the summer of 1987 that it was decided that it would be a 64-bit design, among the earliest such designs in a microprocessor format. In October 1987, Sun Microsystems introduced the Sun-4, their first workstation using their new SPARC processor. The Sun-4 runs about three to four times as fast as their latest Sun-3 designs using the Motorola 68020, and any Unix offering from DEC. The plans changed again; PRISM was realigned once again as a 32-bit part and aimed directly at the Unix market. This further delayed the design.[9]

Having watched the PRISM delivery date continue to slip, and facing the possibility of more delays, a team in the Palo Alto office decided to design their own workstation using another RISC processor. After due diligence, they selected the MIPS R2000 and built a working workstation running Ultrix in a period of 90 days.[10] This sparked off an acrimonious debate within the company, which came to a head in a July 1988 management meeting. PRISM appeared to be faster than the R2000, but the R2000 machines could be in the market by January 1989, a year earlier than PRISM. When this proposal was accepted, one of the two original roles for PRISM disappeared. The decision to make a VMS PRISM had already ended by this point, so there was no remaining role. PRISM was cancelled at the meeting.[11]

RISCy VAX

As the meeting broke up, Bob Supnik was approached by Ken Olsen, who stated that the RISC chips appeared to be a future threat to their VAX line. He asked Supnik to consider what might be done with VAX to keep it competitive with future RISC systems.[9]

This led to the formation of the "RISCy VAX" team. They initially considered three concepts. One was a cut-down version of the VAX instruction set architecture (ISA) that would run on a RISC-like system and leave more complex VAX instructions to system subroutines. Another concept was a pure RISC system that would translate existing VAX code into its own ISA on-the-fly and store it in a CPU cache. Finally, there was still the possibility of a much faster CISC processor running the complete VAX ISA. Unfortunately, all of these approaches introduced overhead and would not be competitive with a pure-RISC machine running native RISC code.[12]

The group then considered hybrid systems that combined one of their existing VAX one-chip solution and a RISC chip as a coprocessor used for high-performance needs. These studies suggested that the system would inevitably be hamstrung by the lower-performance part and would offer no compelling advantage. It was at this point that Nancy Kronenberg pointed out that people ran VMS, not VAX, and that VMS only had a few hardware dependencies based on its modelling of interrupts and memory paging. There appeared to be no compelling reason why VMS could not be ported to a RISC chip as long as these small bits of the model were preserved. Further work on this concept suggested this was a workable approach.[12]

Supnik took the resulting report to the Strategy Task Force in February 1989. Two questions were raised: could the resulting RISC design also be a performance leader in the Unix market, and should the machine be an open standard? And with that, the decision was made to adopt the PRISM architecture with the appropriate modifications. This became the "EVAX" concept, a follow-on to the successful CMOS CVAX implementation. When management accepted the findings, they decided to give the project a more neutral name, removing "VAX", eventually settling on Alpha.[13] The name was inspired by the use of "Omega" as the codename of an NVAX-based VAX 4000 model; "Alpha" was intended to signify the beginning of a new line (with reference to Alpha and Omega).[14] Soon after, work began on a port of VMS to the new architecture.[15]

Alpha

The new design uses most of the basic PRISM concepts, but was re-tuned to allow VMS and VMS programs to run at reasonable speed with no conversion at all. The primary Alpha instruction set architects were Richard L. Sites and Richard T. Witek.[16] The PRISM's Epicode was developed into the Alpha's PALcode, providing an abstracted interface to platform- and processor implementation-specific features.

The main contribution of Alpha to the microprocessor industry, and the main reason for its performance, is not so much the architecture but rather its implementation.[17] At that time (as it is now), the microchip industry was dominated by automated design and layout tools. The chip designers at Digital continued pursuing sophisticated manual circuit design in order to deal with the complex VAX architecture. The Alpha chips show that manual circuit design applied to a simpler, cleaner architecture allows for much higher operating frequencies than those that are possible with the more automated design systems. These chips caused a renaissance of custom circuit design within the microprocessor design community.

Originally, the Alpha processors were designated the DECchip 21x64 series,[18] with "DECchip" replaced in the mid-1990s with "Alpha". The first two digits, "21" signifies the 21st century, and the last two digits, "64" signifies 64 bits.[18] The Alpha was designed as 64-bit from the start and there is no 32-bit version. The middle digit corresponds to the generation of the Alpha architecture. Internally, Alpha processors were also identified by EV numbers, EV officially standing for "Extended VAX" but having an alternative humorous meaning of "Electric Vlasic", giving homage to the Electric Pickle experiment at Western Research Lab.[19]

In May 1997, DEC sued Intel for allegedly infringing on its Alpha patents in designing the original Pentium, Pentium Pro, and Pentium II chips.[20] As part of a settlement, much of DEC's chip design and fabrication business was sold to Intel. This included DEC's StrongARM implementation of the ARM computer architecture, which Intel marketed as the XScale processors commonly used in Pocket PCs. The core of Digital Semiconductor, the Alpha microprocessor group, remained with DEC, while the associated office buildings went to Intel as part of the Hudson fab.[21]

Improved models

The first few generations of the Alpha chips were some of the most innovative of their time.

A persistent report attributed to DEC insiders suggests the choice of the AXP tag for the processor was made by DEC's legal department, which was still smarting from the VAX trademark fiasco.[26] After a lengthy search the tag "AXP" was found to be entirely unencumbered. Within the computer industry, a joke got started that the acronym AXP meant "Almost eXactly PRISM".[27]

Design principles

The Alpha architecture was intended to be a high-performance design. Digital intended the architecture to support a one-thousandfold increase in performance over twenty-five years. To ensure this, any architectural feature that impeded multiple instruction issue, clock rate or multiprocessing was removed. As a result, the Alpha does not have:

Condition codes

The Alpha does not have condition codes for integer instructions[30] to remove a potential bottleneck at the condition status register. Instructions resulting in an overflow, such as adding two numbers whose result does not fit in 64 bits, write the 32 or 64 least significant bits to the destination register. The carry is generated by performing an unsigned compare on the result with either operand to see if the result is smaller than either operand. If the test was true, the value one is written to the least significant bit of the destination register to indicate the condition.

Registers

The architecture defines a set of 32 integer registers and a set of 32 floating-point registers in addition to a program counter, two lock registers and a floating-point control register (FPCR). It also defines registers that were optional, implemented only if the implementation required them. Lastly, registers for PALcode are defined.

The integer registers are denoted by R0 to R31 and floating-point registers are denoted by F0 to F31. The R31 and F31 registers are hardwired to zero and writes to those registers by instructions are ignored. Digital considered using a combined register file, but a split register file was determined to be better, as it enables two-chip implementations to have a register file located on each chip and integer-only implementations to omit the floating-point register file containing the floating-point registers. A split register file was also determined to be more suitable for multiple instruction issue due to the reduced number of read and write ports. The number of registers per register file was also considered, with 32 and 64 being contenders. Digital concluded that 32 registers was more suitable as it required less die space, which improves clock frequencies. This number of registers was deemed not to be a major issue in respect to performance and future growth, as thirty-two registers could support at least eight-way instruction issue.

The program counter is a 64-bit register which contains a longword-aligned virtual byte address, that is, the low two bits of the program counter are always zero. The PC is incremented by four to the address of the next instruction when an instruction is decoded. A lock flag and locked physical address register are used by the load-locked and store-conditional instructions for multiprocessor support. The floating-point control register (FPCR) is a 64-bit register defined by the architecture intended for use by Alpha implementations with IEEE 754-compliant floating-point hardware.

Data types

In the Alpha architecture, a byte is defined as an 8-bit datum (octet), a word as a 16-bit datum, a longword as a 32-bit datum, a quadword as a 64-bit datum, and an octaword as a 128-bit datum.

The Alpha architecture originally defined six data types:

To maintain a level of compatibility with the VAX, the 32-bit architecture that preceded the Alpha, two other floating-point data types are included:

VAX H-floating point (quad precision, 128-bit) was not supported,[31] but another 128-bit floating-point option, X-floating point, is available on Alpha, but not VAX.[32]
H and X have been described as similar, but not identical. Software emulation for H-floating is available from DEC, as is a source-code level converter named DECmigrate.

Memory

The Alpha has a 64-bit linear virtual address space with no memory segmentation. Implementations can implement a smaller virtual address space with a minimum size of 43 bits. Although the unused bits were not implemented in hardware such as TLBs, the architecture required implementations to check whether they are zero to ensure software compatibility with implementations with a larger (or full) virtual address space.

Instruction formats

The Alpha ISA has a fixed instruction length of 32 bits. It has six instruction formats.

The integer operate format is used by integer instructions. It contains a 6-bit opcode field, followed by the Ra field, which specifies the register containing the first operand and the Rb field, specifies the register containing the second operand. Next is a 3-bit field which is unused and reserved. A 1-bit field contains a "0", which distinguished this format from the integer literal format. A 7-bit function field follows, which is used in conjunction with the opcode to specify an operation. The last field is the Rc field, which specifies the register which the result of a computation should be written to. The register fields are all 5 bits long, required to address 32 unique locations, the 32 integer registers.

The integer literal format is used by integer instructions which use a literal as one of the operands. The format is the same as the integer operate format except for the replacement of the 5-bit Rb field and the 3 bits of unused space with an 8-bit literal field which is zero-extended to a 64-bit operand.

The floating-point operate format is used by floating-point instructions. It is similar to the integer operate format, but has an 11-bit function field made possible by using the literal and unused bits which are reserved in integer operate format.

The memory format is used mostly by load and store instructions. It has a 6-bit opcode field, a 5-bit Ra field, a 5-bit Rb field and a 16-bit displacement field.

Branch instructions have a 6-bit opcode field, a 5-bit Ra field and a 21-bit displacement field. The Ra field specifies a register to be tested by a conditional branch instruction, and if the condition is met, the program counter is updated by adding the contents of the displacement field with the program counter. The displacement field contains a signed integer and if the value of the integer is positive, if the branch is taken then the program counter is incremented. If the value of the integer is negative, then program counter is decremented if the branch is taken. The range of a branch thus is ±1 Mi instructions, or ±4 MiB. The Alpha Architecture was designed with a large range as part of the architecture's forward-looking goal.

The CALL_PAL format is used by the CALL_PAL instruction, which is used to call PALcode subroutines. The format retains the opcode field but replaces the others with a 26-bit function field, which contains an integer specifying a PAL subroutine.

Instruction set

Control instructions

The control instructions consist of conditional and unconditional branches, and jumps. The conditional and unconditional branch instructions use the branch instruction format, while the jump instructions use the memory instruction format.

Conditional branches test whether the least significant bit of a register is set or clear, or compare a register as a signed quadword to zero, and branch if the specified condition is true. The conditions available for comparing a register to zero are equality, inequality, less than, less than or equal to, greater than or equal to, and greater than. The new address is computed by longword aligning and sign extending the 21-bit displacement and adding it to the address of the instruction following the conditional branch.

Unconditional branches update the program counter with a new address computed in the same way as conditional branches. They also save the address of the instruction following the unconditional branch to a register. There are two such instructions, and they differ only in the hints provided for the branch prediction hardware.

There are four jump instructions. These all perform the same operation, saving the address of the instruction following the jump, and providing the program counter with a new address from a register. They differ in the hints provided to the branch prediction hardware. The unused displacement field is used for this purpose.

Integer arithmetic

The integer arithmetic instructions perform addition, multiplication, and subtraction on longwords and quadwords; and comparison on quadwords. There is no instruction(s) for division as the architects considered the implementation of division in hardware to be adverse to simplicity. In addition to the standard add and subtract instructions, there are scaled versions. These versions shift the second operand to the left by two or three bits before adding or subtracting. The Multiply Longword and Multiply Quadword instructions write the least significant 32 or 64 bits of a 64- or 128-bit result to the destination register, respectively. Since it is useful to obtain the most significant half, the Unsigned Multiply Quadword High (UMULH) instruction is provided. UMULH is used for implementing multi-precision arithmetic and division algorithms. The concept of a separate instruction for multiplication that returns the most significant half of a result was taken from PRISM.

The instructions that operate on longwords ignore the most significant half of the register and the 32-bit result is sign-extended before it is written to the destination register. By default, the add, multiply, and subtract instructions, with the exception of UMULH and scaled versions of add and subtract, do not trap on overflow. When such functionality is required, versions of these instructions that perform overflow detection and trap on overflow are provided.

The compare instructions compare two registers or a register and a literal and write '1' to the destination register if the specified condition is true or '0' if not. The conditions are equality, inequality, less than or equal to, and less than. With the exception of the instructions that specify the former two conditions, there are versions that perform signed and unsigned compares.

The integer arithmetic instructions use the integer operate instruction formats.

Logical and shift

The logical instructions consist of those for performing bitwise logical operations and conditional moves on the integer registers. The bitwise logical instructions perform