Science Fair Project Encyclopedia
In computing, IA-64 (Instruction Architecture-64) is a 64-bit processor architecture developed in cooperation by Intel and Hewlett-Packard for processors such as Itanium and Itanium 2. The goal of Itanium is to produce a "post-RISC era" architecture, using a very long instruction word (VLIW) design. Unlike previous Intel x86 processors, the Itanium is not geared toward high performance execution of the IA-32 (x86) instruction set.
In a mainstream "out-of-order" design, a complex decoder system examines each instruction as they flow through the pipeline and sees which can be fed off to operate in parallel across the available execution units — e.g., a series of instructions that say
A = B + C and
D = F + G will not affect each other, and so they can be fed into two different execution units and run in parallel. The ability to extract instruction level parallelism (ILP) from the instruction stream is essential to good performance in a modern CPU.
Predicting which code can and cannot be split up this way is a very complex task. In many cases the inputs to one line are dependent on the output from another, but only if some other condition is true. For instance, consider the slight modification of the example noted before,
A = B + C; IF A==5 THEN D = F + G. In this case the calculations remain independent of the other, but the second command requires the results from the first calculation in order to know if it should be run at all.
In these cases the circuitry on the CPU typically "guesses" what the condition will be. In something like 90% of all cases, an
IF will be taken, suggesting that in our example the second half of the command can be safely fed into another core. However, getting the guess wrong can cause a significant performance hit when the result has to be thrown out and the CPU waits for the results of the "right" command to be calculated. Much of the improving performance of modern CPUs is due to better prediction logic, but lately the improvements have begun to slow.
IA-64 instead relies on the compiler for this task. Even before the program is fed into the CPU, the compiler examines the code and makes the same sorts of decisions that would otherwise happen at "run time" on the chip itself. Once it has decided what paths to take, it gathers up the instructions it knows can be run in parallel, bundles them into one larger instruction, and then stores it in that form in the program—hence the name VLIW or "very long instruction word."
Moving this task from the CPU to the compiler has several advantages. Firstly, the compiler can spend considerably more time examining the code, a benefit the chip itself doesn't have because it has to complete as quickly as possible. Thus the compiler version can be considerably more accurate than the same code run on the chip's circuitry. Secondly, the prediction circuitry is quite complex, and offloading prediction to the compiler reduces that complexity enormously. It no longer has to examine anything; it simply breaks the instruction apart again and feeds the pieces off to the cores. Thirdly, doing the prediction in the compiler is a one-off cost, rather than one incurred every time the program is run.
The downside is that a program's runtime-behaviour is not always obvious in the code used to generate it, and may vary considerably depending on the actual data being processed. The out-of-order processing logic of a mainstream CPU can make decisions on the basis of actual run-time data which the compiler can only guess at. That means that it is possible for the compiler to get its prediction wrong even more often than the comparable logic placed on the CPU. The VLIW design thus relies heavily on the performance of the compilers, the trade-off being to decrease microprocessor hardware complexity by increasing compiler software complexity.
The IA-64 architecture includes a very generous complement of registers: 128 each of 82-bit floating point and 64-bit integer registers. In addition to the sheer number, IA-64 adds in a register rotation mechanism that is controlled by the Register Stack Engine. Rather than the typical spill/fill or window mechanisms used in other processors, the Itanium can rotate in a set of new registers to accommodate for new function parameters or temporaries. The register rotation mechanism combined with predication is also very effective in executing automatically unrolled loops.
The architecture also provides a CISC-like complement of instructions. Thus we have explicit instructions for multimedia operations, and explicit instructions for floating point operations.
Where a typical VLIW will assign sub-instructions from each long instruction word to a particular fixed functional unit, the Itanium supports several bundle mappings to allow for more instruction mixing possibilities and which include a balance between serial and parallel execution modes. There was room left in the initial bundle encodings to add more mappings in future versions of IA-64. In addition, the Itanium has individually settable predicate registers to cause a kind of runtime determined "no output" mode to each instruction.
A raw Itanium, when first booted, is actually missing some of its instruction functionality. A boot-rom like program called an EFI program is loaded which loads additional code into on-chip memory for defining these instructions, and performing other boot-time configurations, such as choosing the execution mode of the processor (64-bit versus 32-bit.) This design allows an Itanium system to be deployed with different capabilities depending on the contents of the EFI program.
In order to support IA-32, the Itanium can switch into 32-bit mode with special jump escape instructions. The IA-32 instructions have been mapped to the Itanium's functional units. However, since the Itanium is built primarily for speed of its EPIC-style instructions, and because it has no out-of-order execution capabilities, IA-32 code executes at a severe performance penalty compared to either the IA-64 mode, or its Pentium line of processors. For example, the Itanium functional units do not automatically generate integer flags as a side effect of ordinary ALU computation, and does not intrinsically support multiple outstanding unaligned memory loads. There are also IA-32 software emulators which are freely available for Windows and Linux, and these emulators typically outperform the hardware-based emulation by around 50%. The Windows emulator is available from Microsoft, the Linux emulator is available from some Linux vendors such as Novell. Given the superior performance of the software emulator, there has been some speculation that Intel will remove IA-32 emulation from future Itanium processors. However, the IA-32 hardware accounts for less than 1% of the transistors of an Itanium 2, and so there is little to gain from doing so.
Although other 64-bit architectures have existed for a long time, most (MIPS, Alpha, PA-RISC) have faded from the marketplace. Itanium's remaining competition for the 64-bit server and workstation market appear to be the resurrected AMD with its AMD64 architecture, and the entrenched rivals: IBM's POWER architecture, and Sun's UltraSparc architecture. Apple may also challenge Intel with its XServe product line based on the IBM PowerPC architecture.
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details