Science Fair Project Encyclopedia
Out of Order execution
In computer science, out-of-order execution is a paradigm used in most high-speed microprocessors in order to make use of cycles that would otherwise be wasted by a certain type of costly delay. OoO is an optimization, the same cycles could also be used for other purposes, such a hyperthreading. Almost all modern CPU designs include support for out of order execution.
Out-of-order execution is a restricted form of data flow computation, which was a major research area in computer architecture in the 1980s. Important academic research in this subject was led by Yale Patt and his HPSm simulator. A paper by J.E. Smith and A.R. Pleszkun, published in 1985 completed the scheme by describing how the precise behavior of exceptions could be maintained in out-of-order machines.
The very first use of out-of-order execution was in the IBM 360/91 mainframe's Floating Point Unit from the 1960s, which made use of the Tomasulo algorithm style of reservation stations. IBM also produced the first true out-of-order microprocessor, the PowerPC 601. It was the release of the Intel Pentium Pro which really signified that the technology had become mainstream. By 1996, other OoO CPUs (perhaps just announced instead of in production) included the HP PA-8000 and the MIPS R-10000. Later OoO devices included the DEC Alpha 21264 and AMD's K6. The notable exception to this trend are the SPARC processors from Sun Microsystems.
The logical complexity of the out-of-order schemes was the reason that such machines were not produced until the mid-1990s. Many low-end processors meant for cost-sensitive markets still do not use this paradigm due to large silicon area that is required to build this class of machine.
In earlier processors, the processing of instructions is normally done in these steps:
- Instruction fetch.
- If input operands are available (in registers for instance), the instruction is dispatched to the appropriate functional unit, otherwise the processor stalls until they are available.
- The instruction is executed by the appropriate functional unit.
- The functional unit writes the results back to the register file.
This new paradigm breaks up the processing of instructions into these steps:
- Instruction fetch.
- Instruction dispatch to an instruction queue (also called instruction buffer or reservation stations).
- The instruction waits in the queue until its input operands are available. The instruction is then allowed to leave the queue before earlier, older instructions.
- The instruction is issued to the appropriate functional unit and executed by that unit.
- The results are queued.
- When all older results have been written back to the register file, then this result is written back to the register file. This is called the graduation or retire stage.
The key concept of OoO processing is to allow the processor to avoid a class of stalls that occur when the data needed to perform an operation is not available. In the outline above, the OoO processor avoids the stall that occurs in step (2) of the in-order processor when the instruction is not completely ready to be processed due to missing data.
OoO processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal. The way the instructions are ordered in the original computer code is known as program order, in the processor they are handled in data order, the order in which the data, operands, become available in the processor's registers. Fairly complex circuitry is needed to convert from one ordering to the other and maintain a logical ordering of the output; the processor itself runs the instructions in seemingly random order.
The effect of OoO processing grows as the instruction pipeline deepens and the speed difference between main memory (or cache memory) and the processor widens. For instance, on modern machines the processor runs many times faster than memory, so waiting for data to arrive over the bus is extremely costly.
Dispatch and Issue Decoupling allows Out-of-Order issue
One of the differences created by the new paradigm is the creation of queues which allows the dispatch step to be decoupled from the issue step and the graduation stage to be decoupled from the execute stage. An early name for the paradigm was Decoupled Architecture . In the earlier in-order processors, these stages operated in a fairly lock-step, pipelined fashion.
To avoid false operand dependencies, which would decrease the frequency when instructions could be issued out of order, a technique called register renaming is used. In this scheme, there are more physical registers than defined by the architecture. The physical registers are tagged so that multiple versions of the same architectural register can exist at the same time.
Execute and Writeback Decoupling allows program restart
The queue for results is necessary to resolve issues as branch mispredictions and exceptions/traps. The results queue allows programs to be restarted after an exception, which requires the instructions to be completed in program order. The queue allows results to be discarded due to mispredictions on older branch instructions and exceptions taken on older instructions.
The ability to issue instructions past branches which have yet to resolve is known as speculative execution.
- Are the instructions dispatched to a centralized queue or to multiple distributed queues?
- IBM PowerPC processors use queues which are distributed among the different functional units while most other Out-of-Order processors use a centralized queue. IBM uses the term reservation stations for their distributed queues.
- Is there an actual results queue or are the results written directly into a register file? For the latter, the queueing function is handled by register maps which hold the register renaming information for each instruction in flight.
- Early Intel Out-of-order processors use a results queue called a Re-order Buffer, while most later Out-of-Order processors use register maps.
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details