instruction’s lifecycle—into a series of discrete pipeline stages that can be completed in sequence by specialized hardware. Recall the way that we broke down the SUV assembly process into five discrete steps—with one dedicated crew assigned to complete each step— and you’ll get the idea. Because an instruction’s lifecycle consists of four fairly distinct phases, you can start by breaking down the single-cycle processor’s instruction execution Pipelined Execution 45 process into a sequence of four discrete pipeline stages, where each pipeline stage corresponds to a phase in the standard instruction lifecycle: Stage 1: Fetch the instruction from code storage. Stage 2: Decode the instruction. Stage 3: Execute the instruction. Stage 4: Write the results of the instruction back to the register file. Note that the number of pipeline stages is called the pipeline depth . So the four-stage pipeline has a pipeline depth of four. For convenience’s sake, let’s say that each of these four pipeline stages takes exactly 1 ns to finish its work on an instruction, just like each crew in our assembly line analogy takes one hour to finish its portion of the work on an SUV. So the original single-cycle processor’s 4 ns execution process is now broken down into four discrete, sequential pipeline stages of 1 ns each in length. Now let’s step through another diagram together to see how a pipelined CPU would execute the four instructions depicted in Figure 3-7. 1ns 2ns 3ns 4ns 5ns 6ns 7ns 8ns 9ns Stored Instructions CPU Fetch Decode Execute Write Completed Instructions Figure 3-7: A four-stage pipeline At the beginning of the first nanosecond, the blue instruction enters the fetch stage. After that nanosecond is complete, the second nanosecond begins and the blue instruction moves on to the decode stage, while the next instruction, the red one, starts to make its way from code storage to the processor (i.e., it enters the fetch stage). At the start of the third nanosecond, the blue instruction advances to the execute stage, the red instruction advances to the decode stage, and the green instruction enters the fetch stage. At the fourth nanosecond, the blue instruction advances to the write 46 Chapter 3 stage, the red instruction advances to the execute stage, the green instruc- tion advances to the decode stage, and the purple instruction advances to the fetch stage. After the fourth nanosecond has fully elapsed and the fifth nanosecond starts, the blue instruction has passed from the pipeline and is now finished executing. Thus we can say that at the end of 4 ns (= four clock cycles), the pipelined processor depicted in Figure 3-7 has completed one instruction. At start of the fifth nanosecond, the pipeline is now full and the processor can begin completing instructions at a rate of one instruction per nanosecond. This one instruction/ns completion rate is a fourfold improvement over the single-cycle processor’s completion rate of 0.25 instructions/ns (or four instructions every 16 ns). Shrinking the Clock You can see from Figure 3-7 that the role of the CPU clock changes slightly in a pipelined processor, compared to the single-cycle processor shown in Figure 3-6. Because all of the pipeline stages must now work together simultaneously and be ready at the start of each new nanosecond to hand over the results of their work to the next pipeline stage, the clock is needed to coordinate the activity of the whole pipeline. The way this is done is simple: Shrink the clock cycle time to match the time it takes each stage to complete its work so that at the start of each clock cycle, each pipeline stage hands off the instruction it was working on to the next stage in the pipeline. Because each pipeline stage in the example processor takes 1 ns to complete its work, you can set the clock cycle to be 1 ns in duration. This new method of clocking the processor means