that a new instruction
will not necessarily be completed at the close of each clock cycle, as was the
case with the single-cycle processor. Instead, a new instruction will be com-
pleted at the close of only those clock cycles in which the write stage has been working on an instruction. Any clock cycle with an empty write stage will add
no new instructions to the “Completed Instructions” box, and any clock cycle
with an active write stage will add one new instruction to the box. Of course,
this means that when the pipeline first starts to work on a program, there will
be a few clock cycles—three to be exact—during which no instructions are
completed. But once the fourth clock cycle starts, the first instruction enters
the write stage and the pipeline can then begin completing new instructions
on each clock cycle, which, because each clock cycle is 1 ns, translates into a
completion rate of one instruction per nanosecond.
Shrinking Program Execution Time
Note that the total execution time for each individual instruction is not
changed by pipelining. It still takes an instruction 4 ns to make it all the way through the processor; that 4 ns can be split up into four clock cycles of 1 ns
each, or it can cover one longer clock cycle, but it’s still the same 4 ns. Thus pipelining doesn’t speed up instruction execution time, but it does speed up
program execution time (the number of nanoseconds that it takes to execute an entire program) by increasing the number of instructions finished per unit
Pipelined Execution
47
of time. Just like pipelining our hypothetical SUV assembly line allowed us to
fill the Army’s orders in a shorter span of time, even though each individual
SUV still spent a total of five hours in the assembly line, so does pipelining
allow a processor to execute programs in a shorter amount of time, even
though each individual instruction still spends the same amount of time
traveling through the CPU. Pipelining makes more efficient use of the CPU’s
existing resources by putting all of its units to work simultaneously, thereby
allowing it to do more total work each nanosecond.
The Speedup from Pipelining
In general, the speedup in completion rate versus a single-cycle implementa-
tion that’s gained from pipelining is ideally equal to the number of pipeline
stages. A four-stage pipeline yields a fourfold speedup in the completion rate
versus a single-cycle pipeline, a five-stage pipeline yields a fivefold speedup, a twelve-stage pipeline yields a twelvefold speedup, and so on. This speedup is
possible because the more pipeline stages there are in a processor, the more
instructions the processor can work on simultaneously, and the more instruc-
tions it can complete in a given period of time. So the more finely you can slice those four phases of the instruction’s lifecycle, the more of the hardware that’s used to implement those phases you can put to work at any given moment.
To return to our assembly line analogy, let’s say that each crew is made
up of six workers, and that each of the hour-long tasks that each crew per-
forms can be readily subdivided into two shorter, 30-minute tasks. So we can
double our factory’s throughput by splitting each crew into two smaller, more
specialized crews of three workers each, and then having each smaller crew
perform one of the shorter tasks on one SUV per 30 minutes.
Stage 1: Build the chassis.
z
Crew 1a: Fit the parts of the chassis together and spot-weld the joints.
z
Crew 1b: Fully weld all the parts of the chassis.
Stage 2: Drop the engine into the chassis.
z
Crew 2a: Place the engine into the chassis and mount it in place.
z
Crew 2b: Connect the engine to the moving parts of the car.
Stage 3: Put the doors, a hood, and coverings on the chassis.
z
Crew 3a: Put the doors and hood on the chassis.
z
Crew 3b: Put the other coverings on the chassis.
Stage 4: Attach the wheels.
z
Crew 4a: Attach the
David L. Robbins
Natasha Tanner, Molly Thorne
Josephine Myles
Kerry B. Collison
Philip Nutman
Greig Beck
Bryce Courtenay
Christopher Turner
Eleanor Dark
Linda Byler