Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture by jon stokes Page A

Book: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture by jon stokes Read Free Book Online

Authors: jon stokes

Tags: General, Computers, Systems Architecture, Microprocessors

Ads: Link

instruction’s lifecycle—into a series of
    discrete pipeline stages that can be completed in sequence by specialized hardware. Recall the way that we broke down the SUV assembly process into five
    discrete steps—with one dedicated crew assigned to complete each step—
    and you’ll get the idea.
    Because an instruction’s lifecycle consists of four fairly distinct phases, you
    can start by breaking down the single-cycle processor’s instruction execution
    Pipelined Execution
    45
    process into a sequence of four discrete pipeline stages, where each pipeline
    stage corresponds to a phase in the standard instruction lifecycle:
    Stage 1: Fetch the instruction from code storage.
    Stage 2: Decode the instruction.
    Stage 3: Execute the instruction.
    Stage 4: Write the results of the instruction back to the register file.
    Note that the number of pipeline stages is called the pipeline depth . So the four-stage pipeline has a pipeline depth of four.
    For convenience’s sake, let’s say that each of these four pipeline stages
    takes exactly 1 ns to finish its work on an instruction, just like each crew in
    our assembly line analogy takes one hour to finish its portion of the work on
    an SUV. So the original single-cycle processor’s 4 ns execution process is now
    broken down into four discrete, sequential pipeline stages of 1 ns each in
    length.
    Now let’s step through another diagram together to see how a pipelined
    CPU would execute the four instructions depicted in Figure 3-7.
    1ns 2ns 3ns 4ns 5ns 6ns 7ns 8ns 9ns
    Stored
    Instructions
    CPU
    Fetch
    Decode
    Execute
    Write
    Completed
    Instructions
    Figure 3-7: A four-stage pipeline
    At the beginning of the first nanosecond, the blue instruction enters
    the fetch stage. After that nanosecond is complete, the second nanosecond
    begins and the blue instruction moves on to the decode stage, while the
    next instruction, the red one, starts to make its way from code storage to the
    processor (i.e., it enters the fetch stage). At the start of the third nanosecond, the blue instruction advances to the execute stage, the red instruction
    advances to the decode stage, and the green instruction enters the fetch
    stage. At the fourth nanosecond, the blue instruction advances to the write
    46
    Chapter 3
    stage, the red instruction advances to the execute stage, the green instruc-
    tion advances to the decode stage, and the purple instruction advances to
    the fetch stage. After the fourth nanosecond has fully elapsed and the fifth
    nanosecond starts, the blue instruction has passed from the pipeline and is
    now finished executing. Thus we can say that at the end of 4 ns (= four clock
    cycles), the pipelined processor depicted in Figure 3-7 has completed one
    instruction.
    At start of the fifth nanosecond, the pipeline is now full and the processor
    can begin completing instructions at a rate of one instruction per nanosecond.
    This one instruction/ns completion rate is a fourfold improvement over
    the single-cycle processor’s completion rate of 0.25 instructions/ns (or four
    instructions every 16 ns).
    Shrinking the Clock
    You can see from Figure 3-7 that the role of the CPU clock changes slightly
    in a pipelined processor, compared to the single-cycle processor shown in
    Figure 3-6. Because all of the pipeline stages must now work together
    simultaneously and be ready at the start of each new nanosecond to hand
    over the results of their work to the next pipeline stage, the clock is needed
    to coordinate the activity of the whole pipeline. The way this is done is
    simple: Shrink the clock cycle time to match the time it takes each stage to
    complete its work so that at the start of each clock cycle, each pipeline stage
    hands off the instruction it was working on to the next stage in the pipeline.
    Because each pipeline stage in the example processor takes 1 ns to complete
    its work, you can set the clock cycle to be 1 ns in duration.
    This new method of clocking the processor means