You are on page 1of 44

EC303

Pipelining
Do you know machine can execute
operation at high speed?
Pipelining
RISC designer were concerned with the way
they were going to invent a high-speed chip.
As such, the pipelining concept was used as
one of the best methods to invent this chip.
Pipelining is a technique that enables the processor
to process more than one instruction at a time.
Pipelining is the process to making processor fast.
Anyone who has done a lot of laundry intuitively used
pipelining. The non-pipelining approach to laundry would
be:
1. Place one dirty load of clothes in the washer
2. When the washer is finished, place the wet load in the
dryer.
3. When the dryer is finished, place the dry load in the table
and fold.
4. When the folding is finished, ask your roommate to put
the clothes away.
When your roommate is done, then start over with the
next dirty load.
Overview
Figure 1 : Unpipelineoperation
Figure 2 : Pipeline operation
Thepipelineapproachtakesmuchlesstimeasafigure2.As
soonasthewasher isfinishedwiththefirst loadandplaced
in the dryer, you load the washer with the second dirty
load.
Whenthefirst loadisdry, youplaceit onthetableto start
folding, movethewet loadtodryer, andthenext dirtyload
tothewasher.
Next you haveyour roommateput thefirst load away, you
start foldingthe second load, the dryer has the third load,
andyouput thefourthloadtothewasher.
At this point all stepscalled stages in pipeliningare
operating concurrently. As long as we have separate
resourcefor eachstage, wecanpipelinethetasks.
Thepipeliningparadoxisthat thetimefromplacingasingle
dirty sock in the washer until it is dried, folded, and put
away is not shorter for pipelining; the reason pipeliningis
faster for many loads is that everything is working in
parallel, somoreloadarefinishedper hour.
Pipelining improves throughput of our laundry system
without improvingthetimetocompleteasingleload.
Hence, pipeliningwouldnot decreasethetimetocomplete
one load of laundry, but when we have many loads of
laundry to do, the improvement is throughput decreases
thetotal timetocompletethework.
Step Of Pipelining
The same principles apply to processors where
we pipeline instruction execution. MIPS
instruction classically take 5 steps:
1. Fetch instruction from memory
2. Read register while decodingthe instruction.
The format of MIPS instructions allows reading
and decoding to occur simultaneously.
3. Executethe operation or calculate an address.
4. Access an operand in data memory.
5. Writethe result into a register.
To implement pipelining, designer divide a processors data
path into sections, and place pipeline latches between each
section.
Pipelining
At the start of each cycle, the pipeline latches
readtheir inputsandcopythemtotheir outputs,
which then remain constant throughout the rest
of thecycle.
This breaks the data path into several sections,
each of which has a latency of one clock cycle,
since an instruction cannot pass through a
pipelinelatchuntil thestart of thenext cycle.
Theamount of thedatapaththat asignal travels
through in onecycleis called astageof pipeline,
anddesignersoftendescribeapipelinethat takes
ncyclesasann-stagepipeline.
Stage 1 is the fetch instruction block and its
associated pipeline latch, stage 2 is decode
instruction block and its pipeline latch, and
stages 3, 4 and 5 are the subsequent block of
the pipeline.
Cycle Time Of Pipelined Processors
The cycle time of pipelined processor is
dependent on four factor:
1. The cycle time of the unpipelineddiversion
of processor.
2. The number of pipeline stage.
3. How evenly the data path logic is divided
among the stage.
4. The latency of the pipeline latches.
As the number of pipeline stage is increases,
the pipeline latch latency becomes a greater
and greater fraction of the cycle time, limiting
the benefit of dividing a processor into a very
large number of pipeline stages.
Cycle Time
Pipelined
= Cycle Time
Unpipelined
+ Pipeline Latch Latency
Number of pipeline stage
Example 1
An unpipelined processor has a cycle time of
25ns. What is the cycle time of a pipelined
versionof theprocessor with5evenlydivided
pipeline stages, if each pipeline latch has
latency of 1ns?What if the processor is
devidedinto50pipelinesstages?
Solution
Cycle Time
Pipelined
= Cycle Time
Unpipelined
+ Pipeline Latch Latency
Number of pipeline stage
= (25n/ 5) + 1n
= 6ns.
For the 50 stage pipeline, Cycle Time
Pipelined
=(25n/50) + 1n
=1.5ns
In 5 stage pipeline, the pipeline latch latency is only 1/6
th
of the overall
cycle time,whilethe pipeline latch latency is 2/3 of the total cycle time in 50
stage pipeline.
Another way of looking at this is that the 50-stage pipeline has a cycle time
that of the 5 stage pipeline, at a cost of 10 times as many pipeline latches.
Often, the datapath logic cannot easily be divided into
equal-latencypipelinestages.
For example accessing the register file in a processor
might take3ns, whiledecoding aninstructionmight take
4ns.
When deciding how to divide a datapath into pipeline
stages, designer must balance the desire to have each
stagehavethesamelatencywiththedifficultyof dividing
the datapath into pipeline stages at different point and
theamount of datathat has to bestoredin thepipeline
latch, which determines the amount of space that the
latchtakesuponthechip.
Some part of the datapath, such as the
instruction decode logic, are irregular, making it
hardtosplit themintostages.
When a processor cannot be divided into equal-
latencypipelinestages, theclockcycletimeof the
processor is equal to the latency of the longest
pipeline stageplus the pipeline latch delay, since
the cycle time has to be long enough for the
longest pipeline stage to complete and store its
result in the pipeline latch between it and the
next stage. [1]
Clock cycle =latency of the longest pipeline stage + pipeline latch delay
Example 2
Suppose an unpipelinedprocessor with a 25ns
cycle time is divided into 5 pipeline stages
with latencies of 5,7,3,6 and 4 ns. If the
pipeline latch latency is 1 ns, what is the cycle
time of the resulting processor?
Solution:
The longest pipeline stage is 7ns. Adding a 1ns
pipeline latch to this stage gives a total latency
of 8ns, which is the cycle time.
Pipeline Latency
The latency of the pipeline is the
amount of time that a single
instruction takes to pass through
the pipeline , which is the product
of the number of pipeline stages
and the clock cycle time.
Pipeline latency =number of pipeline stages + clock cycle time
Example 3
If an unpipelinedprocessor with a cycle time of
25ns is evenly divided into 5 pipeline stages using
pipeline latches with 1ns latency, what is the total
latency of the pipeline? How about if the
processor is divided into 50 pipeline stages?
Solution:
For 5 pipeline stages, the latency =5 +25 =30ns
For 50 pipeline stages, the latency =50 +25 =75ns
Example 4
Suppose an unpipelined processor with a 25ns
cycle time is divided into 5 pipeline stages with
latencies of 5,7,3,6and4ns. If thepipelinelatch
latencyis1ns, what isthelatencyof theresulting
pipeline?
Solution:
Cycle time =7 +1 =8ns.
Since there are 5 stages in the pipeline, the total
latency is =8x5 =40ns.
Pipeline Hazard
There are situations in pipelining when the
next instruction cannot execute in the
following clock cycle. These event are called
hazards, and there are three different type.
Structural
hazards
Control
Hazards
Data
Hazards
Structural Hazards.
Structural hazards mean that the hardware
cannot support the combination of instructions
that we want to execute in the same clock cycle.
A structural hazards in the laundry room would
occur if we used a washer-dryer combination
instead of a separate washer and dryer, or if our
roommate was busy doing something else and
wouldnt put clothes away. Our carefully
scheduled pipeline plan would be foiled.
For example, wehadasinglememoryinstead
of two memory. If the pipeline have a four
instruction, at the same clock cycle, the first
instruction is accessing data while fourth
instruction is fetchinginstruction in the same
memory.
Without two memories, our pipeline could
haveastructural hazards.
Data Hazards
Data hazards occur when the pipeline must be stalled because one step
must wait for another tocomplete.
Suppose you found a sock at the folding station for which no match
existed. One possible strategy is to run down to your roomand search
throughyour clothes bureau to seeif you can find thematch. Obviously,
whileyouaredoingthesearch, loadsthat havecompleteddryingandare
ready to fold and those that have finished washingand are ready to dry
must wait.
In computer pipeline, data hazards arise fromthe dependence of one
instructiononanearlier onethat isstill inthepipeline.
Data hazard also called pipeline data hazards. An occurrence in which a
plannedinstructioncannot executeintheproper clockcyclebecausedata
that isneededtoexecutetheinstructionisnot yet available.
Control Hazards
Control hazards also called branch hazards. An
occurrence in which the proper instruction
cannot execute in the proper clock cycle
because the instruction that was fetched is
not the one that is needed; that is, the flow of
instruction addresses is not what the pipeline
expected.
Datapath
The figure showthe single cycle datapath. The division of an instruction
into five stages means a five-stage pipeline, which in turn means that up
tofiveinstructionswill beinexecutionduringanysingleclockcycle.
Thedatapathisseparatedintofivepieces.
1. IF: Instructionfetch
2. ID: Instructiondecodeandregister fileread
3. EX: Executionor addresscalculation
4. MEM: Datamemoryaccess
5. WB: Writeback
These five component correspond roughly to the way the datapath is
drawn; instruction and data move generally fromleft to right through
thefivestageastheycompleteexecution.
Goingbackto our laundryanalogy, clothesget cleaner, drier, andmore
organized as they move through the line , and they never move
backwards.
However, there are two exception to this left-
to-right flow of instructions:
1. The write-backstage, which places the result
back into the register file in the middle of the
datapath.
2. The selection of the next value of PC,
choosing between the incremented PC and
the branch address from the MEM stage.
1. Instruction Fetch
Instruction being read from memory using the
address in the PC and then placed in the IF/ID
pipeline register.
The PC address is incremented by 4 and then
written back into the PC to be ready for the next
clock cycle. This incremented address is also
saved in the IF/ID pipeline register in case it is
needed later for an instruction.
The computer cannot know which type of
instruction is being fetched, so it must prepare
for any instruction, passing potentially needed
information down the pipeline.
2. Instruction decode and
register file read
The instruction portion of the IF/ID pipeline
register supplying the 16-bits immediate field,
which is sign-extended to 32 bits, and the
register numbers to read the two register.
All three values are stored in the ID/EX
pipeline register, along with the incremented
PC address.
We again transfer everything that might be
needed by any instruction during a later clock
cycle.
3.Execution or address calculation
The figure show that the load instruction
reads the contents of register 1 and the sign-
extended immediate from the ID/EX pipeline
register and adds them using ALU. That sum is
placed in the EX/MEM pipeline register.
4. Memory access
The figure shows the load instruction reading
the data memory using the address from the
EX/MEM pipeline register and loading the
data into the MEM/WB pipeline register.
5. Write back
The figure shows the final step: reading the
data from the MEM/WB pipeline register and
writing it into the register file in the middle of
the figure.
Summary of the 5 pipe stages of the
store instruction
No Pipestages Explanation
1 Instruction fetch The instructionis read from memory using the address in
the PC and then placed in the IF/ID pipeline register. This
stage occurs before the instruction is identified, so the
Instruction fetch figure (slide 30) works for store as well as
load.
2 Instruction decode
and register file
read
The instruction in the IF/ID pipeline register supplies the
register numbers for reading two registersand extends the
sign of the 16-bit immediate. These three 32-bit values are
all stored in the ID/EX pipeline register. The figure in slide
32 for load instructions also shows the operation of the
second stage for stores. These first two stage are executed
by all instructions, since it too early to know the type of
instruction.
3 Executeand
address calculation
Figure in slide 34 showthe third step; the affective
address is placed in the EX/MEM pipeline register.
No Pipestages Explanation
4 Memory access Figure in slide 36 showthe data being written to memory.
Note that the register containing the data to be stored was
read in an earlier stage and stored in ID/EX. The only way
to make the data available during the MEM stage is to
place the data into the EX/MEM pipeline register in the EX
stage, just as we stored the effective address into
EX/MEM.
5 Write back The figure in slide 38 show the final step of the store. For
this instruction, nothing happens in the write-back state.
Since every instruction behind the store is already in
progress, we have no way to accelerate those instructions.
Hence, an instruction passes through a stage even if there
is nothing to do because later instructions are already
progressing at the maximum rate.
Multiple clock cycle pipeline diagram
Instruction
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
Instruction
Fetch
Instruction
Decode
Execution Data Access Write Back
B
Instruction
Fetch
Instruction
Decode
Execution Data Access Write Back
C
Instruction
Fetch
Instruction
Decode
Execution Data Access Write Back
D
Instruction
Fetch
Instruction
Decode
Execution Data Access Write Back
E
Instruction
Fetch
Instruction
Decode
Execution Data Access Write Back
Time in clock cycles
Summary
Pipelining increases the number of
simultaneously executing instructions and the
rate at which instructions are started and
completed.
Pipelining does not reduce the time it takes to
complete an individual instruction, also called
latency. For example the five stage pipeline still
takes 5 clock cycles for the instruction to
complete. The pipeline improves instruction
throughput rather than individual instruction
time or latency.
References
1. Computer Organization and Design. David
A. Patterson, John L. Hennessy;Morgan
Kaufmann Publisher,2005
2. Computer Architecture; Nicholas Carter;
Schaum. McGraw-Hill,2002

You might also like