You are on page 1of 10

Instruction Hazards Session 3 Un-conditional Branching

Powerpoint Templates

Page 1

Instruction hazards
Instruction fetch units fetch instructions and supply the execution units with a steady stream of instructions. If the stream is interrupted then the pipeline stalls. Stream of instructions may be interrupted because of a cache miss or a branch instruction.

Powerpoint Templates

Page 2

Instruction hazards (contd..)


Consider a two-stage pipeline, first stage is the instruction fetch stage and the second stage is the instruction execute stage. Instructions I1, I2 and I3 are stored at successive memory locations. I2 is a branch instruction with branch target as instruction Ik. I2 is an unconditional branch instruction. Clock cycle 3: - Fetch unit is fetching instruction I3. - Execute unit is decoding I2 and computing the branch target address. Clock cycle 4: - Processor must discard I3 which has been incorrectly fetched and fetch Ik. - Execution unit is idle, and the pipeline stalls for one clock cycle.

Powerpoint Templates

Page 3

Instruction hazards (contd..)


Time Clock cycle Instruction I1 I2 (Branch) F1 E1 F2 E2 Execution unit idle 1 2 3 4 5 6

I3

F3

Ik

Fk

Ek

Ik+1

Fk+1

Ek+1

Pipeline stalls for one clock cycle. Time lost as a result of a branch instruction is called as branch penalty. Branch penalty is one clock cycle. Powerpoint Templates

Page 4

Instruction hazards (contd..)


Branch penalty depends on the length of the pipeline, may be higher for a longer pipeline. For a four-stage pipeline: - Branch target address is computed in stage E2. - Instructions I3 and I4 have to be discarded. - Execution unit is idle for 2 clock cycles. - Branch penalty is 2 clock cycles.
Time Clock cycle I1 I2 (Branch) I3 I4 Ik Ik+1 1 F1 2 D1 F2 3 E1 D2 F3 4 W1 E2 D3 F4 X X Fk Dk Fk+1 Ek Dk+1 Wk E k+1 5 6 7 8

Powerpoint Templates

Page 5

Instruction hazards (contd..)


Branch penalty can be reduced by computing the branch target address earlier in the pipeline. Instruction fetch unit has special hardware to identify a branch instruction after the instruction is fetched. Branch target address can be computed in the Decode stage (D2), rather than in the Execute stage (E2). Branch penalty is only one clock cycle.
Time Clock cycle I1 I2 (Branch) I3 Ik Ik+1 1 F1 2 D1 F2 3 E1 D2 F3 X Fk Dk Fk+1 Ek Wk 4 W1 5 6 7

Dk+1 E k+1

Powerpoint Templates

Page 6

Instruction hazards (contd..)


Instruction queue F : Fetch instruction Instruction fetch unit

Queue can hold several instructions

Fetch unit fetches instructions before they are needed & stores them in a queue

D : Dispatch/ Decode unit

E : Execute instruction

W : Write results

Dispatch unit takes instructions from the front of the queue and dispatches them to the Execution unit. Dispatch unit also decodes the instruction.

Powerpoint Templates

Page 7

Instruction hazards (contd..)


Fetch unit must have sufficient decoding and processing capability to recognize and execute branch instructions. Pipeline stalls because of a data hazard: Dispatch unit cannot issue instructions from the queue. Fetch unit continues to fetch instructions and add them to the queue. Delay in fetching because of a cache miss or a branch: Dispatch unit continues to dispatch instructions from the instruction queue.

Powerpoint Templates

Page 8

Instruction hazards (contd..)


Clock cycle Queue length 1 1 F1 2 1 D1 F2 3 1 E1 D2 F3 F4 F5 D5 F6 X Fk Dk Fk+1 Ek Wk 4 1 E1 5 2 E1 6 3 W1 E2 D3 W2 E3 D4 W3 E4 W4 7 2 8 1 9 1 10 1 I1 I2 I3 I4 I5 (Branch) I6 I5 is a branch instruction with target instruction Ik. Ik Ik is fetched in cycle 7, and I6 is discarded. Ik+1 However, this does not stall the pipeline, since I4 is dispatched.

Initial length of the queue is 1. Fetch adds 1 to the queue, dispatch reduces the length by 1. Queue length remains the same for first 4 clock cycles. I1 stalls the pipeline for 2 cycles. Queue has space, so the fetch unit continues and queue length rises to 3 in clock cycle 6.

D k+1 E k+1

I2, I3, I4 and Ik are executed in successive clock cycles. Fetch unit computes the branch address concurrently with the execution of other Powerpoint branch folding. instructions. This is called as Templates Page 9 9

Instruction hazards (contd..)


Branch folding can occur if there is at least one instruction available in the queue other than the branch instruction. Queue should ideally be full most of the time. Increasing the rate at which the fetch unit reads instructions from the cache. Most processors allow more than one instruction to be fetched from the cache in one clock cycle. Fetch unit must replenish the queue quickly after a branch has occurred. Instruction queue also mitigates the impact of cache misses: In the event of a cache miss, the dispatch unit continues to send instructions to the execution unit as long as the queue is full. In the meantime, the desired cache block is read. If the queue does not become empty, cache miss has no effect on the rate of instruction execution.

Powerpoint Templates

Page 10

You might also like