Professional Documents
Culture Documents
HakimWeatherspoon
CS3410,Spring2012
ComputerScience
CornellUniversity
SeeP&HChapter4.6
AProcessor
Review:Singlecycleprocessor
memory
+4
inst
register
file
+4
=?
PC
control
offset
new
pc
alu
cmp
addr
din
dout
memory
target
imm
extend
WhatdeterminesperformanceofProcessor?
A)CriticalPath
B)ClockCycleTime
C)CyclesPerInstruction(CPI)
D)Alloftheabove
E)Noneoftheabove
Review:SingleCycleProcessor
Advantages
SingleCycleperinstructionmakelogicandclocksimple
Disadvantages
Sinceinstructionstakedifferenttimetofinish,memory
andfunctionalunitarenotefficientlyutilized.
Cycletimeisthelongestdelay.
Loadinstruction
BestpossibleCPIis1
However,lowerMIPSandlongerclockperiod(lowerclock
frequency);hence,lowerperformance.
Review:MultiCycleProcessor
Advantages
BetterMIPSandsmallerclockperiod(higherclock
frequency)
Hence,betterperformancethanSingleCycleprocessor
Disadvantages
HigherCPIthansinglecycleprocessor
Pipelining:WantbetterPerformance
wantsmallCPI(closeto1)withhighMIPSandshort
clockperiod(highclockfrequency)
CPUtime=instructioncountxCPIxclockcycletime
5
SingleCyclevs PipelinedProcessor
See:P&HChapter4.5
6
TheKids
Alice
Bob
Theydontalwaysgetalong
TheBicycle
TheMaterials
Drill
Saw
Glue
Paint
9
TheInstructions
Npieces,eachbuiltfollowingsamesequence:
Saw
Drill
Glue
Paint
10
Design1:SequentialSchedule
Aliceownstheroom
BobcanenterwhenAliceisfinished
Repeatforremainingtasks
Nopossibilityforconflicts
11
SequentialPerformance
time
1
2
Latency:
ElapsedTimeforAlice:4
Throughput:
ElapsedTimeforBob:4
Concurrency:
Totalelapsedtime:4*N
Canwedobetter?
CPI=
12
Design2:PipelinedDesign
Partitionroomintostages ofapipeline
Dave
Carol
Bob
Alice
Onepersonownsastageatatime
4stages
4people workingsimultaneously
Everyonemovesrightinlockstep
13
time PipelinedPerformance
1
2
3
4
5
6
7
Latency:
Throughput:
Concurrency:
14
Lessons
Principle:
Throughputincreasedbyparallelexecution
Pipelining:
Identifypipelinestages
Isolate stagesfromeachother
Resolvepipelinehazards(Thursday)
15
AProcessor
Review:Singlecycleprocessor
memory
+4
inst
register
file
+4
=?
PC
control
offset
new
pc
alu
cmp
addr
din
dout
memory
target
imm
extend
16
AProcessor
memory
inst
register
file
alu
+4
addr
PC
din
control
new
pc
Instruction
Fetch
imm
extend
Instruction
Decode
dout
memory
compute
jump/branch
targets
Execute
Memory
Write
Back
17
BasicPipeline
FivestageRISCloadstorearchitecture
1. Instructionfetch(IF)
getinstructionfrommemory,incrementPC
2. InstructionDecode(ID)
translateopcode intocontrolsignalsandreadregisters
3. Execute(EX)
performALUoperation,computejump/branchtargets
4. Memory(MEM)
accessmemoryifneeded
5. Writeback (WB)
updateregisterfile
18
Clockcycle
TimeGraphs
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
Latency:
Throughput:
Concurrency:
19
PrinciplesofPipelinedImplementation
Breakinstructionsacrossmultipleclockcycles
(five,inthiscase)
Designaseparatestage fortheexecution
performedduringeachclockcycle
Addpipelineregisters(flipflops)toisolatesignals
betweendifferentstages
20
PipelinedProcessor
See:P&HChapter4.6
21
register
file
B
alu
memory
PipelinedProcessor
+4
IF/ID
ID/EX
Execute
EX/MEM
Memory
ctrl
Instruction
Decode
Instruction
Fetch
dout
compute
jump/branch
targets
ctrl
extend
din
memory
imm
new
pc
control
ctrl
inst
PC
addr
Write
Back
MEM/WB
22
IF
Stage1:InstructionFetch
Fetchanewinstructionevery cycle
CurrentPCisindextoinstructionmemory
IncrementthePCatendofcycle(assumenobranchesfornow)
Writevaluesofinteresttopipelineregister(IF/ID)
Instructionbits(forlaterdecoding)
PC+4(forlatercomputingbranchtargets)
23
IF
instruction
memory
addr
mc
+4
PC
new
pc
24
IF
instruction
memory
mc
00=readword
PC+4
+4
inst
addr
WE
PC
pcreg
new
pc
pcsel
Restofpipeline
pcrel
pcabs
IF/ID
25
ID
Stage2:InstructionDecode
Onevery cycle:
ReadIF/IDpipelineregistertogetinstructionbits
Decodeinstruction,generatecontrolsignals
Readfromregisterfile
Writevaluesofinteresttopipelineregister(ID/EX)
Controlinformation,Rdindex,immediates,offsets,
ContentsofRa,Rb
PC+4(forcomputingbranchtargetslater)
26
WE
Rd
register
D
file
B
Ra Rb
IF/ID
ID/EX
Restofpipeline
inst
PC+4
Stage1:InstructionFetch
ID
27
inst
PC+4
Stage1:InstructionFetch
WE
Rd
register
D
file
IF/ID
ID/EX
decode
extend
Restofpipeline
B
Ra Rb
ID
result
dest
28
EX
Stage3:Execute
Onevery cycle:
ReadID/EXpipelineregistertogetvaluesandcontrolbits
PerformALUoperation
Computetargets(PC+4+offset,etc.)incasethisisabranch
Decideifjump/branchshouldbetaken
Writevaluesofinteresttopipelineregister(EX/MEM)
Controlinformation,Rdindex,
ResultofALUoperation
Valueincase thisisamemorystoreinstruction
29
ctrl
ctrl
PC+4
||
j
pcrel
pcsel
Restofpipeline
alu
target
imm
Stage2:InstructionDecode
pcreg
EX
branch?
pcabs
ID/EX
EX/MEM
30
MEM
Stage4:Memory
Onevery cycle:
ReadEX/MEMpipelineregistertogetvaluesandcontrolbits
Performmemoryload/storeifneeded
addressisALUresult
Writevaluesofinteresttopipelineregister(MEM/WB)
Controlinformation,Rdindex,
Resultofmemoryoperation
PassresultofALUoperation
31
ctrl
ctrl
din
dout
addr
memory
Restofpipeline
target
Stage3:Execute
MEM
mc
EX/MEM
MEM/WB
32
pcsel
MEM
branch?
memory
Restofpipeline
D
pcrel
dout
mc
pcabs
ctrl
target
din
addr
ctrl
Stage3:Execute
pcreg
EX/MEM
MEM/WB
33
WB
Stage5:Writeback
Onevery cycle:
ReadMEM/WBpipelineregistertogetvaluesandcontrolbits
Selectvalueandwritetoregisterfile
34
ctrl
Stage4:Memory
WB
MEM/WB
35
ctrl
Stage4:Memory
D
result
WB
dest
MEM/WB
36
D
M
addr
din dout
EX/MEM
Rd
OP
Rd
mem
OP
ID/EX
A
B
Rt Rd PC+4
IF/ID
OP
PC+4
+4
PC
B
Ra Rb
imm
inst
inst
mem
Rd
D
MEM/WB
37
Administrivia
HW2duetoday
FilloutSurveyonline.Receivecredit/pointsonhomeworkforsurvey:
https://cornell.qualtrics.com/SE/?SID=SV_5olFfZiXoWz6pKI
Surveyisanonymous
Project1(PA1)dueweekafterprelim
Continueworkingdiligently.Usedesigndocmomentum
Saveyourwork!
Saveoften.Verifyfileisnonzero.PeriodicallysavetoDropbox,email.
BewareofMacOSX 10.5(leopard)and10.6(snowleopard)
Useyourresources
LabSection,Piazza.com,OfficeHours,HomeworkHelpSession,
Classnotes,book,Sections,CSUGLab
38
Administrivia
Prelim1: nextTuesday,February28th inevening
Wewillstartat7:30pmsharp,socomeearly
PrelimReview:ThisWed/Fri,3:305:30pm,in155Olin
ClosedBook
Cannotuseelectronicdeviceoroutsidematerial
PracticeprelimsareonlineinCMS
Materialcoveredeverythinguptoendofthisweek
AppendixC(logic,gates,FSMs,memory,ALUs)
Chapter4(pipelined[andnonpipeline]MIPSprocessorwith
hazards)
Chapters2(Numbers/Arithmetic,simpleMIPSinstructions)
Chapter1(Performance)
HW1,HW2,Lab0,Lab1,Lab2
39
Administrivia
Checkonlinesyllabus/schedule
http://www.cs.cornell.edu/Courses/CS3410/2012sp/schedule.html
SlidesandReadingforlectures
OfficeHours
HomeworkandProgrammingAssignments
Prelims(inevenings):
Tuesday,February28th
Thursday,March29th
Thursday,April26th
Scheduleissubjecttochange
40
Collaboration,Late,RegradingPolicies
BlackBoardCollaborationPolicy
Candiscussapproachtogetheronablackboard
Leaveandwriteupsolutionindependently
Donotcopysolutions
LatePolicy
Eachpersonhasatotaloffour slipdays
Maxoftwo slipdaysforanyindividualassignment
Slipdaysdeductedfirstforany lateassignment,
cannotselectivelyapplyslipdays
Forprojects,slipdaysaredeductedfromallpartners
20%deductedperdaylateafterslipdaysareexhausted
Regrade policy
SubmitwrittenrequesttoleadTA,
andleadTAwillpickadifferentgrader
Submitanotherwrittenrequest,
leadTAwillregrade directly
Submityetanotherwrittenrequestforprofessortoregrade.
41
Example:SampleCode(Simple)
Assumeeightregistermachine
Runthefollowingcodeonapipelineddatapath
add
nand
lw
add
sw
42
SlidesthankstoSallyMcKee
Example::SampleCode(Simple)
add r3,r1,r2;
nandr6,r4,r5;
lw r4,20(r2);
add r5,r2,r5;
sw r7,12(r3);
43
MIPSinstructionformats
AllMIPSinstructionsare32bitslong,has3formats
Rtype
op
6bits
Itype
op
6bits
Jtype
rs
rt
5bits 5bits
rs
rt
rd shamt func
5bits
5bits
6bits
immediate
5bits 5bits
16bits
op
immediate(targetaddress)
6bits
26bits
44
MIPSInstructionTypes
Arithmetic/Logical
Rtype:resultandtwosourceregisters,shiftamount
Itype:16bitimmediatewithsign/zeroextension
MemoryAccess
load/storebetweenregistersandmemory
word,halfwordandbyteoperations
Controlflow
conditionalbranches:pcrelativeaddresses
jumps:fixedoffsets,registerabsolute
45
Clockcycle
add
nand
TimeGraphs
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
lw
add
sw
EX MEM WB
Latency:
Throughput:
Concurrency:
46
M
U
X
target
PC+4
PC+4
R0
R1
regB
R2
R3
Registerfile
instruction
PC
Inst
mem
regA
Bits1620
Bits2631
IF/ID
valA
R4
R5
R6
valB
R7
extend
Bits1115
ALU
result
M
U
X
A
L
U
ALU
result
mdata
Data
mem
M
U
X
data
imm
dest
valB
Rd
Rt
op
ID/EX
M
U
X
dest
dest
op
op
EX/MEM
MEM/WB
47
data
dest
extend
0
0
IF/ID
ID/EX
M
U
X
EX/MEM
MEM/WB
48
add312
M
U
X
0
R0
R1
R3
Registerfile
add312
PC
Inst
mem
R2
R4
R5
R6
R7
0
36
9
12
18
7
41
22
extend
Fetch:
add312
Bits1115
Bits1620
Bits2631
Time:1
IF/ID
0
0
0
0
M
U
X
A
L
U
0
Data
mem
M
U
X
data
dest
0
0
0
nop
ID/EX
M
U
X
nop
nop
EX/MEM
MEM/WB
49
nand 645
add312
M
U
X
4
R0
R2
R3
Registerfile
nand645
PC
Inst
mem
R1
R4
R5
R6
R7
0
36
9
12
18
7
41
22
extend
Fetch:
nand645
Bits1115
Bits1620
Bits2631
Time:2
IF/ID
0
0
36
9
3
M
U
X
A
L
U
0
Data
mem
M
U
X
data
dest
0
3
2
add
ID/EX
M
U
X
nop
nop
EX/MEM
MEM/WB
50
lw 420(2)
nand 645
add312
M
U
X
12
8
R0
R2
R3
Registerfile
lw420(2)
PC
Inst
mem
R1
R4
R5
R6
R7
0
36
9
12
18
7
41
22
extend
Fetch:
lw420(2)
Bits1115
Bits1620
Bits2631
Time:3
IF/ID
0
0
36
18
7
6
M
U
X
A
L
U
45
0
Data
mem
M
U
X
data
dest
9
6
5
3
2
nand
ID/EX
M
U
X
add
nop
EX/MEM
MEM/WB
51
add525
lw 420(2)
nand 645
add312
M
U
X
16
12
R0
R2
R3
Registerfile
add525
PC
Inst
mem
R1
R4
R5
R6
R7
0
36
9
12
18
7
41
22
extend
Fetch:
add525
Bits1115
Bits1620
Bits2631
Time:4
IF/ID
0
45
18
18
20
M
U
X
A
L
U
3 45
0
Data
mem
M
U
X
data
dest
7
0
4
6
5
lw
ID/EX
M
U
X
nand
EX/MEM
3
add
MEM/WB
52
sw 712(3)
add525
lw 420(2)
nand 645add312
M
U
X
12
20
16
R0
R2
R3
Registerfile
sw712(3)
PC
Inst
mem
R1
R4
R5
R6
R7
0
36
9
45
18
7
41
22
extend
Fetch:
sw712(3)
Bits1115
Bits1620
Bits2631
Time:5
IF/ID
0
3
9
7
M
U
20 X
A
L
U
29 3
45
0
Data
mem
M
U
X
data
dest
18
5
5
0
4
add
ID/EX
M
U
X
lw
EX/MEM
nand
MEM/WB
53
sw 712(3)
add525
lw 420(2)
nand 645
M
U
X
16
20
R0
R1
R2
R3
Registerfile
PC
Inst
mem
R4
R5
R6
R7
0
36
9
45
18
7
3
22
extend
Nomore
instructions
Bits1115
Bits1620
Bits2631
Time:6
IF/ID
0
29
45
22
12
M
U
X
A
L
U
16 29
99
Data
mem
M
U
X
data
dest
7
0
7
5
5
sw
ID/EX
M
U
X
add
EX/MEM
lw
MEM/WB
54
nop
nop
sw 712(3)
add525 lw 420(2)
M
U
X
20
+
R0
R1
R2
Inst
mem
R3
Registerfile
PC
R4
R5
R6
R7
0
36
9
45
99
7
3
22
0
16
45
M
U
12 X
A
L
U
57 16
Data
mem
Bits1115
Bits1620
22
0
7
Bits2631
Time:7
IF/ID
data
dest
extend
Nomore
instructions
M
U
99 X
M
U
X
sw
ID/EX
EX/MEM
add
MEM/WB
55
nop
nop
nop
sw 712(3)
add525
M
U
X
+
R0
R1
R2
Inst
mem
R3
Registerfile
PC
R4
R5
R6
R7
0
36
9
45
99
16
3
22
57
M
U
X
57
Bits1115
M
U
X
Bits1620
IF/ID
SlidesthankstoSallyMcKee
Data
mem
M
U
X
data
dest
Bits2631
Time:8
22
22
extend
Nomore
instructions
A
L
U
16
sw
ID/EX
EX/MEM
MEM/WB
56
nop
nop
nop
nop
sw 712(3)
M
U
X
+
R0
R1
R2
Inst
mem
R3
Registerfile
PC
R4
R5
R6
R7
0
36
9
45
99
16
3
22
M
U
X
A
L
U
Data
mem
data
dest
extend
Nomore
instructions
M
U
X
Bits1115
M
U
X
Bits1620
Bits2123
Time:9
IF/ID
ID/EX
EX/MEM
MEM/WB
57
PipeliningRecap
Powerfultechniqueformaskinglatencies
Logically,instructionsexecuteoneatatime
Physically,instructionsexecuteinparallel
Instructionlevelparallelism
Abstractionpromotesdecoupling
Interface(ISA)vs.implementation(Pipeline)
58
IF/ID
ID/EX
D
addr
din dout
addr5,r2,r5
sw
lw
nand
addr3,r1,r2
r4,20(r2)
r7,12(r3)
r6,r4,r5 nand
sw
addr5,r2,r5
lw
addr3,r1,r2
r4,20(r2)
r7,12(r3)
r6,r4,r5
OP
Rd
OP
EX/MEM
Rd
mem
PC+4
imm
0
36A
9
12
18
7B
41
Rb
22
77
OP
PC+4
+4
PC
r0
r1
r2
Rd
r3
Dr4
r5
r6
Ra
r7
addr5,r2,r5
sw
lw
addr3,r1,r2
nand
r4,20(r2)
r7,12(r3)
r6,r4,r5
Rd
0:add
1:nand
inst
2:lw
3:add
mem
4:sw
addr5,r2,r5
sw
addr3,r1,r2
nand
lw
r4,20(r2)
r7,12(r3)
r6,r4,r5
inst
nand
addr3,r1,r2
lw
addr5,r2,r5
sw
r4,20(r2)
r7,12(r3)
r6,r4,r5
MEM/WB
59