Professional Documents
Culture Documents
Ahmed
Chapter 5 Exploiting Memory Hierarchy
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 2
Q. How do architects address this gap?
A. Put smaller, faster “cache” memories
Performance between CPU and DRAM.
(1/latency) Create a “memory
memory hierarchy
hierarchy”. CPU
CPU 60% per yr
2X in 1.5 yrs
Saturday, May 22, 2010 CSEN 601 - Computer Architecture, Spring 2010
Year 3
Apple ][ (1977)
CPU: 1000 ns
DRAM: 400 ns
Saturday, May 22, 2010 CSEN 601 - Computer Architecture, Spring 2010 4
CPU Registers Staging
Upper Level
Registers
y
100s Bytes f Unit
Xfer faster
<10s ns
Instr. Operands
Cache
K Bytes prog./compiler
10-100
10 100 ns
Cache 1-8 bytes
y
1-0.1 cents/bit
Blocks
Main Memory cache cntl
M Bytes
y Memory 8-128
8 8 bytes
200ns- 500ns
$.0001-.00001 cents /bit
Disk
Pages
G Bytes, 10 ms OS
(10 000 000 ns)
(10,000,000 Disk 512 4K bytes
512-4K
-5 -6
10 - 10 cents/bit
Files
Tape
infinite user/operator
p
sec-min Tape Mbytes
-8
10 Larger
Saturday, May 22, 2010 CSEN 601 - Computer Architecture, Spring 2010
Lower Level5
L1 (64K Instruction)
Registers
512K
L2
( )
(1K)
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 8
Block (aka line): unit of copying
May be multiple words
If accessed data is present in upper
level
e e
Hit: access satisfied by upper level
▪ Hit ratio: hits/accesses
If accessed data is absent
Miss: block copied from lower level
▪ Time taken: miss penalty
▪ Miss ratio: misses/accesses
Mi i i /
= 1 – hit ratio
Then accessed data supplied from upper
level
How do we know if
the data is present?
Wh
Where do
d we look?
l k?
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 10
Location determined by address
y
Direct mapped: only one choice
(Block address) modulo (#Blocks in cache)
#Blocks is a
power of 2
U llow-order
Use d
address bits
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 11
How do we know which particular block is
stored in a cache location?
Store block address as well as the data
Actually, only need the high‐order bits
Called the tag
What if there is no data in a location?
Valid bit: 1 = present, 0 = not present
Initially 0
y
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 12
8 blocks, 1 word/block, direct mapped
8‐blocks, 1 word/block, direct mapped
Initial state
Index V Tag Data
000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 13
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 14
Word addr Binary addr Hit/miss Cache block
26 11 010 Miss 010
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 15
Word addr Binary addr Hit/miss Cache block
22 10 110 Hit 110
26 11 010 Hit 010
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 16
Word addr Binary addr Hit/miss Cache block
16 10 000 Miss 000
3 00 011 Miss 011
16 10 000 Hit 000
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 18
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 19
Larger blocks should reduce miss rate
L bl k h ld d i
Due to spatial locality
But in a fixed sized cache
But in a fixed‐sized cache
Larger blocks fewer of them
▪ More competition increased miss rate
Larger blocks pollution
Larger miss penalty
g p y
Can override benefit of reduced miss rate
Early restart and critical‐word‐first can help (will
y p(
be studied in Computer Architecture Course)
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 20
On cache hit, CPU proceeds normally
On cache miss
Stall the CPU pipeline
Fetch block from next level of hierarchy
Instruction cache miss
▪ Restart instruction fetch
Data cache miss
▪ Complete data access
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 21
On data‐write hit, could just update the block in
On data rite hit co ld j st pdate the block in
cache
But then cache and memory would be inconsistent
Write through: also update memory
h h l d
But makes writes take longer
e.g., if base CPI = 1, 10% of instructions are stores, write to
g, , ,
memory takes 100 cycles
▪ Effective CPI = 1 + 0.1×100 = 11
Solution: write buffer
Holds data waiting to be written to memory
CPU continues immediately
▪ Only stalls on write if write buffer is already full
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 22
Alternative: On data‐write hit, just update the
block in cache
Keep track of whether each block is dirty
When a dirty block is replaced
Write it back to memory
Can use a write buffer to allow replacing block to
be read first
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 23
What should happen on a write miss?
Alternatives for write‐through
Allocate on miss: fetch the block
Write around: don’t fetch the block
▪ Since programs often write a whole block before
reading it (e.g., initialization)
For write back
For write‐back
Usually fetch the block
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 24
Dr. Mohamed F. Ahmed
Chapter 5 Exploiting Memory Hierarchy
It will cover only what we covered after the mid‐term
It will cover only what we covered after the mid term
What you are taking away with you is the basic
concepts of computer organization
It is about understanding and appreciating the various
b d d d h
design options and tradeoffs inside a computing machine.
What you will get by the end of this course does not
Wh t ill t b th d f thi d t
represent more than 5% to 7% of body of knowledge
in this area of computer science, but their significance
i
is very high for you as a computer scientist/engineer.
hi h f t i ti t/ i
Our job in this exam is to measure your understanding
and give you feedback
d i f db k about your level in this course
b t l l i thi
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 26
If you already read the book, go through it
If l d d h b k h h i
quickly again. If not, you must.
Go through the slides to know which sections
are covered. It is written on the margins of
the slides.
Solve the practice Assignments again.
j
Majority y of the Final Exam questions will be
q
similar to these questions.
Make sure you understand the rational
b h d h
behind the CPU & pipeline design
l d
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 27
Converting HLC to MIPS code
C ti HLC t MIPS d
Tracing MIPS code and writing the output.
Finding out problems with MIPS code
g p
Tracing execution of one or more instructions on
the detailed diagram.
Hazards detection and resolution
Basics of Memory Hierarchy, i.e. where data
should be placed, when replaced, etc.
Trace instructions in the pipeline
Change in a pipeline and its effect on the data
p
path.
Saturday, May 22, 2010 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy 28