You are on page 1of 64

Carnegie Mellon

1
1he Memory n|erarchy
13-213: lnLroducuon Lo CompuLer SysLems
9
Lh
LecLure, Sep. 21, 2010
Instructors:
8andy 8ryanL and uave C'Pallaron
Carnegie Mellon
2
1oday
! Storage techno|og|es and trends
! Loca||ty of reference
! Cach|ng |n the memory h|erarchy
Carnegie Mellon
3
kandom-Access Memory (kAM)
! key features
! 8AM ls Lradluonally packaged as a chlp.
! 8aslc sLorage unlL ls normally a cell (one blL per cell).
! Muluple 8AM chlps form a memory.
! Stanc kAM (SkAM)
! Lach cell sLores a blL wlLh a four or slx-LranslsLor clrculL.
! 8eLalns value lndenlLely, as long as lL ls kepL powered.
! 8elauvely lnsensluve Lo elecLrlcal nolse (LMl), radlauon, eLc.
! lasLer and more expenslve Lhan u8AM.
! Dynam|c kAM (DkAM)
! Lach cell sLores blL wlLh a capaclLor. Cne LranslsLor ls used for access
! value musL be refreshed every 10-100 ms.
! More sensluve Lo dlsLurbances (LMl, radlauon,.) Lhan S8AM.
! Slower and cheaper Lhan S8AM.
Carnegie Mellon
4
SkAM vs DkAM Summary
Trans. Access Needs Needs
per bit time refresh? EDC? Cost Applications
SRAM 4 or 6 1X No Maybe 100x Cache memories
DRAM 1 10X Yes Yes 1X Main memories,
frame buffers
Carnegie Mellon
5
Convennona| DkAM Crgan|zanon
! d x w DkAM:
! dw LoLal blLs organlzed as d supercells of slze w blLs
cols
rows
0 1 2 3
0
1
2
3
Internal row buffer
16 x 8 DRAM chip
addr
data
supercell
(2,1)
2 bits
/
8 bits
/
Memory
controller
(to/from CPU)
Carnegie Mellon
6
kead|ng DkAM Superce|| (2,1)
Cols
Rows
RAS = 2
0 1 2 3
0
1
2
Internal row buffer
16 x 8 DRAM chip
3
addr
data
2
/
8
/
Memory
controller
Carnegie Mellon
7
kead|ng DkAM Superce|| (2,1)
Cols
Rows
0 1 2 3
0
1
2
3
Internal row buffer
16 x 8 DRAM chip
CAS = 1
addr
data
2
/
8
/
Memory
controller
supercell
(2,1)
supercell
(2,1)
To CPU
Carnegie Mellon
8
Memory Modu|es
: supercell (i,j)
64 MB
memory module
consisting of
eight 8Mx8 DRAMs
addr (row = i, col = j)
Memory
controller
DRAM 7
DRAM 0
0 31 7 8 15 16 23 24 32 63 39 40 47 48 55 56
64-bit doubleword at main memory address A
bits
0-7
bits
8-15
bits
16-23
bits
24-31
bits
32-39
bits
40-47
bits
48-55
bits
56-63
64-bit doubleword
0 31 7 8 15 16 23 24 32 63 39 40 47 48 55 56
Carnegie Mellon
9
Lnhanced DkAMs
! 8as|c DkAM ce|| has not changed s|nce |ts |nvennon |n 1966.
! Commerclallzed by lnLel ln 1970.
! DkAM cores w|th beuer |nterface |og|c and faster I]C :
! Synchronous u8AM (Su8AM)
! uses a convenuonal clock slgnal lnsLead of asynchronous conLrol
! Allows reuse of Lhe row addresses (e.g., 8AS, CAS, CAS, CAS)
! uouble daLa-raLe synchronous u8AM (uu8 Su8AM)
! uouble edge clocklng sends Lwo blLs per cycle per pln
! ulerenL Lypes dlsungulshed by slze of small prefeLch buer:
- uu8 (2 blLs), uu82 (4 blLs), uu84 (8 blLs)
! 8y 2010, sLandard for mosL server and deskLop sysLems
! lnLel Core l7 supporLs only uu83 Su8AM
Carnegie Mellon
10
Nonvo|an|e Memor|es
! DkAM and SkAM are vo|an|e memor|es
! Lose lnformauon lf powered o.
! Nonvo|an|e memor|es reta|n va|ue even |f powered o
! 8ead-only memory (8CM): programmed durlng producuon
! rogrammable 8CM (8CM): can be programmed once
! Lraseable 8CM (L8CM): can be bulk erased (uv, x-8ay)
! LlecLrlcally eraseable 8CM (LL8CM): elecLronlc erase capablllLy
! llash memory: LL8CMs wlLh parual (secLor) erase capablllLy
! Wears ouL aer abouL 100,000 eraslngs.
! Uses for Nonvo|an|e Memor|es
! llrmware programs sLored ln a 8CM (8lCS, conLrollers for dlsks,
neLwork cards, graphlcs acceleraLors, securlLy subsysLems,.)
! Solld sLaLe dlsks (replace roLaung dlsks ln Lhumb drlves, smarL
phones, mp3 players, LableLs, lapLops,.)
! ulsk caches
Carnegie Mellon
11
1rad|nona| 8us Structure Connecnng
CU and Memory
! A bus |s a co||ecnon of para||e| w|res that carry address,
data, and contro| s|gna|s.
! 8uses are typ|ca||y shared by mu|np|e dev|ces.
Main
memory
I/O
bridge
Bus interface
ALU
Register file
CPU chip
System bus Memory bus
Carnegie Mellon
12
Memory kead 1ransacnon (1)
! CU p|aces address A on the memory bus.
ALU
Register file
Bus interface
A
0
A x
Main memory
I/O bridge
%eax
Load operation: movl A, %eax
Carnegie Mellon
13
Memory kead 1ransacnon (2)
! Ma|n memory reads A from the memory bus, retr|eves
word x, and p|aces |t on the bus.
ALU
Register file
Bus interface
x
0
A x
Main memory
%eax
I/O bridge
Load operation: movl A, %eax
Carnegie Mellon
14
Memory kead 1ransacnon (3)
! CU read word x from the bus and cop|es |t |nto reg|ster
eax.
x
ALU
Register file
Bus interface
x
Main memory
0
A
%eax
I/O bridge
Load operation: movl A, %eax
Carnegie Mellon
15
Memory Wr|te 1ransacnon (1)
! CU p|aces address A on bus. Ma|n memory reads |t and
wa|ts for the correspond|ng data word to arr|ve.
y
ALU
Register file
Bus interface
A
Main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Carnegie Mellon
16
Memory Wr|te 1ransacnon (2)
! CU p|aces data word y on the bus.
y
ALU
Register file
Bus interface
y
Main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Carnegie Mellon
17
Memory Wr|te 1ransacnon (3)
! Ma|n memory reads data word y from the bus and stores
|t at address A.
y
ALU
register file
bus interface
y
main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Carnegie Mellon
18
What's Ins|de A D|sk Dr|ve?
Spindle
Arm
Actuator
Platters
Electronics
(including a
processor
and memory!)
SCSI
connector
Image courtesy of Seagate Technology
Carnegie Mellon
19
D|sk Geometry
! D|sks cons|st of p|auers, each w|th two surfaces.
! Lach surface cons|sts of concentr|c r|ngs ca||ed tracks.
! Lach track cons|sts of sectors separated by gaps.
Spindle
Surface
Tracks
Track k
Sectors
Gaps
Carnegie Mellon
20
D|sk Geometry (Mu||p|e-|auer V|ew)
! A||gned tracks form a cy||nder.
Surface 0
Surface 1
Surface 2
Surface 3
Surface 4
Surface 5
Cylinder k
Spindle
Platter 0
Platter 1
Platter 2
Carnegie Mellon
21
D|sk Capac|ty
! Capac|ty: max|mum number of b|ts that can be stored.
! vendors express capaclLy ln unlLs of glgabyLes (C8), where
1 C8 = 109 8yLes (LawsulL pendlng! Clalms decepuve adveruslng).
! Capac|ty |s determ|ned by these techno|ogy factors:
! 8ecordlng denslLy (blLs/ln): number of blLs LhaL can be squeezed
lnLo a 1 lnch segmenL of a Lrack.
! 1rack denslLy (Lracks/ln): number of Lracks LhaL can be squeezed
lnLo a 1 lnch radlal segmenL.
! Areal denslLy (blLs/ln2): producL of recordlng and Lrack denslLy.
! Modern d|sks parnnon tracks |nto d|s[o|nt subsets ca||ed
record|ng zones
! Lach Lrack ln a zone has Lhe same number of secLors, deLermlned
by Lhe clrcumference of lnnermosL Lrack.
! Lach zone has a dlerenL number of secLors/Lrack
Carnegie Mellon
22
Compunng D|sk Capac|ty
Capac|ty = (# bytes]sector) x (avg. # sectors]track) x
(# tracks]surface) x (# surfaces]p|auer) x
(# p|auers]d|sk)
Lxamp|e:
! 312 byLes/secLor
! 300 secLors/Lrack (on average)
! 20,000 Lracks/surface
! 2 surfaces/plauer
! 3 plauers/dlsk
Capac|ty = S12 x 300 x 20000 x 2 x S
= 30,720,000,000
= 30.72 G8
Carnegie Mellon
23
D|sk Cperanon (S|ng|e-|auer V|ew)
The disk surface
spins at a fixed
rotational rate
By moving radially, the arm can
position the read/write head over
any track.
The read/write head
is attached to the end
of the arm and flies over
the disk surface on
a thin cushion of air.
s
p
i
n
d
l
e

s p i n d l e
s
p
i
n
d
l
e

spindle spindle
Carnegie Mellon
24
D|sk Cperanon (Mu|n-|auer V|ew)
Arm
Read/write heads
move in unison
from cylinder to cylinder
Spindle
Carnegie Mellon
25
Tracks divided into sectors
D|sk Structure - top v|ew of s|ng|e p|auer
Surface organized into tracks
Carnegie Mellon
26
D|sk Access
Head in position above a track
Carnegie Mellon
27
D|sk Access
Rotation is counter-clockwise
Carnegie Mellon
28
D|sk Access - kead
About to read blue sector
Carnegie Mellon
29
D|sk Access - kead
After BLUE read
After reading blue sector
Carnegie Mellon
30
D|sk Access - kead
After BLUE read
Red request scheduled next
Carnegie Mellon
31
D|sk Access - Seek
After BLUE read Seek for RED
Seek to reds track
Carnegie Mellon
32
D|sk Access - kotanona| Latency
After BLUE read Seek for RED Rotational latency
Wait for red sector to rotate around
Carnegie Mellon
33
D|sk Access - kead
After BLUE read Seek for RED Rotational latency After RED read
Complete read of red
Carnegie Mellon
34
D|sk Access - Serv|ce 1|me Components
After BLUE read Seek for RED Rotational latency After RED read
Data transfer Seek kotanona|
|atency
Data transfer
Carnegie Mellon
35
D|sk Access 1|me
! Average nme to access some target sector approx|mated by :
! 1access = 1avg seek + 1avg roLauon + 1avg Lransfer
! Seek nme (1avg seek)
! 1lme Lo posluon heads over cyllnder conLalnlng LargeL secLor.
! 1yplcal 1avg seek ls 3-9 ms
! kotanona| |atency (1avg rotanon)
! 1lme walung for rsL blL of LargeL secLor Lo pass under r/w head.
! 1avg roLauon = 1/2 x 1/8Ms x 60 sec/1 mln
! 1yplcal 1avg roLauon = 7200 8Ms
! 1ransfer nme (1avg transfer)
! 1lme Lo read Lhe blLs ln Lhe LargeL secLor.
! 1avg Lransfer = 1/8M x 1/(avg # secLors/Lrack) x 60 secs/1 mln.
Carnegie Mellon
36
D|sk Access 1|me Lxamp|e
! G|ven:
! 8oLauonal raLe = 7,200 8M
! Average seek ume = 9 ms.
! Avg # secLors/Lrack = 400.
! Der|ved:
! 1avg roLauon = 1/2 x (60 secs/7200 8M) x 1000 ms/sec = 4 ms.
! 1avg Lransfer = 60/7200 8M x 1/400 secs/Lrack x 1000 ms/sec = 0.02 ms
! 1access = 9 ms + 4 ms + 0.02 ms
! Important po|nts:
! Access ume domlnaLed by seek ume and roLauonal laLency.
! llrsL blL ln a secLor ls Lhe mosL expenslve, Lhe resL are free.
! S8AM access ume ls abouL 4 ns/doubleword, u8AM abouL 60 ns
! ulsk ls abouL 40,000 umes slower Lhan S8AM,
! 2,300 umes slower Lhen u8AM.
Carnegie Mellon
37
Log|ca| D|sk 8|ocks
! Modern d|sks present a s|mp|er abstract v|ew of the
comp|ex sector geometry:
! 1he seL of avallable secLors ls modeled as a sequence of b-slzed
loglcal blocks (0, 1, 2, ...)
! Mapp|ng between |og|ca| b|ocks and actua| (phys|ca|)
sectors
! MalnLalned by hardware/rmware devlce called dlsk conLroller.
! ConverLs requesLs for loglcal blocks lnLo (surface,Lrack,secLor)
Lrlples.
! A||ows contro||er to set as|de spare cy||nders for each
zone.
! AccounLs for Lhe dlerence ln formaued capaclLy" and maxlmum
capaclLy".
Carnegie Mellon
38
I]C 8us
Main
memory
I/O
bridge
Bus interface
ALU
Register file
CPU chip
System bus Memory bus
Disk
controller
Graphics
adapter
USB
controller
Mouse Keyboard Monitor
Disk
I/O bus
Expansion slots for
other devices such
as network adapters.
Carnegie Mellon
39
kead|ng a D|sk Sector (1)
Main
memory
ALU
Register file
CPU chip
Disk
controller
Graphics
adapter
USB
controller
mouse
keyboard
Monitor
Disk
I/O bus
Bus interface
CPU initiates a disk read by writing a
command, logical block number, and
destination memory address to a port
(address) associated with disk controller.
Carnegie Mellon
40
kead|ng a D|sk Sector (2)
Main
memory
ALU
Register file
CPU chip
Disk
controller
Graphics
adapter
USB
controller
Mouse Keyboard Monitor
Disk
I/O bus
Bus interface
Disk controller reads the sector and
performs a direct memory access
(DMA) transfer into main memory.
Carnegie Mellon
41
kead|ng a D|sk Sector (3)
Main
memory
ALU
Register file
CPU chip
Disk
controller
Graphics
adapter
USB
controller
Mouse Keyboard Monitor
Disk
I/O bus
Bus interface
When the DMA transfer completes,
the disk controller notifies the CPU
with an interrupt (i.e., asserts a
special interrupt pin on the CPU)
Carnegie Mellon
42
So||d State D|sks (SSDs)
! ages: S12k8 to 4k8, 8|ocks: 32 to 128 pages
! Data read]wr|uen |n un|ts of pages.
! age can be wr|uen on|y aher |ts b|ock has been erased
! A b|ock wears out aher 100,000 repeated wr|tes.
Flash
translation layer
I/O bus
Page 0 Page 1 Page P-1

Block 0
Page 0 Page 1 Page P-1

Block B-1
Flash memory
Solid State Disk (SSD)
Requests to read and
write logical disk blocks
Carnegie Mellon
43
SSD erformance Character|sncs
! Why are random wr|tes so s|ow?
! Lraslng a block ls slow (around 1 ms)
! WrlLe Lo a page Lrlggers a copy of all useful pages ln Lhe block
! llnd an used block (new block) and erase lL
! WrlLe Lhe page lnLo Lhe new block
! Copy oLher pages from old block Lo Lhe new block
Sequenna| read tput 2S0 M8]s Sequenna| wr|te tput 170 M8]s
kandom read tput 140 M8]s kandom wr|te tput 14 M8]s
kand read access 30 us kandom wr|te access 300 us
Carnegie Mellon
44
SSD 1radeos vs kotanng D|sks
! Advantages
! no movlng parLs " fasLer, less power, more rugged
! D|sadvantages
! Pave Lhe poLenual Lo wear ouL
! MlugaLed by wear levellng loglc" ln ash Lranslauon layer
! L.g. lnLel x23 guaranLees 1 peLabyLe (1013 byLes) of random
wrlLes before Lhey wear ouL
! ln 2010, abouL 100 umes more expenslve per byLe
! App||canons
! M3 players, smarL phones, lapLops
! 8eglnnlng Lo appear ln deskLops and servers
Carnegie Mellon
45
Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980
$/MB 8,000 880 100 30 1 0.1 0.06 130,000
access (ns) 375 200 100 70 60 50 40 9
typical size (MB) 0.064 0.256 4 16 64 2,000 8,000 125,000
Storage 1rends
DRAM
SRAM
Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980
$/MB 500 100 8 0.30 0.01 0.005 0.0003 1,600,000
access (ms) 87 75 28 10 8 4 3 29
typical size (MB) 1 10 160 1,000 20,000 160,000 1,500,000 1,500,000
Disk
Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980
$/MB 19,200 2,900 320 256 100 75 60 320
access (ns) 300 150 35 15 3 2 1.5 200
Carnegie Mellon
46
CU C|ock kates
1980 1990 1995 2000 2003 2005 2010 2010:1980
CPU 8080 386 Pentium P-III P-4 Core 2 Core i7 ---
Clock
rate (MHz) 1 20 150 600 3300 2000 2500 2500
Cycle
time (ns) 1000 50 6 1.6 0.3 0.50 0.4 2500
Cores 1 1 1 1 1 2 4 4
Effective
cycle 1000 50 6 1.6 0.3 0.25 0.1 10,000
time (ns)
Inecnon po|nt |n computer h|story
when des|gners h|t the "ower Wa||"
Carnegie Mellon
47
1he CU-Memory Gap
0.0
0.1
1.0
10.0
100.0
1,000.0
10,000.0
100,000.0
1,000,000.0
10,000,000.0
100,000,000.0
1980 1985 1990 1995 2000 2003 2005 2010
n
s

Year
Disk seek time
Flash SSD access time
DRAM access time
SRAM access time
CPU cycle time
Effective CPU cycle time
D|sk
DkAM
CU
SSD
Carnegie Mellon
48
Loca||ty to the kescue!
1he key to br|dg|ng th|s CU-Memory gap |s a fundamenta|
property of computer programs known as |oca||ty
Carnegie Mellon
49
1oday
! Storage techno|og|es and trends
! Loca||ty of reference
! Cach|ng |n the memory h|erarchy
Carnegie Mellon
50
Loca||ty
! r|nc|p|e of Loca||ty: rograms tend to use data and
|nstrucnons w|th addresses near or equa| to those they
have used recent|y
! 1empora| |oca||ty:
! 8ecenLly referenced lLems are llkely
Lo be referenced agaln ln Lhe near fuLure
! Spana| |oca||ty:
! lLems wlLh nearby addresses Lend
Lo be referenced close LogeLher ln ume
Carnegie Mellon
51
Loca||ty Lxamp|e
! Data references
! 8eference array elemenLs ln successlon
(sLrlde-1 reference pauern).
! 8eference varlable sum each lLerauon.
! Instrucnon references
! 8eference lnsLrucuons ln sequence.
! Cycle Lhrough loop repeaLedly.
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Spana| |oca||ty
1empora| |oca||ty
Spana| |oca||ty
1empora| |oca||ty
Carnegie Mellon
52
ua||tanve Lsnmates of Loca||ty
! C|a|m: 8e|ng ab|e to |ook at code and get a qua||tanve
sense of |ts |oca||ty |s a key sk||| for a profess|ona|
programmer.
! uesnon: Does th|s funcnon have good |oca||ty w|th
respect to array a?
int sum_array_rows(int a[M][N])
{
int i, j, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
sum += a[i][j];
return sum;
}
Carnegie Mellon
53
Loca||ty Lxamp|e
! uesnon: Does th|s funcnon have good |oca||ty w|th
respect to array a?
int sum_array_cols(int a[M][N])
{
int i, j, sum = 0;
for (j = 0; j < N; j++)
for (i = 0; i < M; i++)
sum += a[i][j];
return sum;
}
Carnegie Mellon
54
Loca||ty Lxamp|e
! uesnon: Can you permute the |oops so that the funcnon
scans the 3-d array a w|th a str|de-1 reference pauern
(and thus has good spana| |oca||ty)?
int sum_array_3d(int a[M][N][N])
{
int i, j, k, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
sum += a[k][i][j];
return sum;
}
Carnegie Mellon
55
Memory n|erarch|es
! Some fundamenta| and endur|ng propernes of hardware
and sohware:
! lasL sLorage Lechnologles cosL more per byLe, have less capaclLy,
and requlre more power (heaL!).
! 1he gap beLween Cu and maln memory speed ls wldenlng.
! Well-wrluen programs Lend Lo exhlblL good locallLy.
! 1hese fundamenta| propernes comp|ement each other
beaunfu||y.
! 1hey suggest an approach for organ|z|ng memory and
storage systems known as a memory h|erarchy.
Carnegie Mellon
56
1oday
! Storage techno|og|es and trends
! Loca||ty of reference
! Cach|ng |n the memory h|erarchy
Carnegie Mellon
57
An Lxamp|e Memory n|erarchy
keg|sters
L1 cache
(SkAM)
Ma|n memory
(DkAM)
Loca| secondary storage
(|oca| d|sks)
Larger,
s|ower,
cheaper
per byte
kemote secondary storage
(tapes, d|str|buted h|e systems, Web servers)
Loca| d|sks ho|d h|es
retr|eved from d|sks on
remote network servers
Ma|n memory ho|ds d|sk b|ocks
retr|eved from |oca| d|sks
L2 cache
(SkAM)
L1 cache ho|ds cache ||nes retr|eved
from L2 cache
CU reg|sters ho|d words retr|eved
from L1 cache
L2 cache ho|ds cache ||nes
retr|eved from ma|n memory
L0:
L1:
L2:
L3:
L4:
LS:
Sma||er,
faster,
cost||er
per byte
Carnegie Mellon
58
Caches
! !"#$%& A sma||er, faster storage dev|ce that acts as a stag|ng
area for a subset of the data |n a |arger, s|ower dev|ce.
! Iundamenta| |dea of a memory h|erarchy:
! lor each k, Lhe fasLer, smaller devlce aL level k serves as a cache for Lhe
larger, slower devlce aL level k+1.
! Why do memory h|erarch|es work?
! 8ecause of locallLy, programs Lend Lo access Lhe daLa aL level k more
oen Lhan Lhey access Lhe daLa aL level k+1.
! 1hus, Lhe sLorage aL level k+1 can be slower, and Lhus larger and
cheaper per blL.
! ()* +,%"& 1he memory h|erarchy creates a |arge poo| of
storage that costs as much as the cheap storage near the
bouom, but that serves data to programs at the rate of the
fast storage near the top.
Carnegie Mellon
59
Genera| Cache Concepts
0 1 2 3
4 S 6 7
8 9 10 11
12 13 14 1S
8 9 14 3
Cache
Memory
Larger, s|ower, cheaper memory
v|ewed as parnnoned |nto "b|ocks"
Data |s cop|ed |n b|ock-s|zed
transfer un|ts
Sma||er, faster, more expens|ve
memory caches a subset of
the b|ocks
4
4
4
10
10
10
Carnegie Mellon
60
Genera| Cache Concepts: n|t
0 1 2 3
4 S 6 7
8 9 10 11
12 13 14 1S
8 9 14 3
Cache
Memory
-"." )/ 012#3 0 )4 /%%,%, kequest: 14
14
(12#3 0 )4 )/ #"#$%&
5).6
Carnegie Mellon
61
Genera| Cache Concepts: M|ss
0 1 2 3
4 S 6 7
8 9 10 11
12 13 14 1S
8 9 14 3
Cache
Memory
-"." )/ 012#3 0 )4 /%%,%, kequest: 12
(12#3 0 )4 /2. )/ #"#$%&
7)446
(12#3 0 )4 8%.#$%, 892:
:%:29;
kequest: 12
12
12
12
(12#3 0 )4 4.29%, )/ #"#$%
lacemenL pollcy:
deLermlnes where b goes
8eplacemenL pollcy:
deLermlnes whlch block
geLs evlcLed (vlcum)
Carnegie Mellon
62
Genera| Cach|ng Concepts:
1ypes of Cache M|sses
! Co|d (compu|sory) m|ss
! Cold mlsses occur because Lhe cache ls empLy.
! Con|ct m|ss
! MosL caches llmlL blocks aL level k+1 Lo a small subseL (someumes a
slngleLon) of Lhe block posluons aL level k.
! L.g. 8lock l aL level k+1 musL be placed ln block (l mod 4) aL level k.
! ConlcL mlsses occur when Lhe level k cache ls large enough, buL muluple
daLa ob[ecLs all map Lo Lhe same level k block.
! L.g. 8eferenclng blocks 0, 8, 0, 8, 0, 8, ... would mlss every ume.
! Capac|ty m|ss
! Cccurs when Lhe seL of acuve cache blocks (worklng seL) ls larger Lhan
Lhe cache.
Carnegie Mellon
63
Lxamp|es of Cach|ng |n the n|erarchy
nardware 0 Cn-Ch|p 1L8 Address trans|anons 1L8
Web browser 10,000,000 Loca| d|sk Web pages 8rowser cache
Web cache
Network buer
cache
8uer cache
V|rtua| Memory
L2 cache
L1 cache
keg|sters
Cache 1ype
Web pages
arts of h|es
arts of h|es
4-k8 page
64-bytes b|ock
64-bytes b|ock
4-8 bytes words
What |s Cached?
Web proxy
server
1,000,000,000 kemote server d|sks
CS 100 Ma|n memory
nardware 1 Cn-Ch|p L1
nardware 10 Cn]C-Ch|p L2
AIS]NIS c||ent 10,000,000 Loca| d|sk
nardware + CS 100 Ma|n memory
Comp||er 0 CU core
Managed 8y Latency (cyc|es) Where |s |t Cached?
D|sk cache D|sk sectors D|sk contro||er 100,000 D|sk hrmware
Carnegie Mellon
64
Summary
! 1he speed gap between CU, memory and mass storage
connnues to w|den.
! We||-wr|uen programs exh|b|t a property ca||ed |oca||ty.
! Memory h|erarch|es based on cach|ng c|ose the gap by
exp|o|nng |oca||ty.

You might also like