Professional Documents
Culture Documents
Topics
class10.ppt
Tran.
per bit
Access
time
Persist? Sensitive?
Cost
Applications
SRAM
1X
Yes
No
100x
cache memories
DRAM
10X
No
Yes
1X
Main memories,
frame buffers
15-213, F02
addr
1
rows
memory
controller
supercell
(2,1)
(to CPU)
8 bits
/
data
15-213, F02
RAS = 2
2
/
addr
1
rows
memory
controller
2
8
/
data
15-213, F02
CAS = 1
2
/
addr
To CPU
1
rows
memory
controller
supercell
(2,1)
2
8
/
data
supercell
(2,1)
15-213, F02
Memory Modules
addr (row = i, col = j)
: supercell (i,j)
DRAM 0
64 MB
memory module
consisting of
eight 8Mx8 DRAMs
DRAM 7
63
56 55
48 47
40 39
32 31
24 23 16 15
8 7
bits
0-7
Memory
controller
64-bit doubleword
7
15-213, F02
Enhanced DRAMs
All enhanced DRAMs are built around the conventional
DRAM core.
signals.
signals.
15-213, F02
Nonvolatile Memories
DRAM and SRAM are volatile memories
Types of ROMs
Firmware
15-213, F02
bus interface
10
I/O
bridge
memory bus
main
memory
15-213, F02
register file
%eax
I/O bridge
bus interface
11
main memory
0
x
15-213, F02
I/O bridge
bus interface
12
main memory
0
x
15-213, F02
I/O bridge
bus interface
13
main memory
0
x
15-213, F02
I/O bridge
bus interface
14
main memory
0
A
15-213, F02
register file
%eax
I/O bridge
bus interface
15
main memory
0
A
15-213, F02
I/O bridge
bus interface
16
main memory
0
y
15-213, F02
Disk Geometry
Disks consist of platters, each with two surfaces.
Each surface consists of concentric rings called tracks.
Each track consists of sectors separated by gaps.
tracks
surface
track k
gaps
spindle
sectors
17
15-213, F02
surface 1
surface 2
platter 1
surface 3
surface 4
platter 2
surface 5
spindle
18
15-213, F02
Disk Capacity
Capacity: maximum number of bits that can be stored.
19
(# platters/disk)
Example:
512 bytes/sector
300 sectors/track (on average)
20,000 tracks/surface
2 surfaces/platter
5 platters/disk
15-213, F02
spindle
spindle
spindle
21
15-213, F02
arm
spindle
22
15-213, F02
Time waiting for first bit of target sector to pass under r/w head.
Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min
23
15-213, F02
Derived:
Important points:
24
15-213, F02
The set of available sectors is modeled as a sequence of bsized logical blocks (0, 1, 2, ...)
25
I/O Bus
CPU chip
register file
ALU
system bus
memory bus
main
memory
I/O
bridge
bus interface
I/O bus
USB
controller
mouse keyboard
26
graphics
adapter
disk
controller
monitor
disk
15-213, F02
main
memory
bus interface
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
monitor
disk
27
15-213, F02
main
memory
bus interface
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
monitor
disk
28
15-213, F02
main
memory
bus interface
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
monitor
disk
29
15-213, F02
Storage Trends
SRAM
DRAM
Disk
30
metric
1980
1985
1990
1995
2000
2000:1980
$/MB
access (ns)
19,200
300
2,900
150
320
35
256
15
100
2
190
100
metric
1980
1985
1990
1995
2000
2000:1980
$/MB
8,000
access (ns)
375
typical size(MB) 0.064
880
200
0.256
100
100
4
30
70
16
1
60
64
8,000
6
1,000
metric
1985
1990
1995
2000
2000:1980
100
75
10
8
28
160
0.30
10
1,000
0.05
8
9,000
10,000
11
9,000
1980
$/MB
500
access (ms)
87
typical size(MB) 1
15-213, F02
processor
clock rate(MHz)
cycle time(ns)
31
1980
8080
1
1,000
1985
286
6
166
1990
386
20
50
1995
Pent
150
6
2000
P-III
750
1.6
2000:1980
750
750
15-213, F02
ns
100,000
10,000
1,000
100
10
1
1980
1985
1990
1995
2000
year
32
15-213, F02
Locality
Principle of Locality:
Locality Example:
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Data
Reference array elements in succession
(stride-1 reference pattern): Spatial locality
Reference sum each iteration: Temporal locality
Instructions
Reference instructions in sequence: Spatial locality
Cycle through loop repeatedly: Temporal locality
33
15-213, F02
Locality Example
Claim: Being able to look at code and get a qualitative
sense of its locality is a key skill for a professional
programmer.
15-213, F02
Locality Example
Question: Does this function have good locality?
35
15-213, F02
Locality Example
Question: Can you permute the loops so that the
function scans the 3-d array a[] with a stride-1
reference pattern (and thus has good spatial
locality)?
int sumarray3d(int a[M][N][N])
{
int i, j, k, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
sum += a[k][i][j];
return sum
}
36
15-213, F02
Memory Hierarchies
Some fundamental and enduring properties of hardware
and software:
Fast storage technologies cost more per byte and have less
capacity.
The gap between CPU and main memory speed is widening.
Well-written programs tend to exhibit good locality.
15-213, F02
L0:
registers
L1: on-chip L1
cache (SRAM)
L2:
L3:
Larger,
slower,
and
cheaper
(per byte)
storage
devices
L5:
38
L4:
off-chip L2
cache (SRAM)
main memory
(DRAM)
Caches
Cache: A smaller, faster storage device that acts as a
staging area for a subset of the data in a larger,
slower device.
Fundamental idea of a memory hierarchy:
39
8
4
10
4
Level k+1:
40
14
10
10
11
12
13
14
15
15-213, F02
Level
k:
Cache hit
4*
12
14
12
4*
Level
k+1:
41
Request
12
14
Cache miss
Request
12
4
4*
10
11
12
13
14
15
Conflict miss
Most caches limit blocks at level k+1 to a small subset
Capacity miss
Occurs when the set of active cache blocks (working set) is
42
15-213, F02
What Cached
Where Cached
Registers
4-byte word
CPU registers
0 Compiler
TLB
Address
translations
32-byte block
32-byte block
4-KB page
On-Chip TLB
0 Hardware
On-Chip L1
Off-Chip L2
Main memory
Parts of files
Main memory
1 Hardware
10 Hardware
100 Hardware+
OS
100 OS
L1 cache
L2 cache
Virtual
Memory
Buffer cache
Local disk
Web cache
Remote server
disks
43
Web pages
Local disk
Latency
(cycles)
Managed
By
10,000,000 AFS/NFS
client
10,000,000 Web
browser
1,000,000,000 Web proxy
server
15-213, F02