Professional Documents
Culture Documents
(Study Chapter 7)
04/8/2008
miniMIPS
MADDR MDATA Wr
MEMORY
ADDR DATA R/W
Capacity
Register SRAM DRAM Hard disk* Want? * non-volatile
Comp 411 Spring 2008 04/8/2008
Latency
10 ps 0.2 ns 5 ns 10 ms 0.2 ns
Cost
$$$$ $$$ $ cheap
bit lines
Col. Col. Col. 1 2 3 Row Address Decoder Col. 2M
word lines
The first thing that should pop into you mind when asked to speed up a digital design
Row 1
PIPELINING
Synchronous DRAM (SDRAM) ($100 per Gbyte)
Double-clocked Synchronous DRAM (SDRAM)
Row 2
Row 2N
N Column Multiplexer/Shifter
t1
t2
t3 t4
D
04/8/2008
Data out
L19 Memory Hierarchy 3
Typical high-end drive: Average latency = 4 ms (7200 rpm) Average seek time = 8.5 ms Transfer rate = 300 Mbytes/s (SATA) Capacity = 250 G byte Cost = $85 (33 Gbyte)
Comp 411 Spring 2008 04/8/2008
Quantity vs Quality
Your memory system can be BIG and SLOW... or SMALL and FAST.
Weve explored a range of device-design trade-offs. $/GB
1000 100 10 1 .1 .01 10-8
Comp 411 Spring 2008
10-6
10-3
1 1
100
04/8/2008
How do we make the most effective use of each memory? We could push all of these issues off to programmers
Keep most frequently used variables and stack in SRAM Keep large data structures (arrays, lists, etc) in DRAM Keep bigger data structures on disk (databases) on DISK
It is harder than you think data usage evolves over a programs execution
Comp 411 Spring 2008 04/8/2008 L19 Memory Hierarchy 6
Wed like to have a memory system that PERFORMS like 2 GBytes of SRAM; but COSTS like 512 MBytes of slow memory.
SURPRISE: We can (nearly) get our wish! KEY: Use a hierarchy of memory technologies:
SRAM CPU
MAIN MEM
04/8/2008
Key IDEA
Keep the most often-used data in a small, fast SRAM (often local to CPU chip) Refer to Main Memory only rarely, for remaining data. The reason this strategy works: LOCALITY
Locality of Reference:
Reference to location X at time t implies that reference to location X+X at time t+t becomes more probable as X and t approach zero.
04/8/2008
Cache
cache (kash) n.
A hiding place used especially for storing provisions. A place for concealment and safekeeping, as of valuables. The store of goods or valuables concealed in a hiding place. Computer Science. A fast storage buffer in the central processing unit of a computer. In this sense, also called cache memory.
04/8/2008
Cache Analogy
You are writing a term paper at a table in the library As you work you realize you need a book You stop writing, fetch the reference, continue writing You dont immediately return the book, maybe youll need it again Soon you have a few books at your table and no longer have to fetch more books The table is a CACHE for the rest of the library
Comp 411 Spring 2008 04/8/2008 L19 Memory Hierarchy 10
stack
MEMORY TRACE A temporal sequence of memory references (addresses) from a real program.
data
program
04/8/2008
Working Set
|S| address
stack
t data S is the set of locations accessed during t.
program t
04/8/2008
MAIN MEM
CPU
CPU
Small Static
Dynamic RAM
MAIN MEMORY
HARD DISK
04/8/2008
Why We Care
CPU performance is dominated by memory performance.
More significant than: ISA, circuit optimization, pipelining, etc
HARD DISK
CPU
Dynamic RAM
MAIN MEMORY
TRICK #1: How to make slow MAIN MEMORY appear faster than it is.
Technique:
TRICK #2: How to make a small MAIN MEMORY appear bigger than it is. Technique:
Comp 411 Spring 2008
DYNAMIC RAM
"MAIN MEMORY"
"CACHE"
Cache contains TEMPORARY COPIES of selected main memory locations... eg. Mem[100] = 37
Challenge:
HIT RATIO: Fraction of refs found in CACHE. MISS RATIO: Remaining references.
t ave t c (1 )( t c t m) t c (1 )t m
2) Transparency (compatibility, programming ease)
Comp 411 Spring 2008 04/8/2008
1 0 .8 t ave t c 1 98 % 1 10 tm
04/8/2008
Cache
Sits between CPU and main memory Very fast table that stores a TAG and DATA TAG is the memory address DATA is a copy of memory at the address given by TAG
Memory Tag Data
1000 17 1004 1012 1016 1020 23 5 29 38
17 1 97 11
1008 11
1024
1028 1032 1040
Comp 411 Spring 2008 04/8/2008
44
99 97 1 4
L19 Memory Hierarchy 17
1036 25 1044
Cache Access
On load we look in the TAG entries for the address were loading Found a HIT, return the DATA Not Found a MISS, go to memory for the data and put it and the address (TAG) in the cache
Memory Tag Data
1000 17 1004 1012 1016 1020 23 5 29 38
17 1 97 11
1008 11
1024
1028 1032 1040
Comp 411 Spring 2008 04/8/2008
44
99 97 1 4
L19 Memory Hierarchy 18
1036 25 1044
Cache Lines
Usually get more data than requested (Why?) a LINE is the unit of memory stored in the cache usually much bigger than 1 word, 32 bytes per line is common bigger LINE means fewer misses because of spatial locality Memory but bigger LINE means longer time on miss
1000 17
Tag
Data
1004 1012
23 5
1008 11
17 1 97 11
23 4 25 5
1016
1020 1024 1028
29
38 44 99
1032
1040
Comp 411 Spring 2008 04/8/2008
97
1 4
L19 Memory Hierarchy 19
1036 25 1044
Data
TAG
=?
Data HIT
TAG
=?
04/8/2008
04/8/2008
Direct-Mapping Example
With 8 byte lines, the bottom 3 bits determine the byte within the line With 4 cache lines, the next 2 bits determine which line to use 1024d = 10000000000b line = 00b = 0d 1000d = 01111101000b line = 01b = 1d 1040d = 10000010000b line = 10b = 2d Memory
1000 17 1004 1012 23 5 1008 11
Tag
Data
44 17 1 29
99 23 4 38
1016
1020 1024 1028
29
38 44 99
1032
1040
Comp 411 Spring 2008 04/8/2008
97
1 4
L19 Memory Hierarchy 22
1036 25 1044
Tag
Data
1004 1012
23 5
1008 11
44 17 1 11 29
99 23 4 5 38
1016
1020 1024 1028
29
38 44 99
1032
1040
Comp 411 Spring 2008 04/8/2008
97
1 4
L19 Memory Hierarchy 23
1036 25 1044
Average CPI = 10
Suppose MISS PENALTY = 120 cycles? then CPI = 11 (slower memory doesnt hurt much)
04/8/2008
04/8/2008
04/8/2008