Professional Documents
Culture Documents
k MAR n MDR
2k =n
R /W (,MFC, )
5.1
A measurement of single access A measurement of how quickly two back-to-back accesses of a memory chip can be made
Memory cycle time Cycle time > access time due to latency between successive memory accesses DRAM (For construct Main memory)
access time - 50 to 150 nanoseconds require a pause (refresh) between back-to-back accesses access time - 10 nanoseconds no pause between back-to-back accesses
word line
b7 W0 b7 b1 b 1 b0 b0
Bit line
FF FF
A0 A1 A2 A3
W1
W15
4+2+8=14
/
Sense/Write
Sense/Write
Sense/Write
R /W CS
b 7
b 1
b 0
5.2
Sense/Write
10
32to1 5
R/W CS
5.3
1K*1
6
SRAM
SRAM
Static
Random Access Memory Read/write very fast Needs 6 transistors thus high cost and needs more area Do not need to refresh Low power consumption Implementation technology
CMOS
Construct
cache memory
V supply
T3 T1 X T5
T4 T2 Y T6
5.5
CMOS
8
DRAM
DRAM
Dynamic
Random Access Memory Needs 1 transistor and 1 capacitor Lower cost and compact Each bit must be refreshed periodically Implementation technology
CMOS
Construct
Main Memory
T C
5.6
RAM
10
Asynchronous DRAM
RAS
4096*(512*8)
A 20 9 A 8
Sense/Write
CS R /W
CAS
D7
D0
5.7
2M*8
11
12
13
Synchronous DRAM(SDRAM)
Read/Write
Clock R AS CAS R/ W CS
5.8
14
SDRAM
Support burst operation Auto Column Address increment, that is do not need external CAS cycle time to select column address Interleaving memory
contains two banks of memory internally instead of one This allows the second bank to be "precharging" (RAS and CAS activation) while the first bank is transferring data
15
DDR SDRAM
Access data both as rising and falling edge of clock Thus doubles the bandwidth of the memory by transfering data twice per clock
DDR : 100MHz driven clock -> 100MHz data buffers -> DDR applied -> 200MHz final data frequency DDR-II: 100MHz driven clock -> 200MHz data buffers -> DDR applied -> 400MHz final data frequency
16
SIMM vs DIMM
SIMM
Single
In-line Memory Modules 30 pins (8 bit bus version) 72 pins (wider bus, more address lines)
DIMM
Dual
RAMBUS
RAMBUS Company Make a single chip act more like a memory system than a memory componet Each chip has interleaved memory and high-speed interface RDRAM (1st generation)
Drop RAS/CAS, replacing it with a bus that allows other accesses over the bus between the sending of the address and return of the data. Run at 300 MHz clock Direct RDRAM Separate row- and column-command buses instead of the conventional multiplexing Run at 400 MHz clock 16 RDRAM
RIMM
18
Other memory
ROM PROM EPROM EEPROM Flash
Low
power consumption Portable system such as PDA, mobile phone, digital camera, MP3
19
L1
Memory hierarchy
L2
5.13
20
Memory hierarchy
Level 1 Registers <1KB 0.25-0.5 ns 20,000-100,000 MB/sec Managed by compiler Level 2 Cache <16MB 0.5-25 ns 5000-10000 MB/sec Managed by hardware Level 3 Main memory <16GB 80-250 ns 1000-5000 MB/sec Managed by OS Level 4 Disk storage >100GB 5000000 ns 20-150 MB/sec Managed by OS/operator
21
Cache Terms
Locality of reference
Temporal spatial
Cache block (cache line) Replacement algorithm Read/write hit/miss Write-through Write-back (copy-back)
Dirty bit/modified bit The valid bit is set every time a row is loaded into the cache by a cache miss, and can only be reset by the flush line
Valid bit
22
N-way
associative mapping
23
Direct mapping( )
Block0 Block1
tag
Block127
Block4095 5 7 4
5.15
24
Block4095 12 4
5.16
25
Set63
tag tag
Block126 Block127
Block4095 T 6 6 4
5.17
2
26
Replacement algorithm
LRU
Least
recently used
Random
68040 cache
4K Data cache 4K Instruction cache Contains 64 set Every set contains 4 blocks
4-way
associative mapping
1 cache block contains 4 long words 1 valid bit for cache block 1 dirty bit for long word Write-back/write-through Random replacement
28
0 0 0 0
0 0 0 0
0 0 1 0
22 1 1 1 1
1 1 0 0
1 0 0 0
6 0 0 0 0
4 1 0 0 0
000BF2 No M iss = 1 Hi t = 0
00
8 0CA020 v v v v d d d d d 2 000BF2 d d 3 d 0 1 0
=?
=? Yes Mi ss = 0 Hi t = 1 v v v v
d 0 d d 1 d d 2 d d 3 d 63
5.23 68040
29
ARM710T cache
Only one cache for both data and instructions 4 KB cache 64 sets 1 set contains 4 blocks
4-way
associative mapping
L1 cache
4-way Write-back or write-through 2-way No write strategy due to pure code 512KB 4-way Write-back or write-through
L2 cache
Coppermine
256KB 8-way
31
Pentium 4 cache
L1 cache
L2 cache
within CPU 256KB 8-way Block contains 128 bytes Write-back Server-based CPU
L3 cache
32
L1
L1
L2
34
35
Memory Performance
Every memory module has address buffer register (ABR) and data buffer register (DBR) Single module continuous words Continuous module continuous words
Interleaved
memory CPU reference to continuous memory accesses multiple module concurrently (lower bits select modules)
36
See p5-54 ~p5-59 examples Tave = hC + (1-h)M , where h: hit rate, M: miss penalty, C: access time for cache Tave =h1C1+(1-h1)h2C2+(1-h1)(1-h2)M,
h1 hit rate for L1 cache C1 access time for L1 cache h2 hit rate for L2 cache C2 access time for L2 cache M access time for main memory Note: if h1=h2=0.9 then miss penalty=(1-9)(1-.9)=1% This means if we use two level cache with 0.9 hit rate then the penalty for main memory will less than 1% memory access
37
in CPU Write to write buffer rather than to memory, thus CPU doesnt need to wait memory write
Prefetch
Compiler Allowing
Lockup-free
the data cache to continue to supply cache hits during a miss Helpful for processor that supports out-of-order completion (eg. Via Tomasulos Algorithm)
38
Virtual memory
Virtual address (logical address) MMU (built in CPU) Physical address Page table (in Main Memory) Page frame Address translation TLB
Cache built within CPU for holding translated address just used
LRU
39
MMU DMA
5.26
40
entry
Valid bit Dirty bit Access right of the program to the page
5.27
byte
41
42
The entries in the page directory point to page tables, and the entries in a page table point to pages in physical memory. This paging method can be used to address up to 220 pages, which spans a linear address space of 232 bytes (4 GBytes).
43
To select the various table entries, the linear address is divided into three sections: Page-directory entryBits 22 through 31 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a page table. Page-table entryBits 12 through 21 of the linear address provide an offset to an entry in the selected page table. This entry provides the base physical address of a page in physical memory. Page offsetBits 0 through 11 provides an offset to a physical address in the page.
44
TLB
No Yes Miss
=?
Hit
CPU
5.28 TLB
45
3 , n
0 , 1 0 , 0
5.30
46
47
5.31
48
RAID0 : data stripping, no redundancy,Level 0 stripes data at block level RAID1 : mirroring (shadowing) RAID01(RAID0+1): mirrored stripes RAID2 :Error-Correcting Coding with hamming code Not a typical implementation and rarely used, Level 2 stripes data at the bit level rather than the block level. RAID3:Bit-Interleaved Parity Provides byte-level striping with a dedicated parity disk. Level 3, which cannot service simultaneous multiple requests, also is rarely used. RAID4:Dedicated Parity Drive. A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks. RAID5:Block Interleaved Distributed Parity, Provides data striping at the byte level and also stripe error correction information. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID.
49
CD-ROM 1X : 150KB/sec CD-ROM 40X:150 x 40 = 6MB/sec DVD (Digital Versatile Disk) DVD + R
is a non-rewritable format and it is compatible with about 89%of all DVD Players and most DVD-ROMs has some "better" features than DVD-R/W such as lossless linking and both CAV and CLV writing. is a non-rewriteable format and it is compatible with about 93% of all DVD Players and most DVD-ROMs. was the first DVD recording format released that was compatible with standalone DVD Players.
DVD+R/W
DVD R
DVD-R/W
50
DVD Sizes
DVD-5, holds around 4 700 000 000 bytes and that is 4.37 computer GB where 1 kbyte is 1024 bytes* . DVD+R/W and DVD-R/W supports this format. Also called Single Sided Single Layered. This is the most common DVD Media, often called 4.7 GB Media. DVD-10, holds around 9 400 000 000 bytes and that is 8.75 computer GB. DVD+R/W and DVD-R/W supports this format. Also called Double Sided Single Layered. DVD-9, holds around 8 540 000 000 bytes and that is 7.95 computer GB. DVD+R supports this format. Also called Single Sided Dual Layered. This media is called DVD+R9, DVD+R DL or 8.5 GB Media. DVD-18, holds around 17 080 000 000 bytes and that is 15.9 computer GB. DVD+R supports this format. Also called Double Sided Dual Layered.
51
Single Layer(4.7GB) write speeds 1x (CLV) = about 58 minutes 2x (CLV) = about 29 minutes 2.4x (CLV) = about 24 minutes 4x (CLV) = about 14.5 minutes 6x (CLV/ZCLV) = about 10-12 minutes 8x (PCAV/ZCLV) = about 8-10 minutes 12x (PCAV/ZCLV) = about 6.5-7.5 minutes 16x (CAV/ZCLV) = about 6-7 minutes Dual/Double Layer(8.5GB) write speeds 1x CLV = about 105 minutes 2.4x CLV = about 44 minutes 4x CLV = about 27 minutes Single Layer (4.7GB) read speeds 6x CAV (avg. ~4x) read speed is max 7.93MB/s = ~14 minutes 8x CAV (avg. ~6x) read speed is max 10.57MB/s = ~10 minutes 12x CAV (avg. ~8x) read speed is max 15.85MB/s = ~7 minutes 16x CAV (avg. ~12x) read speed is max 21.13MB/s = ~5 minutes
52
7 9
5.33
53