You are on page 1of 53

Memory system

k MAR n MDR

2k =n

R /W (,MFC, )

5.1

Basic Computer Organization Revisited


Memory I/O Data

Processor GeneralPurpose Registers Control Logic MAR ALUs MDR PC Program

Access time vs cycle time


Memory access time


A measurement of single access A measurement of how quickly two back-to-back accesses of a memory chip can be made

Memory cycle time Cycle time > access time due to latency between successive memory accesses DRAM (For construct Main memory)

access time - 50 to 150 nanoseconds require a pause (refresh) between back-to-back accesses access time - 10 nanoseconds no pause between back-to-back accesses

SRAM (For construct Cache memory)


word line
b7 W0 b7 b1 b 1 b0 b0

Bit line


FF FF

A0 A1 A2 A3

W1

W15

4+2+8=14
/

Sense/Write

Sense/Write

Sense/Write

R /W CS

b 7

b 1

b 0

5.2

RAS: Row address strobe


W0 W1 5 W31 32*32

Sense/Write

10

32to1 5

R/W CS

CAS: Column address strobe

5.3

1K*1
6

SRAM

SRAM
Static

Random Access Memory Read/write very fast Needs 6 transistors thus high cost and needs more area Do not need to refresh Low power consumption Implementation technology

CMOS

Construct

cache memory

V supply

T3 T1 X T5

T4 T2 Y T6

5.5

CMOS
8

DRAM

DRAM
Dynamic

Random Access Memory Needs 1 transistor and 1 capacitor Lower cost and compact Each bit must be refreshed periodically Implementation technology

CMOS

Construct

Main Memory

T C

5.6

RAM

10

Asynchronous DRAM
RAS

4096*(512*8)

A 20 9 A 8

Sense/Write

CS R /W

CAS

D7

D0

5.7

2M*8
11

Fast Page Mode


conventional DRAM requires that a row and column be sent for each access FPM works by sending the row address just once for many accesses to memory in locations near each other, improving access time. That is Row address is decoded once with varied Column address decoded to access different bytes on the same row.( page 5-54 5.1)

12

Extended Data Out (EDO) DRAM


EDO DRAM also called hyper page mode DRAM EDO memory has had its timing circuits modified so one access to the memory can begin before the last one has finished (note: conventional DRAM needs some delay between two consecutive accesses)

13

Synchronous DRAM(SDRAM)

Read/Write

Clock R AS CAS R/ W CS

5.8

14

SDRAM

Support burst operation Auto Column Address increment, that is do not need external CAS cycle time to select column address Interleaving memory
contains two banks of memory internally instead of one This allows the second bank to be "precharging" (RAS and CAS activation) while the first bank is transferring data

Will replace older DRAM technologies

15

DDR SDRAM

Double data rate SDRAM


Access data both as rising and falling edge of clock Thus doubles the bandwidth of the memory by transfering data twice per clock

Standard SDRAM takes action only at rising edge of clock DDR II

running at 1/2 clock frequency of the I/O buffers

DDR : 100MHz driven clock -> 100MHz data buffers -> DDR applied -> 200MHz final data frequency DDR-II: 100MHz driven clock -> 200MHz data buffers -> DDR applied -> 400MHz final data frequency

16

SIMM vs DIMM

SIMM
Single

In-line Memory Modules 30 pins (8 bit bus version) 72 pins (wider bus, more address lines)

DIMM
Dual

In-line Memory Modules 168 pins


17

RAMBUS

RAMBUS Company Make a single chip act more like a memory system than a memory componet Each chip has interleaved memory and high-speed interface RDRAM (1st generation)

Drop RAS/CAS, replacing it with a bus that allows other accesses over the bus between the sending of the address and return of the data. Run at 300 MHz clock Direct RDRAM Separate row- and column-command buses instead of the conventional multiplexing Run at 400 MHz clock 16 RDRAM

DRDRAM (2nd generation)


RIMM

18

Other memory
ROM PROM EPROM EEPROM Flash

Low

power consumption Portable system such as PDA, mobile phone, digital camera, MP3
19


L1

Memory hierarchy

L2

5.13

20

Memory hierarchy

Level 1 Registers <1KB 0.25-0.5 ns 20,000-100,000 MB/sec Managed by compiler Level 2 Cache <16MB 0.5-25 ns 5000-10000 MB/sec Managed by hardware Level 3 Main memory <16GB 80-250 ns 1000-5000 MB/sec Managed by OS Level 4 Disk storage >100GB 5000000 ns 20-150 MB/sec Managed by OS/operator

21

Cache Terms

Locality of reference

Temporal spatial

Cache block (cache line) Replacement algorithm Read/write hit/miss Write-through Write-back (copy-back)

Dirty bit/modified bit The valid bit is set every time a row is loaded into the cache by a cache miss, and can only be reset by the flush line

Valid bit

22

Cache mapping functions


Direct mapping( ) Fully associative mapping( ) Set associative mapping ( )

N-way

associative mapping

23

Direct mapping( )

Block0 Block1

tag tag Block0 Block1

Block127 Block128 Block129

tag

Block127

Block255 Block256 Block257

Block4095 5 7 4

5.15

24

Fully associative mapping( )


Block0 Block1 tag tag Block0 Block1 Block i tag Block127

Block4095 12 4

5.16

25

Set associative mapping ( )


Block0 Block1 Set0 Set1 tag tag tag tag Block0 Block1 Block2 Block3 Block63 Block64 Block65

Set63

tag tag

Block126 Block127

Block127 Block128 Block129

Block4095 T 6 6 4

5.17

2
26

Replacement algorithm

LRU
Least

recently used

Random

First in First out (FIFO)



27

68040 cache

4K Data cache 4K Instruction cache Contains 64 set Every set contains 4 blocks
4-way

associative mapping

1 cache block contains 4 long words 1 valid bit for cache block 1 dirty bit for long word Write-back/write-through Random replacement
28

0 0 0 0

0 0 0 0

0 0 1 0

22 1 1 1 1

1 1 0 0

1 0 0 0

6 0 0 0 0

4 1 0 0 0

000BF2 No M iss = 1 Hi t = 0

00

8 0CA020 v v v v d d d d d 2 000BF2 d d 3 d 0 1 0

=?

=? Yes Mi ss = 0 Hi t = 1 v v v v

d 0 d d 1 d d 2 d d 3 d 63

5.23 68040
29

ARM710T cache

Only one cache for both data and instructions 4 KB cache 64 sets 1 set contains 4 blocks
4-way

associative mapping

1 cache block contains 4 words(32bits)=16bytes Write-through Random replacement


30

Pentium III cache

L1 cache

16KB data cache


4-way Write-back or write-through 2-way No write strategy due to pure code 512KB 4-way Write-back or write-through

16KB instruction cache


L2 cache

Coppermine

L2 cache built in CPU


256KB 8-way

31

Pentium 4 cache

L1 cache

8KB data cache


4-way block contains 64 bytes Write-through

L2 cache

within CPU 256KB 8-way Block contains 128 bytes Write-back Server-based CPU

L3 cache

32

L1

L1

L2

5.24 Pentium III


33

34

35

Memory Performance
Every memory module has address buffer register (ABR) and data buffer register (DBR) Single module continuous words Continuous module continuous words

Interleaved

memory CPU reference to continuous memory accesses multiple module concurrently (lower bits select modules)

36

Caculate miss penalty


See p5-54 ~p5-59 examples Tave = hC + (1-h)M , where h: hit rate, M: miss penalty, C: access time for cache Tave =h1C1+(1-h1)h2C2+(1-h1)(1-h2)M,

h1 hit rate for L1 cache C1 access time for L1 cache h2 hit rate for L2 cache C2 access time for L2 cache M access time for main memory Note: if h1=h2=0.9 then miss penalty=(1-9)(1-.9)=1% This means if we use two level cache with 0.9 hit rate then the penalty for main memory will less than 1% memory access

37

Other methods to reduce miss penalty

Write buffer (improvement for write-through)


Built

in CPU Write to write buffer rather than to memory, thus CPU doesnt need to wait memory write

Prefetch
Compiler Allowing

inserts prefetch instructions (via analyzing codes)

Lockup-free
the data cache to continue to supply cache hits during a miss Helpful for processor that supports out-of-order completion (eg. Via Tomasulos Algorithm)

38

Virtual memory

Virtual address (logical address) MMU (built in CPU) Physical address Page table (in Main Memory) Page frame Address translation TLB

Cache built within CPU for holding translated address just used

Page fault Replacement algorithm

LRU

39

MMU DMA

5.26
40

entry

Valid bit Dirty bit Access right of the program to the page

5.27

byte

41

Intel IA-32 Processors Memory management

42

Intel IA-32 Page Translation

The entries in the page directory point to page tables, and the entries in a page table point to pages in physical memory. This paging method can be used to address up to 220 pages, which spans a linear address space of 232 bytes (4 GBytes).

43

To select the various table entries, the linear address is divided into three sections: Page-directory entryBits 22 through 31 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a page table. Page-table entryBits 12 through 21 of the linear address provide an offset to an entry in the selected page table. This entry provides the base physical address of a page in physical memory. Page offsetBits 0 through 11 provides an offset to a physical address in the page.

44

TLB

No Yes Miss

=?

Hit

CPU

5.28 TLB
45

3 , n

0 , 1 0 , 0

5.30

46

Disk Access Time


Seek time Rotation time (latency time) Transfer time

47

5.31

48

RAID Array of Inexpensive Disk Redundant


RAID0 : data stripping, no redundancy,Level 0 stripes data at block level RAID1 : mirroring (shadowing) RAID01(RAID0+1): mirrored stripes RAID2 :Error-Correcting Coding with hamming code Not a typical implementation and rarely used, Level 2 stripes data at the bit level rather than the block level. RAID3:Bit-Interleaved Parity Provides byte-level striping with a dedicated parity disk. Level 3, which cannot service simultaneous multiple requests, also is rarely used. RAID4:Dedicated Parity Drive. A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks. RAID5:Block Interleaved Distributed Parity, Provides data striping at the byte level and also stripe error correction information. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID.
49

Compact Disc (CD)


CD-ROM 1X : 150KB/sec CD-ROM 40X:150 x 40 = 6MB/sec DVD (Digital Versatile Disk) DVD + R

is a non-rewritable format and it is compatible with about 89%of all DVD Players and most DVD-ROMs has some "better" features than DVD-R/W such as lossless linking and both CAV and CLV writing. is a non-rewriteable format and it is compatible with about 93% of all DVD Players and most DVD-ROMs. was the first DVD recording format released that was compatible with standalone DVD Players.

DVD+R/W

DVD R

DVD-R/W

50

DVD Sizes

DVD-5, holds around 4 700 000 000 bytes and that is 4.37 computer GB where 1 kbyte is 1024 bytes* . DVD+R/W and DVD-R/W supports this format. Also called Single Sided Single Layered. This is the most common DVD Media, often called 4.7 GB Media. DVD-10, holds around 9 400 000 000 bytes and that is 8.75 computer GB. DVD+R/W and DVD-R/W supports this format. Also called Double Sided Single Layered. DVD-9, holds around 8 540 000 000 bytes and that is 7.95 computer GB. DVD+R supports this format. Also called Single Sided Dual Layered. This media is called DVD+R9, DVD+R DL or 8.5 GB Media. DVD-18, holds around 17 080 000 000 bytes and that is 15.9 computer GB. DVD+R supports this format. Also called Double Sided Dual Layered.

51

DVD Write and read speeds

Single Layer(4.7GB) write speeds 1x (CLV) = about 58 minutes 2x (CLV) = about 29 minutes 2.4x (CLV) = about 24 minutes 4x (CLV) = about 14.5 minutes 6x (CLV/ZCLV) = about 10-12 minutes 8x (PCAV/ZCLV) = about 8-10 minutes 12x (PCAV/ZCLV) = about 6.5-7.5 minutes 16x (CAV/ZCLV) = about 6-7 minutes Dual/Double Layer(8.5GB) write speeds 1x CLV = about 105 minutes 2.4x CLV = about 44 minutes 4x CLV = about 27 minutes Single Layer (4.7GB) read speeds 6x CAV (avg. ~4x) read speed is max 7.93MB/s = ~14 minutes 8x CAV (avg. ~6x) read speed is max 10.57MB/s = ~10 minutes 12x CAV (avg. ~8x) read speed is max 15.85MB/s = ~7 minutes 16x CAV (avg. ~12x) read speed is max 21.13MB/s = ~5 minutes

52

7 9

5.33

53

You might also like