You are on page 1of 81

Academic Session 2016/2017

Semester 1

EMT 475/3
Computer Organization
& Architecture
Chapter 3 : Cache Memory

School of Microelectronic Engineering


Universiti Malaysia Perlis

Outline
Computer Memory System Overview
Cache Memory Principles
Cache Design
Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

Outline
Computer Memory System Overview
Cache Memory Principles
Cache Design
Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

Computer Memory System Overview


Characteristic
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation
School of Microelectronic Engineering
Universiti Malaysia Perlis

Computer Memory System Overview


Characteristic: Location
CPU; requires its own local memory in the form of
registers, e.g. PC & IR
Internal memory; often equated with main memory;
cache
External memory; consists of peripheral storage devices
e.g. disk, tape, that are accessible to the processor via
I/O controller.

PC Program Counter
IR Instruction Register

School of Microelectronic Engineering


Universiti Malaysia Perlis

Computer Memory System Overview


Characteristic: Capacity
Word size
The natural unit of organization
Typically equal to the number of bits used to represent
integer and instruction length.
Common word lengths are 8, 16 and 32 bits.
External memory capacity is typically expressed in terms
of bytes (1GB, 20MB)
Number of words
or Bytes
School of Microelectronic Engineering
Universiti Malaysia Perlis

Computer Memory System Overview


Characteristic: Unit of Transfer
Internal
Number of bits read out of or written into memory at a
time.
The unit of transfer is equal to the number of data lines of
the memory module.
Equal to the word length but is often larger, such as 64,128 or
256 bits.
External
Usually a block which is much larger than a word
Addressable unit
Smallest location which can be uniquely addressed
Word internally
School of Microelectronic Engineering
7
Universiti Malaysia Perlis
Cluster on M$ disks

Computer Memory System Overview


Characteristic: Access Method
Sequential
Start at the beginning and read through in order
Access time depends on location of data and previous
location
e.g. tape
Direct
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
e.g. disk
School of Microelectronic Engineering
Universiti Malaysia Perlis

Computer Memory System Overview


Characteristic: Access Method (cont.)
Random
Individual addresses identify locations exactly
Access time is independent of location or previous access
e.g. RAM
Associative
Data is located by a comparison with contents of a portion
of the store
Access time is independent of location or previous access
e.g. cache
School of Microelectronic Engineering
Universiti Malaysia Perlis

Computer Memory System Overview


Characteristic: Performance
Access time (latency)
Time between presenting the address and getting the
valid data; or
time it takes to perform a read or write operation
Memory Cycle time
Time may be required for the memory to recover
before next access
Cycle time is access time + additional time required
before a second access can commence
Transfer Rate
Rate at which data can be moved
School of Microelectronic Engineering
Universiti Malaysia Perlis

10

Computer Memory System Overview


Characteristic: Physical Types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD & DVD
Others
Bubble
Hologram
School of Microelectronic Engineering
Universiti Malaysia Perlis

11

Computer Memory System Overview


Characteristic: Physical Characteristics
Volatility
Volatile information decays naturally or is lost when
electrical power is switched off.
Non volatile information once recorded remains without
deterioration until deliberately changed and no electrical
electric power is needed to retain information.
e.g. magnetic-surface memory.
Erasable
Non erasable cannot be altered, except by destroying the
storage unit. e.g. read-only memory (ROM)
Power consumption
School of Microelectronic Engineering
Universiti Malaysia Perlis

12

Computer Memory System Overview


Characteristic: Organization
Smaller, more expensive, faster memories are supplemented
by larger, cheaper, slower memories.
The key to the success of this organization is decreasing
frequency of access. This concept is explained details in
discussing on
cache memory ( next slide )
virtual memory

School of Microelectronic Engineering


Universiti Malaysia Perlis

13

Computer Memory System Overview


Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
RAM
External memory
Backing store

School of Microelectronic Engineering


Universiti Malaysia Perlis

14

Computer Memory System Overview


Memory Hierarchy - Diagram
Category

Parts

Inboard Memory

Registers
Cache
Main memory

Outboard
Memory

Magnetic-disk
CD-ROM
CD-RW
DVD-RW
DVD-RAM

Off-line storage

Magnetic tape
MO
WORM

MO - Magneto-optical
WORM write once read many

School of Microelectronic Engineering


Universiti Malaysia Perlis

15

Computer Memory System Overview


Memory Hierarchy Diagram
Based on the figure, as one goes down the hierarchy the
following occur
Decreasing cost per bit
Increasing capacity
Increasing Access time
Decreasing the frequency of access of the memory by
the processor

School of Microelectronic Engineering


Universiti Malaysia Perlis

16

Computer Memory System Overview


Cost, Capacity & Access Time
There is a trade-off among the three key characteristics
of memory (cost, capacity & access time)

Faster access time, greater cost per bit


Greater capacity, smaller cost per bit
Greater capacity, slower access time

School of Microelectronic Engineering


Universiti Malaysia Perlis

17

Computer Memory System Overview


The Bottom Line
How much?
Capacity
How fast?
Time is money
How expensive?

School of Microelectronic Engineering


Universiti Malaysia Perlis

18

Computer Memory System Overview


Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
School of Microelectronic Engineering
Universiti Malaysia Perlis

19

Computer Memory System Overview


So you want fast?
It is possible to build a computer which uses only static
RAM (see later)
This would be very fast
This would need no cache
How can you cache cache?
This would cost a very large amount

School of Microelectronic Engineering


Universiti Malaysia Perlis

20

Computer Memory System Overview


Locality of Reference
During the course of the execution of a program,
memory references tend to cluster
e.g. loops
Special Locality: Reflects the tendency of the processor
to access data locations sequentially.
Temporal Locality: Reflects the tendency of the
processor to access the recently used memory locations.
School of Microelectronic Engineering
Universiti Malaysia Perlis

21

Outline
Computer Memory System Overview
Cache Memory Principles
Cache Design
Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

22

Cache Principles
Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module

School of Microelectronic Engineering


Universiti Malaysia Perlis

23

Cache Principles

School of Microelectronic Engineering


Universiti Malaysia Perlis

24

Cache Principles
An intermediate buffer between CPU and main memory.
Use to reduce the CPU waiting time during the main
memory accesses.
Without cache memory, every main memory access results in
delay in instruction processing.
Implement due to higher main memory access time
compare to processor clock period.
Cause the high speed CPUs time wasted during memory
access (instruction fetch, operand fetch or result storing)
To minimize the waiting time for the CPU, a small but fast
memory is introduced as a cache buffer between main
memory and CPU.
School of Microelectronic Engineering
Universiti Malaysia Perlis

25

Cache Principles
A portion of program and data is brought into the cache memory in
advance.
The cache controller keeps track of locations of the main memory
which are copied In cache memory
When CPU needs for instruction or operand, it receives from the
cache memory (if available) instead of accessing the main
memory.
Thus the memory access is fast because the slow main memory
appears as a fast memory.
The cache memory capacity is very small compared to main
memory but the speed is several times better than main memory.
Transfer between the CPU and cache memory usually one word at
a time.
The cache memory usually can be physically with processor IC as
internal cache, also known as on-chip cache.
School of Microelectronic Engineering
Universiti Malaysia Perlis

26

Cache Principles
Cache Level

Level

Devices Cached
Level2 cache, system RAM, Hard Disk,
Level1
DC-ROM
Level2
System RAM, Hard Disk, CD-ROM
System RAM Hard Disk, CD-ROM
Hard Disk/
-CD-ROM
School of Microelectronic Engineering
Universiti Malaysia Perlis

27

Cache Principles
More about Cache.
Disk cache, peripheral cache (e.g.: CDs, very slow initial
access time), Internet cache (software).
A 512kb level 2 cache, caching 64MB of system memory,
can supply 90-95% of info. <=1% in size.
This is due to locality of reference (large programs with
several MBs of instructions, only small portions are
loaded at a time (e.g.: Opening a word document)).
SRAM 7-20 ns, DRAM50-70 ns
Level2 256-512KB, level18-64KB
School of Microelectronic Engineering
Universiti Malaysia Perlis

28

Cache/Main Memory Structure


1. Cache consists of C-lines.
2. Each line contains K words
and a tag of a few bits.
3. The number of words in the
line referred as line size.
4. Tag- a portion of main
memory address.
5. Each line includes a tag
identifies which particular
block is currently being
stored.
School of Microelectronic Engineering
Universiti Malaysia Perlis

29

Cache Principles
Remember!

Unit
KB
MB
GB
TB

Value
210 Bytes = 1024 Bytes
220 Bytes = 1024 KB
230 Bytes = 1024 MB
240 Bytes = 1024 GB

School of Microelectronic Engineering


Universiti Malaysia Perlis

30

Cache Principles
Example:
Cache Memory
Cache of 64KBytes
Cache block of 4 bytes each.
64K/4 = 64(1024)bytes/4bytes =16K = 214 lines
i.e. cache is 16K = (214) lines of 4 bytes each

Main Memory
16MBytes main memory
(24)(220)bytes = (224) =16MBytes
24bit address
School of Microelectronic Engineering
Universiti Malaysia Perlis

31

Cache Principles
Cache Operation: Overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main memory to
cache
Then deliver from cache to CPU
Cache includes tags to identify which block of main
memory is in each cache slot
School of Microelectronic Engineering
Universiti Malaysia Perlis

32

Cache Principles
Cache Operation
Flowchart
Read operation

RA Read address

School of Microelectronic Engineering


Universiti Malaysia Perlis

33

Outline
Computer Memory System Overview
Cache Memory Principles
Cache Design
Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

34

Cache Design
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches

School of Microelectronic Engineering


Universiti Malaysia Perlis

35

Cache Design
Size
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time

School of Microelectronic Engineering


Universiti Malaysia Perlis

36

Cache Design

School of Microelectronic Engineering


Universiti Malaysia Perlis

37

Cache Design
Comparison of Cache Sizes
Processor

Type

Year of
Introduction

L1 cache

L2 cache

L3 cache

IBM 360/85

Mainframe

1968

16 -32 kB

PDP-11/70

Minicomputer

1975

1 kB

VAX 11/780

Minicomputer

1978

16 kB

IBM 3033

Mainframe

1978

64 kB

IBM 3090

Mainframe

1985

128 256 kB

Intel 80486

PC

1989

8 kB

Pentium

PC

1993

8 kB/8 kB

256 512 kB

PowerPC 601

PC

1993

32 kB

PowerPC 602

PC

1996

32 kB/32 kB

PowerPC G4

PC/Server

1999

32 kB/32 kB

256 kB to 1 MB

2 MB

IBM S/390 G6

Mainframe

1999

256 kB

8 MB

School of Microelectronic Engineering


Universiti Malaysia Perlis

38

Cache Design
Comparison of Cache Sizes
Processor

Type

Year of
Introduction

L1 cache

L2 cache

L3 cache

Pentium 4

PC/Server

2000

8 kB/8 kB

256 kB

IBM SP

High-end server/supercomputer

2000

64 kB/32 kB

8M

CRAY MTA

Supercomputer

2000

8 kB

2 MB

Itanium

PC/Server

2001

16 kB/16 kB

96 kB

4 MB

Itanium 2

PC/Server

2002

32 kB

256 kB

6 MB

IBM POWER5

High-end server

2003

64 kB

1.9MB

36 MB

CRAY XD-1

Supercomputer

2004

64 kB/64 kB

1 MB

IBM POWER6

PC/Server

2007

64 kB/64 kB

4 MB

32 MB

IBM z10

Mainframe

2008

64 kB/128 kB

3 MB

24 48 MB

Intel Core i7 EE 990

Workstation/server

2011

6 x 32 kB/32 kB

1.5 MB

12 MB

IBM zEnterprise 196

Mainframe/server

2011

24 x 64 kB/128 kB

24 x 1.5 MB

24 MB L3
192 MB L4

School of Microelectronic Engineering


Universiti Malaysia Perlis

39

Cache Design
Mapping Function
Fewer cache lines than the main memory block.
Therefore, an algorithm is needed for mapping main memory
blocks into cache lines.
Needs for determining which memory block currently
occupies a cache line Mapping function
The choice of the mapping function, determines the how the
cache is organized.
Three techniques :Direct
Associative
Set associative
School of Microelectronic Engineering
Universiti Malaysia Perlis

40

Cache Design
Direct Mapping
Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place
Address is in two parts
Least Significant w bits identify unique word
Most Significant s bits specify one memory block
The MSBs are split into a cache line field r and a tag of
s-r (most significant)
School of Microelectronic Engineering
Universiti Malaysia Perlis

41

Cache Design
Direct Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

42

Cache Design
Direct Mapping Address Structure
Tag s-r
Line or Slot r
Word w
8-bit
2-bit
14-bit
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
School of Microelectronic Engineering
Universiti Malaysia Perlis

43

Cache Design
Direct Mapping - Cache Line Table
Cache
line
0

Main Memory Block


Assigned
s

0, m, 2m, 3m 2 -m

1,m+1, 2m+1 2 -m+1

m-1

m-1, 2m-1,3m-1 2 -1
School of Microelectronic Engineering
Universiti Malaysia Perlis

44

Cache Design
Direct Mapping Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

45

Cache Design
Direct Mapping
Example

School of Microelectronic Engineering


Universiti Malaysia Perlis

46

Cache Design
Direct Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

47

Cache Design
Direct Mapping - Summary
Address length = (s + w) bits
s+w
Number of addressable units = 2 words or bytes
w
Block size = line size = 2 words or bytes
s+w
w
s
Number of blocks in main memory = 2 /2 = 2
r
Number of lines in cache = m = 2
Size of tag = (s r) bits

School of Microelectronic Engineering


Universiti Malaysia Perlis

48

Cache Design
Direct Mapping pros & cons
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the same
line repeatedly, cache misses are very high

School of Microelectronic Engineering


Universiti Malaysia Perlis

49

Cache Design
Associative Mapping
A main memory block can load into any line of cache
Memory address is interpreted as tag and word
Tag uniquely identifies block of memory
Every lines tag is examined for a match
Cache searching gets expensive

School of Microelectronic Engineering


Universiti Malaysia Perlis

50

Cache Design
Associative Mapping Address Structure
Tag

Word w

22bit

2bit

22 bit tag stored with each 32 bit block of data


Compare tag field with tag entry in cache to check for
hit
Least significant 2 bits of address identify which 16 bit
word is required from 32 bit data block
Tag
Data
Cache line
e.g. Address
FFFFFC

3FFFFF

24682468

School of Microelectronic Engineering


Universiti Malaysia Perlis

3FFF
51

Cache Design
Associative Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

52

Cache Design
Fully Associative Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

53

Cache Design
Associative Mapping
Example

Address

Tag

Data

Cache line

FFFFFC

3FFFFF

24682468

3FFF

School of Microelectronic Engineering


Universiti Malaysia Perlis

54

Cache Design
Associative
Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

55

Cache Design
Associative Mapping Summary
Address length = (s + w) bits
s+w
Number of addressable units = 2 words or bytes
w
Block size = line size = 2 words or bytes
s+w
w
s
Number of blocks in main memory = 2 / 2 = 2
Number of lines in cache = undetermined
Size of tag = s bits

School of Microelectronic Engineering


Universiti Malaysia Perlis

56

Cache Design
Set Associative Mapping
Cache is divided into a number of sets
Each set contains a number of lines
A given block maps to any line in a given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set
2 way associative mapping
A given block can be in one of 2 lines in only one set
School of Microelectronic Engineering
Universiti Malaysia Perlis

57

Cache Design
Set Associative
Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

58

Cache Design
Set Associative
Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

59

Cache Design
Set Associative Mapping Example
13 bit set number
Block number in main memory is modulo 213
000000, 00A000, 00B000, 00C000 map to same set

School of Microelectronic Engineering


Universiti Malaysia Perlis

60

Cache Design
Set Associative Mapping Address Structure
Tag
Set
9bit
13bit
Use set field to determine cache set to look in
Compare tag field to see if we have a hit
e.g.
Address
167FFC
FFFFFC

Tag
02C
1FF

Data
12345678
11223344

School of Microelectronic Engineering


Universiti Malaysia Perlis

Word
2bit

Set Number
1FFF
1FFF
61

Cache Design
Two Way
Set Associative
Cache
Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

62

Cache Design
Two Way
Set Associative
Mapping Example

School of Microelectronic Engineering


Universiti Malaysia Perlis

63

Cache Design
Set Associative
Mapping

School of Microelectronic Engineering


Universiti Malaysia Perlis

64

Cache Design
Set Associative Mapping Summary
Address length = (s + w) bits
s+w
Number of addressable units = 2 words or bytes
w
Block size = line size = 2 words or bytes
d
Number of blocks in main memory = 2
Number of lines in set = k
d
Number of sets,v = 2
d
Number of lines in cache, m = k v = k 2
Size of tag = (s d) bits
School of Microelectronic Engineering
Universiti Malaysia Perlis

65

Cache Design
Comparison

Cache Type

Hit Ratio

Search Speed

Direct Mapping

Good

Best

Fully Associative
Mapping

Best

Moderate

Very Good,
k-way SetBetter as k
Associative Mapping
increases

Good, Worse as k
increases

School of Microelectronic Engineering


Universiti Malaysia Perlis

66

Cache Design
Replacement Algorithms (1): Direct mapping
No choice
Each block only maps to one line
Replace that line

School of Microelectronic Engineering


Universiti Malaysia Perlis

67

Cache Design
Replacement Algorithms (2): Associative & Set Associative
Hardware implemented algorithm (speed)
Least Recently used (LRU)
e.g. in 2 way set associative
Which of the 2 block is LRU?
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used
replace block which has had fewest hits
Random
68
School of Microelectronic Engineering
Universiti Malaysia Perlis

Cache Design
Write Policy
Must not overwrite a cache block unless main memory is
up to date
Multiple CPUs may have individual caches
I/O may address main memory directly

School of Microelectronic Engineering


Universiti Malaysia Perlis

69

Cache Design
Write through
All writes go to main memory as well as cache
Multiple CPUs can monitor main memory traffic to keep
local (to CPU) cache up to date
Lots of traffic
Slows down writes

School of Microelectronic Engineering


Universiti Malaysia Perlis

70

Cache Design
Write back
Updates initially made in cache only
Update bit for cache slot is set when update occurs
If block is to be replaced, write to main memory only if
update bit is set
Other caches get out of sync
I/O must access main memory through cache
N.B. 15% of memory references are writes
School of Microelectronic Engineering
Universiti Malaysia Perlis

71

Outline
Computer Memory System Overview
Cache Memory Principles
Cache Design
Cache Organization

School of Microelectronic Engineering


Universiti Malaysia Perlis

72

Cache Organization
Pentium 4 Cache
Pentium 4
L1 caches
80386 no on chip cache
8k bytes
80486 8k using 16 byte lines
64 byte lines
and four way set associative
organization
four way set associative
Pentium (all versions) two L2 cache
on chip L1 caches
Feeding both L1 caches
Data & instructions
256k
128 byte lines
Pentium III L3 cache added
off chip
8 way set associative
L3 cache on chip
School of Microelectronic Engineering
Universiti Malaysia Perlis

73

Cache Organization
Intel Cache Evolution
Problem

Solution

Processor on which
feature first appears

External memory slower than the system bus.

Add external cache using faster memory technology.

386

Increased processor speed results in external bus


becoming a bottleneck for cache access.

Move external cache on-chip, operating at the same speed as the


processor.

486

Internal cache is rather small, due to limited


space on chip

Add external L2 cache using faster technology than main memory

486

Contention occurs when both the Instruction


Prefetcher and the Execution Unit simultaneously
require access to the cache. In that case, the
Prefetcher is stalled while the Execution Units
data access takes place.

Create separate data and instruction caches.

Increased processor speed results in external bus


becoming a bottleneck for L2 cache access.
Some applications deal with massive databases
and must have rapid access to large amounts of
data. The on-chip caches are too small.

Create separate back-side bus that runs at higher speed than the
main (front-side) external bus. The BSB is dedicated to the L2 cache.

Pentium

Pentium Pro

Move L2 cache on to the processor chip.

Pentium II

Add external L3 cache.

Pentium III

Move L3 cache on-chip.

Pentium 4

School of Microelectronic Engineering


Universiti Malaysia Perlis

74

Cache Organization
Pentium 4 Block Diagram

School of Microelectronic Engineering


Universiti Malaysia Perlis

75

Cache Organization
Pentium 4 Core Processor
dependence and resources
Fetch/Decode Unit
May speculatively execute
Fetches instructions from
L2 cache
Execution units
Decode into micro-ops
Execute micro-ops
Store micro-ops in L1 cache Data from L1 cache
Out of order execution logic
Results in registers
Schedules micro-ops
Memory subsystem
Based on data
L2 cache and systems bus
School of Microelectronic Engineering
Universiti Malaysia Perlis

76

Cache Organization
Pentium 4 Design Reasoning
Decodes instructions into RISC
like micro-ops before L1 cache
Micro-ops fixed length
Superscalar pipelining and
scheduling
Pentium instructions long &
complex
Performance improved by
separating decoding from
scheduling & pipelining
(More later ch14)
Data cache is write back
Can be configured to write

through
L1 cache controlled by 2 bits in
register
CD = cache disable
NW = not write through
2 instructions to invalidate
(flush) cache and write back
then invalidate
L2 and L3 8-way set-associative
Line size 128 bytes

School of Microelectronic Engineering


Universiti Malaysia Perlis

77

Cache Organization
PowerPC Cache Organization
64kb L1 cache
601 single 32kb 8 way set
associative
8 way set associative
256k, 512k or 1M L2 cache
603 16kb (2 x 8kb) two way
set associative
two way set associative
604 32kb
G5
32kB instruction cache
620 64kb
64kB data cache
G3 & G4

School of Microelectronic Engineering


Universiti Malaysia Perlis

78

Cache Organization
PowerPC G5 Block Diagram

School of Microelectronic Engineering


Universiti Malaysia Perlis

79

Internet Sources
Manufacturer sites
Intel
IBM/Motorola
Search on cache

School of Microelectronic Engineering


Universiti Malaysia Perlis

80

Finish!
Q&A

School of Microelectronic Engineering


Universiti Malaysia Perlis

81

You might also like