You are on page 1of 34

TMA 1271

INTRODUCTION TO MACHINE
ARCHITECTURE

Week 9 and 10 – Lec10


Cache Memory – Principles,
Elements of Cache Design
What you are going to study?
❚ Cache Memory
❙ Typical organization
❙ Operation -overview
❙ Elements of Cache Design
❘ Mapping - Direct, Associative, Set
Associative
❘ Replacement Algorithms
❘ Write Policy
❘ Block Size
❘ Number of Caches

RK 2
Cache

❚ Small amount of fast memory


❚ Sits between normal main memory and
CPU
❚ May be located on CPU chip or module

RK 3
Cache

RK 4
Cache operation - overview

RK 5
Cache operation - overview

❚ CPU requests contents of memory location


❚ Check cache for this data
❚ If present, get from cache (fast)
❚ If not present, read required block from
main memory to cache
❚ Then deliver from cache to CPU
❚ Cache includes tags to identify which
block of main memory is in each cache
slot
❚ C<<M (Cache lines RK << Main memory 6
Typical Cache Organization

RK 7
Elements of Cache Design

❚ Size
❚ Mapping Function
❚ Replacement Algorithm
❚ Write Policy
❚ Block Size
❚ Number of Caches

RK 8
Size does matter

❚ Cost
❙ More cache is expensive
❚ Speed
❙ More cache is faster (up to a point)
❙ Larger cache-larger gates involved-slow down
❙ Checking cache for data takes time
❙ Studies show that size bet. 1K and 512 K
words-effective

RK 9
Mapping Function
❚ There are fewer cache lines than main memory
blocks, an algo. Is needed for mapping main
memory blocks into cache lines.
(Direct/Associative/Set Associative)
❚ Means needed to determine which main memory
block currently occupies a cache line.
❚ Ex: Assume Cache of 64kByte
❙ block of 4 bytes - Data transfer bet. Mem. And cache.
❘ i.e. cache is organized as 16k (214) lines of 4 bytes
each
❙ Assume 16MBytes main memory
❘ each byte directly addressable by 24 bit address
❘ (224=16M)
RK 10
Direct Mapping
❚ Each block of main memory maps to only
one cache line
❙ i.e. if a block is in cache, it must be in one
specific place
❚ Address is in two parts
❚ Least Significant w bits identify unique
word or byte within a block of main
memory
❚ Most Significant s bits specify one of 2s
memory block
❚ The MSBs are split into
RK a cache line field r11
Direct Mapping
Address Structure
Tag s-r Line or Slot r Word w
8 14 2

❚ 24 bit address
❚ 2 bit word identifier (4 byte block)
❚ 22 bit block identifier
❙ 8 bit tag (=22-14)
❙ 14 bit slot or line
❚ No two blocks in the same line have the same Tag
field
❚ Check contents of cache by finding line and
checking Tag
RK 12
Direct Mapping
Cache Line Table
❚ Cache line Main Memory blocks
assigned
❚ 0 0, m, 2m, 3m…2s-m
❚ 1 1,m+1, 2m+1…2s-m+1

❚ m-1 m-1, 2m-1,3m-1…2s-1


❚ m = 2r lines of cache = number of lines in
cache
❚ mapping i =RKj modulo m 13
Direct Mapping Cache
Organization

RK 14
Direct Mapping Example

RK 15
Direct Mapping pros & cons

❚ Simple
❚ Inexpensive
❚ Fixed location for given block
❙ If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high

RK 16
Associative Mapping

❚ A main memory block can load into any


line of cache
❚ Memory address is interpreted as tag and
word
❚ Tag uniquely identifies block of memory
❚ Every line’s tag is examined for a match
❚ Cache searching gets expensive

RK 17
Fully Associative Cache
Organization

RK 18
Associative Mapping Example

RK 19
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
❚ 22 bit tag stored with each 32 bit block of
data
❚ Compare tag field with tag entry in cache
to check for hit
❚ Least significant 2 bits of address identify
which 16 bit word is required from 32 bit
data block
❚ e.g.
❙ Address Tag
RK
Data 20
Set Associative Mapping

❚ Cache is divided into a number of sets


❚ Each set contains a number of lines
❚ A given block maps to any line in a given
set
❙ e.g. Block B can be in any line of set i
❚ e.g. 2 lines per set
❙ 2 way associative mapping
❙ A given block can be in one of 2 lines in only
one set

RK 21
Set Associative Mapping
Example

❚ 13 bit set number


❚ Block number in main memory is modulo
213
❚ 000000, 00A000, 00B000, 00C000 … map
to same set

RK 22
Two Way Set Associative
Cache Organization

RK 23
Set Associative Mapping
Address Structure

Word
Tag 9 bit Set 13 bit 2 bit

❚ Use set field to determine cache set to


look in
❚ Compare tag field to see if we have a hit
❚ e.g
❙ Address Tag Data Set
number
❙ 1FF 7FFC 1FF 12345678 1FFF
❙ 001 7FFC 001 11223344
RK 1FFF 24
Two Way Set Associative
Mapping Example

RK 25
Replacement Algorithms (1)
Direct mapping

❚ No choice
❚ Each block only maps to one line
❚ Replace that line

RK 26
Replacement Algorithms (2)
Associative & Set Associative
❚ Hardware implemented algorithm (speed)
❚ Least Recently used (LRU)
❚ e.g. in 2 way set associative
❙ Which of the 2 block is lru?
❚ First in first out (FIFO)
❙ replace block that has been in cache longest
❚ Least frequently used
❙ replace block which has had fewest hits
❚ Random
RK 27
Write Policy

❚ Must not overwrite a cache block unless


main memory is up to date
❚ Multiple CPUs may have individual caches
❚ I/O may address main memory directly

RK 28
Write through

❚ All writes go to main memory as well as


cache
❚ Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date
❚ Lots of traffic
❚ Slows down writes
❚ Remember bogus write through caches!

RK 29
Write back

❚ Updates initially made in cache only


❚ Update bit for cache slot is set when
update occurs
❚ If block is to be replaced, write to main
memory only if update bit is set
❚ Other caches get out of sync
❚ I/O must access main memory through
cache
❚ N.B. 15% of memory references are writes
RK 30
Line Size
❚ As the block size increases from very small to
larger size, the cache hit ratio will at first
increase because of principle of locality.
❚ As the block size increases, more useful data are
brought into the cache.
❚ However, the cache hit ratio will begin to
decrease
❙ Larger blocks reduce the number of blocks that fit into a
cache. Because each block fetch overwrites older cache
contents, a small number of blocks results in data being
overwritten shortly after they are fetched.
❙ As a block becomes larger, each additional word is
farther from the requested word, therefore less likely to
be needed in the near future.

RK 31
Multi-Level Caches

❚ Increases in transistor densities have


allowed for caches to be placed inside
processor chip
❚ Internal caches have very short wires
(within the chip itself) and are therefore
quite fast, even faster then any zero wait-
state memory accesses outside of the chip
❚ This means that a super fast internal
cache (level 1) can be inside of the chip
while an external cache (level 2) can
provide access faster then to main
memory RK 32
Unified versus Split Caches

❚ Split into two caches – one for instructions, one


for data
❚ Disadvantages
❙ Questionable as unified cache balances data
and instructions merely with hit rate.
❙ Hardware is simpler with unified cache
❚ Advantage
❙ What a split cache is really doing is providing
one cache for the instruction decoder and one
for the execution unit.
❙ This supports pipelined architectures.

RK 33
Problem

❚ A set associative cache consists of 64 lines,


or slots, divided into four-line sets. Main
memory contains 4K blocks of 128 words
each. Show the format of main memory
addresses.

❚ A two-way set associative cache has lines


of 16 bytes and a total size of 8 Kbytes. The
64-Mbytes main memory is byte-
addressable. Show the format of main
memory
Textbook addresses.
– Chapter 4, pp. 126
RK 34

You might also like