You are on page 1of 34

ECE 2300

Digital Logic & Computer Organization


Spring 2018

More Caches

Lecture 21: 1
Announcements
• Prelim 2 stats
– High: 79.5 (out of 80), Mean: 65.9, Median: 68

• Prelab 5(C) deadline extended to Saturday 3pm


– No further extension (aka slip days) allowed

Lecture 21: 2
Hexadecimal Notation (used in HW7)
• Often convenient to write binary (base-2)
numbers as hexadecimal (base-16) numbers
– Fewer digits: 4 bits per hex digit
– Less error prone: easy to misread long string
of 1’s and 0’s (such as memory address)

Binary Hex Decimal Binary Hex Decimal


0000 0 0 1000 8 8
0001 1 1 1001 9 9
0010 2 2 1010 A 10
0011 3 3 1011 B 11
0100 4 4 1100 C 12
0101 5 5 1101 D 13
0110 6 6 1110 E 14
0111 7 7 1111 F 15

Lecture 21: 3
Converting from Binary to Hex
• Every group of four bits is a hex digit
– Start grouping from right-hand side

0011101010001111010011010111

3 A 8 F 4 D 7
This is not a new machine representation;
just a compact way to write the number

Lecture 21: 4
Cache Basics: True or False?
• Cache is usually implemented in DRAM?

• Memory block size is larger than the cache


block size?

• Memory block address is shorter than the


memory address?

• Direct mapped cache allows a memory block to


have more than one cache location?

Lecture 21: 5
Example: DM Cache Address Breakdown
• Assuming 16-bit memory addresses, how many
bits are associated with the tag, index, and
offset of the following configurations for a direct
mapped cache?

• (a) 16 blocks, 4 bytes per block


– Byte offset: 2 bits; Index: 4 bits; Tag: 10 bits

• (b) 32 blocks, 8 bytes per block


– Byte offset: 3 bits; Index: 5 bits; Tag: 8 bits

Lecture 21: 6
Block Placement in DM Cache
• Direct mapped cache: Each memory block maps to one
cache block
– Mapping conflicts may increase miss rate

Block 0

Block 1

Block 2

Block 3

Block 4

Block 5

Block 6 Direct mapped cache


with 4 blocks
Block 7

Memory with 8 blocks Lecture 21: 7


More Flexible Block Placement
• K-way Set Associate Cache: each memory block maps
to one set, which contains K blocks
– A block can be stored anywhere in the set

Block 0

Block 1
Way 0 Way 1
Block 2
Set 0
Block 3
Set 1
Block 4
2-way set associative cache
Block 5
with 4 blocks
Block 6

Block 7

Memory with 8 blocks Lecture 21: 8


Associative Caches
• K-way set associative
– Index bits determine which set to address
– Each set contains K entries (ways)
– All ways in the selected set are searched in parallel
• K comparators (more expensive than direct mapped)

• An extreme case: Fully associative


– Block can go in any cache location
• Only one set => No need for index bits
– All entries are searched in parallel
• Comparator per entry (most expensive)

Lecture 21: 9
Address Translation for Associative Caches

• Breakdown of memory address for cache use

n-i-b tag bits i index bits


b byte offset bits

• Parameters for a K-way set associative cache


– Size of each cache block is 2b bytes
– Number of sets is 2i
– Number of blocks is K × 2i
– Total cache size is (K × 2b+i) bytes

Lecture 21: 10
4-way Set Associative Cache

Index bits address one cache set

256 sets
(4 ways per set,
1024 blocks)

All 4 ways within the selected cache set are searched in parallel
Lecture 21: 11
2-way Set Associative Example
• Size of each block is 4 bytes
• Cache holds 4 blocks, 2-way set associative
• Memory holds 16 blocks
• Memory address
V tag data V tag data
0
1

3 tag bits 2 byte offset bits 2 sets


1 index bit
2 ways

Lecture 21: 12
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 0 0 0010 120
R2 <= M[011100] 1 0 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 13
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 000 100 0 0010 120
R2 <= M[011100] 1 0 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 14
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 0 0010 120
R2 <= M[011100] 1 0 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 15
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 0 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 16
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 000 100 0 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 17
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 18
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 19
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 20
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 21
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] hit 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 22
2-way Set Associative Example
Processor Cache Memory

0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] hit 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 110 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250

Lecture 21: 23
Spectrum of Associativity
• A K-way set associative cache with N blocks
– Number of cache sets S = N / K
• Number of index bits = log2(S)
– When K = N, fully associative cache
• ONE cache set à zero index bits
– When K = 1 (one-way), direct mapped cache
• N cache sets

• Increasing the associatively


– Typically improves the hit rate (fewer conflicts)
– But increases the hit time (takes longer to search)

Lecture 21: 24
Spectrum of Associativity
• For a cache with 8 blocks

Lecture 21: 25
Exercise: Set Associate Cache
Address Breakdown
• Assuming 16-bit addresses, how many bits are
associated with the tag, index, and offset of the
following cache configuration?

• 16 blocks, 16 bytes per block, 4-way set


associative
– Byte offset: 4 bits; Index: 2 bits; Tag: 10 bits

• 16 blocks, 16 bytes per block, fully associative


– Byte offset: 4 bits; index: 0 bits; Tag: 12 bits

Lecture 21: 26
Miss Classification
• Compulsory (Cold) misses
– Caused by the first access to a memory block

• Capacity misses
– Occur because the cache might not be big enough to
hold the active set of memory blocks needed during
program execution

• Conflict misses
– Occur with a direct mapped or set-associative cache
when multiple memory blocks compete in the same
set due to the inflexibility of block placement
– Would not occur in a fully associative cache
Lecture 21: 27
Misses vs. Associativity Example
• Compare different caches
– Capacity: 4 blocks
– Direct mapped, 2-way set associative, fully associative
– Block address sequence: 0, 8, 0, 6, 8 (in decimal)

• Direct mapped
Block Cache Hit/miss Cache contents after access
address index Block 0 Block 1 Block 2 Block 3
0
8
0
6
8

Blocks
(or Sets)

Lecture 21: 28
Misses vs. Associativity Example
• Compare different caches
– Capacity: 4 blocks
– Direct mapped, 2-way set associative, fully associative
– Block address sequence: 0, 8, 0, 6, 8 (in decimal)

• Direct mapped
Block Cache Hit/miss Cache contents after access
address index Block 0 Block 1 Block 2 Block 3
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

Color code: Cold miss Conflict miss


Lecture 21: 29
Misses vs. Associativity Example
• 2-way set associative
Block Cache Hit/miss Cache contents after access
address index Set 0 Set 1
0
8
0
6
8

Ways Ways
• Fully associative
Block Hit/miss Cache contents after access
address
0
8
0
6
8

Lecture 21: 30
Misses vs. Associativity Example
• 2-way set associative
Block Cache Hit/miss Cache contents after access
address index Set 0 Set 1
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

• Fully associative
Block Hit/miss Cache contents after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]

Color code: Cold miss Conflict miss


Lecture 21: 31
Block Replacement Policy
• Direct mapped: no choice

• Set associative and fully associative


– Pick non-valid entry, if there is one
– Otherwise, choose among entries in the set

• Least recently used (LRU)


– Choose the one unused for the longest time
– Requires extra bits to order the blocks
– High overhead beyond 4-way set associative

• Random
– Similar performance as LRU for high associativity

Lecture 21: 32
LRU Replacement Example
• Fully associative (X) = LRU Age
2 bits in this case
Block Cache Hit/miss Cache contents after access
address index
0 miss Mem[0] (0)
4 miss Mem[0] (1) Mem[4] (0)
2 miss Mem[0] (2) Mem[4] (1) Mem[2] (0)
6 miss Mem[0] (3) Mem[4] (2) Mem[2] (1) Mem[6] (0)
8 miss Mem[8] (0) Mem[4] (3) Mem[2] (2) Mem[6] (1)
0 miss Mem[8] (1) Mem[0] (0) Mem[2] (3) Mem[6] (2)
4 miss Mem[8] (2) Mem[0] (1) Mem[4] (0) Mem[6] (3)
2 miss Mem[8] (3) Mem[0] (2) Mem[4] (1) Mem[2] (0)
6 miss Mem[6] (0) Mem[0] (3) Mem[4] (2) Mem[2] (1)
8 miss Mem[6] (1) Mem[8] (0) Mem[4] (3) Mem[2] (2)
2 hit Mem[6] (2) Mem[8] (1) Mem[4] (3) Mem[2] (0)
6 hit Mem[6] (0) Mem[8] (2) Mem[4] (3) Mem[2] (1)
2 hit Mem[6] (1) Mem[8] (2) Mem[4] (3) Mem[2] (0)
0 miss Mem[6] (2) Mem[8] (3) Mem[0] (0) Mem[2] (1)

Color code: Cold miss Conflict miss Capacity miss


Lecture 21: 33
Before Next Class
• H&H 7.5.5, 8.2

Next Time

More Caches
Measuring Performance

Lecture 21: 34

You might also like