Professional Documents
Culture Documents
More Caches
Lecture 21: 1
Announcements
• Prelim 2 stats
– High: 79.5 (out of 80), Mean: 65.9, Median: 68
Lecture 21: 2
Hexadecimal Notation (used in HW7)
• Often convenient to write binary (base-2)
numbers as hexadecimal (base-16) numbers
– Fewer digits: 4 bits per hex digit
– Less error prone: easy to misread long string
of 1’s and 0’s (such as memory address)
Lecture 21: 3
Converting from Binary to Hex
• Every group of four bits is a hex digit
– Start grouping from right-hand side
0011101010001111010011010111
3 A 8 F 4 D 7
This is not a new machine representation;
just a compact way to write the number
Lecture 21: 4
Cache Basics: True or False?
• Cache is usually implemented in DRAM?
Lecture 21: 5
Example: DM Cache Address Breakdown
• Assuming 16-bit memory addresses, how many
bits are associated with the tag, index, and
offset of the following configurations for a direct
mapped cache?
Lecture 21: 6
Block Placement in DM Cache
• Direct mapped cache: Each memory block maps to one
cache block
– Mapping conflicts may increase miss rate
Block 0
Block 1
Block 2
Block 3
Block 4
Block 5
Block 0
Block 1
Way 0 Way 1
Block 2
Set 0
Block 3
Set 1
Block 4
2-way set associative cache
Block 5
with 4 blocks
Block 6
Block 7
Lecture 21: 9
Address Translation for Associative Caches
Lecture 21: 10
4-way Set Associative Cache
256 sets
(4 ways per set,
1024 blocks)
All 4 ways within the selected cache set are searched in parallel
Lecture 21: 11
2-way Set Associative Example
• Size of each block is 4 bytes
• Cache holds 4 blocks, 2-way set associative
• Memory holds 16 blocks
• Memory address
V tag data V tag data
0
1
Lecture 21: 12
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 0 0 0010 120
R2 <= M[011100] 1 0 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 13
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 000 100 0 0010 120
R2 <= M[011100] 1 0 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 14
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 0 0010 120
R2 <= M[011100] 1 0 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 15
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 0 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 16
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 000 100 0 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 17
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 18
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 0 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 19
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 20
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 21
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] hit 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 22
2-way Set Associative Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data V tag data 0001 110
R2 <= M[000100]
R3 <= M[010000] hit 0 1 000 100 1 010 140 0010 120
R2 <= M[011100] 1 1 000 110 1 011 170 0011 130
R1 <= M[000000] 0100 140
R1 <= M[000100] 0101 150
0110 160
0111 170
R0 1000 180
R1 110 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 21: 23
Spectrum of Associativity
• A K-way set associative cache with N blocks
– Number of cache sets S = N / K
• Number of index bits = log2(S)
– When K = N, fully associative cache
• ONE cache set à zero index bits
– When K = 1 (one-way), direct mapped cache
• N cache sets
Lecture 21: 24
Spectrum of Associativity
• For a cache with 8 blocks
Lecture 21: 25
Exercise: Set Associate Cache
Address Breakdown
• Assuming 16-bit addresses, how many bits are
associated with the tag, index, and offset of the
following cache configuration?
Lecture 21: 26
Miss Classification
• Compulsory (Cold) misses
– Caused by the first access to a memory block
• Capacity misses
– Occur because the cache might not be big enough to
hold the active set of memory blocks needed during
program execution
• Conflict misses
– Occur with a direct mapped or set-associative cache
when multiple memory blocks compete in the same
set due to the inflexibility of block placement
– Would not occur in a fully associative cache
Lecture 21: 27
Misses vs. Associativity Example
• Compare different caches
– Capacity: 4 blocks
– Direct mapped, 2-way set associative, fully associative
– Block address sequence: 0, 8, 0, 6, 8 (in decimal)
• Direct mapped
Block Cache Hit/miss Cache contents after access
address index Block 0 Block 1 Block 2 Block 3
0
8
0
6
8
Blocks
(or Sets)
Lecture 21: 28
Misses vs. Associativity Example
• Compare different caches
– Capacity: 4 blocks
– Direct mapped, 2-way set associative, fully associative
– Block address sequence: 0, 8, 0, 6, 8 (in decimal)
• Direct mapped
Block Cache Hit/miss Cache contents after access
address index Block 0 Block 1 Block 2 Block 3
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Ways Ways
• Fully associative
Block Hit/miss Cache contents after access
address
0
8
0
6
8
Lecture 21: 30
Misses vs. Associativity Example
• 2-way set associative
Block Cache Hit/miss Cache contents after access
address index Set 0 Set 1
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
• Fully associative
Block Hit/miss Cache contents after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
• Random
– Similar performance as LRU for high associativity
Lecture 21: 32
LRU Replacement Example
• Fully associative (X) = LRU Age
2 bits in this case
Block Cache Hit/miss Cache contents after access
address index
0 miss Mem[0] (0)
4 miss Mem[0] (1) Mem[4] (0)
2 miss Mem[0] (2) Mem[4] (1) Mem[2] (0)
6 miss Mem[0] (3) Mem[4] (2) Mem[2] (1) Mem[6] (0)
8 miss Mem[8] (0) Mem[4] (3) Mem[2] (2) Mem[6] (1)
0 miss Mem[8] (1) Mem[0] (0) Mem[2] (3) Mem[6] (2)
4 miss Mem[8] (2) Mem[0] (1) Mem[4] (0) Mem[6] (3)
2 miss Mem[8] (3) Mem[0] (2) Mem[4] (1) Mem[2] (0)
6 miss Mem[6] (0) Mem[0] (3) Mem[4] (2) Mem[2] (1)
8 miss Mem[6] (1) Mem[8] (0) Mem[4] (3) Mem[2] (2)
2 hit Mem[6] (2) Mem[8] (1) Mem[4] (3) Mem[2] (0)
6 hit Mem[6] (0) Mem[8] (2) Mem[4] (3) Mem[2] (1)
2 hit Mem[6] (1) Mem[8] (2) Mem[4] (3) Mem[2] (0)
0 miss Mem[6] (2) Mem[8] (3) Mem[0] (0) Mem[2] (1)
Next Time
More Caches
Measuring Performance
Lecture 21: 34