Professional Documents
Culture Documents
#1 Multilevel Caches
Wide gap between CPU and MM access time might lead to a catch 22, o Fast Cache small size low number of hits. o Large Cache slow access time CPU stalls. Can we have best of both? Consider, o Level 1 Cache Match CPU cycle time, but low total memory capacity. Buffered from CPU cycle time, but high hit ratio. o Level 2 Cache
CSCI3121
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
Global (cache) Miss Rate = Number of Misses up to specified cache level Total Number of CPU memory references generated
With respect to L1 the Global Miss Rate is therefore, o Miss Rate (L1) With respect to L2 the Global Miss Rate is therefore, o Miss Rate (L1) Miss Rate (L2) Note, o L2 local Miss Rate will be poor, Why?
L2 cache policies
What size is L2. (a) Multilevel Inclusion Data in L1 present in L2 Implication, o L2 size >> L1 size; o Consistency between memory content ensured; L1, L2 block size policy? o Case of L1 block < L2 block L2 miss forces a flush of L1 blocks mapped to L2 block which is replaced. (b) Multilevel Exclusion Data in L1 never in L2 o L1 miss swap with L2 block which hits Summary o L1 designed to minimize hit time; o L2 designed to maximize hits.
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
o Benefit CPU commences execution whilst remainder of block stored in cache. o Drawback complex memory interface which may actually delay access time. Early Restart o Interface to MM Fetch words of block in order they are stored. o Method Forward word causing miss to CPU as soon as it is encountered. o Advantage simple memory interface. Note typically only of significance when block size increases.
#4 Victim Cache
Write back policy o At some point need to replace a dirty block. o Let such a block be a victim. o Small (1 to 5 blocks) victim cache inserted between cache and next memory level. o IF (hit is made on the victim cache), THEN (swap block with an alternative cache block) See figure 1.
CSCI3121
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
Figure 1: Relationship between victim cache and other forms of memory. Victim cache acts as a waste basket from which most recently discarded items can be retrieved.
Summary
Four basic policies utilized in cache penalty reduction, o More the merrier multi-level caches o Impatience forward required word the CPU before entire block is read. o Preference stall writes, but prioritize reads. o Recycling once high cost of block transfer made, attempt to retain for as long as possible.
Reducing cache miss penalty and rates Overlapping execution with memory accesses
#1 Non blocking caches stall reduction on cache misses
Out of order execution and compiler optimization for scheduling loads away from operand use imply that operands should not stall on a cache miss. Non blocking (lockup free) caches support hits on instructions issued after a cache miss. Cache controller complexity increases, o Required to track multiple misses and hits.
CSCI3121
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
Figure 2 indicates that, o FP programs benefit from supporting higher depths of hit under miss. o INT programs gain most from a single hit under miss.
Figure 2: SPEC 92 benchmark of memory stall time under different levels of hit under miss provision.
Example
A UltraSPARC III has a 64K byte data cache in which prefetching reduces the data miss rate by 20%. A prefetch hit takes one clock cycle, whereas a miss on both cache and prefetch costs 15 clock cycles. Data references (per instruction) is 22% and Table 1 details misses per instruction on different cache sizes. What is the effective miss rate of the UltraSPARC III using prefetching? How much bigger a data cache would be needed in the UltraSPARC III to match the average access time if prefetching were not available? CSCI3121 5
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
Table 1: Misses per 1000 Instructions on instruction, data and unified caches Size 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB Instruction Cache 8.16 3.82 1.36 0.61 0.30 0.02 Data Cache 44.0 40.9 38.4 36.9 35.3 32.6 Unified Cache 63.0 51.0 43.3 39.4 36.2 32.9
Summary
Out of order processors require non blocking cache operation in order to avoid stalling the cache. o Several misses are combined such that CPU operation continues whilst MM is accessed. Prefetching attempts to anticipate cache fetch requirement, but o Not good for power sensitive embedded applications.
CSCI3121
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
CSCI3121
Hennessy and Patterson, Computer Architecture a Quantitative Approach (3rd Ed) Chapter 5 Memory Hierarchy Design.
CSCI3121