A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to accessmemory. Asmall but fast cachememory can be placed betweenthemainmemoryandtheCPU. Thecachememory is searched first, andthereferencedwordisaccessedinthecacheif thewordispresent.
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to accessmemory. Asmall but fast cachememory can be placed betweenthemainmemoryandtheCPU. Thecachememory is searched first, andthereferencedwordisaccessedinthecacheif thewordispresent.
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to accessmemory. Asmall but fast cachememory can be placed betweenthemainmemoryandtheCPU. Thecachememory is searched first, andthereferencedwordisaccessedinthecacheif thewordispresent.
5.3 Cache Memory 1 Cache memory A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to accessmemory. Thecacheisasmaller, faster memorywhichstorescopiesof the data from the most frequently used main memorylocations. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory 2 3 Cachearegenerallythetoplevel (memoryhierarchy) andare almost alwaysconstructedout of SRAM. The main structural difference between acache and other levelsinmemoryhierarchyisthat cachescontainhardwareto track thememoryaddressesthat arecontainedinthecache andmovedataintoandout of thecachenecessary. [1] Locality principle Most of memory references in an executing program are made to a small number of locations. Temporal locality When a program references a memory locations, it is likely to reference that same memory again soon.[2] 4 Asmall but fast cachememory, inwhichthecontentsof themost commonly accessed locations are maintained, can be placed betweenthemainmemoryandtheCPU. When theprogramexecutes, thecaches memory is searched first, andthereferencedwordisaccessedinthecacheif theword ispresent. If thereferencedwordisnot inthecache, thenfreelocationis createdinthecacheandthereferencedwordisbrought intothe cachefromthemainmemory. Althoughthisprocesstakeslonger thanaccessingmainmemory directly, the overall performance can be improved if a high proportionof memory accessesaresatisfiedbythecache, asis normallycase. 5 In principle, alarge cache memory is desirable so that alarge number of memoryreferencescanbesatisfiedbythecache. Unfortunately, alargecacheisslower andmoreexpensivethana small cache. Acompromisethat workswell inpracticeistofirst makeacache memorythat isclosetotheCPUasfast asit canbewithinagiven size constraint, and then to compensate for not being large enough, addanother cachethat iscloser tothemainmemory. Modern memory systems may have several levels of cache, referredtoasLevel 1(L1), Level 2(L2) andeven, insamecases, Level 3(L3), with the L1 cache being closest to the CPU and organizedfor speed, andtheL3cachebeingthefarthest fromthe CPU (and closest to the main memory) and being organized primarilyto satisfymemoryreferencesmissedbytheL1andL2 caches. 6 A cache memory is faster than main memory for a number of reason. Cache is small, however , this increase in cost is relatively small. A cache memory has fewer location than a main memory, which reduce the access time. The cache is placed both physically closer and logically closer to the CPU than the main memory , and this placement avoids communication delays. 7 Placement Of Cache Memory in a Computer Figure 1 Asimplecomputer without cacheisshownintheleft sideof theFigure1. Thiscachelesscomputer containsaCPU that has a clock speed of 3.8GHz, but communicates over a slower 800MHzsystembustoamainmemorythat support a lower clockspeedof 533MHz. Thesystembusiscommonly referredtoasthefrontsidebus. 8 A few bus cycles are normally needed to synchronize the CPU with the bus, and thus the difference in speed between main memory and the CPU can be as large as a factor of ten or more. A cache memory can be positioned closer to the CPU, as shown in the right side of the Figure1, so that the CPU sees fast accesses over a 3.8 GHz direct path (backside bus) to the cache. 9 Cache Mapping The choice of cache mapping scheme affects cost and performance, and there is no single best method that is appropriate for all situations. There a re two way to mapping a cache such as associative and direct mapping. But for this subject, only direct mapping will be discuss. 10 Direct Mapping Figure 2 Figure 2 show a direct mapping scheme for a 2 32 word memory. Asbeforethememoryisdividedinto2 27 blocksof 2 5 =32wordsper block, andthecacheconsist of 2 14 slots. 11 There are more main memory blocks than there are cache slots, and a total of 2 27 / 2 14 = 2 13 main memory blocks can be mapped onto each cache slot. In order to keep track of which of the 2 13 possible blocks in each slot, a 13-bit tag field is added to each slot which holds an identifier in the range from 0 to 2 13 -1 This scheme is called direct mapping because each cache slot corresponds to an explicit set of main memory blocks. For a direct mapped cache, each main memory block can be mapped to only one slot, but each slot can receive more than one block. 12 The mapping frommain memory blocks to cache slots is performedbypartitioninganaddressintofieldsfor thetag, theslot, andthewordasshownbelow: The 32 bit main memory address is partitioned into a 13 bit tag field,14 bit slot field and 5 bit word field. Tag Slot Word 13 bits 14 bits 5 bits Referenced address 13 When a reference is made to a main memory address , the slot field identifies in which of the 2 14 slots (the block will be found if it is in the cache). Case I If valid bit = 1 , tag field of the referenced address is comparedwiththetagfieldof theslot. If thetagfield aresame, then the word is taken fromthe wordfield. 14 Case II If validbit =1andtagfieldnot same; thentheslot iswritten back to main memory if the dirty bit is SET; and correspondingmainmemoryblockisthenreadintotheslot. CaseIII For aprogramthat hasjust startedexecution, thevalidbit will be0, sotheblock(datafrommainmemory) issimply writtentotheslot (cachememory). The valid bit is then set to 1, and the programresume executions. 15 Example 1 Consider how an access to memory location A035F014 16 is mapped to the cache. The bit pattern is partitioned according to the word format in slide 12. Solution: 16 Tag Slot Word 13 bits 14 bits 5 bits 1010000000110 10111110000000 10100 1406 16 2F80 16 14 16 A 0 3 5 F 0 1 4 1010 0000 0011 0101 1111 0000 0001 0100 Tag Slot Word If the addressed word is in the cache, it will be found in word 14 16 of slot 2F80 16 which will have atag of 1406 16 . Cache performance 17 Runtime performance is the purpose of using a cache memory. Figure 3 show the cache read and write policies. 18 The policies depend upon whether or not the requested wordisinthecache. Whentheaddressthat anoperationreferencesisfound, a hitissaidtohaveoccurredinthat level. Otherwiseamissis saidtohaveoccurred.[1] If acachereadoperationistakingplace, andthereferenced word is in the cache, then there is a cache hit and the referenceddataisimmediatelyforwardedtotheCPU. Whenacachemissoccurs, thentheentirelinethat contains thereferencedwordisreadintothecache. Load through- the word that causes the miss is immediatelyforwardedtotheCPU assoonasit isreadinto thecache, rather thanwaitingfor theremainder of thecache slot tobefilled. 19 Two different copies of word for write operation; one in the cache, and one in main memory. Write-through: If both are updated simultaneously. Write-back: If the write is differed until the cache line is flushed from the cache. The use of a dirty bit is only needed for a write-back policy. Even if the data item is not in the cache when the write occurs, there is the choice of bringing the block containing the word into the cache and then updating it, known as write-allocate, or updating it in main memory without involving the cache, known as write-no-allocate. Question 1 20 If a level of memory hierarchy has a rate of 75%, memory requests take 12ns to complete if they hit in the level, and memory request that miss in the level take 100ns to complete, what is the average access time of the level? Solution: Average access time = (T hit P hit ) + (T miss P miss ) = (12ns x 0.75) + (100ns x 0.25) = 34ns. Question 2 21 A memory system contains a cache, a main memory, and a virtual memory. The access time of the cache is 5ns, and it has an 80% hit rate. The access time of the main memory is 100ns, and it has a 99.5% hit rate. The access time of the virtual memory is 10ms. What is the average access time of the hierarchy? Solution: To solve the problem, start from the bottom of the hierarchy and work up. Since the hit rate of the virtual memory is 100%, Average access time of main memory: =( 100ns x 0.995)+(10ms x 0.005)=50099.5ns. Average access time of cache memory: =( 5ns x 0.80) + (50099.5ns x 0.20) = 10024ns. References 1. Computer Architecture ; Nicholas Carter; Mc Graw Hill;2002 2. Computer Architecture and Organization, An Integrated Approached; Miles Murdocca, Vincent Heuring; John Wiley and Son Inc;2007 22