5 3 PDF

EC303
MohdRizal Bin MohamadAlias

5.3
Cache Memory
1
Cache memory
A CPU cache is a cache used by the central processing
unit of a computer to reduce the average time to
accessmemory.
Thecacheisasmaller, faster memorywhichstorescopiesof
the data from the most frequently used main
memorylocations.
As long as most memory accesses are cached memory
locations, the average latency of memory accesses will be
closer to the cache latency than to the latency of main
memory
2
3
Cachearegenerallythetoplevel (memoryhierarchy) andare
almost alwaysconstructedout of SRAM.
The main structural difference between acache and other
levelsinmemoryhierarchyisthat cachescontainhardwareto
track thememoryaddressesthat arecontainedinthecache
andmovedataintoandout of thecachenecessary. [1]
Locality principle
Most of memory references in an executing program are made
to a small number of locations.
Temporal locality
When a program references a memory locations, it is likely to
reference that same memory again soon.[2]
4
Asmall but fast cachememory, inwhichthecontentsof themost
commonly accessed locations are maintained, can be placed
betweenthemainmemoryandtheCPU.
When theprogramexecutes, thecaches memory is searched
first, andthereferencedwordisaccessedinthecacheif theword
ispresent.
If thereferencedwordisnot inthecache, thenfreelocationis
createdinthecacheandthereferencedwordisbrought intothe
cachefromthemainmemory.
Althoughthisprocesstakeslonger thanaccessingmainmemory
directly, the overall performance can be improved if a high
proportionof memory accessesaresatisfiedbythecache, asis
normallycase.
5
In principle, alarge cache memory is desirable so that alarge
number of memoryreferencescanbesatisfiedbythecache.
Unfortunately, alargecacheisslower andmoreexpensivethana
small cache.
Acompromisethat workswell inpracticeistofirst makeacache
memorythat isclosetotheCPUasfast asit canbewithinagiven
size constraint, and then to compensate for not being large
enough, addanother cachethat iscloser tothemainmemory.
Modern memory systems may have several levels of cache,
referredtoasLevel 1(L1), Level 2(L2) andeven, insamecases,
Level 3(L3), with the L1 cache being closest to the CPU and
organizedfor speed, andtheL3cachebeingthefarthest fromthe
CPU (and closest to the main memory) and being organized
primarilyto satisfymemoryreferencesmissedbytheL1andL2
caches.
6
A cache memory is faster than main memory for a number
of reason. Cache is small, however , this increase in cost is
relatively small. A cache memory has fewer location than a
main memory, which reduce the access time.
The cache is placed both physically closer and logically closer
to the CPU than the main memory , and this placement
avoids communication delays.
7
Placement Of Cache Memory in a
Computer
Figure 1
Asimplecomputer without cacheisshownintheleft sideof
theFigure1. Thiscachelesscomputer containsaCPU that
has a clock speed of 3.8GHz, but communicates over a
slower 800MHzsystembustoamainmemorythat support a
lower clockspeedof 533MHz. Thesystembusiscommonly
referredtoasthefrontsidebus.
8
A few bus cycles are normally needed to synchronize the
CPU with the bus, and thus the difference in speed between
main memory and the CPU can be as large as a factor of ten
or more.
A cache memory can be positioned closer to the CPU, as
shown in the right side of the Figure1, so that the CPU sees
fast accesses over a 3.8 GHz direct path (backside bus) to the
cache.
9
Cache Mapping
The choice of cache mapping scheme affects cost and
performance, and there is no single best method that is
appropriate for all situations.
There a re two way to mapping a cache such as associative
and direct mapping.
But for this subject, only direct mapping will be discuss.
10
Direct Mapping
Figure 2
Figure 2 show a direct mapping scheme for a 2
32
word
memory. Asbeforethememoryisdividedinto2
27
blocksof
2
5
=32wordsper block, andthecacheconsist of 2
14
slots.
11
There are more main memory blocks than there are cache
slots, and a total of 2
27
/ 2
14
= 2
13
main memory blocks can
be mapped onto each cache slot.
In order to keep track of which of the 2
13
possible blocks in
each slot, a 13-bit tag field is added to each slot which holds
an identifier in the range from 0 to 2
13
-1
This scheme is called direct mapping because each cache
slot corresponds to an explicit set of main memory blocks.
For a direct mapped cache, each main memory block can be
mapped to only one slot, but each slot can receive more than
one block.
12
The mapping frommain memory blocks to cache slots is
performedbypartitioninganaddressintofieldsfor thetag,
theslot, andthewordasshownbelow:
The 32 bit main memory address is partitioned into a 13 bit
tag field,14 bit slot field and 5 bit word field.
Tag Slot Word
13 bits 14 bits 5 bits
Referenced address
13
When a reference is made to a main memory address , the
slot field identifies in which of the 2
14
slots (the block will be
found if it is in the cache).
Case I
If valid bit = 1 , tag field of the referenced address is
comparedwiththetagfieldof theslot.
If thetagfield aresame, then the word is taken fromthe
wordfield.
14
Case II
If validbit =1andtagfieldnot same; thentheslot iswritten
back to main memory if the dirty bit is SET; and
correspondingmainmemoryblockisthenreadintotheslot.
CaseIII
For aprogramthat hasjust startedexecution, thevalidbit
will be0, sotheblock(datafrommainmemory) issimply
writtentotheslot (cachememory).
The valid bit is then set to 1, and the programresume
executions.
15
Example 1
Consider how an access to memory location A035F014
16
is
mapped to the cache. The bit pattern is partitioned according
to the word format in slide 12.
Solution:
16
Tag Slot Word
13 bits 14 bits 5 bits
1010000000110 10111110000000 10100
1406
16
2F80
16
14
16
A 0 3 5 F 0 1 4
1010 0000 0011 0101 1111 0000 0001 0100
Tag Slot Word
If the addressed word is in the cache, it will be found in word 14
16
of slot 2F80
16
which will have atag of 1406
16
.
Cache performance
17
Runtime performance is the purpose of using a cache
memory.
Figure 3 show the cache read and write policies.
18
The policies depend upon whether or not the requested
wordisinthecache.
Whentheaddressthat anoperationreferencesisfound, a
hitissaidtohaveoccurredinthat level. Otherwiseamissis
saidtohaveoccurred.[1]
If acachereadoperationistakingplace, andthereferenced
word is in the cache, then there is a cache hit and the
referenceddataisimmediatelyforwardedtotheCPU.
Whenacachemissoccurs, thentheentirelinethat contains
thereferencedwordisreadintothecache.
Load through- the word that causes the miss is
immediatelyforwardedtotheCPU assoonasit isreadinto
thecache, rather thanwaitingfor theremainder of thecache
slot tobefilled.
19
Two different copies of word for write operation; one in the
cache, and one in main memory.
Write-through: If both are updated simultaneously.
Write-back: If the write is differed until the cache line is
flushed from the cache. The use of a dirty bit is only needed
for a write-back policy.
Even if the data item is not in the cache when the write
occurs, there is the choice of bringing the block containing
the word into the cache and then updating it, known as
write-allocate, or updating it in main memory without
involving the cache, known as write-no-allocate.
Question 1
20
If a level of memory hierarchy has a rate of 75%, memory
requests take 12ns to complete if they hit in the level, and
memory request that miss in the level take 100ns to
complete, what is the average access time of the level?
Solution:
Average access time = (T
hit
P
hit
) + (T
miss
P
miss
)
= (12ns x 0.75) + (100ns x 0.25)
= 34ns.
Question 2
21
A memory system contains a cache, a main memory, and a virtual
memory. The access time of the cache is 5ns, and it has an 80% hit rate.
The access time of the main memory is 100ns, and it has a 99.5% hit
rate. The access time of the virtual memory is 10ms. What is the average
access time of the hierarchy?
Solution:
To solve the problem, start from the bottom of the hierarchy and work up.
Since the hit rate of the virtual memory is 100%,
Average access time of main memory:
=( 100ns x 0.995)+(10ms x 0.005)=50099.5ns.
Average access time of cache memory:
=( 5ns x 0.80) + (50099.5ns x 0.20) = 10024ns.
References
1. Computer Architecture ; Nicholas Carter; Mc Graw
Hill;2002
2. Computer Architecture and Organization, An Integrated
Approached; Miles Murdocca, Vincent Heuring; John
Wiley and Son Inc;2007
22

5 3 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 3 PDF

Uploaded by

Copyright:

Available Formats

EC303

MohdRizal Bin MohamadAlias

You might also like