You are on page 1of 56

EA 2004

Computer Architecture - II

Memory Organization
Memory Hierarchy
In a computer system memory unit is the storage
place used to store programs and data on a
temporary or permanent basis.
Typically the memory technologies can categorized in
to three groups
semiconductor memory devices
Registers, SRAM, DRAM, FLASH
which operates on solid state principles making them faster.
magnetic memory devices
HDD and FDD
optical memory devices
CD and DVDs are optical memory devices
Memory Hierarchy

Typically magnetic and optical memory devices are


much slower than semiconductor memory devices
since they are larger and involve moving parts.

But optical and magnetic memory devices has


relatively more capacity and they are less expensive
compared to semiconductor memory devices.
Memory Hierarchy
Main memory
The memory unit which communicate directly with the
CPU
Only programs and data currently needed by processor
reside in the main memory
All other information is stored in auxiliary memory and
transferred to main memory when needed
Auxiliary memory
The devices that provides backup storage space are
called the auxiliary memory.
magnetic disks and magnetic tapes
used to store system programs, data files and other
backup information.
Memory Hierarchy
Extended Memory Hierarchy
Typical Memory Hierarchy System

Auxiliary Provide backup storage


MAGNETIC DISKS, MAGNETIC
Memory TAPES

I/O
Processor

Main Occupies central position,


communicates directly with the CPU,
Memory Auxiliary Memory, Cache Memory

Cache Special high speed memory


CPU Memory
Current programs and data to
CPU at rapid rate
Types of RAMs
Static RAM vs. Dynamic RAM
Transistors Capacitors
Developed using Silicon & Developed using Silicon &
other semiconductors other semiconductors
High-speed switching Slower performance
Automatically discharges after
Will retain state forever
sometime, need refreshing
Reliable Less reliable
Low density High density
High power consumption Low power consumption
High cost (per bit) Low cost (per bit)
Types of RAMs (Cont.)

SRAM (Static RAM)


More reliable but expensive
Typically used for cache memories
More expensive & consumes more power
DRAM (Dynamic RAM)
Contents are stored as charges in a small capacitor
Capacitor must be re-charged from time to type
Bulk of PC memory is made out of DRAM
Hypothetical Library

Book Library

customer Librarian
Hypothetical Library
Imagine a library
There is a librarian who is responsible of lending and
returning books and a store room, where the books are
kept.
When a customer arrives and ask for a book the
librarian goes to the store room, find the book, come
back and give the book to the customer.
When the customer returns the book, the librarian go
to the store room, place the book there and return to
the counter waiting for another customer.
The librarian has to do the same complete cycle for
each book, even for a popular ones that are requested
frequently.
Hypothetical Library
Suppose library get upgraded and the librarian got a
'cache', a small bookshelf behind his counter which can
store 20 books.
A new day starts and the small bookshelf is empty.
The first customer come and ask for the book ABC'.
The librarian go the store room, find the book and give
it to the customer.
When customer returns the book, instead of going the
round trip to the store room, the librarian put it in the
small bookshelf.
Hypothetical Library

Another customer comes and ask for the same book


ABC'.
Librarian look in the small shell for the book and give it
to the customer.
The process is efficient now with a cache.
So by placing frequently requested books in a day in
the fast accessible bookshelf, the average time take to
find a book can be reduced.
What if the book is not in the book shelf
Hypothetical Library

if the book is not in the book shelf, librarian first


examine in the bookshelf and then go to the store
room and bring the book.
This time librarian has to spend some extra time
checking his cache.
Since cache is small (here it's only 20 books) the time
take to check it is much smaller compared to the time
take for the round trip to the store room.
Hypothetical Library

Normally the cache is designed to minimize the cache


searching time, by making it small.
In a computer system cache use faster but smaller
memory type.
It is possible to have multiple layers of cache (In the
above example, a larger bookshelf of 40 books behind
the counter two level cache).
Cache Memory
Small amount of memory that is faster than RAM
Slower than registers
Built using SRAM
Range from few KB to few MB
Use by CPU to store frequently used instructions &
data
Spatial & temporal locality
Use multiple levels of cache
L1 Cache Very fast, usually within CPU itself
L2 Cache Slower than L1, but faster than RAM
Today theres even L3 Cache
Cache Memory
General Caching Concepts
Program needs object d, which
is stored in memory. 14
12 Request
12
Cache hit 14

Program finds b in the cache.


0 1 2 3
-
(e.g., block 14) cache 4
12 9 14 3

Cache miss
- b is not in cache, so cache 12 Request
12
must fetch it from memory.
(e.g., block 12)
- If level cache is full, then some
current block (a victim) must 0 1 2 3
be replaced (evicted). 4 5 6 7
memory
8 9 10 11
12 13 14 15

18
Cache Operation

CPU requests contents of memory location


Check cache for this data
If present, get from cache (fast)
If not present, read required block from main
memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of
main memory is in each cache slot
Hit Ratio
The ratio of the total number of hits divided by the
total CPU accesses to memory (i.e. hits plus misses) is
called Hit Ratio.
hit : the CPU finds the word in the cache
miss : the word is not found in cache (CPU must read main
memory)
The performance of the cache memory is measured in
hit ratio.

Hit ratio = hit / (hit+miss)


Hit Ratio Example

Hit ratio = hit / (hit+miss)

Consider a computer with cache access time of 100ns,


a main memory access time of 1000ns and a hit ratio of
0.9.
Average access time without a cache memory = 1000ns
Average access time with the cache memory =
100*0.9+1000*0.1 = 190ns
Mapping Techniques

The transformation of data from main memory to the


cache memory is called mapping process. The mapping
process is divided in to three types,

1. Direct mapping
2. Associative mapping
3. Set-associative mapping
without a cache memory
Memory locations (words)

5 4 3
CPU Block 0
2 1 0

Block 1
d bits memory register

d-b bits b bits Block 2


block number location

d bits
Block 3
i.e. in here,
Block number=4
Memory locations=6
without a cache memory
Memory locations (words)

CPU
Block 0

d bits memory register Block 1

5-3 bits 3 bits


Block 2
block number location
5 bits
Block 3

For example if the CPU has 5 bit memory register, it can call 25
memory addresses from main memory.
If the main memory has a block size of23, then there will be 22 blocks.
Suppose the CPU address the memory location 11011 then that
memory location is 011 (3rd location) in 01 (or in block 1).
with cache memory

CPU
Main Memory

Block 0
d bits memory register

Cache Memory
Block 1

Block 0

Block 2

Block 1

Block 3

???
Direct mapping Main Memory

Suppose we have a computer with,


Main memory of 256 words Block 0
Cache memory of 16 words
Block size of 8 words
Block 1
Therefore no. of required blocks
in main memory = 256/16=32
Therefore no. of blocks Block 2
in cache memory = 16/8=2

Cache Memory Block 3

Block 0
Block .

Block 1 Block 31
Direct mapping Main Memory
0%2=0
1%2=1
CPU Block 0
2%2=0
3%2=1

Block 1
31%2=1
d bits memory register

Block 2
Cache Memory

Block 0 Block 3

Block 1 Block .

Block 31
Direct mapping
0%2=0
1%2=1
2%2=0
3%2=1


31%2=1

Cache Memory

0001 Block 0 {0 , 2 , 4, 6 .. 28 , 30}

0010 Block 1 {1 , 3, 5 , 7 .. 29 , 31}

Tag bits So tag bits gives number of blocks of main


memory that can compete for one particular block
in cache memory
Direct mapping
Suppose we have a computer with,
Main memory of 2d words Cache memory with 2C of memory blocks
Block size of 2b words
Therefore no. of required blocks in main memory = 2d / 2b = 2d-b
Therefore no. of required tag bits= 2d-b /2c = 2d-b-c
Direct Mapping pros & cons
Simple
Inexpensive
One disadvantage of direct mapping that the hit ratio
can drop significantly if two addresses with same index
but different tags are accessed repeatedly.
The other disadvantage is that since each block of main
memory have definite location in cache memory, there
could be certain memory locations in cache memory
which are empty
Associative mapping
The fastest and most flexible cache uses associative
mapping.
Unlike in direct mapping there are no definite locations
for the blocks in main memory.
The associative memory store both address and the
content of the memory block.
This permits any memory block location of cache to
store any block from main memory.
Associative mapping
Associative mapping
The least significant b bits of the CPU memory address
represents the location of the word inside the block
and most significant d-b bits represents the location of
the word in main memory.
When CPU calls the data in a certain memory address
in main memory, it first check each and every block in
cache memory.
If the address stored alongside with a block in cache
memory match with the address CPU calling, then it is
a hit and required word is in that block.
Associative mapping
Otherwise it is a miss and required word is read from
main memory and that block then stored in the cache
memory together with the address.
If the cache is full then an address-block pair need to
be displaced in order to give space for the new
address-block pair.
The decision as to what pair is replaced is determined
by a replacement algorithm that designer choose.
Associative memory mapping is more complex and
expensive, since it required more logic than the direct
mapping.
Set-Associative mapping
Set associative mapping can be considered as a
combination of both associative mapping and direct
mapping.
As mentioned before the main disadvantage of the
direct mapping is that two blocks with same index but
with different tag values cannot reside in cache
memory for same time.
However in set associative mapping cache can store 2
or more memory blocks under same index.
Set-Associative mapping

Figure represents a set associative memory organization for a set size of 3.


Set-Associative mapping
When CPU required a word in certain memory address in
main memory, it first checks the c-bits in the address and
determines in which SET of cache memory the data should
be.

Then it compare each tag field stored in that set with the
tag field of the memory address.

If there is a match then its a hit and the word is read from
cache.
Set-Associative mapping
Otherwise the CPU goes to the main memory, using the
memory address and read the required word and then that
word transfer to the corresponding SET of the cache and
store that block together with the tag.

If the SET is full then it is necessary to replace existing tag-


block pair with the new one. As in associative mapping, a
suitable replacement algorithm is used in that process.
Writing into Cache
When memory write operations are performed, CPU first
writes into the cache memory. These modifications made
by CPU during a write operations, on the data saved in
cache, need to be written back to main memory or to
auxiliary memory.

These two popular cache write policies (schemes) are:


Write-Through
Write-Back
Write-Through
In a write through cache, the main memory is updated each
time the CPU writes into cache.

The advantage of the write-through cache is that the main


memory always contains the same data as the cache
contains.
This characteristic is desirable in a system which uses direct
memory access scheme of data transfer. The I/O devices
communicating through DMA receive the most recent data.
Write-Back
In a write back scheme, only the cache memory is updated
during a write operation.

The updated locations in the cache memory are marked by


a flag so that later on, when the word is removed from the
cache, it is copied into the main memory.

The words are removed from the cache time to time to


make room for a new block of words.
Cache Initialization
The cache is initialized when power is applied.
After initialization the cache may contain some non-valid
data.
So in cache memory a valid bit is included to each word in
cache to indicate whether or not the word contains valid
data.
The cache is initialized by setting each valid bit to 0.
Valid bit of word is set whenever the word is read from
main memory and updated in cache.
If valid bit is 0, new word automatically replace the invalid
data.
Virtual Memory
In a computer memory system all programs and data are stored
in auxiliary memory.
When the CPU needs programs or data, they are brought from
auxiliary memory to main memory.
However unfortunately sometimes the main memory is not
enough to store all the programs and data that users expect to
run at same time.
For example, if you load the windows 7 operating system,
Microsoft word, Firefox web browser, adobe Photoshop in to
RAM simultaneously, your 1Gb RAM (main memory) could not
be enough to hold them all.
That is the moment that the concept of virtual memory comes in
to the picture.
Virtual Memory
With virtual memory the computer identify the areas of main
memory (RAM) that have not been used recently and copy them
in to auxiliary memory (Hard disk), freeing up space in main
memory to load the new application or data.

Without a virtual memory your computer will stuck or ask you to


close some running applications to get enough space for new
applications.
Virtual Memory
Virtual memory gives the user or programmer the illusion that
computer has large memory even though computer has relatively
small main memory.

Normally auxiliary memory are much cheaper than main memory.


However the read/write speed of auxiliary memory is very much
slower than main memory.

Therefore if the computer has to depend too much on virtual


memory, you will feel a significant performance drop.
Address Space and Memory Space

An address used by a programmer is called 'virtual


address' and the set of such addresses the 'address
space'.

An address in main memory is called a 'physical


address' and the set of each physical addresses the
'memory space'.

A virtual memory system provides a mechanism for


translating program-generated virtual addresses in to
correct physical addresses (main memory address).
Address Mapping Using Pages

The physical memory is broken into 'group's of equal


size called blocks.

The address space is also divided in to 'page's of


same size.
Although page and block are of equal size, page
refer to organization of address space and block
represent the organization of memory space.
Address Mapping Using Pages
For example suppose a computer with an address space of 8K
and memory space of 4K.
Furthermore suppose that page or block consists of 1K words.
Then we obtain eight pages and four blocks as in figure

In above example a virtual address has 13 bits (8=81024=213 ) and a


physical address has 12 bits (4=41024=212 ).
Address Mapping Using Pages

How an address translated from virtual to physical?


It is done by using of a memory mapping table (page table).
Similar to cache the virtual address is divided in to two parts,
namely page number and line number.
In the above example the virtual address has 13 bits.
Since each page consists of 1K (210=1024) words, the highest 3
bits will represents the page number of address space and the
lowest 10 bits of the virtual address will represents the line
address within that page.
Address Mapping Using Pages
Page no.
Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address

Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 10 01 0101010011 Block 2
100 0 Main memory Block 3
101 01 1 address register
110 10 1
MBR
111 10

01 1

Memory page table


Associative Memory Page Table
The major drawback of the above page table is it is inefficiency
of storage utilization since there is a certain entry for each page
number.
In the above example observe that always at least 4 entries of
the page table is marked empty because main memory cannot
accommodate more than four blocks.
Consider a case where address space is 1024K words and
memory space 32K.
If each page or block contains 1K words, then there will be
1024 pages and 32 blocks.
Then the capacity of the memory-page table should be equal to
1024 words (entries), therefore at a given time at least 992
locations of page table will be empty and not in use. !!!
Associative Memory Page Table
The more efficient way is to use an associative memory page
table, in which the number of entries in memory page table is
equal to the number of blocks in main memory.
In this method each page table entry store both page number
and associated block number.
Associative Memory Page Table
Here there are two fields in page table corresponds
to page number and block number.

When a program generate a virtual address, the page


number field of each entry of the page table will be
compared with the page number field of the virtual
address.

If a match occurs the associated block number stored


in that entry will be used to identify the
corresponding block in main memory and the
required word will be red from form that block using
line number of the virtual address.
Page Replacement
The program is executed from main memory until
page required is not available.
If page is not available, this condition is called page
fault.
When it occurs, present program is suspended until
the page required is brought into main memory.
If main memory is full, pages to remove is
determined from the replacement algorithm used.
The two of most common replacement algorithms
are first in first out (FIFO) and least recently used
(LRU).
Page Replacement
First-In-First-Out (FIFO):
The FIFO replacement is easy to implement, but it has the
disadvantage that under certain circumstances pages could
be removed and loaded from memory too frequently.

Least Recently Used (LRU):


The LRU is more difficult to implement, but has the
advantage that the least recently used page is a better
candidate for removal than the first in (least recently
loaded) page.
Thank You

You might also like