More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture
PART5 More Elaborations with cache & Virtual Memory

1
Cache Optimization into categories
Reducing Miss Penalty

Multilevel caches Critical word first: Dont wait for the full block to be loaded before sending the requested word and restarting the CPU
Read Miss Before write miss: This optimization serves reads before writes have been completed.
SW R2, 512(R0) ; M[512] R2 (cache index 0) LW R1,1024(R0) ; R1 M[1024] (cache index 0)
LW R2,512(R0) ; R2 M[512] (cache index 0)
- If the write buffer hasnt completed writing to location 512 in memory, the read of location 512 will put the old, wrong value into the cache block, and then into R2.
Victim Caches
2
Victim Caches
One approach to lower miss penalty is to remember what was discarded in case it is needed again. This victim cache contains only blocks that are discarded from a cache because of a miss victimsand are checked on a miss to see if they have the desired data before going to the next lower-level memory. The AMD Athlon has a victim cache with eight entries.
3
Jouppi [1990] found that victim caches of one to ve entries are effective at reducing misses, especially for small, direct-mapped data caches. Depending on the program, a four-entry victim cache might remove one quarter of the misses in a 4-KB direct-mapped data cache.
Cache Optimization into categories
Reducing the miss rate

Larger block size, Larger cache size,
Higher associativity,
Way prediction Pseudo-associativity,
- In way-prediction, extra bits are kept in the cache to predict the set of the next cache access.
Compiler optimizations
small and simple caches,
Reducing the time to hit in the cache
avoiding address translation,

and pipelined cache access.
4
Cache Optimization
Complier-based cache optimization reduces the miss rate without any hardware change For Instructions

Reorder procedures in memory to reduce conflict Profiling to determine likely conflicts among groups of instructions
For Data
Merging Arrays: improve spatial locality by single array of compound elements vs. two arrays Loop Interchange: change nesting of loops to access data in order stored in memory Loop Fusion: Combine two independent loops that have same looping and some variables overlap Blocking: Improve temporal locality by accessing blocks of data repeatedly vs. going down whole columns or rows
5
Examples
Reduces misses by improving spatial locality through combined arrays that are accessed simultaneously
Sequential accesses instead of striding through memory every 100 words; improved spatial locality
Examples
Some programs have separate sections of code that access the same arrays (performing different computation on common data)
Fusing multiple loops into a single loop allows the data in cache to be used repeatedly before being swapped out
Loop fusion reduces missed through improved temporal locality (rather than spatial locality in array merging and loop interchange)
Accessing array a and c would have caused twice the number of misses without loop fusion
Blocking Example
Example
B called Blocking Factor Conflict misses can go down too Blocking is also useful for register allocation
Summary of performance equations
10
VIRTUAL MEMORY
Youre running a huge program that requires 32MB
Your PC has only 16MB available...
Rewrite your program so that it implements overlays Execute the first portion of code (fit it in the available
memory) When you need more memory... Find some memory that isnt needed right now Save it to disk Use the memory for the latter portion of code So on... The memory is to disk as registers are to memory Disk as an extension of memory Main memory can act as a cache for the secondary stage (magnetic disk)
11
A Memory Hierarchy
CPU
Registers
Store Load or I-Fetch
Extend the hierarchy
Main memory acts like a cache for the disk

HW manages movement
Cache: About $20/Mbyte <2ns access time
Cache
512KB typical
Memory: About $0.15/MBtye, 50ns access time
256MB typical
Main Memory (DRAM)
Disk: About $0.0015/MByte, 15ms (15,000,000 ns) access SW time
manages movement The operating system is responsible for managing the movement of memory between disk and main memory, and for keeping the address translation table accurate.
40GB typical
Disk
12
Virtual Memory
Idea: Keep only the portions of a program (code, data) that are currently needed in Main Memory

Currently unused data is saved on disk, ready to be brought in when needed Appears as a very large virtual memory (limited only by the disk size)
Advantages:
Programs that require large amounts of memory can be run (As
long as they dont need it all at once) Multiple programs can be in virtual memory at once, only active programs will be loaded into memory A program can be written (linked) to use whatever addresses it wants to! It doesnt matter where it is physically loaded! When a program is loaded, it doesnt need to be placed in continuous memory locations
Disadvantages:
The memory a program needs may all be on disk The operating system has to manage virtual memory
13
Virtual Memory
We will focus on using the disk as a storage area for chunks of main memory that are not being used.
The basic concepts are similar to providing a cache for main memory, although we now view part of the hard disk as being the memory.

Only few programs are active An active might not need all the memory that has been reserved by the program (store rest in the Hard disk)
14
The Virtual Memory Concept

Virtual Memory Space: All possible memory addresses (4GB in 32-bit systems) All that can be held as an option (conceived) . Virtual Memory Space Disk Swap Space: Area on hard disk that can be used as an extension of memory. (Typically equal to ram size) All that can be used. Main Memory: Physical memory. (Typically 1GB) All that physically exists.
Disk Swap Space
Main Memory
15
The Virtual Memory Concept

This address can be conceived of, but doesnt correspond to any memory. Accessing it will produce an error. This address can be accessed. However, it currently is only on disk and must be read into main memory before being used. A table maps from its virtual address to the disk location. This address can be accessed immediately since it is already in memory. A table maps from its virtual address to its physical address. There will also be a back-up location on disk.
Virtual Memory Space Error Disk Swap Space Disk Address: 58984 Not in main memory
Main Memory Physical Address: 883232 Disk Address: 322321
16
The Process
The CPU deals with Virtual Addresses 1. Convert the virtual address to a physical address Need a special table (Virtual Addr-> Physical Addr.) The table may indicate that the desired address is on disk, but not in physical memory
Read the location from the disk into memory (this may require
moving something else out of memory to make room)
Steps to accessing memory with a virtual address
2. Do the memory access using the physical address Check the cache first (note: cache uses only physical addresses) Update the cache if needed
17
Structure of Virtual Memory

Return our Library Analogy Virtual addresses as the title of a book Physical address as the location of that in the library From Processor
Virtual Address Page fault Using elaborate Software page fault Handling algorithm
Address Translator
Physical Address
To Memory
18
Translation (hardware that translates these virtual addresses to physical addresses)
Since the hardware access memory, we need to convert from a logical address to a physical address in hardware The Memory Management Unit (MMU) provides this functionality
2n-1
CPU
Virtual Address (Logical)
MMU
Physical Address (Real)
Physical Memory
19
Address Translation
In Virtual Memory, blocks of memory (called pages) are mapped from one set of address (called virtual addresses) to another set (called physical addresses)
20
Page Faults
If the valid bit for a virtual page is off, a page fault occurs. The operating system must be given control. Once the operating system gets control, it must find the page in the next level of the hierarchy (usually magnetic disk) and decide where to place the requested page in main memory.
21
Terminology
page: The unit of memory transferred between disk and the main memory. page fault: when a program accesses a virtual memory location that is not currently in the main memory. address translation: the process of finding the physical address that corresponds to a virtual address.
Cache Block Cache miss Block addressing
Virtual memory Page page fault Address translation
22
Difference between virtual and cache memory
The miss penalty is huge (millions of seconds)
Solution: Increase block size (page size) around 8KB

- Because transfers have a large startup time, but data transfer is relatively fast after started
Even on faults (misses) VM must provide info on the disk location

VM system must have an entry for all possible locations When there is a hit, the VM system provides the physical address in memory (not the actual data, in the cache we have data itself )
- Saves room one address rather than 8 KB data
Since miss penalty is very huge, VM systems typically have a miss (page fault) rate of 0.00001- 0.0001%
23
In Virtual Memory Systems
Pages should be large enough to amortize the high
access time. (from 4 kB to 16 kB are typical, and some

designers are considering size as large as 64 kB.)
Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative)
Sophisticated LRU replacement policy is preferable
Page
faults can be handled in software. (Write-through scheme does not work.)

24
Write-back
we
need a scheme that reduce the number of disk writes.
Keeping track of pages: The page table

All programs use the same virtual addressing space Each program must have its own memory mapping Each program has its own page table to map virtual addresses to physical addresses
virtual Address
Page Table
Physical Address
The page table resides in memory, and is pointed to by the page table register The page table has an entry for every possible page (in principle, not in practice...), no tags are necessary. A valid bit indicates whether the page is in memory or on disk.
25
Virtual to Physical Mapping

Both virtual and physical address are broken down a page number and page offset
31
(index)
No tag - All entries are unique

13 12 0
Virtual Page Number

Translation 23
Page Offset
Note: may involve reading from disk Page tables are stored in main MEM 13 12 0
Example 4GB (32-bit) Virtual Address Space 16MB (24-bit) Physical Address Space 8 KB (13-bit) page size (block size)
Physical Page Number Page Offset
A 32-bit virtual address is given to the V.M. hardware
The virtual page number (index) is derived from this by removing the page (block) offset
The Virtual Page Number is looked up in a page table When found, entry is either: The physical page number, if in memory The disk address, if not in memory (a page fault) If not found, the address is invalid
V->1 V->0
Virtual Memory (32-bit system): 8KB page size,16MB Mem

31 13 12 0
Virtual Address
4GB / 8KB = 512K entries
Index
19
219=512K
Page offset
13
Virt. Pg.# V Phys. Page # 0 1 2 ...

... 512K
11
23 13 12 0
Disk Address
Physical Address
27
Virtual Memory Consists

Bits for page address Bits for virtual page number Number of virtual pages Entries in the page table Bits for physical page number Number of physical pages Bits per page table line Total page table size
28
Write issues
Write Through - Update both disk and memory

+ Easy to implement - Requires a write buffer - Requires a separate disk write for every write to memory - A write miss requires reading in the page first, then writing back the single word
Write Back - Write only to main memory. Write to the disk

only when block is replaced. + Writes are fast + Multiple writes to a page are combined into one disk write - Must keep track of when page has been written (dirty bit)
29
Page replacement policy

Exact Least Recently Used (LRU) but it is expensive. So, use Approximate LRU: a use bit (or reference bit) is added to every page table line
If there is a hit, PPN is used to form the address and reference bit is turned on so the bit is set at every access
the OS periodically clears all use bits the page to replace is chosen among the ones with their use bit at zero
Choose one entry as a victim randomly
If the OS chooses to replace the page, the dirty bit indicates whether the page to be written out before its location in memory can be given to another (give a Figure)
30
Virtual memory example

System with 20-bit V.A., 16KB pages, 256KB of physical memory Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N. Page Table: Virtual Page # (index) 000000 000001 000010 000011 000100 000101 000110 000111 Valid Bit 1 0 1 0 1 1 0 0 1 1 Physical Page #/ Disk address 1001 sector 5000... 0010 sector 4323 1011 1010 sector xxxx... sector 1239... 1010 0001 Access to: 0000 1000 1100 1010 1010 PPN = 0010
Physical Address: 00 1000 1100 1010 1010

Access to: 0001 1001 0011 1100 0000
PPN = Page Fault to sector 1239...
Pick a page to kick out of memory (use LRU). Read data from sector 1239 Assume LRU is VPN 000101 for this example. into PPN 1010
31

More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture

Uploaded by

Copyright:

Available Formats

CMPE 421 Parallel Computer Architecture

PART5 More Elaborations with cache & Virtual Memory

Cache Optimization into categories

Reducing Miss Penalty

LW R2,512(R0) ; R2 M[512] (cache index 0)

Cache Optimization into categories

Reducing the miss rate

Larger block size, Larger cache size,

Reducing the time to hit in the cache

avoiding address translation,

Summary of performance equations

Youre running a huge program that requires 32MB

Your PC has only 16MB available...

Extend the hierarchy

Main memory acts like a cache for the disk

Cache: About $20/Mbyte <2ns access time

Memory: About $0.15/MBtye, 50ns access time

Main Memory (DRAM)

Disk: About $0.0015/MByte, 15ms (15,000,000 ns) access SW time

The Virtual Memory Concept

Disk Swap Space

The Virtual Memory Concept

Main Memory Physical Address: 883232 Disk Address: 322321

Steps to accessing memory with a virtual address

Structure of Virtual Memory

Translation (hardware that translates these virtual addresses to physical addresses)

Physical Address (Real)

Cache Block Cache miss Block addressing

Virtual memory Page page fault Address translation

Difference between virtual and cache memory

The miss penalty is huge (millions of seconds)

Solution: Increase block size (page size) around 8KB

Even on faults (misses) VM must provide info on the disk location

In Virtual Memory Systems

Pages should be large enough to amortize the high

access time. (from 4 kB to 16 kB are typical, and some

Sophisticated LRU replacement policy is preferable

faults can be handled in software. (Write-through scheme does not work.)

need a scheme that reduce the number of disk writes.

Keeping track of pages: The page table

Virtual to Physical Mapping

No tag - All entries are unique

Virtual Page Number

Physical Page Number Page Offset

A 32-bit virtual address is given to the V.M. hardware

Virtual Memory (32-bit system): 8KB page size,16MB Mem

Virt. Pg.# V Phys. Page # 0 1 2 ...

Virtual Memory Consists

Write Through - Update both disk and memory

Write Back - Write only to main memory. Write to the disk

Page replacement policy

Choose one entry as a victim randomly

Virtual memory example

Physical Address: 00 1000 1100 1010 1010

PPN = Page Fault to sector 1239...

You might also like