Pres - Opteron Slides - PDF - Vers - Magliozzi

AMD Opteron Quad-Core
a brief overview
Daniele Magliozzi
Politecnico di Milano
Opteron Memory Architecture
 native quad-core design

(four cores on a single die for
more efficient data sharing)
 enhanced cache structure
 integrated memory
controller
 sustain multi-threaded
application throughput fitting
modern servers and
workstations needs
Daniele Magliozzi - Politecnico di Milano -1- AMD Opteron Quad-Core : A Brief Overview
3 levels of dedicated & shared cache
4 different caches accelerate instruction exec. and data processing
 L1 Instruction Cache: 64-Kbyte, 2-way set-associative, 64 bytes
line length, LRU; for instruction loads, instruction prefetching, instruction
predecoding, and branch prediction.
 L1 Data Cache: 64-Kbyte, 2-way set-associative, W.A. & W.B. with
LRU, divided into eight banks(16 bytes wide), with prefetcher and 3-
cycle load-to-use latency.
 L2 Cache: contains only victim or copy-back blocks from L1.
 L3 Cache: dynamically shared, non-inclusive victim cache with
blocks allocated on L2 victim/copy-backs. Hits in L3 can either leave the
data there (for data accessed by multiple cores), or remove the data
from L3 placing it solely in L1(for data accessed by a single core)
DDR2 SDRAM with integrated memory controller
 SDRAM: store memory in memory cells activated using clock signal

to synchronize their operation with an external data bus.
 DDR2 SDRAM: (double data rate synchronous dynamic random

access memory) cells transfer data both on the rising and falling edge of
the clock (a technique called "double pumping").
 Improvement: operation of the external data bus at twice the clock

rate achieved to obtain twice the bandwidth over its predecessor (DDR)
 Memory Controller: integrated on-die, manages the flow of data

going to and from the main memory, optimizing memory performance
and bandwidth per CPU and reducing latency inherent in front-side
buffer architectures.
Direct Connect Architecture
Front side bus eliminated, core

directly connected to:
 memory controller
 I/O subsystem
 other processors
by high bw. Hypertransport links.

Improving overall system
performance and efficiency by
eliminating traditional bottlenecks
inherent in legacy front side bus
architectures.
HyperTransport Technology
high-speed, low latency, point-to-point, unidirectional links between two
devices, capable of extremely fast signaling (up to 800MHz ck. sp.)
compatible with PCI interface.
 “Packetized” bus: addresses, data, and commands are sent
along the same wires allowing narrower links easier to route.
 HT System: a processor with a HyperTransport port called
HyperTransport host, the HyperTransport bus and any I/O channels
connected to it.
 Differential signaling: (employed by links) use two wires for
each signal, with the result being the difference between the two
signals sent, does not suffer from problems associated with the single-
ended signaling of high speed parallel buses (bouncing signals,
interference, cross-talk).
HyperTransport Technology (Switch
Topology)
supports multiple connection topologies: daisy chain, switch, star.

Switch Topology
The host communicates directly
with the switch chip, which in turn
manages multiple independent
slaves including tunnels, bridges,
and end-device chips (Parallelize
Daisy Chain).
Each port on the switch benefits
from the full bandwidth of the
HyperTransport technology I/O link
because the switch directs the flow
of electrical signals between the
slave devices connected to it.
AMD Virtualization
To allow multiple operating systems to run on the same physical
platform, a SW platform layer ( Hypervisor) decouples the operating
system from the underlying hardware. It is also a translation layer for
guest virtual addresses that could operate in 2 ways:
 SW: Hypervisor modifies the guest source code to cooperate with
him or to control his privileged operations(at run-time).
 HW-assisted virtualization: Hypervisor uses a set
processor extensions (ex: AMD-V) to intercept and emulate guest
privileged operations.
In AMD-V technology Hypervisor specifies how the processor should
handle privileged operations in guest itself without transferring control
to the Hypervisor. This improves the efficiency of switching between
VM, helping improve performance and effectively isolates VM for
secure operation.
Rapid Virtualization Indexing (RVI)
 Paging enabled: the operating system defines a set of Page
Tables, used by the Page Walker (implemented in processor
HW), in order to translate the “linear addresses” to physical
addresses.
 guest Page Table (gPT): another level of translation under
virtualization. Hypervisor can manage it via SW (with the shadow
Page Table) or via HW:
 nested Page Tables (nPT): set by the Hypervisor in the
Page Walker and letting it manage translations using a second
level of translation, reducing overheads found in equivalent
shadow paging implementations, storing recent translations in an
internal translation look-aside buffer (TLB).
Power Performances
Enhanced AMD PowerNow! with Independent


Dynamic Core: Allows processors and cores to operate at

various voltages and frequencies.
 AMD CoolCore Technology: Reduces processor energy

consumption by turning off unused parts of the processor.
 AMD Smart Fetch Technology: Allows core to enter "halt"

state and draw less power. Reduces CPU power consumption.
Opteron 4-C 3rd generation optimizations
1.Load-Execute Instructions (for Floating-Point or Integer Operands)

2.Write-Combining (multiple memory-write cycles in a 64-B buffer)
3.Branches That Depend on Random Data(avoid random condition branch)
4.Loop Unrolling
5.Pointer Arithmetic in Loops(using loop count. as index into memory arrays)
6.Explicit Load Instructions
7.Reuse of Dead Registers
8.ccNUMA (cache coherent non-uniform memory access)
9.Prefetch and Streaming Instructions
Daniele Magliozzi - Politecnico di Milano - 10 - AMD Opteron Quad-Core : A Brief Overview

Some Technical Data
Core Speed 2800 MHz

System Bus Speed 2200 MHz
Integrated memory Speed 2200 MHz
Wattage 75 W
L1 Cache Size 64 Kbyte (X 4 Cache)
Daniele Magliozzi - Politecnico di Milano - 11 - AMD Opteron Quad-Core : A Brief Overview

Pres - Opteron Slides - PDF - Vers - Magliozzi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pres - Opteron Slides - PDF - Vers - Magliozzi

Uploaded by

Copyright:

Available Formats

AMD Opteron Quad-Core

 native quad-core design

 SDRAM: store memory in memory cells activated using clock signal

 DDR2 SDRAM: (double data rate synchronous dynamic random

 Improvement: operation of the external data bus at twice the clock

 Memory Controller: integrated on-die, manages the flow of data

Front side bus eliminated, core

by high bw. Hypertransport links.

supports multiple connection topologies: daisy chain, switch, star.

Enhanced AMD PowerNow! with Independent

Dynamic Core: Allows processors and cores to operate at

 AMD CoolCore Technology: Reduces processor energy

 AMD Smart Fetch Technology: Allows core to enter "halt"

1.Load-Execute Instructions (for Floating-Point or Integer Operands)

Daniele Magliozzi - Politecnico di Milano - 10 - AMD Opteron Quad-Core : A Brief Overview

Core Speed 2800 MHz

Daniele Magliozzi - Politecnico di Milano - 11 - AMD Opteron Quad-Core : A Brief Overview

You might also like