You are on page 1of 12

AMD Opteron Quad-Core

a brief overview

Daniele Magliozzi

Politecnico di Milano
Opteron Memory Architecture

 native quad-core design


(four cores on a single die for
more efficient data sharing)
 enhanced cache structure
 integrated memory
controller
 sustain multi-threaded
application throughput fitting
modern servers and
workstations needs

Daniele Magliozzi - Politecnico di Milano -1- AMD Opteron Quad-Core : A Brief Overview
3 levels of dedicated & shared cache
4 different caches accelerate instruction exec. and data processing
 L1 Instruction Cache: 64-Kbyte, 2-way set-associative, 64 bytes
line length, LRU; for instruction loads, instruction prefetching, instruction
predecoding, and branch prediction.
 L1 Data Cache: 64-Kbyte, 2-way set-associative, W.A. & W.B. with
LRU, divided into eight banks(16 bytes wide), with prefetcher and 3-
cycle load-to-use latency.
 L2 Cache: contains only victim or copy-back blocks from L1.
 L3 Cache: dynamically shared, non-inclusive victim cache with
blocks allocated on L2 victim/copy-backs. Hits in L3 can either leave the
data there (for data accessed by multiple cores), or remove the data
from L3 placing it solely in L1(for data accessed by a single core)

Daniele Magliozzi - Politecnico di Milano -2- AMD Opteron Quad-Core : A Brief Overview
DDR2 SDRAM with integrated memory controller

 SDRAM: store memory in memory cells activated using clock signal


to synchronize their operation with an external data bus.

 DDR2 SDRAM: (double data rate synchronous dynamic random


access memory) cells transfer data both on the rising and falling edge of
the clock (a technique called "double pumping").

 Improvement: operation of the external data bus at twice the clock


rate achieved to obtain twice the bandwidth over its predecessor (DDR)

 Memory Controller: integrated on-die, manages the flow of data


going to and from the main memory, optimizing memory performance
and bandwidth per CPU and reducing latency inherent in front-side
buffer architectures.
Daniele Magliozzi - Politecnico di Milano -3- AMD Opteron Quad-Core : A Brief Overview
Direct Connect Architecture

Front side bus eliminated, core


directly connected to:
 memory controller
 I/O subsystem
 other processors

by high bw. Hypertransport links.


Improving overall system
performance and efficiency by
eliminating traditional bottlenecks
inherent in legacy front side bus
architectures.

Daniele Magliozzi - Politecnico di Milano -4- AMD Opteron Quad-Core : A Brief Overview
HyperTransport Technology
high-speed, low latency, point-to-point, unidirectional links between two
devices, capable of extremely fast signaling (up to 800MHz ck. sp.)
compatible with PCI interface.
 “Packetized” bus: addresses, data, and commands are sent
along the same wires allowing narrower links easier to route.
 HT System: a processor with a HyperTransport port called
HyperTransport host, the HyperTransport bus and any I/O channels
connected to it.
 Differential signaling: (employed by links) use two wires for
each signal, with the result being the difference between the two
signals sent, does not suffer from problems associated with the single-
ended signaling of high speed parallel buses (bouncing signals,
interference, cross-talk).
Daniele Magliozzi - Politecnico di Milano -5- AMD Opteron Quad-Core : A Brief Overview
HyperTransport Technology (Switch
Topology)

supports multiple connection topologies: daisy chain, switch, star.


Switch Topology
The host communicates directly
with the switch chip, which in turn
manages multiple independent
slaves including tunnels, bridges,
and end-device chips (Parallelize
Daisy Chain).
Each port on the switch benefits
from the full bandwidth of the
HyperTransport technology I/O link
because the switch directs the flow
of electrical signals between the
slave devices connected to it.
Daniele Magliozzi - Politecnico di Milano -6- AMD Opteron Quad-Core : A Brief Overview
AMD Virtualization
To allow multiple operating systems to run on the same physical
platform, a SW platform layer ( Hypervisor) decouples the operating
system from the underlying hardware. It is also a translation layer for
guest virtual addresses that could operate in 2 ways:
 SW: Hypervisor modifies the guest source code to cooperate with
him or to control his privileged operations(at run-time).
 HW-assisted virtualization: Hypervisor uses a set
processor extensions (ex: AMD-V) to intercept and emulate guest
privileged operations.
In AMD-V technology Hypervisor specifies how the processor should
handle privileged operations in guest itself without transferring control
to the Hypervisor. This improves the efficiency of switching between
VM, helping improve performance and effectively isolates VM for
secure operation.
Daniele Magliozzi - Politecnico di Milano -7- AMD Opteron Quad-Core : A Brief Overview
Rapid Virtualization Indexing (RVI)
 Paging enabled: the operating system defines a set of Page
Tables, used by the Page Walker (implemented in processor
HW), in order to translate the “linear addresses” to physical
addresses.
 guest Page Table (gPT): another level of translation under
virtualization. Hypervisor can manage it via SW (with the shadow
Page Table) or via HW:
 nested Page Tables (nPT): set by the Hypervisor in the
Page Walker and letting it manage translations using a second
level of translation, reducing overheads found in equivalent
shadow paging implementations, storing recent translations in an
internal translation look-aside buffer (TLB).

Daniele Magliozzi - Politecnico di Milano -8- AMD Opteron Quad-Core : A Brief Overview
Power Performances

Enhanced AMD PowerNow! with Independent


Dynamic Core: Allows processors and cores to operate at


various voltages and frequencies.

 AMD CoolCore Technology: Reduces processor energy


consumption by turning off unused parts of the processor.

 AMD Smart Fetch Technology: Allows core to enter "halt"


state and draw less power. Reduces CPU power consumption.

Daniele Magliozzi - Politecnico di Milano -9- AMD Opteron Quad-Core : A Brief Overview
Opteron 4-C 3rd generation optimizations

1.Load-Execute Instructions (for Floating-Point or Integer Operands)


2.Write-Combining (multiple memory-write cycles in a 64-B buffer)
3.Branches That Depend on Random Data(avoid random condition branch)
4.Loop Unrolling
5.Pointer Arithmetic in Loops(using loop count. as index into memory arrays)
6.Explicit Load Instructions
7.Reuse of Dead Registers
8.ccNUMA (cache coherent non-uniform memory access)
9.Prefetch and Streaming Instructions

Daniele Magliozzi - Politecnico di Milano - 10 - AMD Opteron Quad-Core : A Brief Overview


Some Technical Data

Core Speed 2800 MHz


System Bus Speed 2200 MHz
Integrated memory Speed 2200 MHz
Wattage 75 W
L1 Cache Size 64 Kbyte (X 4 Cache)
L2 Cache Size 512 Kbyte (X 4 Cache)
L3 Cache Size 6144 Kbyte (X 1 Cache)

Daniele Magliozzi - Politecnico di Milano - 11 - AMD Opteron Quad-Core : A Brief Overview

You might also like