Professional Documents
Culture Documents
Performance Projections
David J. Frank
12/4/09
International Winter School for Graduate Students
IIT, Bombay, India
1. Introduction
log(Performance)
Time (yr)
2
log(Performance)
Time (yr)
3
log(System Performance)
Miniaturization
Outline
1.
2.
3.
4.
1. Limitations to Scaling
1. Electrostatic constraints
2. Quantum mechanical leakage currents
3. Discreteness of matter and energy
4. Thermodynamic limitations
5. Practical and environmental constraints on power
Basic idea of Scaling:
Adjust dimensions,
voltages, & doping to
achieve smaller FET
with same electrostatic
behavior.
}
}
Long channel
behavior
Short channel
behavior
Gate
S
Everything
becomes
leaky.
FET 'ON'
FET 'OFF'
Vdd
Gnd
Gnd
Gnd
Gnd
Vdd
ee-
Source
e-
Channel
Drain
Atomistic effects
The number of dopant atoms
in the depletion layer of a
MOSFET has been scaling
roughly as Leff1.5.
Thermodynamic limitations
The Boltzmann distribution determines the subthreshold slope
and leakage current, VT, and diode leakage currents, too.
e(VG -VT )/kT
IsubVT = I0 e
e-
Source
Channel
Drain
10
dynamic power
Large
Miniaturization
Small
log(Performance)
Power
leakage power
Declining available
dynamic power
overwhelms speed
improvements of
scaling
Large
Miniaturization
Small
12
Variables:
initial guess
new
values:
improved
guess
Area Model
Thermal
Model
Wiring
statistics
Wire Capacitance
Delay
tolerance adjustments
Device Structure
IV Model
Leakage Model
Constrained
optimizer
Leakage Power
tolerance adjustments
Total Power
13
Clock
Fudge
Repeaters
Logic
Treat in
detail
Memory
Fudge
100%
Alpha 21264 ('96)
15M FETs, L1 cache only
40%
Dothan, 140M FETs
40%
Power4, 174M
FETs
7.0%
data from:
M. Scheuermann
and M. Wisniewski
9.3%
23.3%
7.0%
9.3%
60%
60.5%
23.3%
1/3
7.0%
9.3%
20.2%
23.3%
2/3
13.3%
40.3%
7.0%
20.2%
33%
31% 36%
20.2%
13.3%
Caches (L1)
Macros
Caches
Register files
Custom & RLMs
Caps, Clock dist., Unused
12.5%
14.5%
1.2%
Latches and
LCBs
Logic
Unused/caps
Caps, Clock
dist., Unused
90%
11.2%
14.5%
16
Optimization Approaches
1. Engineering approach:
Maximize system performance, at fixed power.
Use total logic transition rate (LTR),
LTR = Ngates x activity factor/logic depth x 1/Delay
Relatively little dependence on architectural details.
2. Business approach:
Maximize Return on Investment (ROI).
17
FET Model
Using a general temperature-dependent short-channel FET model in
which VT, tD, and tox are coupled, halo doping effects are included, and
VT is set by the doping.
Modified alpha power model:
W kT kT / e (E )
V V
I D (VGS ) = effI
0
EC F GS T
tox e FI EC LCH 0
kT / e
Fermi-Dirac
integral of order
10W FET
Lg=28nm
1mW FET
Lg=45nm
18
3/ 4
Propagation
delay
1 + 42 / 3 + 34 / 3
=
0.5 + (1 VT / VDD ) (1 + )
Power Calculation
PTOT = PDYN + PsubVT + POX + PB 2 B
PDYN =
lD
N CKT
1
2
C (VH VL )VDD
1
3
LTR = lD NCKT
Note that
cross-through
power is not
included.
The powers are computed separately for logic and for repeaters.
= mean delay for a single loaded logic gate
lD
20
4FO 2r 3
2 NCKT
l
3 + FO
lR
LnoRptr
lR
N Rptr =
l Max
lR
linet (l)dl
inet (l)dl
2 r 3
linet (l)dl l R
# Wiring Levels
Required
inet (l) =
log(number of wires)
lR
2 NCKT log(length)
From optimizations:
12
10
8
6
4
1E+5 1E+6 1E+7 1E+8
Repeater Model
100
0.9
0.8
0.7
80
0.6
70
60
50
0.01
0.5
0.4
Pecon=10 W/cm2
0.1
0.3
0.2
90
1
Repeater spacing
Repeater width
0.1
0.01
9S
10S
0.001
0.01
11S
0.1
12S
13S
10
100
22
Variation sources:
Consequences modeled:
Increased static power
combine 1 sigma of doping, length, and noise
Worst-case vectors:
MPWC
vector
3 2 1
Murphy
vector
24
VDD+5%
230 W
VDD-10%
Nominal
design point:
VDD=1.058 V
P=0.01
P=1
P=50
1.2
Increased variability
requires:
1.1
1
0.9
0.8
0.7
0%
100%
150%
200%
Relative Margin
26
28
3. Optimization Results
General results
Evaluating specific possible device
directions
Increasing mobility
High-k gate dielectric and metal gates
3D stacking
Better heat sinks
Sub-ambient cooling
Multi-processor tradeoffs
29
Note that the LG, tox, VDD, VT, width, etc. are NOT preselected.
They are solved for by the optimizations.
30
Optimization results
Gate Length vs Power
120
90 nm
65 nm
45 nm
32 nm
100
80
60
40
20
0
0.01
0.1
10
100
90 nm
65 nm
45 nm
32 nm
Oxynitride
0.1
10
100
Optimization results
Voltages vs Power
0.9
0.8
Vdd,90
VT,90
Vdd,65
VT,65
Vdd,45
VT,45
Vdd,32
VT,32
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.01
0.1
10
100
Power Allocation
100%
80%
60%
40%
20%
0%
1
10
30
100
300
Mobility dependence
Enhanced mobility has greatest benefit at high power.
Even for large mobility enhancements, performance boost is
modest: 10-15%.
1.12
45nm technology
dual core processor
1.15 water cooled
Relative Performance
Relative Performance
1.2
1.1
1.05
1 W chip
10 W chip
100W chip
1.5
2
2.5
Mobility Enhancement Factor
1W
1.1
10W
100W
1.08
1.06
1.04
32nm technology
8 core processor
Air cooled
1.02
1
1.5
2
2.5
3
Mobility Enhancement Factor
34
1.3
5W, high-k
50W, high-k
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.1
0.2
0.3
0.4
0.5
Workfunction offset from bandedge (ev)
3D stacking
Multiple layers offer higher performance
due to shorter wires.
RED = 1 Layer, GREEN = 2 Layers
400
Relative Performance
300
10
200
20
15
10
5
0
100
1
10
100
10
Chip Power (W)
100
0
1.2
1
0.8
0.6
0.4
0.2
0
10
100
10
Tot Si area
Footprint
15
100
36
5
4
3
2
1
0
1
-40C Liquid
18C Water
Hi-Perf. Air
Low-Cost Air
10000
Forced liquid cooling through microchannel fins may permit very high
power densities.
Optimized (maximum) performance increases as the ~log of the power.
7
6
5
4
-40C Liquid
18C Water
Hi-Perf. Air
Low-Cost Air
3
2
1
0
1
10000
Optimized over 7 variables: Lg, tox, Nd, <w>, Drptr, <wrptr>, Vdd.
Low temperature case does not include refrigerator power.
37
Multiprocessor motivation
The energy / performance tradeoff is very steep at the high end.
Lower power, more parallel processors potentially offer more
computation for the same total power level.
30
4 core processors
0
0
10
30
1
10
3
3x
3x
0.
1E+14
1
0.
03
1E+13
0.
01
0.
00
0.1
1E+12
3
00
0.
0.
10
1E+15
1E+16
38
10
2 cores
4 cores
8 cores
16 cores
1
1
10
100
1000
Device options
General results
Technology projections
Beyond 11nm?
40
Device Options
PDSOI
IBMs best understood technology.
FinFET
FinFET Drain
ETSOI
Gate
ETSOI Gate
Source
Shallow Bulk
Source
Comparable source/drain
resistance and parasitic
capacitance models
were implemented for
ETSOI, FinFET, and for
shallow bulk MOSFETs.
Drain
1.E+09
Frequency (Hz)
As always, the
easiest way to
increase
performance is
to increase the
power.
22nm
16nm
11nm
1.E+08
1.E+07
0.01
0.1
10
100
1000
0.8
45
0.7
40
0.6
22nm
35
30
25
11nm
20
22nm
0.5
22nm
22nm
0.4
16nm
16nm
11nm
11nm
0.3
11nm
15
0.2
10
5
0
0.01
0.1
10
100
1000
0
0.01
10
100
1000
1.4
14
1.2
0.8
11nm
22nm
0.6
0.4
0.2
0
0.01
12
10
22nm
8 16nm
22nm
11nm
11nm
16nm
22nm
11nm
0.1
10
100
1000
0
0.01
0.1
10
100
1000
10
10x
22nm
16nm
11nm
22nm
4x
11nm
0.1
1.E+07
1.E+08
1.E+09
1.E+10
bi
as
co
nd
itio
ns
0.1
Hi
gh
1000:1
22 nm
16 nm
0.001
11 nm
co
nd
itio
ns
Off-current (A/cm)
0.01
Lo
w
bi
as
0.0001
10000:1
0.00001
0.1
10
100
On-current (A/cm)
Performance increases at 32nm due to hi-k introduction, but then falls as strain
1
diminishes and gate dielectric does not scale further.
0.9
6
50 W/cm2
0.7
0.6
10W/cm2
0.5
0.4
25 W/cm2
50W/cm2
0.3
0.2
0.1
0
45nm
10W/cm2
32nm
22nm
16nm
11nm
Technology Node
25W/cm2
50W/cm2
6
10 W/cm2
Frequency (GHz)
0.8
0
45nm
32nm
22nm
Technology Node
16nm
11nm
5
4
10 W/cm2
25 W/cm2
50 W/cm2
2
1
0
45nm
32nm
22nm
16nm
11nm
Technology Node
Conditions: PDSOI, 4 core processor chip, constraining total chip power density
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater
size and spacing.
47
Frequency (GHz)
3.5
3
Bulk
2.5
PDSOI
FinFET
ETSOI
1.5
1
25 W/cm2
0.5
0
45nm
32nm
22nm
16nm
11nm
Technology Node
60
PDSOI
40
PD 1e15
PD 1.5e15
30
~3.9 GHz
Fin 1e15
Fin 1.5e15
FinFET
20
10
PDSOI
PDSOI
~2.6 GHz
10
FinFET
0
45nm
32nm
22nm
16nm
11nm
Area (cm2)
50
PD 1e15
PD 1.5e15
Fin 1e15
Fin 1.5e15
Technology node
FinFET
Conditions: PDSOI and FinFET, 4 core processor chip, constraining total chip
performance
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater
size and spacing, fin height, sidewall thickness
0.1
45nm
32nm
22nm
16nm
11nm
Technology node
49
14
FinFET
12
25 W/cm2
Relative Performance
50 W/cm2
10
PD 25W/cm2
PD 50W/cm2
Fin 25W/cm2
Fin 50W/cm2
25 W/cm2
50 W/cm2
80
PDSOI
0
45nm
32nm
22nm
16nm
11nm
Technology node
Number of Cores
70
Conditions: PDSOI and FinFET, variable # core processor chip, constraining both
chip power and chip area (4 cm2).
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater
size and spacing, fin height, sidewall thickness, and number of cores.
FinFET
60
50
PD 25W/cm2
PD 50W/cm2
40
Fin 25W/cm2
Fin 50W/cm2
30
20
PDSOI
10
0
45nm
32nm
22nm
16nm
11nm
Technology node
50
18%
25 W/cm2
1.7
Relative Performance
1.6
0.7X Variability
1.5
1.4X
1.4
Mobility
Vary Vdd
Vary actf
1.3
Vary Tol
Vary mu
1.2
0.7X
1.1
1.4X
Optimizing everything
except VDD.
Activity factor
0.9
6%
0.8
0.4
0.5
0.6
0.7
0.8
0.9
Conditions: PDSOI, 4 core processor chip, constraining total chip power density
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater
size and spacing.
51
Higher DIBL
0.12
40
0.1
ETSOI
0.08
FinFET
30
20
ETSOI
FinFET
10
0
0
20
40
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.02
60
FinFET
20
40
20
40
60
ETSOI
11 nm node
0.04
Ultimately, lower
performance
0.06
Higher VDD
Thinner tSi
DIBL (V)
50
Shorter gate
lengths necessitate:
Rel. Performance
60
8
7
6
5
4
3
2
1
0
ETSOI
FinFET
20
40
60
52
FinFET
Bulk
Fin
35
0.5
30
0.6
11
0.4
22
0.3
0.2
0.1
0
0
25W/cm2
0.02
0.04
DIBL (V/V)
FinFET
ETSOI
ETSOI
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
ET
Bulk
25
20
15
10
5
0
0.06
22nm
16nm
11nm
Technology Node
Bulk
Fin
ET
Bulk
35
22
0.02
0.04
DIBL (V/V)
0.06
11
0.08
1W
30
25
20
15
10
5
0
22nm
16nm
11nm
Technology Node
53
III-V FinFETs
Higher mobility improves drive current
Tunnel FETs
Improved subthreshold slope enables low VDD and low energy
operation.
To properly model this device, have to be able to calculate the tunneling
barrier shapes and band-edge alignments.
We are in the process of developing a compact model for TFETs for the
optimizer, but results are not yet available. As an interim measure, we
can alter the Boltzmann constant in the conventional FET model, to see
the impact of steeper subthreshold slope.
10
PDSOI
1
0.1
CNFET
FinFET
TFET
0.01
1.E+06
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
Frequency (Hz)
Summary
56
Acknowledgements
Wilfried Haensch
Leland Chang
Paul Solomon
Steve Koester
Lan Wei
Philip Wong
Ghavam Shahidi
Mike Scheuermann
Phillip Restle
Omer Dokumaci
Mary Wisniewski
Steve Kosonocky
Yuan Taur
Bob Dennard
57
Extra slides
58
Comparison
of simplified
analytic
model with
detailed
numerical
model.
This
model
is red
My model
is red.
FE
is blue
Kaismodel
data is blue.
Heat sources
Si wafer
Interface
Heat spreader
(e.g., SiC or Cu)
Interface to final coolant
(e.g., air or water)
59
III-V FinFETs
III-V FinFETs are modeled by increasing the mobility in the conventional model.
Increased mobility enables an improved energy/performance tradeoff by
reducing the voltage needed for high performance designs.
FinFET
III-V 2x
III-V 4x
10
Fin
2x
4x
1.2
1x
(=Si)
1
2x
(~GaAs)
4x
(~InGaAs)
0.1
1.E+13
1.E+14
1.E+15
Energy/transition (fJ)
1.4
1
0.8
0.6
0.4
0.2
0
0
10
12
14
Frequency (GHz)
1.E+16
Performance (transitions/sec)
60
16
EG3
E
Source
EG3 B
~ 2 e
l
Gate
Drain
EG l
Ec
Ev
Vgs2
2Vgs + D(Vgs ,Vds )
P++
Log Ids
n++
S
High Vth low
power space
60mV/dec
Vgs
[Haensch]
61
TFET Heterostuctures
Planar SiGe H-TFET
gate
III-V H-TFET
ON-state
poly
source
drain
p++ SiGe
Off-state
Buried Oxide
Si
SiGe
High on
current
n+ Si
p-Si
Ambipolar
device
supressed
Si
Vg = 0
Ambipolar
device
supressed
SiGe
Si
Wire Model
Assumed constant 2:1 height to width wiring with equal lines
and spaces. ( 0.062 kBEOL fF/um )
Relative Performance change due to R(T)
Relative Performance
2.5
10W,const R
1W, const R
0.1W, const R
10W
1W
0.1W
2
1.5
1
0.5
50
100
150
200
250
300
350
400
0.95
0.90
0.85
0.001
0.01
0.1
10