Professional Documents
Culture Documents
Mark Horowitz
Computer Systems Laboratory
Stanford University
horowitz@stanford.edu
1 4 16
FO4
12
Inverter Delays (log)
12
8.4 8.4
2 2
1.4 1.4
TT FF SS FS SF 2 2.5 3 3.5
Process Corner Power Supply (V)
40.0
• Can use FO4 delays
to look at memory Total Delay (no wire res.)
Decoder Delay (no wire res.)
access time too. Output Path Delay (no wire res.)
• Delay is log(Size) for 30.0 Total Delay (with wire res.)
an optimized design.
• Wire delay is
Delay (τ fo4)
TT ILDTop
SL SR
H Metal, ILDmiddle
ε2
W TB
ε1 ILDBot
ε1W/TT ILDTop
ε2H/SL
Metal, ILDmiddle
ε2H/SR
ε1W/TB ILDBot
• Resistance is simpler
– R/µm = ρ/wh
– Scales up as the technology shrinks
– Main reason that wire height has not scaled much
H1 R H1 1.4R
αH1 2R
W1 αW1 αW1
0.4
0.2 pF
0.2
0.1
0 0
0.25 0.18 0.13 0.1 0.07 0.05 0.035 0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um) Technology Ldrawn (um)
MAH VLSI Scaling for Architects 19
Scaling Module Wires
0.4
0.2 pF
0.2
0.1
0 0
0.25 0.18 0.13 0.1 0.07 0.05 0.035 0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um) Technology Ldrawn (um)
MAH VLSI Scaling for Architects 20
Module Wires
Conservative scaling
0.4
0.3
0.2
0.1
0
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um)
This first cut seems to imply that scaled wires aren’t a problem
• Delay of these wires are scaling (mostly) with gate speed
• Long wires get worse, but pretty slowly
• So the job a designer (or CAD tool) see stays the same, right?
9 modules, 22 exceptions 19
19 modules,
modules, 49
24 exceptions
exceptions
100
Is this important?
0
0.25 0.18 0.13 0.10 0.07 0.05 0.035
Technology Ldrawn (um)
M3
0.8
M5
0.7
0.6
0.5
0.4
0.3
0.2
4 6 8 10 12 14 16
Wire width in lambdas
MAH VLSI Scaling for Architects 26
Designer Responses
Cw Cw Cw Cw Cw CCwload Cload
4 4 4 4 4 4
14
M3
12
Delay (FO4)
10
8 Repeaters
6
4 Repeaters
M5
2
0
0 5 10 15 20
Distance (mm)
C on serva tive
12 0
10 0 S IA
M3
80
60
M5
40
20
0.25 0.2 0.15 0.1 0.05
Fe ature s ize (µm )
MAH VLSI Scaling for Architects 29
A Different View
100
10
ISPEC
SpecInt95
ISPEC
1
10
80386
80486
0.1 Pentium
Pentium II
1
0.01
1993 1994 1995 1996 1997 1998 1999
Jan-85 Jan-88 Jan-91 Jan-94 Jan-97
Year
• Plot of IPC
– Compiler + IPC 0.05
SpecInt95 / MHz
– OOO is old idea 0.03
– Uses lots of wires
• What next? 0.02
– Wider machines
0.01
– Threads
– Speculation 0.00
• Guess answers to Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98
create parallelism
– Have high wire costs
MAH VLSI Scaling for Architects 34
Architecture Scaling Issues
1000
MHz
100
80386
80486
Pentium
Pentium II
10
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98
• Caused by:
– Faster circuit families
(dynamic logic)
80386
80486
– Better optimization
Pentium
Pentium II – Better micro-architecture
10.00
– Better adder/mem arch
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98 • All this generally requires
more transistors
MAH VLSI Scaling for Architects 37
Gates Per Clock Limits
Mainframe
uP
Yes and No
• Uniprocessor performance growth will slow down
– Lastest jump is getting to the 16ish FO4 cycles
• People will change the benchmarks to fix this problem
– More data parallel application
• Multi-media / streaming applications
– More threaded applications