Professional Documents
Culture Documents
Rabaey
Anantha Chandrakasan
Borivoje Nikolic
One die
Wafer
Up to ¨12” (30cm)
From http://www.amd.com
1
The number of
transistors and
resistors on a chip 1011 static memory
intel microprocessors
doubles every 18 1010
Number of transistors
months Moore, 1964]
[Gordon 109 1G
256 M
108
In other words: 64 M
2
10,000,000 100,000,000
1,000 10,000
X
100 X X
1,000
X x X
X
2.5 10 21%/Yr. compound 100
Productivity growth rate
Logic Transistors
1 per Chip (K) 10
Productivity (Trans./Staff-Mont
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
A growing gap between design complexity and design productivity
Source: sematech97
System
Add
Accumulator Register-Transfer
Input
Command Register
+1
Command Counter
& &
1 Gate
J TT
C
K
Circuit
Device
n+
p+
n
n+
p
3
Design
Circuit re-simulation
No No No
Meets Meets
requirements? Problem requirements?
Preparation
Yes solving circuit
Yes
Reliability
Stability
Area
Parameters
Switching speed (delay)
Leakage power
Fan in, fan out
Noise Immunity
4
- Bipolar
* with saturated transistors - TTL, I2L
* with non-saturated transistors -
ECL
- Unipolar
* FET
* NMOS, PMOS, CMOS
- Bipolar CMOS
© Digital Integrated Circuits2nd Design Methodologies
5
Exploding NRE / Mask
Costs
Mask Costs
(SM)
0
0,05 0,1 0,15 0,2 Process Geometry
(Micron)
cost:
¢-per-transistor
1
0.1 Fabrication capital cost per transistor (Moore’s law)
0.01
0.001
0.0001
0.00001
0.000001
0.0000001
1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012
6
where Nt, Ng – total number and number of yield ICs on
one wafer respectively
CW, CD - wafer and die cost respectively
Dw – wafer diameter, Ad –die area
MEMORY
INPUT/OUTPUT
CONTROL
INPUT-OUTPUT
DATAPATH
7
Courtesy: Philips
100-1000
Energy Efficiency (in MOPS/mW)
Embedded microprocessor
10-100
(e.g. DSP)
Configurable/Parameterizable
Hardwired custom
1-10
0.1-1
8
• Design process traverses iteratively between three abstractions:
behavior, structure, and geometry
• More and more automation for each of these steps
Custom Semicustom
Cell-based Array-based
9
Intel 4004
10
Feedthrough cell Logic cell
Routing
channel
Routing channel
Functional requirements are
module reduced by presence
(RAM, of more interconnect
multiplier,
…) layers
[Brodersen92]
11
Cell-structure
hidden under
interconnect layers
12
Initial transistor Placed Routed Compacted Finished
geometries transistors cell cell cell
Product terms
x0 x1
x2
AND OR
plane plane
f0 f1
x0 x1 x2
13
Every logic function can be
expressed in sum-of-products
format (AND-OR)
minterm
And-Plane Or-Plane
V DD φ
GND
x0 x0 x1 x1 x2 x2 f0 f1
Pull-up devices Pull-up devices
14
River PLAs
A cascade of multiple-output PLAs.
Adjacent PLAs are connected via river routing.
PRE-CHARGE
BUFFER
BUFFER PRE-CHARGE
CHARGE
BUFFER
PRE-
BUFFER PRE-CHARGE
PRE-CHARGE
BUFFER
BUFFER PRE-CHARGE
BUFFER
PRE-
BUFFER PRE-CHARGE
• Output buffers and the input buffers
of the next stage are shared.
1.4
Area:
RPLAs (2 layers) 1.23
SCs (3 layers) - 1.00, 1
NPLAs (4 layers) 1.31
Delay
RPLAs 1.04
SCs 1.00
0.6
NPLAs 1.09
Synthesis time: for RPLA , synthesis time equals design time;
SCs and NPLAs still need P&R.
Also: RPLAs are regular and predictable
0.2
0 2 4 6 area
15
256×32 (or 8192 bit) SRAM
Generated by hard-macro module generator
16
A Protocol Processor for Wireless
© Digital Integrated Circuits2nd Design Methodologies
Design Capture
HDL
Pre-Layout
Simulation
Logic Synthesis
Floorplanning
Post-Layout
Simulation Placement
Tape-out
17
Iterative Removal of Timing Violations (white lines)
Physical Synthesis
Place-and-Route
Optimization
Artwork
© Digital Integrated Circuits2nd Design Methodologies
18
Array-based
Pre-diffused Pre-wired
(Gate Arrays) (FPGA's)
polysilicon
VD D
metal
rows of Uncommited
uncommitted possible
cells GND contact Cell
routing
channel
Committed
Cell
(4-input NOR)
Out
19
Oxide-isolation
PMOS
PMOS
NMOS
NMOS
NMOS
20
© Digital Integrated Circuits2nd From Smith97 Design Methodologies
Random Logic
Memory
Subsystem
21
Via programmable gate array
(VPGA)
The image
cannot be
displayed.
Your
computer
may not have
enough
memory to
open the
image, or the
Via-programmable cross-point
image may
The image
cannot be
displayed.
Your
computer
may not
have enough
memory to
open the
metal-5 metal-6
programmable via
22
antifuse polysilicon ONO dielectric
n+ antifuse diffusion
2l
I5 I4 I3 I2 I1 I0 Programmable
OR array I3 I2 I1 I0 Programmable
OR array I5 I4 I3 I2 I1 I0 Fixed OR array
23
1 X2 X1 X0
: programmed node
NA NA f 1 f 0
i 3 j k)
programmable AND array (2 k macrocells
product
1 terms
j -wide OR array
D Q
j OUT
j
macrocell
CLK
A B C i i inputs
24
Configuration
A B S F=
0 0 0 0
0 X 1 X
A 0 0 Y 1 Y
F 0 Y X XY
X 0 Y XY
B 1
Y 0 X XY
Y 1 X X1 Y
1 0 X X
S
1 0 Y Y
1 1 1 1
B 1
SA Y
1
C
D 1
SB
S0
S1
25
In Out
Out 00 00
01 1
10 1
11 0
ln1 ln2
Figure must be
4
updated
C1....C4
D4 Bits xxxx
Logic control
D3 xx xx
function xx
xx x xx x
D2 of xx
xxx
D1
Logic xx xx
functionx x
x
of x x
F4 xxx
Bits xxxx
F3 Logic xx control xx
function xx
xx x xx x
F2 of xx
xxx
F1
xx xx
x
xxxxx x
H x
P
Multiplexer Controlled
by Configuration Program
Xilinx 4000 Series
26
Interconnect
Point
M
Cell
Horizontal
tracks
Vertical tracks
Switch Box
Connect Box
Interconnect
Point
27
© Digital Integrated Circuits2nd Courtesy Dehon and Wawrzyniek Design Methodologies
28
Primary inputs Macrocell
29
column channel row channel
t PIA
LAB1 LAB2
LAB
PIA
t PIA
LAB6
Array-based Mesh-based
(MAX 3000-7000) (MAX 9000)
I/O Buffers
Standard-cell like
floorplan
rs rs
fe
f fe
f
u u
B B
/O
I /O
I
I/O Buffers
30
12 Quad
8 Single
4 Double
3 Long
Direct
CLB 2 Connect
3 Long
12 4 4 8 4 8 4 2
Xilinx XC4000ex
31
Array Size: 8x8 (2 x 4
LUT)
Power Supply: 1.5V &
0.8V
Configuration: Mapped as
RAM
Toggle Frequency:
125MHz
Area: 3mm x 3mm
600k transistors
208-pin PGA
fclock = 50 MHz
Pav = 3.6 W @ 5V
32
Multi- 500 k Gates FPGA Embedded applications
Analog
Spectral
RAM + 1 Gbit DRAM
where cost, performance,
Imager Preprocessing and energy are the real
64 SIMD Processor
issues!
µC
Array + SRAM system DSP and control intensive
+2 Gbit Mixed-mode
Image Conditioning DRAM Combines programmable
100 GOPS Recog-
and application-specific
nition
modules
Software plays crucial role
© Digital Integrated Circuits2nd Design Methodologies
33
Silicon System Platform
Flexible architecture for hardware and software
Specific (programmable) components
Network architecture
Software modules
Rules and guidelines for design of HW and SW
Has been successful in PC’s
Dominance of a few players who specify and control architecture
Application-domain specific (difference in constraints)
Speed (compute power)
Dissipation
Costs
Real / non-real time data
34
• 0.25um 6-level metal CMOS
FPGA
• 5.2mm x 6.7mm
• 1.2 Million transistors
Reconfigurable • 40 MHz at 1V
Data-path
• 2 extra supplies: 0.4V, 1.5V
Interface
FPGA Fabric
Embedded memories
Embedded PowerPc
Hardwired multipliers
High-speed I/O
35
Digital
CMOS Design is kicking and healthy
Some major challenges down the road
caused by Deep Sub-micron
Super GHz design
Power consumption!!!!
Reliability – making it work
Some new circuit solutions are bound to emerge
Who can afford design in the years to come?
Some major design methodology change in
the making!
36