Professional Documents
Culture Documents
Goals of the course
Introduction to Computer Aided • Introduce fundamental concepts and algorithms
Design used in CAD layout of ICs and IC‐based systems
• Provide a broad and state of the art context for
Physical Design Automation
Physical Design Automation electronic design automation
• Identify promising new areas and problems for
Malgorzata Marek‐Sadowska research
256 A
1
256a 256a 2
Administrative details Moore’s Law (1965)
Dual Core Itanium
with 1.7B transistors
• Instructor: Malgorzata Marek‐Sadowska The number of transistors that
– 4111 Howard Frank Hall can be integrated on single
– mms@ece.ucsb.edu chip would double about
– 893‐2721 every two years
– Office hours: Tuesday 4‐5pm or by appointment.
• Grading:
– Homeworks : 20%
– Final project : 70%
– Project presentation: 10%
Courtesy, Intel ®
256a 3 256a 4
Technology scaling road map
Technology scaling road map (ITRS)
Year 2004 2006 2008 2010 2012 Year 2004 2006 2008 2010 2012
Feature size (nm) 90 65 45 32 22 Feature size (nm) 90 65 45 32 22
Integration (BT) 2 4 6 16 32 Integration (BT) 2 4 6 16 32
Delay = CV/I 07
0.7 ~0.7
07 >0.7
>0 7 Delay Scaling will
Delay Scaling will
• Fun facts about 45nm transistors Scaling slow down
– 30 million can fit on the head of a pin
– You could fit more than 2,000 across the width of a human hair • More fun facts about 45nm transistors
– If car prices had fallen at the same rate as the price of a single – It can switch on and off about 300 billion times a second
transistor has since 1968, a new car today would cost about 1 – A beam of light travels less than a tenth of an inch during the
cent time it takes a 45nm transistor to switch on and off
1
9/23/2010
Technology scaling road map
Technology Scaling Road Map
Year 2004 2006 2008 2010 2012 Year 2004 2006 2008 2010 2012
Feature size (nm) 90 65 45 32 22 Feature size (nm) 90 65 45 32 22
Intg. Capacity (BT) 2 4 6 16 32 Intg. Capacity (BT) 2 4 6 16 32
Delay = CV/I 0.7
07 ~0.7
07 >0 7
>0.7 Delay Scaling Delay = CV/I 0.7
07 ~0.7
07
Delay Scaling >0 7
>0.7
Scaling will slow down Scaling will slow down
Energy/Logic Op ~0.35 ~0.5 >0.5 Energy Scaling Energy/Logic Op >0.35 >0.5 >0.5 Energy Scaling
Scaling will slow down Scaling will slow down
• A 60% decrease in feature size increases the heat Process Variability Medium High Very High
flux (W/cm2) by six times • Transistors in a 90nm part have 30% variation in
frequency, 20x variation in leakage
Courtesy: M.J.Irwin, PSU 256a 7 Courtesy: M.J.Irwin, PSU 256a 8
VLSI Chip Power Density Scaling
Sun’s
10000 • Wire delay trends:
Surface
Rocket » wire pitch must continue to shrink because
1000 Nozzle smaller transistors must be connected.
W/cm2)
Nuclear • Time
Time-distance
distance trends:
Power Density (W
100 Reactor
» rule of thumb: RC=internal gate delay for
8086 Hot Plate optimal signal speed delay => buffering
10 4004 P6
8008 8085 Pentium®
scheme and signal drive region.
386
286 486
8080 » if gate delay decreases, the signal drive
1
1970 1980 1990 2000 2010
distance will decrease faster-than-linear.
Year
Source: Intel 256a 9 256a 10
Consequences A generic multi‐core platform
PE PE PE PE General and
• IC chip will be a set of isolated islands of logic Memory Memory Memory Memory
NIC NIC NIC NIC special purpose
• It is impossible to build super-highways to R R R R
cores (PEs)
improve this delay PE PE PE PE
Memory Memory Memory Memory
PEs likely to have
• Commonly used architectural elements like NIC NIC NIC NIC the same ISA
the same ISA
register files and crossbar switches that increase R R R R
256a 11 Courtesy: M.J. Irwin (PSU)
256a 12
2
9/23/2010
The Cost of Next Generation Product
Rapid Increase in Manufacturing Cost
Total Product Cost ($M) $30M ~ $50M @ 90nm
50
Wireless chip case
40
$2.5
60 $60
k ($K)
$2.0 $50
Single Mask 20
1.5 1.5 2.5 4.5 7.5 12 40 60
Cost/Mask
40 $40
costt ($K) $1.5
10
$30
# of Masks 12 12 12 16 20 26 30 34
$1.0
$20
0.18um 0.15 0.13um 90nm
Mask Set cost 12
18 18 30 72 150 312 1,000 2,000 $0.5
($K) 7.5
$10 Engineering Cost – 60% up
$0.0
250nm 180nm 130nm 100nm
0
Product Manufacturing Cost – 40% up
Cost
NRE/Mask Cost – 100% up
Respin cost – 78% up
The Process of Design Simplified model of design
Design
• Behavior
Implementation – Functions the system must implement; with constraints such as time,
area, power, etc.
– Implementation‐independent description.
Debug
• Register
Design – Components and their interconnections.
Initial concept: what is the function performed by the design?
Initial concept: what is the function performed by the design? – Standard components, ROM, ASIC,PLD.
Standard components ROM ASIC PLD
Constraints: Timing? Area? Cost?
Map abstract functional blocks into circuit realizations
– Timing constraints for components.
• Gate
Implementation
– Low‐level components and nets in terms of ASIC library
Assemble primitives into more complex building blocks
Composition via wiring • Mask
Choice among alternatives – Physical layout of IC board.
Debug
Faulty systems: design, composition, component, modeling flaws
Design to make debugging easier
Testing, diagnosing, and troubleshooting
256a 15 256a 16
Functional Specification Input
Schematic diagram
Assembling sub circuits Placement, routing, compaction
Logic Optimization
Layout
Design rule checking, circuit
Technology Mapping Layout verification extraction, electrical rule checking
256a 17 256a 18
3
9/23/2010
Physical Design Deep Submicron Design Challenges
• Mapping logic to physical
implementation
• Physical effects are increasingly significant
– Implementation – Parasitics, reliability issues, power management, process
• Selecting components variation, etc.
– locations • Design complexity is high
Design complexity is high
– wiring
– Multi‐functionality integration
– shapes
– Design verification is a major limitation on time‐to‐market
– Examples • Cost of fabrication facilities and mask making has
ai2.1 ai2.2
• TTL chips on a PC board increased significantly
• FPGA
• Custom CMOS chip
256a 19 256a 20
Mask Costs Design Rules Explosion
700
Number of design rules per process node
OPC Fracture 600
500
Design Mask
400
300
Mask Cost Data Volume
200
OPC, PSM, Fill increased feature complexity
increased mask cost 100
0
0.35um 0.25um 180nm 150nm 130nm 90nm
Courtesy Synopsys Inc. 256a 23 256a 24
Courtesy: A.B.Kahng, UCSD
4
9/23/2010
Interconnect Synthesis Process Variation
Optimized interconnect designs:
Constraints: • In film thickness, lateral dimensions, doping
Topology
• Delay
Sizing • Measured:
• Skew – From wafer to wafer
• Signal integrity
– From die to die –
di di inter‐die
i di
...
– Across die – intra‐die or process tilt
Spacing
• Other possible optimizations: buffer insertion, simultaneous device and interconnect sizing
…
• Automatic solutions guided by accurate interconnect models
256a 25 256a 26
Process Variation Sources Important Device Variations
• Channel length L
– Photolithography proximity effects
• Wafer: topography, reflectivity
– Optics deviations
• Reticle: CD error, proximity effects, defects – Plasma etch dependencies
• Stepper: Lens heating, focus, dose, lens • Oxide thickness tox
aberrations – Well‐controlled ‐‐ only significant between wafers
• Threshold voltage Vt
• Etch: Power, pressure, flow rate – Varying doping
• Resist: Thickness, refractive index – Annealing effects
– Mobile Q in gate oxide
• Develop: Time, temperature, rinse – Discrete dopant variations (few dopant atoms in
• Environment: Humidity, pressure transistors)
256a 27 256a 28
Inter‐chip variation Interconnect Variations
• Many of the sources of variation affect all objects on • Line width and line spacing
the same layer of the same chip. – Photolithography
• Examples: – Etching proximity effects
– Metal or dielectric layers might be thicker/thinner • Metal and dielectric thickness
– Each exposure could be over/under exposed – Chemical Mechanical Polishing
– Each layer could be over/under etched • Contact resistance
– Contact dimensions
– Etch and clean steps
256a 29 256a 30
5
9/23/2010
Interconnect variation Intra‐chip Deterministic Variation
• Looking at chip cross section • Optical Proximity Effects
• Pitch is well controlled, so spacing is not • Metal Density Effects
independent • Center vs corner focus
P5
You draw this: You get this:
g
P4 Pitch is well controlled
Width and
These dimensions P3 spacing are not
can vary P2
independent
independently
P1
P0
256a 31 256a 32
Reducing leakage power Leakage Gating with Sleep Transistor
• Most important for mobile and internet servers, as
important as speed ! • Leakage is a main concern below 90nm
• Partition the chip to allow individual control of the sleep transistors
• Standby leakage – Sleep transistor is on while the block is working
– power consumed when whole chip is idle, Tj is NOT high – Sleep transistor is off while the block is idle
(Spec temp. for mobile at 50C)
– impact on battery life for portable devices
• Active leakage
– power consumed due to device leakage when chip is Block A Block B
working, and Tj is high (110C)
• Subthreshold and Gate leakage significantly higher Sleep Sleep
– impact on overall chip thermal design power and frequency Control Control
• Ptot=Pswitch + Pleak,, Block C Block D
Sleep Sleep
Control Control
256a 33 256a 34
Sleep transistors in timing Why Automated Design Optimization?
• Difficult to comprehend in STA
– Many cells share same virtual ground through one • Complexity (circuits and design processes)
sleep transistor (legged/distributed in reality) – Abstraction level of optimization
– Voltage of virtual ground depends on current drawn by – Abstraction level of design restriction
all active gates on same sleep transistor • Design times (and redesign times)
• Need to guarantee max/min voltage on virtual ground
y y / g
• How to verify statically min/max GND voltage • Design metrics in digital circuits:
– Timing: signal delay in critical paths
• Need cell models and interaction models for cells – Area
on different virtual ground – Power dissipation
– Logic grouping, by time of common switching • Design metrics in analog circuits:
– Estimate current needed in worst case – Various: bandwidth, gain, noise figure, etc., etc.
• Lack of support in timing tools is main limiting – Specific circuit knowledge required
factor for using this technique
256a 35 256a 36
6
9/23/2010
Implementation Choices Application Specific Integrated Circuits
• Very high capacity today ‐‐ 10‐100M transistors
• Very high speed – 500MHz+
Digital Circuit Implementation Approaches Memories
– Integration
– Specificity
• Unique features:
STANDARD GATE ARRAY,
– Very high manufacturing volume
SEA OF GATES FPGA CPLD
CELL
256a 39 256a 40
256a 41 256a 42
7
9/23/2010
VD D
• Gate Array
metal
– Two‐dimensional array of logic gates
rows of Uncommited
uncommitted possible
– Traditionally connected with customized metal
cells GND contact Cell
– Every logic circuit (customer) needs a custom‐manufactured chip
Every logic circuit (customer) needs a custom‐manufactured chip
• Field Programmable
In 1 In2 In 3 In4 – Customized by programming after manufacture
routing – One FPGA can serve every customer
channel Committed • FPGA: re‐programmable hardware
Cell
(4‐input NOR)
Out
Field‐Programmable Gate Arrays Programmable Logic Devices
• Based on Configurable Logic Blocks (CLB)
• Early version: Mask‐Programmable Gate Arrays
– Build standard layout of transistors on chip
CLB CLB CLB CLB CLB CLB CLB
•Types: SRAM and fuse/anti‐fuse
– Customer specifies wiring to connect transistors into gates/system
CLB CLB CLB CLB CLB CLB CLB •Fuse‐based are cheaper, smaller – Only has to go through last few mask steps of fabrication process, so faster
and slower than SRAM‐based. than full chip fabrication
CLB CLB CLB CLB CLB CLB CLB
– May become popular again in the near future
•Switchboxes in SRAM‐based FPGA
CLB CLB CLB CLB CLB CLB CLB Can occupy 70% of the chip area. • Newer version: Programmable Logic Devices (PLD)
– Use AND‐OR array to implement arbitrary boolean functions
•SRAM very desirable in development – Programmed by burning fuses that define connection from input wires to gates
CLB CLB CLB CLB CLB CLB CLB
phase due to re‐programmability. – Customer site programming allows rapid prototyping
– Limited capacity, functionality
CLB CLB CLB CLB CLB CLB CLB • Generally have to be used in conjunction with other parts to hold state
• Used to implement logic with moderate number of inputs (< 20)
CLB CLB CLB CLB CLB CLB CLB
256a 45 256a 46
FPGA design‐manufacturing
Today ‐ Two Major Types of Programmable Logic
characteristics
• CPLD (complex programmable logic device)
– coarse‐grained two‐level AND‐OR programmable logic arrays (PLAs) • Enablers:
– fast and more predictable delay – Use of memory elements to customize prefabricated devices
– simpler interconnect structures • Problems:
• FPGA (field programmable gate array)
FPGA (field programmable gate array) – Poor cost/performance ratio
/p
– fine‐grained logic cells • Unique features:
– high logic density – Short TAT (total turnaround time)
– good design flexibility – No or very low NRE (non‐recurring expenses )
– Field‐reprogrammable
– Platform‐based design
256a 47 256a 48
8
9/23/2010
FPGA design‐manufacturing Standard Cell Design
characteristics • Design circuit using standard cells Feedthrough cell Logic cell
– cells are small numbers of gates, latches, etc.
• Abstraction level of key decision: • Technology mapping selects cells Routing
channel
Rows of cells
– Netlist • Place and wire them
• Granularity of manipulated objects: – cells placed in rows
Functional
– Logic/gate level
g /g • all cells same height, different widths module
(RAM
(RAM,
multiplier, …)
– wiring between rows or over the cells
• Predictability: high
• Reusability: Very high • CAD problems
– cell placement ‐ row and location within row
• Design flexibility: very limited
– wiring in channels
• Design cost: very low – minimize area, delay
• Performance: very low
256a 49 256a 50
Standard Cell — Example
Standard Cell – The New Generation
Cell‐structure
hidden under
interconnect layers
[Brodersen92]
256a 51 256a 52
Source: J. Rabaey, UCB.
ASIC design‐manufacturing characteristics ASIC design‐manufacturing characteristics
• Enablers: • Abstraction level of key design decision:
– Advanced EDA infrastructure – All levels – logic an above
• Problems: • Granularity of manipulated objects:
– Standard cells and macros
– Lithographic resolution
g p
– Cost • Predictability: poor
• Unique features: • Reusability: moderate
– Complex and expensive design process that can be
• Design flexibility: low
justified only for very high volume IC devices • Design cost: moderate
• Performance: much below possible
256a 53 256a 54
9
9/23/2010
General Cell Design FPGAs vs. Standard Cell ASICs
• Generalization of standard cells Parameter FPGA Standard Cell
• Cells can be large, irregularly shaped CAD tool Cost $2000 $Millions
– standard cells, RAMs, ROMs, datapaths, etc.
• Used in large designs Mask Cost 0 $1.4M US @ 90 nm
– e.g. Pentium has datapaths, RAM, ROM, standard cells, etc. Bug Fix 1 hour ~10 weeks
• CAD problems Electrical &
– placement and routing of arbitrary shapes is difficult Vendor’s
Optical Check Your Problem!
& Debug Problem
Time to
Market Fast Slow
Cache Decode
RAM PLA Die Size 2X to 20X 1X
Data path Volume Cost 1X to 20X 1X
uCode Speed 0.3X to 0.6X 1X
ROM
Std. Cells Power 2X to 5X 1X
256a 55 Source: Altera Corporation 256a 56
Std Cell ASIC Development Cost Trend Result: Declining ASIC Starts
45
40
12000
pment Costs ($M)
35 Standard Cell/Gate Arrays
30 10000
25
Desiggn Starts
8000
20
Total Develop
15 6000
10
4000
5
0 2000
0.18 µm 0.15 µm 0.13 µm 90 nm 65 nm 45 nm
0
Masks & Wafers Test & Product Engineering 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Software Design/Verification & Layout
Note: Conservative estimate; does not include re‐spins.
Source: Altera Corporation 256a 57 Source: Dataquest/Gartner 256a 58
The Custom Approach Design Methods
Intel 4004 • Full custom design
– no constraints ‐ output is geometry
• highest‐volume, highest performance designs
– requires some handcrafted design
• 5‐10 transistors/day for custom layout
– use to design cells for other methods
– primary CAD tools
primary CAD tools
• layout editor, plotter
• Cell‐based design
– compose design using a library of cells
– at board‐level, cells are chips
– cell = single gate up to microprocessor
– primary CAD tools
• partitioning
• placement and routing
10
9/23/2010
Design Methods Analysis and Verification
• Symbolic design • Analysis
– reduce problem to topology – circuit extraction
– let tools determine geometry (following design rules) • determine circuit from geometry
– can reuse same topology when design rules change • compute circuit parameters from geometry
• e.g. shrink wires from 2 microns to 1 micron • resistance, capacitance, transistor sizes
– used mostly to design cells – feed back to logic design, place & route
• Procedural design • Verification
– “cells” are programs – design rules
– module generation ‐ ROMs, RAMs, PLAs • geometry rules ‐ e.g. widths, spacings
– silicon compilation ‐ module assembly from HLL • electrical rules ‐ e.g. no floating gate inputs
• Analysis and verification – interconnect
• compare designed and extracted circuit
– design rule checking ‐ geometry widths, spacings ok?
• pin‐point difference if there is one
– circuit extraction ‐ geometry => circuit
– catch human and CAD tool bugs
– interconnect verification ‐ circuit A == circuit B?
256a 61 256a 62
High‐performance logical/physical flow
Traditional design flow
• Synthesis uses simple model for wire loads
» optimizes a measure which captures the number of • logic optimization is poorly able to:
wires – initially estimate delay in wires
» gate delays – perform incremental logic optimization ‐ each iteration results in
major perturbation in design
• Layout design phase does not change gate
implementations
p • place and route is poorly able to:
» optimizes summation of wire lengths – perform accurate timing analysis
» or a measure of topological longest delay – perform timing‐driven place and route (i. e. accept forward
constraints)
• Mismatches between predicted delays after – perform incremental place and route ‐ each iteration results in major
synthesis and actual delays are of the order 100- perturbation in design
200%.
Timing closure problem causing 10’s, 100’s, 1000’s of iterations
256a 63 256a 64
256a 65 256a 66
11
9/23/2010
• Design rule checking
• Compaction (spacing) – Input
• Hierarchical layout description, i.e. set of polygons, macros and
– Input: macro‐cells
• (possibly symbolic) layout (components and wires) • set of design rules: minimal width, spacing, enclosure …
• design rules – Output
– Output:
• legal layout with minimized area, i.e. positions of components • Location of design rule violations
and dimensions of wires are adapted in a way to minimize
f • Features to be checked oftem
F t t b h k d ft must first be isolated by “Boolean
t fi t b i l t d b “B l
chip area mask operations”.
– Initial layout may be illegal width
Green AND red length
– Sometime component dimensions are adjusted as well
(resizing) Transistor gate
– Technology migration
256a 67 256a 68
• Mask data preparation
• Circuit extraction
– Fracturing
– Input • Input: polygons
• Full layout • Output: rectangles, trapezoids
• Possibly label information
– Output
• circuit level net list
• transistors
• wires
• capacitance
– Over/under‐sizing
• resistors
• Input: polygons
– Main sub‐tasks: • Output: polygons with extended/retracted boundary
• device recognition (by Boolean mask operation)
• analysis of devices to calculate parameters
• extraction of connectivity information
256a 69 256a 70
Techniques proven useful in all these subtasks • Typical problems in graph theory:
of physical design and mask data processing – Proof or calculation of a property
• connection
• tree structure
• Graph theory
• acyclic
– We use graphs to represent relations between objects • planarity
– An object might be: • chromatic number
• a piece of geometry, component – Derivation of specific sub graphs
• a routing channel, a contact • connected components
connected components
– A relation could be: • decomposition into paths
• is located above • find one (or all) paths between two specified nodes
• must be processed before • find one (or all) spanning trees
• is connected to – Calculation of optimal sub graphs
• shortest/longest paths
• minimal spanning trees
• minimal cut sets
256a 71 256a 72
12
9/23/2010
Computational geometry deals with algorithmic aspects 256A class
of geometrical problems • Graph algorithms
• General techniques
• Often restricted to one plane • Data structures
• Objects: • Signal routing
• Steiner trees
– points
• Placement
– lines and line segments
• Floor planning
– planar subdivision of the plane (stright line embedding of planar
planar subdivision of the plane (stright line embedding of planar • Cl k
Clock routing
ti
graphs)
• Partitioning
• Typical questions: • Cell design
– intersection (segment with segment, polygon with a polygon) • Physical design of FPGAs
– point location • Layout verification
– decomposition problems
– efficient storage and retrieval
256a 73 256a 74
13