Professional Documents
Culture Documents
ASYNC 2003 Conference Slide 1 Dr. Ted Williams, Infineon Technologies MorphICs
Overview
lLearning from Asynchronous and Synchronous designs
lClassic Asynchronous benefits
lPlanning for real wire delays
lMultiple solutions for “avoiding clock skew”
lReal voltages aren’t digital à Physics always wins
lHandling Min/Max bounds on all timing calculations
lRe-evaluation of Asynchronous claims/myths
lSummary of “analogous benefits”by sync design
lConclusion: General good design principles
ASYNC 2003 Conference Slide 2 Dr. Ted Williams, Infineon Technologies MorphICs
Synchronous Motivations
Clean logic state machines
Separating Logic specification from Timing specification
ASYNC 2003 Conference Slide 3 Dr. Ted Williams, Infineon Technologies MorphICs
Asynchronous Benefits
Divide-and-Conquer applied to clock distribution
lClaim of avoiding “clock skew penalty”
Easier Interfaces
lArbiters still required whenever sampling signals of unknown phase
ASYNC 2003 Conference Slide 4 Dr. Ted Williams, Infineon Technologies MorphICs
Real Systems already both sync & async
Systems à SoC (System-on-a-chip)
Diameter of a clock period à Synchronous horizon
Pipelining in space à 3D Compilers (x,y,t)
ASYNC 2003 Conference Slide 6 Dr. Ted Williams, Infineon Technologies MorphICs
Appropriate Goals
Throughout this presentation, it’s important to emphasize the context for judgement of ideas.
Irrelevant goals:
lDelay insensitivity
lThe only reasons useful “for its own sake”are for the intrinsic elegance
or academic purity, and these are unimportant for engineered products.
ASYNC 2003 Conference Slide 7 Dr. Ted Williams, Infineon Technologies MorphICs
Gates versus wires
Old view à Gates are what matter
Modern view à Wires are what matter
ASYNC 2003 Conference Slide 8 Dr. Ted Williams, Infineon Technologies MorphICs
Planning with linearized signal velocity
Asynchronous design doesn’t change reality that
distance determines delay
lTiming Closure à Performance Closure
lTiming Closure of synchronous hierarchical design is more
parallelizable across a design team.
ASYNC 2003 Conference Slide 9 Dr. Ted Williams, Infineon Technologies MorphICs
Plan for Velocity of signals
µm per ps = mm per ns
During floorplanning, multiply velocity by Manhatten distance
times a routing non-ideality factor.
Assume ideal repeaters (typically every 1mm to 2mm in a
0.13um process) get added later. Without repeaters, it only
gets worse.
The mm per ns metric assumes no fanout. Linearized velocity
delay must be added to delay of buffer trees to drive actual
loads.
Real case (on nets with more than 1 endpoint) will be between
two extremes (so examine both during floorplanning):
lBest case, assuming no fanout, just linearized velocity
delay.
lWorst case, add the delay of a buffer tree to drive the
entire capacitive load of the wire in a Steiner tree linking all
endpoints on net.
ASYNC 2003 Conference Slide 10 Dr. Ted Williams, Infineon Technologies MorphICs
Classic Synchronous Constraints
Classic goals of synchronous design:
Separate task of clock distribution from logic propagation.
Equalize clock arrival times at all registers
Tolerate skew by subtracting from maximum clock
frequency, and adding to required minimum logic delay.
ASYNC 2003 Conference Slide 11 Dr. Ted Williams, Infineon Technologies MorphICs
Classic Synchronous clocks
ASYNC 2003 Conference Slide 12 Dr. Ted Williams, Infineon Technologies MorphICs
Classic Synchronous Clock Problems
More complex logic -> More total load.
Today’s SoC can have > 500K registers
loading the clock tree.
ASYNC 2003 Conference Slide 13 Dr. Ted Williams, Infineon Technologies MorphICs
Typical Asynchronous Answer
Since low-skew distribution is hard à do away with the
clock entirely?
ASYNC 2003 Conference Slide 14 Dr. Ted Williams, Infineon Technologies MorphICs
Re-think issue of skew
The hard part in clock distribution is getting low-skew, but that
doesn’t mean we can’t have a clock.
If we don’t really force low-skew, then we can save effort and
power in clock distribution.
Rather, just build a timing methodology to robustly ensure we have
full analysis of actual skew effects on both setup and hold.
Tolerate larger skew, and actually get a more conservative design.
ASYNC 2003 Conference Slide 15 Dr. Ted Williams, Infineon Technologies MorphICs
Remove “boundary” at register clock pins
Modern, better approach to combine clock tree analysis and improvement
with combinational path optimization:
lUse actual clock paths feeding every launching & receiving register pair
lTrace clock paths toward root only back to point of “reconvergence”
(this feature added to commercial tools just in 2001)
lDon’t require clock distribution to be equal to all points à
Effectively, relaxing of “synchronicity” constraint
lLet unified critical path analysis drive improvement in the clock tree, but
only as required
lLet useful skew help
lMaximum skew doesn’t matter except when in series with top critical
paths
lMake sure analysis includes real max/min path combinations for every
clock check
Effectively, this is making a synchronous design
more asynchronous, or GALS-like, even though
regions are still using all the same root clock.
ASYNC 2003 Conference Slide 16 Dr. Ted Williams, Infineon Technologies MorphICs
Setup and Hold Checks
Launching Register Receiving Register
Combinational
D Q Logic D Q
ASYNC 2003 Conference Slide 17 Dr. Ted Williams, Infineon Technologies MorphICs
Why do we have to be careful?
We have to be sure our path-tracing accounts for all
possible paths for both setup and hold checks.
With increasing divergence in actual capacitance due to
various factors, EVERY computation needs to be
viewed as requiring analyses using both min and max
values.
If we miss any paths that did have large differences in
clock arrival times at launching and receiving registers,
we would suffer functional failures (hold violations) or
performance loss (setup violations).
ASYNC 2003 Conference Slide 18 Dr. Ted Williams, Infineon Technologies MorphICs
Why is this method still hard?
Most static timing analyzers today still use the model of
initiating all path tracing at the Q outputs of registers.
Then to do this type of analysis, they “correct”the reported
paths by the differences in the traced arrival times.
But the correction must be applied to all paths, prior to
sorting, to be sure the top-N list (that we need to optimize
and fix) really is right.
ASYNC 2003 Conference Slide 19 Dr. Ted Williams, Infineon Technologies MorphICs
Real circuits aren’t digital
Physics à Nothing ever is instantaneous in real world. Gates don’t “trigger”
Actual signal transition times are 3x to 10x gate “delays”
transition times (often defined as 10% to 90%)
Vdd
“Threshold”
Gnd
Speed versus “noise margin”tradeoffs, esp. if not full-swing, such as sense amps
Asynchronous designs shouldn’t have cycle times less than transition times!
ASYNC 2003 Conference Slide 20 Dr. Ted Williams, Infineon Technologies MorphICs
Accuracy versus uncertainty
Physics à Every quantity has an error bar
A value quoted as a single number is always wrong!
ASYNC 2003 Conference Slide 21 Dr. Ted Williams, Infineon Technologies MorphICs
Margin Types
Little or no margin
lMost common reality is that design teams run STA with just one configuration, and strive (fruitlessly) to make it “accurate”.
Explicit spread between min/max
lOnce methodology enhanced to have both min/max calculations, can choose distinct values differently for known effects.
lAppropriate for aggressor cross-coupling where definitive calculations are possible to get relative magnitudes.
Additive margin
lAdding an extra pessimistic value into each setup path and hold path check.
lIndependent of path length, or namely “once”per path.
lAdditive margin is appropriate to model:
l Inaccuracies in setup and hold characterization.
Multiplicative margin
lWiden min/max range by multiplying/dividing by a pessimistic factor each capacitance, resistance, or delay.
lSince “per gate”, it gains conservatism proportional to path length.
lMultiplicative margin (sometimes called applying a “library derating factor”) is appropriate to model:
l Inaccuracies in capacitance extraction
l Inaccuracies in library or delay calculation
l Power supply noise, including inductive bounces
l Power supply resistive voltage drops
Statistical margin
lGives each data value an error bar, and propagates the potential errors assuming a symmetric gaussian distribution.
lStatistical margin is appropriate for issues that are statistical, such as:
l Fabrication parameter spreads (transistors, metals, dielectrics)
lAssumes long paths will see cancellation of effects, which may not be true depending on the direction of on-die process “tilt”.
lNot right substitute for min/max of issues that may not actually cancel in long paths.
Corner “margin”
lReplication of entire computation set with different fabrication/circuit parameters.
lAppropriate for process, because vendors specify representative points (typically hot/typ/cold for libraries, and true circuit
analysis will also include the cross-corners: fast-n/slow-p and fast-p/slow-n)
lNot a substitute for the other types of margin that need to express spreads within a corner, but we do want to run again
using all the other types of margin at every process corner.
ASYNC 2003 Conference Slide 22 Dr. Ted Williams, Infineon Technologies MorphICs
Margin summary: Use all together!
§Explicit min/max spread for aggressor coupling
§Additive margin at register endpoints
§Multiplicative margin for all gate&wire delays in a path
§Statistical margin for on-chip process tilt
§Corner “margin”for die-to-die variation
Timing analysis should use all of the margins together, since each is
appropriate for different issues.
At each corner - all of the other margins should be applied
(it is unnecessary to combine min/max across corners, which can often differ by large factors)
Now more detail about what goes into the individual margins…
ASYNC 2003 Conference Slide 23 Dr. Ted Williams, Infineon Technologies MorphICs
Effective coupling capacitance
Voltage
Effective Effective capacitance of switching aggressor
Capacitance =
Ratio Capacitance of quiet aggressor
Time
3
Opposing
Aggressor switching aggressors
in opposite direction
2 as victim signal
No aggressor
0
Aggressor switching
in same direction
as victim signal
Aiding
-1 aggressor
0.2 1 5
Signal transition time of victim
Slew Ratio =
Signal transition time of aggressor
ASYNC 2003 Conference Slide 24 Dr. Ted Williams, Infineon Technologies MorphICs
Complementary Min/Max delays
Using a single delay value is never right!
Example: Compare two clock trees with the “same”
computed delay, but of greatly differing height.
The shorter one is better (less min/max spread) but a
single number doesn’t convey the advantage.
ASYNC 2003 Conference Slide 25 Dr. Ted Williams, Infineon Technologies MorphICs
Choosing Min/Max values
Max C: (1+x) * (Csupply (vdd/vss) + 2 * Csignal-cross-coupling)
Min C: (1-x) * (Csupply (vdd/vss) + 0 * Csignal-cross-coupling)
ASYNC 2003 Conference Slide 26 Dr. Ted Williams, Infineon Technologies MorphICs
Clock tree tuning based on min/max path tracing
Applies to clock trees both within blocks and again at top level.
ASYNC 2003 Conference Slide 27 Dr. Ted Williams, Infineon Technologies MorphICs
Practical details with today’s tools
Need to truly choose all min delays and max delays correctly.
Static timing analysis must have both min and max delay available
for each gate transition in the same timing run.
Current versions of Primetime do not allow multiple spef
capacitance datapoints for same node, so must use separate delay
calculations runs to pre-compute sdf delays, which can then be
loaded to simultaneously specify multiple min/max wire delays for
each node.
With min/max sdf delay data on every node, Primetime can then
choose the correct ones along clock paths leading separately to
launching and receiving registers, using the mode called “on-chip
variation”
So, for each block need three primetime runs (2 for sdf calc + 1
main) for each voltage/temperature/process operating condition).
For (cold,typ,hot) process, use total of 9 runs at block level.
ASYNC 2003 Conference Slide 28 Dr. Ted Williams, Infineon Technologies MorphICs
Rigorous Setup and Hold checking using simultaneous min/max delays
Combinational
D Q Logic D Q
All annotated delays
Max RC
include additional
includes 2x all
multiplicative margin to
potential signal
account for:
cross-coupling
capacitances §tool inaccuracies
§on-chip process tilt
Min RC §IR-drop supply variation
contains only
capacitance to
vdd/vss/substrate
ASYNC 2003 Conference Slide 29 Dr. Ted Williams, Infineon Technologies MorphICs
Handling min/max throughout hierarchy
Currently, commercial Static-Timing-Analysis tools have “on-chip-
variation”modes that allow the correct choices of min/max
delays for setup/hold path tracing within a block only.
For “true-hierarchical”design, must create a parent-level timing run
that does not have to see all the details in every child block.
But, “on-chip-variation”modes that work in blocks aren’t
implemented for creation of abstract timing models for use in a
parent.
So, need to more explicitly force the selection of correct min/max
combinations:
lAnnotate mixes of min/max capacitance prior to timing block abstraction
lUsing the RC data (.spef) instead of pre-computed delays (.sdf) for model abstraction has
additional advantage that the abstract enables expression of dependencies on input edge-
rate and output loads.
lConsider the different possible arcs that can be generated…
ASYNC 2003 Conference Slide 30 Dr. Ted Williams, Infineon Technologies MorphICs
Abstraction of paths into constraint arcs
Comb Comb
Logic Logic
Clock Port
Combinational
Input Port Output Port
Through-Timing Arc
Output
Setup/Hold Propagation
Timing Timing
Arc Arc
Clock Port
ASYNC 2003 Conference Slide 31 Dr. Ted Williams, Infineon Technologies MorphICs
No single choice of min/max net delay annotation is sufficient
Hierarchical boundary
Comb
Logic
min max
min or max ?
Clock
min/max values shown are the ones that would be needed for the parent HOLD check run
ASYNC 2003 Conference Slide 32 Dr. Ted Williams, Infineon Technologies MorphICs
No single choice of min/max net delay annotation is sufficient
Input Port
max Comb Comb max Output Port
D Q D Q
Logic Logic
all-max
model, often max max
used for setup
checks, but Comb Comb
still not Logic Logic
correct
max
Clock Port
min
Clock Port
ASYNC 2003 Conference Slide 33 Dr. Ted Williams, Infineon Technologies MorphICs
Take annotation from 4 separate pre-processed capacitance
sets, and recombine into 2 models per process corner
model needed
for setup min max
check in Each
parent Comb Comb grouping is
Logic Logic a pre-
processed
timing
analysis run
Clock Port type, from
which arcs
are pulled to
create these
Input Port min min Output Port
Comb D Q Comb combined
D Q
Logic Logic models for
model needed use in a
for hold check parent run.
in parent max min
Comb Comb
Logic Logic
Clock Port
ASYNC 2003 Conference Slide 34 Dr. Ted Williams, Infineon Technologies MorphICs
Need Multiple STA runs to get all arcs
min max
Comb Comb
Logic Logic
Hierarchical boundary
D Q max max
D Q
Input Port Output Port
Setup Output
Constraint Propagation
Timing Timing
Arc Arc
max min
Clock Port
Clock
ASYNC 2003 Conference Slide 36 Dr. Ted Williams, Infineon Technologies MorphICs
Recombine arcs into hold-check model
max min
Comb Comb
Logic Logic
Hierarchical boundary
D Q min min
D Q
Input Port Output Port
Hold Output
Constraint Propagation
Timing Timing
Arc Arc
min max
Clock Port
Clock
ASYNC 2003 Conference Slide 37 Dr. Ted Williams, Infineon Technologies MorphICs
Parent runs then get correct min/max treatment recursively
Max RC
includes Miller
factors for all
potential signal All annotated delays
cross-coupling include additional
capacitances multiplicative margin to
account for:
§tool inaccuracies
Min RC
contains only §on-chip process tilt
capacitance to
§IR-drop supply variation
vdd/vss/substrate
Clock Reconvergent Node
ASYNC 2003 Conference Slide 38 Dr. Ted Williams, Infineon Technologies MorphICs
Benefits of full-complementary timing analysis
More Robust, even for large integration complexity
§Both additive and multiplicative margin built into delay equations.
§Margins built into all setup/hold checks ensure functionality and timing performance
corresponding to simulation.
§Let min/max methodology encompass clock-skew judgement, to take advantage of useful
skew, and not penalize irrelevant skew.
ASYNC 2003 Conference Slide 39 Dr. Ted Williams, Infineon Technologies MorphICs
Summary: Min/Max Flow vs. Traditional Additive Margins
ASYNC 2003 Conference Slide 40 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Clock skew will stop synchronous progress
Problem: Clock skew does get worse in finer geometry
technologies, and with increasing complex designs.
ASYNC 2003 Conference Slide 41 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Asynchronous designs are safer
Problem: Process variation does get worse in finer
geometry technologies.
Variations in actual operating conditions (process,
voltage, temperature, coupling) do affect circuit speed.
ASYNC 2003 Conference Slide 42 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Completely new tools needed to handle Signal Integrity
Problem: Aggressor cross-coupling (as a percentage of
total capacitance) does get worse in finer geometry
technologies.
ASYNC 2003 Conference Slide 43 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Statistical timing takes the place of margins
Problem: A new theme is that variability in delays should be
handled by timing analysis that project every delay to be statistical.
Good in the sense of adding an “error bar”, but bad by creating
impression other margins are no longer needed, even though they
are more appropriate for the calculable known effects, such as
aggressor coupling and power-supply drops.
ASYNC 2003 Conference Slide 44 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Useful clock skew can only apply inside a block
Problem: Previous algorithms for taking advantage of
useful skew only help for individual signals.
Data buses D
Clock
ASYNC 2003 Conference Slide 45 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Async design means not thinking about timing
Problem: In an isolated sense, a chain of asynchronous
logic will still work independent of actual timing, but not
thinking about timing will almost assuredly mean
inadequate performance.
ASYNC 2003 Conference Slide 46 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : QDI design makes circuits independent of wire delay
Problem: QDI (Quasi-Delay-Insensitive) design focuses
on correctness, but can ignore performance.
ASYNC 2003 Conference Slide 47 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Performance closure is easier in Async design
Problem: Performance through an asynchronous
pipeline is constrained by the local cycle times of
handshake loops, and forward and reverse latencies of
each stage. Any of these values can be influenced by
physical wire distance, or mis-sized gates.
Asynchronous designs also have “critical paths”that
determine performance, and they often aren’t as cleanly
partitioned, and therefore require more thought to find
and care to fix.
ASYNC 2003 Conference Slide 48 Dr. Ted Williams, Infineon Technologies MorphICs
Myth: Asynchronous designs are more robust across operating
voltage changes
ASYNC 2003 Conference Slide 49 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Timing should use instance-specific power-supply voltages
ASYNC 2003 Conference Slide 50 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Async circuits lower power due to being data-driven
Problem: Circuits that transition when there is no need
for new data, can waste power.
ASYNC 2003 Conference Slide 51 Dr. Ted Williams, Infineon Technologies MorphICs
Synchronous designers believe myths too
ASYNC 2003 Conference Slide 52 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Clock gating introduces too much skew
Problem: Inserting clock gating elements into the clock
tree does change the distribution latency, and can
complicate balancing seeking matched arrival times.
Also, fine-grained gating introduces more irregularity,
making it less possible to match delays by replication.
ASYNC 2003 Conference Slide 53 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Chip area is determined by transistor count
Problem: Design complexity historically measured through transistor
count, or effective gate count. Density is usually quoted in gates/mm2,
with the implication of linear scaling.
Reality: Transistors (gates) are the free objects that sit under the
wires. Density of logic is determined by wire connectivity
requirements, for both sync & async.
Maximum total wire density is:
(num effective metal layers)/(wire pitch) = (wire length) / area
Example: A 6 layer process, may effective have 4 usable layers
For wire pitch of 0.5 microns, max density is:
4/0.5 = 8 wires per µ width = 8000 mm wire / mm2 = 8000 mm-1
1.25cm2 chip would have maximum total wire length = 1 km wire
(Analogous to “ideal”quoted density of back-to-back packed nand gates)
Typically, routed regions will be lucky to get a third to half of this.
Quoting wire-utilization density also is a fairer way of expressing top-
level area and complexity, where there may be few gates.
Density measurements in wirelength/area better than gates/area
ASYNC 2003 Conference Slide 54 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : “Synchronizers” fix asynchronous interfaces
Problem: Any time a signal of unknown phase is
sampled (including in an asynchronous arbiter), it might
be transitioning just at the wrong time, causing an
intermediate voltage value to be trapped.
Metastability can result in the trapped voltage persisting
indefinitely, with recovery based on the time constants
of the feedback loops to amplify the node away from its
midpoint.
Failures occur at the point where the signal branches,
and different receivers treat the analog voltages
differently.
Reality: Using multiple registers in series improves the
problem exponentially by allowing recovery times
greater than a single clock cycle, but it never results in a
zero failure probability.
ASYNC 2003 Conference Slide 55 Dr. Ted Williams, Infineon Technologies MorphICs
Valid: Asynchronous better enables precharged logic
Problem: In synchronous design, there is always a
challenge in how to generate precharge control signals.
Common methods either add an interval at the
beginning or end of a clock cycle, or use a clock phase
for precharging, wasting parts of a clock cycle.
ASYNC 2003 Conference Slide 56 Dr. Ted Williams, Infineon Technologies MorphICs
Valid : Good timing principles apply equally to all circuits
Timing dominated by issues outside of gate internals
(wire RC, aggressor coupling, power supply variations).
Physics: Every number has an uncertainty, and
computations should use these bounds.
Min/Max margin analysis puts design improvements
where they really do add safety and performance, for
both sync and async designs.
ASYNC 2003 Conference Slide 57 Dr. Ted Williams, Infineon Technologies MorphICs
Summary: Similar palettes for both
SoC design performance is all about Floorplanning and
Architecture that accounts for physical on-chip distance.
Gate-sizing and repeater insertion (“wire-synthesis”) must be
automated, because now fundamental.
Complexity and area are determined by wires, not transistors.
Domain crossings and sampling of unknown-phase-signals must
always be handled with care, with correct synchronizers, and
quantified metastability MTBF.
Varying the power supply voltage allows speed/power tradeoff.
Optimizing transition count is optimizing power consumption.
Removing unnecessary low-skew and synchronization constraints
(at register clock pins) allows design to focus improvement where it
matters, and is a step toward more general GALS design.
ASYNC 2003 Conference Slide 58 Dr. Ted Williams, Infineon Technologies MorphICs
Conclusions
Many of the historically claimed advantages of
asynchronous design are really Myths, because they
can be solved with equivalently good solutions in
synchronous design.
Example: Clock skew effectively handled through
better techniques that minimize overall penalty.