You are on page 1of 59

Clock Skew and other Myths

Dr. Ted Williams


Infineon Technologies MorphICs

ASYNC 2003 Conference Slide 1 Dr. Ted Williams, Infineon Technologies MorphICs
Overview
lLearning from Asynchronous and Synchronous designs
lClassic Asynchronous benefits
lPlanning for real wire delays
lMultiple solutions for “avoiding clock skew”
lReal voltages aren’t digital à Physics always wins
lHandling Min/Max bounds on all timing calculations
lRe-evaluation of Asynchronous claims/myths
lSummary of “analogous benefits”by sync design
lConclusion: General good design principles

ASYNC 2003 Conference Slide 2 Dr. Ted Williams, Infineon Technologies MorphICs
Synchronous Motivations
Clean logic state machines
Separating Logic specification from Timing specification

Separating Logic verification from Timing verification:


Formal Verification can verify logic without timing

Performance optimization can be automated


independently of logic functionality.

ASYNC 2003 Conference Slide 3 Dr. Ted Williams, Infineon Technologies MorphICs
Asynchronous Benefits
Divide-and-Conquer applied to clock distribution
lClaim of avoiding “clock skew penalty”

Robustness with respect to variable delays


lFrom data, process, voltage, temperature, coupling, characterization
lScalability and portability

Operation at actual rather than worst-case


lLess margining

More self-consistent for precharged logic


lMany better choices for control of precharge/postcharge intervals
lDual usage of “completion detection”for asynchronous handshaking

Easier Interfaces
lArbiters still required whenever sampling signals of unknown phase

ASYNC 2003 Conference Slide 4 Dr. Ted Williams, Infineon Technologies MorphICs
Real Systems already both sync & async
Systems à SoC (System-on-a-chip)
Diameter of a clock period à Synchronous horizon
Pipelining in space à 3D Compilers (x,y,t)

Chip Die Area


Event Horizon
Event Horizon @ 500MHz
@ 100MHz

Clock domain crossing à Check phase-correlation


Globally Asynchronous - Locally Synchronous (GALS)
ASYNC 2003 Conference Slide 5 Dr. Ted Williams, Infineon Technologies MorphICs
Learning from each other
In last 3 decades, synchronous designers have been
usefully challenged by asynchronous designers.

Today, modern SoC designs always have both aspects.

The technical part of this presentation will delve into


some “synchronous details”, but useful even for
asynchronous designers to learn principles needed.

Goal: High-performance design

Theme: If you can analyze, you can optimize.

ASYNC 2003 Conference Slide 6 Dr. Ted Williams, Infineon Technologies MorphICs
Appropriate Goals
Throughout this presentation, it’s important to emphasize the context for judgement of ideas.

Relevant goals for improvement:


lPerformance
lCost
lArea, yield
lRobustness
lReliability, Manufacturability
lDifficulty
lSchedule, manpower
lRisk

Irrelevant goals:
lDelay insensitivity
lThe only reasons useful “for its own sake”are for the intrinsic elegance
or academic purity, and these are unimportant for engineered products.

ASYNC 2003 Conference Slide 7 Dr. Ted Williams, Infineon Technologies MorphICs
Gates versus wires
Old view à Gates are what matter
Modern view à Wires are what matter

Changes to basic optimization and planning strategies

Synchronous world-view changes:


lAcademics now counting interconnect instead of counting transistors
l“Design compiler”counting cell area versus new “Physical compilers”
lWire-load model farce becoming increasingly unable to cope

Asynchronous world-view changes:


lStopfocus on gate delays separated by “isochronic forks”
lQDI: Quasi-Delay-Insensitive (but ignores rather than addresses the issue)
lEncourage pipelining on the wires

ASYNC 2003 Conference Slide 8 Dr. Ted Williams, Infineon Technologies MorphICs
Planning with linearized signal velocity
Asynchronous design doesn’t change reality that
distance determines delay
lTiming Closure à Performance Closure
lTiming Closure of synchronous hierarchical design is more
parallelizable across a design team.

Floorplan with an early view of timing issues


lPredominant effect on achieving timing goal is wire lengths to actual
physical locations of gates and ports.
lRun early top-level timing budgeting, assuming registered block
inputs/outputs (best case, but add some margin)
lDrives analysis of the long paths, more pre-planning of buses, tuning
of block region ports, guidance for ram instance placements, and
study of actual gate placements along critical paths.
lSeparate wire planning from later actual repeater insertion.

ASYNC 2003 Conference Slide 9 Dr. Ted Williams, Infineon Technologies MorphICs
Plan for Velocity of signals
µm per ps = mm per ns
During floorplanning, multiply velocity by Manhatten distance
times a routing non-ideality factor.
Assume ideal repeaters (typically every 1mm to 2mm in a
0.13um process) get added later. Without repeaters, it only
gets worse.
The mm per ns metric assumes no fanout. Linearized velocity
delay must be added to delay of buffer trees to drive actual
loads.
Real case (on nets with more than 1 endpoint) will be between
two extremes (so examine both during floorplanning):
lBest case, assuming no fanout, just linearized velocity
delay.
lWorst case, add the delay of a buffer tree to drive the
entire capacitive load of the wire in a Steiner tree linking all
endpoints on net.

ASYNC 2003 Conference Slide 10 Dr. Ted Williams, Infineon Technologies MorphICs
Classic Synchronous Constraints
Classic goals of synchronous design:
Separate task of clock distribution from logic propagation.
Equalize clock arrival times at all registers
Tolerate skew by subtracting from maximum clock
frequency, and adding to required minimum logic delay.

ClockPeriod >= LogicDelaymaximum + SetupD->Clock + PropClock->Q + Skewmaximum


LogicDelayminimum >= HoldD->Clock + Skewmaximum

Consider every register to require a known outcome to


2 races, one for setup and one for hold constraints
across all conditions

ASYNC 2003 Conference Slide 11 Dr. Ted Williams, Infineon Technologies MorphICs
Classic Synchronous clocks

Before describing a fresher approach, review steps of


“old”way:
lAssume a uniform clock-arrival time at all registers
lAssess the skew that will still be present even after
seeking to make the clock arrive synchronously
everywhere.
lPenalize the attainable clock-frequency by adding
the skew into every setup path computation
lPenalize area by adding delay elements required to
correct hold violations assuming the whole skew
budget
lWork hard to distribute the clock within the
specified skew, even in cases where the paths
actually have plenty of slack.

ASYNC 2003 Conference Slide 12 Dr. Ted Williams, Infineon Technologies MorphICs
Classic Synchronous Clock Problems
More complex logic -> More total load.
Today’s SoC can have > 500K registers
loading the clock tree.

Longer latency of clock tree to buffer up to bigger load,


so more proportional variability
More cross-coupling à more variability
Denser clock grids à higher power

à Low-skew Clock tree distribution becoming harder

ASYNC 2003 Conference Slide 13 Dr. Ted Williams, Infineon Technologies MorphICs
Typical Asynchronous Answer
Since low-skew distribution is hard à do away with the
clock entirely?

lLoses the simplicity of synchronous state model


lCompounds logic verification dilemma, by coupling it to fine-
grain event sequencing
lMisses the point that the problem isn’t the clock itself or the
clock latency, but only the handling the calculable part of the
arrival time difference (“skew”).

ASYNC 2003 Conference Slide 14 Dr. Ted Williams, Infineon Technologies MorphICs
Re-think issue of skew
The hard part in clock distribution is getting low-skew, but that
doesn’t mean we can’t have a clock.
If we don’t really force low-skew, then we can save effort and
power in clock distribution.
Rather, just build a timing methodology to robustly ensure we have
full analysis of actual skew effects on both setup and hold.
Tolerate larger skew, and actually get a more conservative design.

To implement, start tracing of races at the point in clock distribution


where the paths to the launching register and receiving register
diverge.
Or backtracing from the registers, find the point in the clock tree
where the reverse paths from launching register and receiving
register re-converge.

ASYNC 2003 Conference Slide 15 Dr. Ted Williams, Infineon Technologies MorphICs
Remove “boundary” at register clock pins
Modern, better approach to combine clock tree analysis and improvement
with combinational path optimization:

lUse actual clock paths feeding every launching & receiving register pair
lTrace clock paths toward root only back to point of “reconvergence”
(this feature added to commercial tools just in 2001)
lDon’t require clock distribution to be equal to all points à
Effectively, relaxing of “synchronicity” constraint
lLet unified critical path analysis drive improvement in the clock tree, but
only as required
lLet useful skew help
lMaximum skew doesn’t matter except when in series with top critical
paths
lMake sure analysis includes real max/min path combinations for every
clock check
Effectively, this is making a synchronous design
more asynchronous, or GALS-like, even though
regions are still using all the same root clock.
ASYNC 2003 Conference Slide 16 Dr. Ted Williams, Infineon Technologies MorphICs
Setup and Hold Checks
Launching Register Receiving Register

Combinational
D Q Logic D Q

Clock Reconvergent Node

Setup Path Check Hold Path Check


Red < Green + Period Red > Green

Observe: No explicit clock skew


Observe: Variability in the
budget needs to be added,
“Common part” has no effect on
because path tracing already
the setup and hold races for this
includes effect of any arrival time
pair of registers. Clock differences.

ASYNC 2003 Conference Slide 17 Dr. Ted Williams, Infineon Technologies MorphICs
Why do we have to be careful?
We have to be sure our path-tracing accounts for all
possible paths for both setup and hold checks.
With increasing divergence in actual capacitance due to
various factors, EVERY computation needs to be
viewed as requiring analyses using both min and max
values.
If we miss any paths that did have large differences in
clock arrival times at launching and receiving registers,
we would suffer functional failures (hold violations) or
performance loss (setup violations).

Old method even worse: Adding large enough additive


margin to account for biggest possible difference
between all longest max and shortest min paths

ASYNC 2003 Conference Slide 18 Dr. Ted Williams, Infineon Technologies MorphICs
Why is this method still hard?
Most static timing analyzers today still use the model of
initiating all path tracing at the Q outputs of registers.
Then to do this type of analysis, they “correct”the reported
paths by the differences in the traced arrival times.
But the correction must be applied to all paths, prior to
sorting, to be sure the top-N list (that we need to optimize
and fix) really is right.

Also, back-tracing must stop at point of clock-reconvergence.


Applying separate min/max delays to the common part of the
clock tree would be an incorrect and overly severe penalty.

Now examine in more detail how to handle min/max issues


and where the spreads come from…

ASYNC 2003 Conference Slide 19 Dr. Ted Williams, Infineon Technologies MorphICs
Real circuits aren’t digital
Physics à Nothing ever is instantaneous in real world. Gates don’t “trigger”
Actual signal transition times are 3x to 10x gate “delays”
transition times (often defined as 10% to 90%)

Vdd

“Threshold”

Gnd

Can even move threshold


definition and change
t1 individual delays, but still
Delays: t2 get same sum through a
path.
t3

Speed versus “noise margin”tradeoffs, esp. if not full-swing, such as sense amps
Asynchronous designs shouldn’t have cycle times less than transition times!

ASYNC 2003 Conference Slide 20 Dr. Ted Williams, Infineon Technologies MorphICs
Accuracy versus uncertainty
Physics à Every quantity has an error bar
A value quoted as a single number is always wrong!

Calculated delays always have a range due to:


lFabrication parameter spreads (transistors, metals, dielectrics) ß process corners
lAggressor/victim signal couplings ß address explicitly
lInaccuracies in capacitance extraction
lInaccuracies in delay calculation
address by
lSwitching threshold approximations multiplicative
lPower supply noise, including inductive bounces margin factors

lPower supply resistive voltage drops

Don’t get bogged down in 3% “correlation to spice”when


already ignoring many 10% factors

ASYNC 2003 Conference Slide 21 Dr. Ted Williams, Infineon Technologies MorphICs
Margin Types
Little or no margin
lMost common reality is that design teams run STA with just one configuration, and strive (fruitlessly) to make it “accurate”.
Explicit spread between min/max
lOnce methodology enhanced to have both min/max calculations, can choose distinct values differently for known effects.
lAppropriate for aggressor cross-coupling where definitive calculations are possible to get relative magnitudes.
Additive margin
lAdding an extra pessimistic value into each setup path and hold path check.
lIndependent of path length, or namely “once”per path.
lAdditive margin is appropriate to model:
l Inaccuracies in setup and hold characterization.

Multiplicative margin
lWiden min/max range by multiplying/dividing by a pessimistic factor each capacitance, resistance, or delay.
lSince “per gate”, it gains conservatism proportional to path length.
lMultiplicative margin (sometimes called applying a “library derating factor”) is appropriate to model:
l Inaccuracies in capacitance extraction
l Inaccuracies in library or delay calculation
l Power supply noise, including inductive bounces
l Power supply resistive voltage drops

Statistical margin
lGives each data value an error bar, and propagates the potential errors assuming a symmetric gaussian distribution.
lStatistical margin is appropriate for issues that are statistical, such as:
l Fabrication parameter spreads (transistors, metals, dielectrics)
lAssumes long paths will see cancellation of effects, which may not be true depending on the direction of on-die process “tilt”.
lNot right substitute for min/max of issues that may not actually cancel in long paths.

Corner “margin”
lReplication of entire computation set with different fabrication/circuit parameters.
lAppropriate for process, because vendors specify representative points (typically hot/typ/cold for libraries, and true circuit
analysis will also include the cross-corners: fast-n/slow-p and fast-p/slow-n)
lNot a substitute for the other types of margin that need to express spreads within a corner, but we do want to run again
using all the other types of margin at every process corner.

ASYNC 2003 Conference Slide 22 Dr. Ted Williams, Infineon Technologies MorphICs
Margin summary: Use all together!
§Explicit min/max spread for aggressor coupling
§Additive margin at register endpoints
§Multiplicative margin for all gate&wire delays in a path
§Statistical margin for on-chip process tilt
§Corner “margin”for die-to-die variation

Timing analysis should use all of the margins together, since each is
appropriate for different issues.
At each corner - all of the other margins should be applied
(it is unnecessary to combine min/max across corners, which can often differ by large factors)

For asynchronous designs, margin analysis is still needed to quantify the


performance, and thus to verify the meeting of the performance targets.

Now more detail about what goes into the individual margins…

ASYNC 2003 Conference Slide 23 Dr. Ted Williams, Infineon Technologies MorphICs
Effective coupling capacitance
Voltage
Effective Effective capacitance of switching aggressor
Capacitance =
Ratio Capacitance of quiet aggressor
Time

3
Opposing
Aggressor switching aggressors
in opposite direction
2 as victim signal

No aggressor

0
Aggressor switching
in same direction
as victim signal
Aiding
-1 aggressor

0.2 1 5
Signal transition time of victim
Slew Ratio =
Signal transition time of aggressor

ASYNC 2003 Conference Slide 24 Dr. Ted Williams, Infineon Technologies MorphICs
Complementary Min/Max delays
Using a single delay value is never right!
Example: Compare two clock trees with the “same”
computed delay, but of greatly differing height.
The shorter one is better (less min/max spread) but a
single number doesn’t convey the advantage.

Every delay really within a range, so bound by min/max


Every constraint has a “dual”
Every calculation needs the “pessimistic”combination

Unfortunately, often a significant CAD tool change:


lDoubling size of many data structures
lMust pick the correct combinations

ASYNC 2003 Conference Slide 25 Dr. Ted Williams, Infineon Technologies MorphICs
Choosing Min/Max values
Max C: (1+x) * (Csupply (vdd/vss) + 2 * Csignal-cross-coupling)
Min C: (1-x) * (Csupply (vdd/vss) + 0 * Csignal-cross-coupling)

Using these Max and Min RC values in all Setup/Hold


checks back to point of clock reconvergence forces
attention to both potential data and clock improvements
without paying global arbitrary penalties.
lMore sophisticated capacitance munging can take into account some actual
ratios of correlated aggressor coupling to total coupling.
lMin and Max variations (1 +/- x) factors can apply to resistances too.
lMultiplicative factors more important than simple additive ps margins.
lThese RC changes are not meant to handle process corners, but to be
applied within a process corner.
lAll comparison is done within a process corner, but multiple corners used for
the checks (setup interesting at cold process, hold interesting at hot process)

ASYNC 2003 Conference Slide 26 Dr. Ted Williams, Infineon Technologies MorphICs
Clock tree tuning based on min/max path tracing

Tuning to fix paths can be either in comb logic or in clock trees

Registers on neighboring clock branches see little clock skew.


Registers on remote clock branches see big skew, and it is real!

Without doing this analysis, we might be inserting delay to “match


skew” when it actually hurts performance or conservatism.

Applies to clock trees both within blocks and again at top level.

Tuning of clock trees and delay matching to account for varying


block insertion delays is done at typical process, but analysis is
seen at the corners.

ASYNC 2003 Conference Slide 27 Dr. Ted Williams, Infineon Technologies MorphICs
Practical details with today’s tools
Need to truly choose all min delays and max delays correctly.
Static timing analysis must have both min and max delay available
for each gate transition in the same timing run.
Current versions of Primetime do not allow multiple spef
capacitance datapoints for same node, so must use separate delay
calculations runs to pre-compute sdf delays, which can then be
loaded to simultaneously specify multiple min/max wire delays for
each node.
With min/max sdf delay data on every node, Primetime can then
choose the correct ones along clock paths leading separately to
launching and receiving registers, using the mode called “on-chip
variation”
So, for each block need three primetime runs (2 for sdf calc + 1
main) for each voltage/temperature/process operating condition).
For (cold,typ,hot) process, use total of 9 runs at block level.

ASYNC 2003 Conference Slide 28 Dr. Ted Williams, Infineon Technologies MorphICs
Rigorous Setup and Hold checking using simultaneous min/max delays

Launching Register Receiving Register

Combinational
D Q Logic D Q
All annotated delays
Max RC
include additional
includes 2x all
multiplicative margin to
potential signal
account for:
cross-coupling
capacitances §tool inaccuracies
§on-chip process tilt
Min RC §IR-drop supply variation
contains only
capacitance to
vdd/vss/substrate

Clock Reconvergent Node

Setup Path Check Hold Path Check


Max RC in clock feeding Launching Register Min RC in clock feeding Launching Register
Register clk->q prop delay Register clk->q prop delay
Max logic choices in Combinational Logic Min logic choices in Combinational Logic
Max RC in Combinational Logic Min RC in Combinational Logic
Min RC in clock feeding Receiving Register Max RC in clock feeding Receiving Register
Library Setup spec for Register Clock Library Hold spec for Register + margin

ASYNC 2003 Conference Slide 29 Dr. Ted Williams, Infineon Technologies MorphICs
Handling min/max throughout hierarchy
Currently, commercial Static-Timing-Analysis tools have “on-chip-
variation”modes that allow the correct choices of min/max
delays for setup/hold path tracing within a block only.
For “true-hierarchical”design, must create a parent-level timing run
that does not have to see all the details in every child block.
But, “on-chip-variation”modes that work in blocks aren’t
implemented for creation of abstract timing models for use in a
parent.
So, need to more explicitly force the selection of correct min/max
combinations:
lAnnotate mixes of min/max capacitance prior to timing block abstraction
lUsing the RC data (.spef) instead of pre-computed delays (.sdf) for model abstraction has
additional advantage that the abstract enables expression of dependencies on input edge-
rate and output loads.
lConsider the different possible arcs that can be generated…

ASYNC 2003 Conference Slide 30 Dr. Ted Williams, Infineon Technologies MorphICs
Abstraction of paths into constraint arcs

Input Port Output Port


Comb Comb
D Q D Q
Logic Logic

Comb Comb
Logic Logic

Clock Port

Combinational
Input Port Output Port
Through-Timing Arc

Output
Setup/Hold Propagation
Timing Timing
Arc Arc

Clock Port
ASYNC 2003 Conference Slide 31 Dr. Ted Williams, Infineon Technologies MorphICs
No single choice of min/max net delay annotation is sufficient

Hierarchical boundary

min Comb min min Comb min


D Q D Q D Q
Logic Logic

Comb
Logic
min max
min or max ?

Clock

min/max values shown are the ones that would be needed for the parent HOLD check run

ASYNC 2003 Conference Slide 32 Dr. Ted Williams, Infineon Technologies MorphICs
No single choice of min/max net delay annotation is sufficient

Input Port
max Comb Comb max Output Port
D Q D Q
Logic Logic

all-max
model, often max max
used for setup
checks, but Comb Comb
still not Logic Logic
correct
max

Clock Port

Input Port min Comb Comb min Output Port


D Q D Q
Logic Logic
all-min model,
often used for
hold checks, min min
but still not
correct Comb Comb
Logic Logic

min

Clock Port

ASYNC 2003 Conference Slide 33 Dr. Ted Williams, Infineon Technologies MorphICs
Take annotation from 4 separate pre-processed capacitance
sets, and recombine into 2 models per process corner

Input Port max Comb Comb max Output Port


D Q D Q
Logic Logic

model needed
for setup min max
check in Each
parent Comb Comb grouping is
Logic Logic a pre-
processed
timing
analysis run
Clock Port type, from
which arcs
are pulled to
create these
Input Port min min Output Port
Comb D Q Comb combined
D Q
Logic Logic models for
model needed use in a
for hold check parent run.
in parent max min
Comb Comb
Logic Logic

Clock Port

ASYNC 2003 Conference Slide 34 Dr. Ted Williams, Infineon Technologies MorphICs
Need Multiple STA runs to get all arcs

Need a total of 4 static-timing-analysis runs for each of


the min/max groupings, for each process corner.

For example, to cover (cold,typ,hot) process corners,


need 4*3 = 12 block characterization runs.

The timing arcs for block_inputà clock and


clockà block_output taken from above runs, but must
use external timing arc swizzling to combine (from
different block runs) the right arcs into the models
needed for the parent-level setup and hold runs.
Always combine arcs only from the same process
corner. Keep process corners separate, but perform
same swizzling of min/max arcs again at each corner.
ASYNC 2003 Conference Slide 35 Dr. Ted Williams, Infineon Technologies MorphICs
Recombine arcs into setup-check model

max Comb Comb max


D Q D Q
Input Port Logic Logic Output Port

min max
Comb Comb
Logic Logic

Clock Port Clock Port

Hierarchical boundary

D Q max max
D Q
Input Port Output Port
Setup Output
Constraint Propagation
Timing Timing
Arc Arc
max min

Clock Port
Clock

ASYNC 2003 Conference Slide 36 Dr. Ted Williams, Infineon Technologies MorphICs
Recombine arcs into hold-check model

min Comb Comb min


D Q D Q
Input Port Logic Logic Output Port

max min
Comb Comb
Logic Logic

Clock Port Clock Port

Hierarchical boundary

D Q min min
D Q
Input Port Output Port
Hold Output
Constraint Propagation
Timing Timing
Arc Arc
min max

Clock Port
Clock

ASYNC 2003 Conference Slide 37 Dr. Ted Williams, Infineon Technologies MorphICs
Parent runs then get correct min/max treatment recursively

Launching Register Receiving Register “Registers” can


now be whole
Combinational Mgate blocks
D Q Logic D Q

Max RC
includes Miller
factors for all
potential signal All annotated delays
cross-coupling include additional
capacitances multiplicative margin to
account for:
§tool inaccuracies
Min RC
contains only §on-chip process tilt
capacitance to
§IR-drop supply variation
vdd/vss/substrate
Clock Reconvergent Node

Setup Path Check Hold Path Check


Max RC in clock feeding Launching Register Min RC in clock feeding Launching Register
Register clk->q prop delay Register clk->q prop delay
Max logic choices in Combinational Logic Min logic choices in Combinational Logic
Max RC in Combinational Logic Min RC in Combinational Logic
Min RC in clock feeding Receiving Register Max RC in clock feeding Receiving Register
Library Setup spec for Register Clock Library Hold spec for Register + margin

ASYNC 2003 Conference Slide 38 Dr. Ted Williams, Infineon Technologies MorphICs
Benefits of full-complementary timing analysis
More Robust, even for large integration complexity
§Both additive and multiplicative margin built into delay equations.
§Margins built into all setup/hold checks ensure functionality and timing performance
corresponding to simulation.
§Let min/max methodology encompass clock-skew judgement, to take advantage of useful
skew, and not penalize irrelevant skew.

Lower-power clock distribution


§Differentblocks can have different clock-distribution skew requirements
§Enables tall-thin clock trees with reconvergent tracing of launching and receiving clock trees,
applying min/max spread only where paths diverge.
§Increased shippable yield throughout process spread.
§Can take advantage of useful clock skew between registers and blocks

Automated, simultaneous setup + hold improvements


§Normal critical path analysis and optimization improves clock-tree as well as datapaths.
§Can allow buffer insertion and gate-sizing tools (example: Sequence Design PhysStudio) to
improve clock tree as well as combinational logic.
§Delay buffers for hold-fixes inserted at points of maximum setup slack.
§Timing closure: touching fewer items each pass (primarily closure of only hold violations in
later passes).

ASYNC 2003 Conference Slide 39 Dr. Ted Williams, Infineon Technologies MorphICs
Summary: Min/Max Flow vs. Traditional Additive Margins

By adding min/max computation into the fundamental path timing


computation, normal critical path sorting will show where there is
the least slack, allowing optimization and designer attention to fix it.

This single robust method automatically handles:


lAnalyzing clock tree together with the downstream data paths.
lAllowing different blocks to have different clock-distribution skew
requirements (lower power & area, avoiding unnecessary gates and work)
lTaking advantage of useful clock skew between registers (automatic)
lTaking advantage of useful clock skew between blocks (crafted choices,
but automatic verification)
lRequires no special casing for gated clocks that insert different local
skew. Control signal for gated clock must come from a register inside
block so its own setup/hold checks are handled at block level.

Min/Max timing analysis applies to Async performance


analysis too, even if not required for correctness.

ASYNC 2003 Conference Slide 40 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Clock skew will stop synchronous progress
Problem: Clock skew does get worse in finer geometry
technologies, and with increasing complex designs.

Reality: Methods that handle the skew analysis along


with other path analysis can tackle this problem, and
enable increasingly higher-frequency synchronous
designs.
The separation of logic verification and timing
verification will continue to promote synchronous
designs.

ASYNC 2003 Conference Slide 41 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Asynchronous designs are safer
Problem: Process variation does get worse in finer
geometry technologies.
Variations in actual operating conditions (process,
voltage, temperature, coupling) do affect circuit speed.

Reality: Consistently handling min/max delays and RC


data can tackle this problem, and enable increasingly
robust synchronous designs too.

Analysis of the expected worst-case is always needed


anyway, to ensure correct functional behavior of the
enclosing system.

ASYNC 2003 Conference Slide 42 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Completely new tools needed to handle Signal Integrity
Problem: Aggressor cross-coupling (as a percentage of
total capacitance) does get worse in finer geometry
technologies.

Reality: Aggressor couplings are one piece of delay


variation that can be accounted for using min/max RC
data full-complementary timing analysis.
Min/Max analysis does not necessarily require the use
of “Primetime-SI”or any other “signal-integrity”tool.
These tools often haven’t implemented a multiplicative
min/max spread to account for extraction inaccuracy.
But, these tools do have advantage of more
sophisticated exclusion of aggressor couplings that
can’t occur due to non-overlapping switching windows.

ASYNC 2003 Conference Slide 43 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Statistical timing takes the place of margins
Problem: A new theme is that variability in delays should be
handled by timing analysis that project every delay to be statistical.
Good in the sense of adding an “error bar”, but bad by creating
impression other margins are no longer needed, even though they
are more appropriate for the calculable known effects, such as
aggressor coupling and power-supply drops.

Reality: When propagating through logic, min/max values may not


be gaussian or even symmetrical.
While statistical tools can expose some of the same potential faults,
they won’t be guaranteed to find them as exhaustively as the
min/max bounded approach will. The best use is to combine both
approaches, using a component of statistical variation on top of the
min/max approach for the other effects (such as aggressor coupling
and IR-drops) that won’t cancel out in long paths.

ASYNC 2003 Conference Slide 44 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Useful clock skew can only apply inside a block
Problem: Previous algorithms for taking advantage of
useful skew only help for individual signals.

Reality: It is possible to deliberately skew the arrival


time of the clock to whole regions, to adjust for the
harder direction of data travel. The fully-complementary
timing analysis can still completely verify the
hierarchical interconnect timing across hierarchy.
Block A Data buses C Block B

Data buses D

Clock

ASYNC 2003 Conference Slide 45 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Async design means not thinking about timing
Problem: In an isolated sense, a chain of asynchronous
logic will still work independent of actual timing, but not
thinking about timing will almost assuredly mean
inadequate performance.

Reality: Good performance doesn’t come for free.


Even asynchronous pipelines must analyze timing
latency and throughput to ensure desired operation.

ASYNC 2003 Conference Slide 46 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : QDI design makes circuits independent of wire delay
Problem: QDI (Quasi-Delay-Insensitive) design focuses
on correctness, but can ignore performance.

Reality: The performance is still dependent on the


actual wire distances (delays).
QDI misses the point by hiding the issue, rather than
exposing it.
Without highlighting of performance loss, insufficient
attention brought to wire distances and floorplanning.
Even after insertion of async handshaking buffers, still
need “performance closure”.

ASYNC 2003 Conference Slide 47 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Performance closure is easier in Async design
Problem: Performance through an asynchronous
pipeline is constrained by the local cycle times of
handshake loops, and forward and reverse latencies of
each stage. Any of these values can be influenced by
physical wire distance, or mis-sized gates.
Asynchronous designs also have “critical paths”that
determine performance, and they often aren’t as cleanly
partitioned, and therefore require more thought to find
and care to fix.

Reality: Synchronous designs allow better separation


of issues, and parallelizable effort.

ASYNC 2003 Conference Slide 48 Dr. Ted Williams, Infineon Technologies MorphICs
Myth: Asynchronous designs are more robust across operating
voltage changes

Problem: Synchronous libraries are typically


characterized at only specific voltages.
But, the fundamental switching digital circuits do work
over the same ranges in both async and sync design.
The multiplicative factor in min/max complementary
timing can handle non-linear differences in the scaling
of delays with voltage for different gate types.

Reality: The real problem is in generation of the clock


of exactly the right frequency for the synchronous circuit
to just work. The fact that asynchronous designs are
“self-adjusting”is a valid advantage.

ASYNC 2003 Conference Slide 49 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Timing should use instance-specific power-supply voltages

Problem: Power-supply IR drop does affect delay


timing (almost linearly). If ignored, this can cause
incorrect timing analysis.

Reality: The min/max timing method should choose a


multiplicative factor that does account for the potential
range of power-supply drops.
This will ensure correct robust handling.
But, silly to rely upon the drops for correct operation,
which could happen if a timing tool was trying to
calculate a single “corrected”delay accounting for
power-supply drop, without doing min/max bounding.

ASYNC 2003 Conference Slide 50 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Async circuits lower power due to being data-driven
Problem: Circuits that transition when there is no need
for new data, can waste power.

Reality: Much of the benefit of data-driven transitions is


attainable by clock gating in synchronous systems.

Further, dual-monotonic (dual-rail) pairs, often used in


asynchronous precharged logic, can actually consume
more power because they always must have one
polarity transition, instead of synchronous logic that
need not transition if the data values are unchanged.

ASYNC 2003 Conference Slide 51 Dr. Ted Williams, Infineon Technologies MorphICs
Synchronous designers believe myths too

ASYNC 2003 Conference Slide 52 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Clock gating introduces too much skew
Problem: Inserting clock gating elements into the clock
tree does change the distribution latency, and can
complicate balancing seeking matched arrival times.
Also, fine-grained gating introduces more irregularity,
making it less possible to match delays by replication.

Reality: The fully-complementary timing approach


enables easier clock-gating because it doesn’t have to
be an exception to the general timing methodology, and
doesn’t add to a skew budget where it is not used.
The normal min/max analysis and tracing of paths
(through potential clock-gating elements) back to point
of clock-tree reconvergence fully analyzes this, and
therefore allows optimization.

ASYNC 2003 Conference Slide 53 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : Chip area is determined by transistor count
Problem: Design complexity historically measured through transistor
count, or effective gate count. Density is usually quoted in gates/mm2,
with the implication of linear scaling.
Reality: Transistors (gates) are the free objects that sit under the
wires. Density of logic is determined by wire connectivity
requirements, for both sync & async.
Maximum total wire density is:
(num effective metal layers)/(wire pitch) = (wire length) / area
Example: A 6 layer process, may effective have 4 usable layers
For wire pitch of 0.5 microns, max density is:
4/0.5 = 8 wires per µ width = 8000 mm wire / mm2 = 8000 mm-1
1.25cm2 chip would have maximum total wire length = 1 km wire
(Analogous to “ideal”quoted density of back-to-back packed nand gates)
Typically, routed regions will be lucky to get a third to half of this.
Quoting wire-utilization density also is a fairer way of expressing top-
level area and complexity, where there may be few gates.
Density measurements in wirelength/area better than gates/area
ASYNC 2003 Conference Slide 54 Dr. Ted Williams, Infineon Technologies MorphICs
Myth : “Synchronizers” fix asynchronous interfaces
Problem: Any time a signal of unknown phase is
sampled (including in an asynchronous arbiter), it might
be transitioning just at the wrong time, causing an
intermediate voltage value to be trapped.
Metastability can result in the trapped voltage persisting
indefinitely, with recovery based on the time constants
of the feedback loops to amplify the node away from its
midpoint.
Failures occur at the point where the signal branches,
and different receivers treat the analog voltages
differently.
Reality: Using multiple registers in series improves the
problem exponentially by allowing recovery times
greater than a single clock cycle, but it never results in a
zero failure probability.
ASYNC 2003 Conference Slide 55 Dr. Ted Williams, Infineon Technologies MorphICs
Valid: Asynchronous better enables precharged logic
Problem: In synchronous design, there is always a
challenge in how to generate precharge control signals.
Common methods either add an interval at the
beginning or end of a clock cycle, or use a clock phase
for precharging, wasting parts of a clock cycle.

Reality: Asynchronous handshake signals are natural


sources for precharge control. Dynamic precharged
logic is more self-consistent with asynchronous design,
and can provide a 2x performance advantage compared
with fully-complementary static logic.

ASYNC 2003 Conference Slide 56 Dr. Ted Williams, Infineon Technologies MorphICs
Valid : Good timing principles apply equally to all circuits
Timing dominated by issues outside of gate internals
(wire RC, aggressor coupling, power supply variations).
Physics: Every number has an uncertainty, and
computations should use these bounds.
Min/Max margin analysis puts design improvements
where they really do add safety and performance, for
both sync and async designs.

Analysis of arrival time differences in distribution of


high-fanout clocks and asynchronous control signals.

Delivered performance is determined by signal timing.

ASYNC 2003 Conference Slide 57 Dr. Ted Williams, Infineon Technologies MorphICs
Summary: Similar palettes for both
SoC design performance is all about Floorplanning and
Architecture that accounts for physical on-chip distance.
Gate-sizing and repeater insertion (“wire-synthesis”) must be
automated, because now fundamental.
Complexity and area are determined by wires, not transistors.
Domain crossings and sampling of unknown-phase-signals must
always be handled with care, with correct synchronizers, and
quantified metastability MTBF.
Varying the power supply voltage allows speed/power tradeoff.
Optimizing transition count is optimizing power consumption.
Removing unnecessary low-skew and synchronization constraints
(at register clock pins) allows design to focus improvement where it
matters, and is a step toward more general GALS design.

ASYNC 2003 Conference Slide 58 Dr. Ted Williams, Infineon Technologies MorphICs
Conclusions
Many of the historically claimed advantages of
asynchronous design are really Myths, because they
can be solved with equivalently good solutions in
synchronous design.
Example: Clock skew effectively handled through
better techniques that minimize overall penalty.

Asynchronous design can be advantageous for


performance in self-timed precharged pipelines.

Asynchronous designers tend to be good circuit


designers because they don’t get overly
constrained by thinking only in a rigid synchronous
framework.
ASYNC 2003 Conference Slide 59 Dr. Ted Williams, Infineon Technologies MorphICs

You might also like