Professional Documents
Culture Documents
Incorrect timing in next logic stage
Logic failure
17
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
What can happen with a timing violation?
D changes during setup and hold
Ck
D
Q
t
hold
t
SetUp
Q changes in WRONG clock cycle
- Racethrough
18
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Latch Based Design
D-latch
Q follows D while clock is high
(transparent)
Value on D when clock goes low
is stored on Q
Set-up and hold times:
D can not change close to the falling
(latching) clock edge.
t_clock-Q
Delay from clock going high to Q
changing
t_D-Q
Delay from D changing to Q changing
while clock high (assumed = t_ck-Q
here)
D Q
Ck
Ck
D
Q
t
hold
t
SetUp
t
clock-Q
Register Register
Comb
Logic
+/-t
skew
D1
Q1
D2
Q2
t
Logic-delay
19
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Latch Timing Constraints
Set-up constraints under nominal design same as for flip-flop
Transparency of latch can be used to improve flexibility of timing
e.g.: If critical path is in logic block 1:
skew up set ic Q clock clock
t t t t t + + + >
max log max
t1 t2
clock
t1 t2
t1 longer than for FF exploit transparent latch
20
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Latch timing constraints WITH cycle stealing
To prevent set-up violations:
clock
clock
Q1
D2
Notes:
The percentage of time the clock is high is referred to as the duty-cycle
If part of the following clock-high time is used to allow this logic to be slower, then
the logic-block connected to Q2 must be proportionally faster
Using the clock-high time like this is called cycle-stealing
tskew
tclock-Q
tlogic
tset-up
tslack
skew up set ic Q clock high clock clock
t t t t t t + + + > +
max log max max
21
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Latch Set Up Violations
Notes:
The percentage of time the clock is high is referred to as the duty-cycle
If part of the following clock-high time is used to allow this logic to be
slower, then the logic-block connected to Q2 must be proportionally
faster
Using the clock-high time like this is called cycle-stealing
Normally cycle stealing is not enabled
22
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Latches Cycle Stealing
Can use up to a total of t_clock_high within a pipeline structure, to help
in timing closure
Example:
t1 t2 t3
clock
t1 t2
t1,t2 > tclock (cycle stealing)
What can t3 be?
T3=tclock
t3
23
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Latch timing constraints
To prevent hold violations:
clock
clock
Q1
D2
Note:
Hold violations are harder to prevent in latch-based designs
tskew
tlogic
thold
Hold
Violation
min log min max
+ s + +
ic Q clock skew hold high clock
t t t t t
24
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Revision So far
What is a set-up violation?
How is a set-up violation fixed?
What is a hold-violation?
How is a hold-violation fixed?
Why are edge-triggered flip-flops preferred over latches?
D input for flip-flop changing after set-up time before clock edge
D input for flip-flop changing during hold period after clock edge, as a
result of race-through
Reduce critical path, or increase clock period
Increase shortest logic path(s)
Much less suscpetible to hold violations
25
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Example:
D-flip-flop
D-flip-flop
min : typ : max
Tclock-Q = 3 : 4 : 5
TNOR = 1 : 2 : 3
Tsu_max = 1
Thold_max = 2
Tskew = 1 ns
If this is the critical path, what is the fastest clock frequency?
Is there potential for a hold violation?
skew up set ic Q clock clock
t t t t t + + + >
max log max
= 5 + 6 + 1 + 1 = 13 ns I.e. 76 MHz
Setup:
Hold:
min log min
+ s +
ic Q clock skew hold
t t t t
Is
?
2 + 1
3 + 1
< OK. No hold violation
26
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
CMOS Drive Strength
Revision: CMOS transistors operating in the linear region:
on ds ds
ox
ds ds t gs ds
R V I
L W
L W t
V V V V I
/
ion, approximat first a To i.e.
length channel the is and r width, transisto the is where
) / )( ( where
2 / ) ((
2
~
/ =
=
c |
|
Thus, delay in CMOS circuits depends largely on W/L of the drive transistor
and the capacitance of the load it is driving.
Different drive cell sizes can be selected by synthesis tool (x1, x2, x3, etc.)
That capacitance consists of:
Input gates of cells being driven, and Capacitance of wiring
R
on
~ 1/ |(V
GS
-V
T
) t = R
on
C
load
27
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Estimating and Improving Performance
With a focus on timing:
Topics:
Metrics : FO-4
Typical timing budgets
Pipelining and Parallelism
Logic style
28
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Delay Metric
Usual Metric for delay:
Fanout of 4 inverter delay: FO4
Estimating FO4:
Typical ~ 360 x Leff (ps)
Worst Case ~ 600 x Leff (ps)
Leff = Effective gate length in um ~ 0.7 x Ldrawn
E.g. In a 0.18 um process, Leff = 0.126 um and FO-4 <= 75 ps
Exemplar delays:
Inverter = FO-4 2-input NAND gate = 2 FO4
1-bit adder = 10 FO4 2-input Multiplexer = 4 FO4
Flip-flop t_cp-Q = 4 FO4 Flip-flop t_su / t_h = 2 FO4
Clock skew = 4 FO4
Clock jitter = 2 FO4
29
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Examples of Improving Timing Performance
Example 1 : Benefits of Pipelining and Parellism
Example:
If t_comparator = 20 FO4, what is the clock period?
(Use values on previous page)
Compare
Compare
Compare
T_cp = t_ck-Q + t_logic + t_su + t_skew + t_jitter
= 4 + 20*3 + 2 + 4 + 2 = 72 FO4
Critical path
30
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Pipelining
Replace with:
What is the delay improvement?
What is the drawback?
Compare
Compare
Compare
T_cp = t_ck-Q + t_logic + t_su + t_skew + t_jitter
= 4 + 20 + 2 + 4 + 2 = 32 FO4
2.25x (note: NOT 3x)
Increased area AND power
BUT performance/area increased
31
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Logic Level Parallelism
Replace with:
Clock Period = 52 FO-4
No increase in area
Compare
Compare
Compare
To see how to code these three structures in Verilog, please refer to the end
of the Verilog1 notes.
32
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Retiming
Impact of critical paths can often be reduced by retiming or rebalancing a
design:
Example:
Before:
After:
Note: Clock level logic sequence has been changed
20FO4 5FO4 10FO4
T_cp = 4 + 20 + 5 + 2 + 4 + 2 = 37 FO4
20FO4
5FO4
10FO4
T_cp = 4 + 20 + 2 + 4 + 2 = 32 FO4
33
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Timing in CAD Flow
During synthesis
Tool calculates path delays under worst-case delay conditions
Determines critical path
Moves logic to a faster path if setup violation predicted
Some tools do a preliminary placement while doing logic synthesis so
that wiring delay can be properly estimated
After synthesis:
Perform Static Timing Analysis
Determine no setup violations exist under worst case conditions
Determine no hold violations exist under best case conditions
After place and route:
Perform Timing Analysis
i.e. Run timing verification tools on netlist with actual delays
Back-annotate actual delays to netlist from later tools
34
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Initial Delay Estimation Flow
Primetime: Gate-level static timing
analysis tool
Report timing for critical path
Back-annotate wire parasitics for
more accuracy
SPEF file from place and route tool
Start
Load, link, and
constrain design in
Pr i m et i m e
Verilog structural
description of
design (post clock
tree synthesis)
Post-route SPEF
file from SoC
Encoutner
Back-annotate
wiring parasitics in
Pr i m et i m e
Report timing in
Pr i m et i m e
Finish
35
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Sidebar
Not Examinable!
What is asynchronous design?
Can we use deliberate local clock skew in a design?
Are you sure flip-flops are better, cycle stealing sounds
useful?
Message: The CAD tools capabilities
constrain your design flexibility.
36
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Remember
Methodology for purposes of ECE 520
If at all possible one-edge of one clock
If you need multiple clocks, they must have a common root and be
related by factors of 2
E.g. Root clock : Tclock = 5 ns
This is BEST: Tclock only!!
This is OK:
Tclock, Tclock10, Tclock20
Tclock10 = 10 ns (Tclock*2)
Tclock20 = 20 ns (Tclock*4)
This is NOT OK
Tclock, Tclock15, Tclock17
Tclock15 = 15 ns (Tclock*3)
Tclock17 = 17 ns (??)
ONE CLOCK
ONE EDGE
37
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Summary
What determines the maximum clock frequency?
What is a hold violation?
Why do we prefer flip-flop designs over using latches?
What tool is used to check timing at design closure?
Total delay in critical path + Clock skew
Determined in part by maximum logic depth
When logic change runs through storage elements in sequence
in one clock edge.
Determined in part by minimum logic depth
Less potential for hold violations
Static timing analysis, after logic & RC extraction.
38
2012 Dr. Paul D. Franzon, www.ece.ncsu.edu/erl/faculty/paulf.html
ECE 520 Class Notes
Summary
What determines clock skew?
Can the designer set the clock skew to be positive or negative?
Does the logic design have a strong impact on achieving a desired clock
speed?
Random variations in clock distribution delay
No, (usually) clock skew can be either, depending on random result
Yes, design structure has a STRONG impact