You are on page 1of 19

SI "prevention and optimization" Flow for Clock and

Signal Nets in UDSM Designs

Panditharadhya Maregoud
Tirendra Kumar
Jwalant Trivedi

The Science of ASICs

Agenda

Motivation
Challenges and Solutions
Conclusion
Acknowledgements
Q&A

The Science of ASICs

Motivation
SI closure flow is becoming more & more challenging
in 40nm/below and most of the times SI could become
serious hindrance in achieving desired performance of
the SOC

Critical path optimization is already exhausted in


meeting base timing (without SI) and this may cause
sometimes more than 50+MHZ degradation due to SI

The default flow does not provide various mechanism


for SI fixing for achieving last minute SI closure

The Science of ASICs

Challenges and Solutions

SI free clock network mostly without shielding


Using MaxDist along with slew constraints can help in
building SI free clock network without shielding

Selective aggressor downsizing


Timing aware downsizing of aggressor net driver to fix
SI issues

Routing based SI fixes


Change routing topology of aggressor and/or victim
net segments

AAE for Simultaneous base and SI delay


optimization
Using AAE/CoE to fix base and SI delay concurrently

The Science of ASICs

SI free clock network without shielding

Three runs were fired with the following options


1. No MaxDistance and no shielding
2. MaxDistance of 500u but no shielding
3. Shielding but no MaxDistance
Clock Tree is built, clock and signal nets routed. No
timing optimization done. Timing is reported after
routing stage with SI
Design details:
14 million placeble instances
3 functional clocks, with 700MHz max frequency.
12X12mm chip
40nm, flip-chip
The Science of ASICs

SI free clock network without shielding


No MaxDistance and no shielding:
Nr. of Subtrees
: 13
Nr. of Sinks
: 387
Nr. of Buffer
: 377
Nr. of Level (including gates) : 59
Root Rise Input Tran
: 0.1(ps)
Root Fall Input Tran
: 0.1(ps)
Max trig. edge delay at sink(R): 4105.4(ps)
Min trig. edge delay at sink(R): 3926.8(ps)

MaxDistance of 500um but no shielding:


Nr. of Subtrees
: 13
Nr. of Sinks
: 387
Nr. of Buffer
: 501
Nr. of Level (including gates) : 66
Root Rise Input Tran
: 0.1(ps)
Root Fall Input Tran
: 0.1(ps)
Max trig. edge delay at sink(R): 3750.4(ps)
Min trig. edge delay at sink(R): 3584.1(ps)

view setup_func : skew = 178.6ps


view hold_func : skew = 187.9ps

view setup_func : skew = 166.3ps


view hold_func : skew = 164.7ps

With shielding but no MaxDistance:


Nr. of Subtrees
: 13
Nr. of Sinks
: 387
Nr. of Buffer
: 370
Nr. of Level (including gates)
: 63
Root Rise Input Tran
: 0.1(ps)
Root Fall Input Tran
: 0.1(ps)
Max trig. edge delay at sink(R): 4246.3(ps)
Min trig. edge delay at sink(R): 3996.8(ps)
view setup_func : skew = 249.5ps
view hold_func : skew = 241.4ps

The Science of ASICs

SI free clock network without shielding


No MaxDistance and no shielding:
Nr. of Subtrees
: 13
Nr. of Sinks
: 229
Nr. of Buffer
: 311
Nr. of Level (including gates) : 43
Root Rise Input Tran
: 0.1(ps)
Root Fall Input Tran
: 0.1(ps)
Max trig. edge delay at sink(R): 3101.1(ps)
Min trig. edge delay at sink(R) : 2976.2(ps)

MaxDistance of 500um but no shielding:


Nr. of Subtrees
: 13
Nr. of Sinks
: 229
Nr. of Buffer
: 397
Nr. of Level (including gates) : 53
Root Rise Input Tran
: 0.1(ps)
Root Fall Input Tran
: 0.1(ps)
Max trig. edge delay at sink(R): 2886.5(ps)
Min trig. edge delay at sink(R): 2750.1(ps)

view setup_func : skew = 124.9ps


view hold_func : skew = 122.8ps

view setup_func : skew = 136.4ps


view hold_func : skew = 131.4ps

With shielding but No MaxDistance:


Nr. of Subtrees
: 13
Nr. of Sinks
: 229
Nr. of Buffer
: 281
Nr. of Level (including gates)
: 44
Root Rise Input Tran
: 0.1(ps)
Root Fall Input Tran
: 0.1(ps)
Max trig. edge delay at sink(R): 3119.9(ps)
Min trig. edge delay at sink(R): 2950.7(ps)
view setup_func : skew = 169.2ps
view hold_func : skew = 161.7ps

The Science of ASICs

SI free clock network without shielding


Delays(ns)
capture clock path delay
Launch clock path delay
clock Period
setup time

no maxdist, no shield
1.835
2.94
2
0.092

maxdist 500, no shield


1.672
2.814
2
0.104

shielding but no maxdist


1.885
3.013
2
0.1

capture clock path delay


Launch clock path delay
clock Period
setup time

1.835
2.784
2
0.086

1.672
2.658
2
0.098

1.885
2.852
2
0.103

capture clock path delay


Launch clock path delay
clock Period
setup time

3.328
3.919
1.33
0.064

3.027
3.585
1.33
0.064

3.449
4.066
1.33
0.064

capture clock path delay


Launch clock path delay
clock Period
setup time

1.299
1.793
1.6
0.25

1.123
1.641
1.6
0.248

1.327
1.74
1.6
0.265

Observation : Though the number of clock buffers used are more in the Maxdistance set 500um,
clock latency and skew are much less compared to other two runs and the timing looks more
positive

The Science of ASICs

Selective Aggressor Downsizing Flow

By-default EDI does not do aggressor downsizing as it makes design hard to converge

We have used selective aggressor down sizing


as one of the approaches

We have seen good amount of TNS reduction by


using this method

The Science of ASICs

Flow for selective Aggressor downsizing

By-default EDI does not dump the list of aggressor and victim

pair
Use following command to get aggressor and victim pair
setSIMode -insCeltICPostTcl {generate_report -delay max threshold 0.0 txtfile <file name>
Collect the victim nets having incr_delay (SI delay) more than
5ps which are coming in the critical timing paths
Get the list of aggressor nets for the above mentioned victim
nets
Get the list of drivers for the above mentioned aggressor nets
Put dont_touch on the entire design except above mentioned
aggressor instances and use reclaimArea to downsize

The Science of ASICs

Results of Selective Aggressor Downsizing Flow

Timing QOR Summary without resizing flow

Setup Mode
WNS (ns)
TNS (ns)
Violating Paths

all
reg2reg in2reg reg2out clkgate default
-0.072 -0.062 -0.072 -0.015 -0.063
0
-56.327 -46.05 -2.222 -0.019 -4.384
0
4986
4279
135
2
305
0

m2reg
-0.039
-4.364
350

reg2m
-0.03
-0.055
9

Timing QOR Summary with Selective Resizing Flow

Setup Mode
all reg2reg in2reg reg2out clkgate default m2reg
-0.54 -0.54 -0.012 0.037 -0.053
0
-0.048
WNS (ns)
-41.024 -34.599 -0.082 0
-5.116
0
-3.059
TNS (ns)
12
0
297
0
217
Violating Paths 2440 2055

reg2m
0.004
0
0

The Science of ASICs

Routing based SI fixes

By-default NR (signoff router) does pretty good


job in making SI preventive routing topology
NR works on heuristic approach in making SI
preventive routing topology
At final SI analysis stage, you might see some of
the cases where small routing change can
correct small SI violations
It is difficult to fix small distributed SI with
optimization transform, routing topology change
is best bet to fix it !

The Science of ASICs

Flow used for routing changes

Use same flow as described in selective aggressor downsize flow to


identify aggressor and victim pair
dbNetDeleteAggrNearestWire will delete nearest overlapping
segments in both victim and aggressor, if you only want to ripup
aggressor, then pass last argument as 0

dbNetDeleteAggrNearestWire <vicNetPtr>
<aggrNetPtr> 1 1
dbSetIsNetRouteDirty <aggrNetPtr> 1
dbSetIsNetRouteDirty <vicNetPtr> 1
dbSetNetPrefExtraSpace < vicNetPtr > 1

We have seen 3 to 20 ps of improvement in


violating path

The Science of ASICs

Simultaneous base + SI delay timing optimization Flow


Present Celtic based closure within EDI has higher runtime
and it uses very limited transform for doing SI fixing
> Buffer insertion
> Cell sizing

Cadence came up with Advanced Analysis Engine (AAE)


and combined it with GigaOpt (same optimizer for pre & post
route transform). This flow allows much more powerful
optimization tricks at post route stage
AAE allows simultaneous base delay + SI delay calculation
This engine is incremental and multi-threaded, which will
speed up every timing analysis runtime
In postRoute stage, Setup timing, Hold timing and
Leakage/Dynamic Power optimizations are using AAE when
enabled

The Science of ASICs

Simultaneous base + SI delay timing optimization

How to use?
setDelayCalMode -SIAware true
optDesign postRoute
This will force combined base + SI delay Setup timing optimization

Notes
For setup optimization, a single call to optDesign -postRoute replaces traditional regular +
SI flow ( a.k.a. optDesign -postRoute + optDesign postRoute -si ).
The same concept applies to hold optimization flow.
The timing reported at the end of optDesign postRoute is with the SI estimation engine used
to drive postroute optimization. Please use timeDesign -si with setDelayCalMode
siAware false engine signalStorm signoff true to get the accurate signoff SI
timing

EDI9.1
optDesign postRoute hold
optDesign postRoute si hold
optDesign postRoute si
timeDesign signOff si
timeDesign signOff si hold

AAE SGS

SGS SGS

optDesign postRoute

EDI10.1
optDesign postRoute
optDesign postRoute hold

timeDesign signOff si
timeDesign signOff si hold

The Science of ASICs

Flow advantage of AAE

Resullt with Default flow

Setup Mode
all reg2reg in2reg reg2out clkgate default m2reg reg2m
-0.166 -0.166 -0.097 0.079 -0.083
0
-0.059 -0.02
WNS (ns)
-73.76 -52.792 -10.067 0
-9.003
0
-6.717 -0.084
TNS (ns)
0
356
0
424
7
Violating Paths 3775 2823 453

Result with AAE based flow

Setup Mode
all reg2reg in2reg reg2out clkgate default m2reg reg2m
-0.194 -0.162 -0.194 0.082 -0.087
0
-0.04 -0.008
WNS (ns)
-37.151 -25.33 -6.492
0
-4.768
0
-1.346 -0.033
TNS (ns)
850
0
158
0
159
8
Violating Paths 2157 1084

The Science of ASICs

Conclusion
Use unconventional methods to achieve final SI closure
Shielding can be used for chips with certain applications

but MaxDist should also be looked as an alternative (and


sometimes better alternative) to achieve SI free clock
network
Careful and selective aggressor downsizing can help not
only in reducing SI but also in power and area savings
Routing based SI fixes should be used for fixing small but
distributed SI issues on a given timing path

The Science of ASICs

Acknowledgements

Shrikirshna Mehetre - Open-Silicon


Cadence Team

The Science of ASICs

Q&A

The Science of ASICs

You might also like