You are on page 1of 11

LEAKAGE POWER OPTIMIZATION FLOW

GURUDEV BHAT SIRSI








CADENCE DESIGN SYSTEMS
408-944-7431
gurudev@cadence.com


















INTERNATIONAL CADENCE USERSGROUP CONFERENCE
September 13-15, 2004
Santa Clara, CA



ABSTRACT

Power has been a major issue in todays sub-micron technologies for SoC designs. It has become very
important to control the power and address the power dissipation throughout the design cycle right
from the architectural level. For 90nm and below technologies, leakage is the main factor which
dominates over the dynamic power and contributes to almost or more than 50% of total power
dissipation. The device modeling is the key factor to have the best trade off between the power and
other characteristics. This paper proposes the power optimization flow with CAD tools for the ASIC
design in order to optimize the leakage power to the best optimal value from the RTL synthesis level
to the Place & Route stage without affecting the circuit speed. Also the analysis has been carried out
at various stages of the design with due consideration to delay by the impact of power optimization
techniques.

INTRODUCTION

As the design size is shrinking to the ultra deep sub microns and density is increasing to millions of gates in
a system on chip; large power dissipation in the chip due to sub-threshold leakage is becoming
uncontrollable in the practical world. It is taking more importance as it is becoming dominant component
for the overall power in the chip. There have been many researches going on recently to extend the battery
life as the hand held mobile devices are densely packed with multiple functionalities.
The power consumption in silicon primarily consists of two main components, namely Dynamic power and
Static power represented as:

<P
total
> = <P
dynamic
> + <P
static
> ------------ 1.1

The static power constitutes leakage power due to sub-threshold current and standby power. I call this as
passive energy for leakage dissipation of the device is not in active mode of operation for time t
off
duration.
There will also be some percentage of leakage power during the active mode. This passive energy has been
emerged as one of the important design parameters in the sub micron low power systems where battery
running life is critical in todays portable electronic gadgets. This pushes through the design performance
limits with power-delay product as a measurable parameter. The leakage power dissipation can never be
made zero but only can be minimized.

There have been multiple techniques used in the past to reduce the dynamic power dissipation and have
been implemented successfully through the different levels of design abstraction. The leakage power has
been a biggest concern increasing day by day due to scaling of process technologies shrinking, with scaling
down of supply voltage but without the proportionate scaling of threshold voltage, V
t
. The sub threshold
leakage increases exponentially as the threshold voltage is reduced.
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
S
u
b

T
h
r
e
s
h
o
l
d

L
e
a
k
a
g
e

P
o
w
e
r
0.18u 0.13u 0.09u 0.065u 0.045u
Technology Scaling x

The delay of logical gates increases with the increase in threshold voltage (High V
t
), whereas the static
power decreases with the increase in threshold voltage. This can be represented by a simple gate delay
equation:
Delay , T
d
= (C
L
V
dd
) / (V
dd
V
t
)
a
------------- 1.2

In the equation 1.2, T
d
represents the propagation delay, C
L
is the load capacitance, V
dd
is the supply voltage
and V
t
is the threshold voltage for the transistor. a is the coefficient and represents the effect due to
shortening of the device channel length (scaling down technology).

On the other hand, the interconnect delay has been dominating the gate delays causing many issues in
meeting the chip performance parameters such as power, delay, area and signal integrity. In order to reduce
the interconnect delay on a path, this needs to be buffered up using the chain of repeaters by accounting for
the driver size and the load size in order to meet the delay constraint on that path. The repeaters for this
chain should be selected with the right size and characterized with low power in order to meet not only
timing but also to meet the power constraints. This is a very tedious task for the CAD tool although it might
come up with a good trade-off. Designers try different circuit techniques and schemes in order to minimize
the leakage power to trade off the speed of the circuit. If one tries to control the leakage power, there
happens the speed degradation of the circuit which is a real problem in any high performance design.

All the above factors that are being discussed are needed to be considered carefully at each level of design
abstraction for the deep sub micron SoC designs.
In this paper, I have proposed an optimization flow that selects the low V
t
and high V
t
to yield performance
targeting towards reducing the leakage power while meeting the delay constraints from RTL down to place
& routed stage while the circuit is active and static or standby modes. The optimization flow depicted in this
paper, together with the circuit design techniques can lead to a greater advantage towards achieving low
power and high performance chips.

Figure 1: Shows the rate of increase of
sub- threshold leakage power with the
scaling of device technology.

It can be seen that the integrated circuits
with 90nm and beyond, the leakage power
contributes to the substantial percentage of
total power. So there is a great need for the
novel techniques to reduce the leakage
power.

ASSUMPTIONS

The max or worst case corner is taken as the specified corner throughout the flow for analyzing both delay
and leakage power. The optimizations are based on the given RTL for a given clock frequency and the
optimal supply voltages. There might be some stringent delay requirements on some of the paths in the
design. Only the leakage in the logical portion is considered whereas the memory part is neglected in
building the design flow.

LEAKAGE POWER OPTIMIZATION METHODOLOGY

Considering all the above discussions, a good beginning optimal point is very much necessary when the
leakage power and speed of the design are critical. When all the required work in the architectural
abstraction is complete and a qualified RTL for the design is ready, then the remaining work is left to the
design automation tools right from the synthesis all the way to the final routed stage. There are various
factors that are considered which affect the performance of the low power design.

1. Area Reclamation

There has been almost no discussion in the past with respect to the density of the chip when it comes to the
low power discussions. It is always a very good starting point to reduce the density of the chip which
directly removes the redundant logic and unused components in the design. This has many advantages as it
reduces the number of resources used and reduces the congestion of overall design. This factor also affects
the percentage decrease in interconnects thereby reducing the respective delays. This is a very important
first step towards the optimization of high performance designs.

2. Characterization of Library Cells

The library cell characterization plays an important role as the design automation tools relies on the
accuracy of the library. A typical low power library will have not only area and delay cost functions, but
also will have power cost functions. There will be leakage power characterized for each input pin state. The
target library for the technology mapping for leakage power should have the cells with multiple V
t
cells for
maximum benefits in order to apply the multiple V
t
schemes towards the leakage power optimizations. In
this flow, two combinations of V
t
are used: 1. High V
t
and 2. Low V
t
for the same logic function specified
in the library. The third type is nominal V
t
is optional in this flow. This is the minimum requirement in the
technology library in order to pursue the leakage power optimization using the Cadence tools. No other
changes in the design procedures are required as the technology mapping handles them in order to utilize
the multiple V
t
technology. Combinations of high V
t
and low V
t
can be used to characterize the complex V
t

cells in the library apart from just high V
t
and just low V
t
cells. The library size would be reasonably larger
in this kind of characterized libraries. One more important thing to note is that the constant leakage power
in the library would slowly go away as the constant models are not accurate for 90 nm and below.

4. Gate Sizing

The gates can be sized-resized depending on the fan-out of loads to minimize the power dissipation in the
design. The static power dissipation of a gate can be expressed as a function of delay (from eqn 1.1) and
also the device geometry represented by W/L ratio. By gate resizing to achieve the leakage reduction, the
area of the circuit will also be reduced. However, decreasing the size of a gate makes it slower as its drive
capacity is decreased. So the tool should be really capable of finding out the loads, drivers for the entire
path and apply the transformations accordingly. This technique along with a very good logic restructuring
can be served towards minimizing the delay and power of a path concurrently with most of the energy
dissipated near the final load trying to satisfy the input timing constraints and the initial timing targets.
There should be a good trade-off between power and delay while trying to resize the gate.

5. Signal Integrity

The multiple V
t
techniques are applied during the optimization in order to reduce the power while meeting
the delay constraints at the same time. The high V
t
devices are used on the slower path to reduce the leakage
power whereas the low V
t
devices are used for critical path where the delay constraint is very tight. Since
the low V
t
devices can switch faster, they are also sensitive to the noise. So one of the factors to keep in
mind while optimizing the power after the routing would be to consider the noise penalty of low V
t
devices
on faster paths. This is quite interesting as the leakage power reduces due to high V
t
cells, the noise also
reduces. Currently this is one of the CAD tool limitations to consider the noise while optimizing for the
power. One can definitely analyze the noise due to coupling once the optimization is over.

The Multi-VT Leakage Power Optimization Flow

In this work, I deploy the multi- V
t
flow for optimizing the leakage power of a design. There are several
other techniques used such as MTCMOS, input vector control, supply gating, sizing, V
t
control, etc... . Of
all the techniques, the Multi- V
t
technique proved and emerged as the efficient method to minimize the
leakage power without much sacrificing the performance of the design. This method not only reduces the
leakage power during the standby mode, but also during the active mode operation of the device.
The low V
t
and high V
t
devices have inverse relationships as regards to the gate delay and the leakage
power. So the high V
t
cells are assigned on the non-critical paths where as low V
t
cells on the critical timing
paths to achieve the low power and high performance design.
I have depicted three primary flows using three Cadence tools, namely RTL Compiler, PKS and SoC
Encounter-GPS. These flows can then be extended for use by the designers according to their design
implementation. These flows are then targeted to two designs of 130 nm and 90 nm respectively to validate
the final results. Both the designs are very timing critical and dense.

POWER FLOW 1

The Power Flow-1 uses three Cadence tools: 1. RTL Compiler, 2. PKS, 3. SoC Encounter-GPS. Once the
data is set, the RTL synthesis and logic optimization will be performed using RTL Compiler using Low V
t

only. Then once the delay is met, the leakage power optimization using RTL Compiler will be ran with
target slack of zero. The RTL Compiler does a very good job of reducing the leakage power from 6.5 mW
to 2.8 mW for the 130 nm testcase without violating the timing slack. Also the design area is recovered in
this step to almost 15% to 16%. Then the netlist is ready for the place and route. Once the physical
optimization is over, the PKS will be used in this flow to reduce the leakage power further. PKS runs the
concurrent power and timing optimization to reduce the leakage power by applying transforms such as gate
resizing, swapping the cells etc. It could reduce the power by 1 mw but the timing gets worse. Using this
flow, there are many drawbacks.
LVT/HVT libraries,
RTL, Timing
constraints
Area Reclamation
Timing Optimization
using LVT
Leakage power
opt using HVT
Timing
Met?
RTL Synthesis
Yes
No
Global Physical
Optimization
Concurrent Power
and Timing Opt
Routing
POST Route Leakage
Power /Timing Opt
Sign-Off
Analysis
Physically Knowledgeable Synthesis
RTL Compiler
SoC Encounter
Floorplanning
& Placement

Constraints
LEF
Although RTL Compiler optimizes the leakage power based on multi V
t
(HVT /LVT) cells and statistical
wireload models, the accuracy is not good at the logic synthesis stage for deep sub mircon based designs.
The critical paths change once the design is placed and routed as it accounts for physical information and
also real interconnect delays. This makes the lay out optimizations to work hard to recover timing on
critical paths and runs for hours together.

POWER FLOW 2:


The power Flow-2 is similar to power Flow-1 and is savvy about the leakage power reduction in the logic
synthesis phase. In this flow the PKS is not used and the results did not vary much. This flow is good for
non-timing critical and low density designs. This flow was not able to deliver good results for both 130nm
and 90 nm designs used for the analysis. This is due to lack of information upfront during the logic
synthesis, although synthesis can meet timing and can reduce the leakage power. Using this flow, there will
be longer run times during the physical optimization and also during the post route optimization to achieve
the clock speed. This creates congestion issues leading to an un-routable design at the end.
LVT/HVT libraries,
RTL, Timing
constraints
Area Reclamation
Timing Optimization
using LVT
Leakage power
opt using HVT
Timing
Met?
RTL Synthesis
Yes
No
Global Physical
Optimization
Routing
POST Route Leakage
Power/Timing Opt
Sign-Off
Analysis
RTL Compiler
SoC Encounter
Floorplanning
& Placement

Constraints
LEF
POWER FLOW 3:


In this power flow-3, only two Cadence tools are used; RTL Compiler and SoC Encounter.
This uses the same initial data setup as flow-1 and flow-2. This is a very simple flow where RTL Compiler
has been used in recovering area as much as up to 16% in the logic synthesis stage. No leakage power
optimization has been performed in this stage. The timing was met easily using only LVT cells used by the
RTL Compiler. The synthesized netlist is then handed over to SoC Encounter to do the Floorplanning,
Placement, Physical optimization, clock tree, and Routing. In the SoC Encounter, leakage optimization has
also been carried out in the pre-route and the post route stages of the design.

In deep sub micron designs such as 130 nm and below, the interconnect delay dominates over the gate delay
that affects the circuit speed. This flow came up with better results as it accounted for the accurate
interconnect delays in the post route stage and then swap the cells from low V
t
to High V
t
to optimize the
leakage power using the superior post route optimization algorithm used in the SoC Encounter.


LVT/HVT libraries,
RTL, Timing
constraints
Area Reclamation
Timing Optimization
using LVT
Timing
Met?
RTL Synthesis
Yes No
Global Physical
Optimization
PRE-Route Leakage
Optimization
Routing
POST Route Leakage
Power/Timing Opt
Sign-Off
Analysis
Floorplanning
& Placement

RTL Compiler
SoC Encounter
Constraints
LEF
SUMMARY OF EXPERIMENTAL RESULTS

(i) Design- D1

Design statistics:
Clock speed: 200 MHz
Technology: 130 nm
Cell instances: 120K

Initial leakage power with only LVT cells: 6.5mW

Flows
Used
Leakage
(mW)
Timing
Violation(ns)
HVT
(%)
LVT
(%)
Leakage Power
Savings(%)
Flow-1 3.89 -0.30 78 22 40
Flow-2 4.19 -0.361 82 18 35.5
Flow-3 3.345 0 95 5 48.5




As can be seen from the above results, the script for Flow-3 ended up with very good leakage power
reduction while meeting the speed for the 130 nm design. The Flow-3 did perform very well with lower
leakage power while meeting the clock speed.
The total area reduction using Flow-3 was about 2.32% by using the HVT cells.
This is a significant savings in area just by using the Flow-3 while meeting timing and power constraints.

Performance of 3 Flows for 130nm design
-1
0
1
2
3
4
5
Flow-1 Flow-2 Flow-3
Total Leakage Power (mw)
Timing Violation (ns)
(ii) Design- D2

Design statistics:
Clock speed: 360 MHz
Technology: 90 nm
Cell instances: 100K
Initial leakage power with only LVT cells: 19.6 mW

Flow used Leakage
(mW)
Timing
Violation(ns)
HVT
(%)
LVT
(%)
Leakage Power
Savings (%)
Flow-1 7.8 -0.432 68 32 60.6
Flow-2 8.4 -0.488 71 29 57.4
Flow-3 6.23 -0.263 82 18 68.2




As can be seen from the above results for the 90 nm design, again the script using the Flow-3 ended up with
very good leakage power reduction without much sacrificing the speed as compared to flow-1 and flow-2.
The Flow-3 has been very effective in reducing the leakage power. It also did very good trade-off between
leakage power and overall performance although it didnt meet clock speed in this case. The total area
savings using flow-3 was about 2.8% compared to only LVT based flow where the timing was easily met.




Performance of 3 Flows for 90nm design
-1
0
1
2
3
4
5
6
7
8
9
Flow-1 Flow-2 Flow-3
Total Leakage Power (mW)
Timing Violation (ns)
Current limitations of CAD tools

1. The current technology library has many more limitations and these limitations become the limitation of
the CAD tool in turn. Passive energy cannot be calculated accurately as it depends on the real time
operation of the devices in a circuit. Also it has not been accounted for the complete charging and
discharging of the device capacitance.
2. Cannot consider the cross talk noise during low power optimization as the cross talk noise affects the
delay as it is quite significant factor in deep sub-micron designs.
3. Cannot accurately account for the energy when the device transitions from 0->1 and 1->0 states as each
device characterizations are different.
4. Optimizations are not ready to detect and balance the optimal trade off points for the energy-delay to
without sacrificing the maximum performance of the design under the power constraints.

CONCLUSIONS

This paper presents three methodologies, out of which the optimization methodology using Flow-3 is very
effective and reduced the leakage power drastically almost down to 60%. It is also shown that the Flow-3
which is based on the layout driven leakage power optimization using Cadence SoC Encounter-GPS ended
up with the optimal results for the deep sub micron designs used in the experimental runs as it accounted for
the accurate physical and interconnect delay information. The estimation during the logical synthesis is
nearly inaccurate as it cannot reproduce the same results in the layout. The proposed simplistic power
optimization script based on Flow-3 experiments using multiple V
t
techniques, achieved not only maximum
sub threshold power benefits, but also met circuit speed and area reduction to the design. This methodology
can find the best optimal trade-off point between power and delay keeping the performance of the
nanometer design.

FUTURE WORK

Cross talk effect or Signal integrity can be accounted as it affects the total delay through effective coupling
capacitance in the design. Optimizations and the analysis can be run in two other operating conditions such
as min and typical corners to see the effect on the performance of the chip. Multiple supply voltages or
voltage islands for different modules can be applied towards obtaining varied power supply to reduce the
power dissipation. There requires a need for exploring several other techniques of optimization apart from
simultaneous multiple- V
t
assignment based on slack.

References:

* User guide to Cadence RTL Compiler, Cadence Physically Knowledgeable Synthesis and Cadence
SoC Encounter GPS.