You are on page 1of 16

Power Management in Complex

SoC Design
Jim Flynn – Senior IC Design Engineer, Synopsys Professional Services
Brandon Waldo – Senior IC Design Engineer, Synopsys Professional Services

http://www.synopsys.com/sps

April 2004

©2004 Synopsys, Inc.


The need to reduce power consumption—long recognized as a significant design issue—becomes more
critical as larger, faster ICs go into portable applications. As a result, techniques for managing power
throughout the design flow are evolving to assure that all parts of the product receive power properly and
efficiently, and that the product is reliable. Techniques such as multi-voltage islands and dynamic scaling of
both clock frequency and threshold voltage help conserve battery power in portable applications, while
delivering high performance.

Perhaps more critically, increases in system-on-chip (SoC) size and speed have led to power consumption
challenges across a broad range of designs that have not been viewed traditionally as supply-limited. In
these designs, heat dissipation and reliability issues such as electromigration and IR drop have become
vitally important. (For information on dealing with power-related reliability issues, please consult the Synopsys
Professional Services’ White paper “Design Planning Strategies to Improve Physical Design Flows—
Floorplanning and Power Planning” http://www.synopsys.com/cgi-bin/sps/wp/dps/paper1.cgi)

Power issues in mainstream deep submicron designs may limit functionality or performance and severely
affect manufacturability and yield. Higher power dissipation increases junction temperature, which slows
transistors and increases interconnect resistance. Design techniques aimed at improving performance may
therefore fall short if power is not considered. Lower-than-expected performance decreases device yield.
Additionally, higher power dissipation requires more system-level measures for thermal management. In
general, these power issues are increasing SoC and system costs. Managing power consumption at
appropriate points in the SoC design flow keeps these costs under control.

Where an SoC consumes power


The total power consumed by a chip equals dynamic power plus static power. Dynamic power is the power
consumed in switching logic states, both internal to the cells (internal power) and for driving the chip’s nets
and external loads (switching power):
2
Dynamic power = CV F

where C is the load, V is the voltage swing and F is the number of logic-state transitions.

As semiconductor structures become smaller, device and interconnect capacitances decrease, allowing for
higher performance and lower power. Countering these factors are power increases due to larger designs
and higher switching rates.

Static power (leakage power) is consumed while transistors are not switching:

Static power = VISTAT

Although transistors have some reverse-biased diode leakage from drain to substrate, the larger portion of
leakage power is due to the sub-threshold current through a transistor that is turned off. This sub-threshold
current results from the conduction between source and drain through the transistor channel.

The sub-threshold leakage current is problematic because it increases as transistor threshold voltages (Vth)
decrease. In fact, the move to 130 nanometer (nm) and beyond may boost leakage power as high as 50
percent of the total chip power (Figure 1). Increased leakage power helps to exponentially increase reliability
related failures in chips (even in standby).

©2004 Synopsys, Inc. 2


Figure 1: Increase in leakage power—Bringing down transistor threshold voltages helps decrease dynamic
power but increases sub-threshold leakage current. A power-aware design flow is thus needed to meet timing
requirements and keep power consumption within acceptable limits. Source: Intel. Published in IC Insights Inc.
2003 Technology Trends.

As CMOS technologies scale down, the main approach for reducing power has been to scale down the
supply voltage VDD. Voltage scaling is a good technique for controlling a chip’s dynamic power because of
the quadratic effect of voltage on power consumption. However, just reducing the power supply degrades
circuit speed because the switching delay time is proportional to the load capacitance and the ratio Vth/VDD.
To maintain sufficient drive strength for fast switching, Vth must decrease in proportion to VDD. This relationship
leads to the leakage power increase. Fortunately, a power-aware design flow helps balance timing
requirements with various power goals.

Power solutions
The higher the level of design abstraction, the greater the influence on power consumption. At the system
and algorithm levels, for example, using a parallel approach rather than a serial implementation reduces
clock frequencies, which helps to decrease power consumption significantly. The lower power of the parallel
approach may come at the expense of somewhat greater area or slower performance.

To give an example of the effect of parallel vs. sequential architectures, in one chip that received data
samples serially, the samples were processed in parallel to reduce this logic’s clock speed from 80 to 10
MHz. Additionally, the supply voltage was reduced from 1.8V to 1.25V. The parallel processing logic was
much larger than the serial processing equivalent, but the logic’s reduced voltage and operating frequency
reduced the power consumption by 75 percent. This parallel approach was able to save power because
power has a squaring function to voltage and only a linear function for frequency and switching. In other
designs, the area penalty has been small but the power savings significant, so it is worth exploring the tradeoffs.

3 ©2004 Synopsys, Inc.


Power Optimization Power Analysis
Architecture optimization Power estimates based on
System Design (e.g. parallel vs. serial) - Estimated gate counts
- Estimated activity
Supply voltage scaling
Clock frequency scaling

Module clock gating RTL power analysis based on


RTL Design defined clocks and registers
- Estimated gate counts
- Realistic activity

Voltage islands
Floorplanning

Threshold voltage scaling Gate level power analysis


Synthesis Power optimization in synthesis based on
RTL clock gating - Actual gate counts
- Realistic activity
- Wireload models
- Final libraries

Gate level power analysis


Place and Route based on
- Actual gate counts
- Realistic activity
- Accurate routing
- Final libraries

Figure 2: In the context of the design flow, the potential for power savings and the accuracy of power estimates is
greatest early in the flow.

Figure 2 references several power optimization and analysis techniques that can be used throughout an
SoC design flow. The power solutions covered in this paper include:

■ Module clock gating


■ Multiple supply voltages
■ Multiple threshold voltages

■ Power optimization in synthesis, including RTL clock gating

Because techniques such as clock gating and dividing affect design for test (DFT), that topic is also
addressed. A brief design example at the end of the paper shows the benefits of combining dynamic
frequency and voltage scaling.

Power estimation and analysis


Over the course of the design flow, it is useful to estimate power consumption at four stages (Table 1). The
accuracy of the estimate improves at each stage as additional design and library information becomes available.

©2004 Synopsys, Inc. 4


Table 1: Four stages of power consumption estimation (Recommended).

When to perform the How Gates are How Load is Estimation


estimation – during Calculated Calculated Tool(s) Used

1. Design/library exploration Rough estimation Unknown/In definition Spreadsheet

2. Pre/early synthesis Rough estimation DC-Wire Load Models Design Compiler,


Power Compiler

3. Post-synthesis Accurate (placed) Wire Load Models/SPEF Power Compiler,


Physical Compiler,
PrimePower

4. Post-layout Exact Extracted –SPEF PrimePower

RTL power analysis


In the earliest stages of a design flow, power analysis provides rough estimates of a design’s power
consumption. Libraries may not be selected yet, so library data may be limited. Before the library is
selected, a spreadsheet analysis can be used to reveal the best power-conscious libraries and design
architectures. After the library is selected, Design Compiler® and Power Compiler™ can be used instead of
the spreadsheet method or to supply values for use in the spreadsheets.

The power-analysis spreadsheet includes approximate gate counts, rough activity-per-block values, side-
by-side vendor µW/MHz data, and relative power estimates. The analysis at this point also helps to show
if a design consumes too much power to be practical–thus avoiding weeks of design effort to implement
a chip that will never be manufactured.

To use the spreadsheet analysis method, it is necessary to estimate each block’s gate count (number of
library cells of each type) and activity level. The amount of energy consumed by the switching of each cell
type is also needed; data from a library vendor’s manuals can be used to assign an appropriate power
value relative to speed (in µW/MHz). A block’s internal power consumption for a particular type of cell is
given by the equation:

Power consumption = Gate Count * µW/MHz * Activity * Frequency

Summing these power values for all the different types of cells in a block gives the block’s overall internal
active-power estimate. Before synthesis, gate counts are estimated based on architectural choices and an
understanding of the design. For example, approximate gate counts can be drawn from features such as
bus sizes, word lengths, control layers and memory depth. When the library has been selected, the gate
counts for a block can be estimated by using Design Compiler’s report_reference capability after
early synthesis, which reports the number of each instance type for the design.

A key aspect of the power calculation is assigning the activity levels. The gates of a design have different
activity levels that can be estimated with or without a simulation to extract switching activity. After the
library is selected, however, a functional simulation is recommended to determine the switching activity.

Switching activity is measured in terms of a toggle rate (TR). Toggle rate is the number of logic-0-to-logic-1
and logic-1-to-logic-0 transitions of a design object (for example, a net, pin or port) per unit of time. A net
having an activity of 50 logic-1-to-logic-0 transitions and 50 logic-0-to-logic-1 transitions during a 100ns
interval has a TR of 1. A net having an activity of five logic-1-to-logic-0 transitions and five logic-0-to-logic-1
transitions during a 10ns interval also has a TR of 1. These examples have nanoseconds as the unit of
time, and a TR of 1 indicates one activity transition per ns. Power and TR can be related by understanding
that for each transition an amount of energy must be supplied to change the state of an internal circuit
during the time interval of the state change.

5 ©2004 Synopsys, Inc.


Keep in mind that power estimates at any level of abstraction are meaningful only when the switching activity
represents the chip’s actual working operation. A common mistake is to use a vector set that simulates
system boot sequences when trying to determine activity. This activity rarely represents actual working
conditions and therefore leads to inaccurate power estimates. An RTL simulator helps to generate a
Switching Activity Interchange Format (SAIF) file automatically, but the activity values are accurate only if
the vector set is realistic. Current tools are not able to generate such vectors automatically—the task requires
an understanding of the circuit’s intent.

Figure 3 shows the Programming Language Interface (PLI) system tasks that can be used within VCS® to
generate an SAIF file during simulation. Power Compiler™ offers a power_estimate capability that uses an
SAIF file to define libraries and constraints and annotate the design for power estimation. Power Compiler’s
default switching activity for non-annotated ports is 0.25 toggle per positive edge; this value is applied and
propagated throughout the block.

$set_gate_level_monitoring ("rtl_on");
$set_toggle_region;
$toggle_start;
$toggle_stop;
$toggle_report;

Figure 3: Programming Language Interface (PLI) commands — These commands cause VCS to generate an SAIF
file for use in Power Compiler.

Tables 2 lists examples of results estimated using the above methods. After calculating internal power,
switching power can be estimated as 30 percent of internal power. Without accurate load and switching
data, this value is only a rough estimate. Such estimates are useful mainly as a way to compare the power
implications of various design strategies rather than as predictors of a chip’s actual power consumption.
As mentioned earlier, however, rough estimates at the RTL stage do provide an early warning that a design
may turn out to be unacceptably hot.

Table 2: Example of estimation results using the spreadsheet method.

Block 1: (Frequency = 100 Mhz)


Gate Type A = 125 activity = 0.25 µW/MHz = 5
Gate Type B = 150 activity = 0.05 µW/MHz = 12
Gate Type C = 50 activity = 0.1 µW/MHz = 16

Table 3: Example of report_reference command output.

report_reference:

Reference Library Unit Area Count Total Area Attributes

INV tech_lib 1.00 1 1.00


MX1P tech_lib 8.00 8 64.00 n
NAND2 tech_lib 1.00 6 6.00
NAND3 tech_lib 2.00 1 2.00

Total 8 references 174.00

©2004 Synopsys, Inc. 6


Switching power is usually the most important value to determine in early analysis, but it is also possible to
estimate leakage power based on each cell type’s leakage data. Since leakage is different for high and low
states, the leakage analysis must be based on the static probability that a signal is at a certain logic state.
Static probability is expressed as a value between 0 and 1. This value can be estimated based on a signal’s
function. For example, an active-low reset signal typically has a logic-one static probability (SP1) at or near
1.0 (100 percent). For a data-bus signal, SP1 can usually be assumed as 0.5 (50 percent) unless some
architectural characteristic suggests otherwise. After the library is selected, static probability can be calculated
during simulation by comparing the time a signal is at a certain logic state to the total time of simulation.

Gate-level power analysis


After synthesis, it is possible to get fairly accurate power estimates from Power Compiler based on actual
gate counts and simulated activity. The most significant sources of inaccuracy at this point are the activity
and pre-layout wire-load values. Accuracy is improved by generating an SAIF file from gate-level simulations.
In VCS, the same commands shown in Figure 3 generate the SAIF file, except that the first command
should be:

$set_gate_level_monitoring ("on");

Again it must be emphasized that activity values are accurate only when the simulation vectors represent
actual application behavior. Physical Compiler® helps improve the accuracy of the load values by using the
write_parasitics -distributed command after physical optimization. This command produces a
SPEF file annotating Steiner route and RC parasitic estimates.

After layout, a gate-level simulation helps generate a Value Change Dump (VCD) file for use in PrimePower®
analysis. VCD files log changes to signal values during a simulation and provide the design’s nodal activity,
structural data hierarchical connectivity, path delays, timing and event information.

Note that chip I/Os can be a significant source of inaccuracy if they are numerous, switching at high speed
and driving long wires. If design goals require accurate rather than worst-case power estimates, lumped
load models for the I/Os may produce overly pessimistic results. To get a more accurate picture, HSPICE®
simulations can be performed on critical I/O cell types with accurate distributed-impedance models. The
I/O cell power can then be calculated using numeric methods that determine charge and energy per
rising/falling edge. Given the HSPICE output of current and time, the internal energy per transient is
calculated using the trapezoidal integration method (in Matlab, for example). The I/O activity recorded
during PrimePower analysis is used to scale I/O power, and the total I/O power is combined with the core
power for an overall power estimate.

To show how power estimates vary using the methods described here over different phases of the design
and implementation cycle, Figure 4 shows examples based on one block (a high-speed FIR filter) in a DSP
design. This example demonstrates how the power estimates vary depending on the accuracy of the
information supplied. The graph shows how the estimates changed for an example block at four points in
the flow:

■ Case 1—An estimate using worst-case switching activity and worst-case wire load estimates
■ Case 2—An estimate using more accurate wire load estimates and worst-case activity
■ Case 3—An estimate using accurate wire load estimates and realistic activity

■ Case 4—An estimate using exact wire loads (extracted) and realistic activity based on

SPICE-accurate simulation

7 ©2004 Synopsys, Inc.


600
518
500
430
400
Power – mW

300 260
237

200

100

0
1 2 3 4
Case

Figure 4: In the course of a design flow, power estimates can vary considerably.

Power optimization techniques


Figure 5 categorizes the power optimization techniques with respect to static vs. dynamic power and the
level of design abstraction at which the technique is applied. The use of one or more of these methodologies
depends on the design goals. Incorporating the methods into a design flow provides an integrated power-
management design strategy.

©2004 Synopsys, Inc. 8


IC Design (RTL coding)
1. Physical
approach
(Power supply Multi-clock
control or 2. Design
source approach
voltage island)
Multi-power
supply
Power gating Clock gating
Static Power Dynamic Power

Multi-VT Low power


synthesis synthesis

3. Synthesis
approach

IC Implementation (RTL to GDSII)

Figure 5: Power optimization techniques for different stages of a design flow (from top to bottom) and how they
affect static or dynamic power (from left to right).

Module clock gating


Module clock gating can be used at the architectural level to disable the clock to parts of the design that
are not in use. Power Compiler helps replace the clock gating logic inserted manually, gating the clock to
any module using an Integrated Clock Gating (ICG) cell from the library. The tool automatically identifies
such combinational logic once the clock is explicitly created by the user in the script.

Module clock gating can be applied in a series of levels, including the chip level, domain level (DSP, CPU, etc.),
module and sub-module. When the whole chip is in idle mode but must respond to external wakeup
events, an application can gate the chip clock. The same is true at the lowest level; when no memory
access is needed, the clock to the SDRAM controller can be switched off, given that the SDRAM is first
set to self-refresh mode. In addition to turning clocks on and off, the gating structure can include configurable
clock dividers to change the clock speed to various parts of the design.

Designing such a clock structure depends on an understanding of the chip’s function and insights from
power analysis about how much power can be saved by clock-gating ever-smaller portions of the design.
In general, clock switching power is more than 30 percent of a chip’s total power consumption, so clock
gating at all levels is usually well worth the effort.

Clock gating challenges


Beyond the complexities of deciding where and how to gate and/or divide clocks, high-level clock gating
involves a number of timing and DFT issues (more on DFT later). The timing issues can be appreciated
by observing that a long path in a clock structure might go through a DPLL, a clock divider, several
mode-switching multiplexers and several levels of clock gating.

9 ©2004 Synopsys, Inc.


While a tool such as Astro™ CTS (clock tree synthesis) synthesizes high-quality clock trees for typical
chips, complex gated clocks and dividers can require manual intervention, largely based on the need to
modify parts of the design outside the purview of the tool. This intervention may be needed to prevent
severe clock phase delay, for example. Clock phase delay might occur because registers and non-CTS
cells in a high-level clock hierarchy are placed far apart, causing an increase in high-level expanded clock
tree insertion delays and thus an increase in clock phase delay. Netweight-based placement control of
non-CTS cells can avoid the problem. This method involves extracting nets that connect the clock gating
cells, switching multiplexers and driven CTS macros, then applying heavy net weights to these nets to pull
the cells close to each other in the placement optimization. The optimization then minimizes the cells’ load
and hence cell delays and output slews.

A poor floorplan for clock distribution can also cause phase delay problems because clock tree synthesis
balances the clock tree according to the delay of the longest clock tree branch. A single long clock path
due to a poor floorplan therefore increases the entire clock tree insertion delay. Careful floorplanning
constraints for better clock tree balancing prevent this problem.

Other sources of clock phase delay are bad placement of non-CTS cells and large slew at non-CTS cell
outputs. The Synopsys Professional Services paper “Clock Distribution and Balancing In a Large and
Complex ASIC: Issues and Solutions” gives solutions to these problems as well as methods for dealing
with three other clock distribution issues: clock skew reduction, clock duty cycle distortion reduction and
clock gating efficiency (The paper is available at http://www.synopsys.com/sps/techpapers.html.). The
paper also provides a clock-balancing automation strategy. Manual clock tree analysis and balancing
methods are not suitable for complex ASIC designs due to time-to-market constraints. The automation
strategy involves three steps: extracting a common shared clock distribution topology, defining a local
balance strategy for each clock path that does not fit in the common clock distribution, and combining
these local balance constraints with the constraints of the common clock distribution. The result is a
clock tree synthesis constraint for the CTS tools to balance the complete clock distribution automatically.

Another timing issue is the clock glitch that can occur when restarting a clock asynchronously. (Figure 6
shows how this glitch occurs.) It is therefore necessary to include a circuit that times the restart to avoid
the glitch.

CLK1

Select
Out Clock

CLK0

CLK0
CLK1

Select
Out Clock

Glitch

Figure 6: Clock switching glitch — After “turning off” a clock using clock gating, the clock restart must be timed to
avoid the glitch shown here.

©2004 Synopsys, Inc. 10


Multiple voltage islands
While clock gating helps limit dynamic power consumption, the use of multiple supply and/or threshold
voltages can help manage both dynamic and leakage power. Threshold voltage does not have to scale
directly with supply voltage, though the two are related, as explained earlier.

The use of voltage islands or voltage domains offers a way to meet both power consumption and performance
requirements. In this scheme, sections of logic are grouped physically into separate regions according to
their functionality. The logic regions that must operate at the highest speed use the highest supply voltage,
while less timing-critical regions use lower supply voltages.

Frequency scaling is thus necessary along with the voltage scaling, so the voltage island approach works
well with clock gating. The logic in a clock-gated block constantly consumes leakage power, but reducing
the supply voltage to this block reduces the leakage.

Multiple supply voltages must be provided through separate power pins or analog voltage regulators
integrated into the device. The efficiency of these voltage regulators must be included in power calculations
for the device. If only a small portion of the design will operate at a lower voltage, more power may be lost
in the voltage regulator than is saved in the lower-voltage logic. Note that voltage island design may require
level-shifter cells to ensure a proper rail shift for signals traveling between voltage domains.

In addition to reducing supply voltages, it is possible to vary the supply voltage of an island depending on
system requirements. Among other challenges, this method requires the use of cells that have been
characterized at all voltages. Synopsys Scalable Polynomial Models (SPMs) support the necessary timing
and power information. Non-Linear look-up table Models (NLMs) can also be used for voltage-island designs.

An SoC can also be designed to power-down certain voltage islands to eliminate their leakage power.
Such islands require the use of power isolation cells, which can be simple AND gates. The outputs from a
powered-down section into an active power domain should never be allowed to float. Power isolation logic
ensures that all inputs to the active power domain are clamped to a stable value. Additionally, a state-
retention technique may be required so that the blocks can resume operation when powered-up.
Powering-down various islands’ voltages or scaling their voltages dynamically may also require power-
sequencing circuitry to ensure correct operation of the chip.

Multiple-threshold design
Multiple supply-voltage islands work well with multi-threshold synthesis. Optimization meets timing goals by
using low-Vth cells on critical timing paths and high-Vth cells on non-critical paths. Note that better leakage
quality of results can be obtained by using state-dependent leakage models, if the silicon vendor provides
such models.

A one or two-pass synthesis flow can be used for multi-threshold designs, depending on the design team’s
methodology or preference. Initial synthesis may be performed with the low-Vth, high-performance library,
followed by an incremental compile using multi-Vth libraries to reduce leakage current. For designs in
which both timing and leakage are important, one-pass synthesis uses multi-Vth libraries simultaneously.
The design is first optimized for timing, then leakage power optimization is performed without affecting the
achieved timing (i.e., the worst negative slack, or WNS). The timing optimization is not degraded by power
optimization. The power optimization is followed by area optimization. The use of multi-Vth libraries is rec-
ommended in the synthesis environment (using Power Compiler with Design Compiler or Physical
Compiler) when optimizing for leakage power for either the one- or two-pass flow.

The flow relies on the use of a reasonable leakage constraint, set in Power Compiler by the
set_max_leakage_power command.

11 ©2004 Synopsys, Inc.


Power optimization in synthesis
Synthesis tools have the ability to optimize designs for power consumption with techniques such as RTL
clock gating insertion and gate-level power optimization. These techniques are implemented by Power
Compiler in conjunction with Design Compiler and/or Physical Compiler.

RTL clock gating shuts down the clock to large register banks when the outputs of these flip-flops are not
needed. Figure 7Fig 7a: the difference between a clock gating circuit and the synchronous load enable
shows
circuit that Design Compiler would otherwise use. The feedback net and multiplexer of the synchronous
load enable circuit are replaced by a latch and a two-input gate inserted in the register’s clock net.
always@(posedge CLK)
if (EN) Synchronous Load-enable
D_out = D_in implementation without Clock Gating

Fig 7a:
elaborate D_out
D_in Reg
Bank

always@(posedge CLK) FSM EN


CLK
if (EN) Synchronous Load-enable
D_out = D_in implementation without Clock Gating

elaborate D_out
D_in Reg
Bank
EN
FSM
CLK

Fig 7b:

always@(posedge CLK)
if (EN) Synchronous Load-enable
D_out = D_in implementation with Clock Gating

elaborate
Fig 7b: -gate_clock D_in D_out
Reg
Bank
EN G_CLK
always@(posedge CLK) FSM Latch
CLK
if (EN) Synchronous Load-enable
D_out = D_in implementation with Clock Gating

elaborate -gate_clock
Figure 7: Power optimization during synthesis — Power Compiler automatically inserts clock-gating circuits, replac-
D_in D_out
Reg
ing typical Design Compiler implementations (a) with the gating circuit (b). Bank
EN G_CLK

FSM Latch
CLK
This type of clock gating has a relatively low impact on area because the gating circuits replace muxes
(and, in fact, reduces the area used by 5 to 15 percent). Power Compiler implements the gating automatically,
and it requires no RTL code change, though it is possible to specify the gating manually using a variety of
coding styles.

©2004 Synopsys, Inc. 12

Control point "OR" Gate +1 Data_out


Mux
Fig 7b:
Power Compiler also has the capability to replace the manually inserted clock gates with an ICG from the
always@(posedge
library. This feature helps supportCLK)
the legacy blocks or IPs that have manual clock gates throughout the
physical flow. Power Compiler recognizes Synchronous
if (EN) Load-enable
the ICG’s power-related attributes, which aid in the placement of
D_out =
such cells. For advanced D_in
users implementation
of clock gating, withhelps
Power Compiler Clockobtain
Gating
greater power savings by
performing multi-stage clock gating. In this technique, one clock gating cell feeds another clock gating cell
instead of a register bank. (This technique is also an RTL-based feature.)
elaborate -gate_clock
D_in D_out
RTL clock gating saves power in several ways. Internal power consumption Reg decreases because the clock
Bank
does not continuously feed register banks, switching EN power decreases
G_CLK because of reduced capacitance on

the clock network, and power decreases


CLK
further because downstream logic does not change.
FSM Latch

When Power Compiler works with Physical Compiler, the placements for the clock gating cells are optimized.
Within the Physical Compiler flow, Power Compiler makes sure that the gate element cells are placed close
together and that the gating element is placed close to the sequential elements it drives. This layout reduces
the clock skew that can otherwise occur with clock gating.

Clock gating can reduce a chip’s testability unless specific DFT features are added. Because the clock signal
is gated with an internal signal, a test engineer cannot control the loading of the DFT scan flip-flops. This
problem is avoided by adding a test pin and assigning a fixed value (1'b1) to it during test compilation. No
specific coding style is required. Figure 8 shows a clock gating circuit with a control point added.

Control point "OR" Gate +1 Data_out


Mux
Test_Mode
Sequential Enable
circuit
Clock
Reset

Figure 8: Clock gating circuit with added control point — Because clock gating makes part of the circuit untestable,
clock-gated designs require the addition of control points, as shown here.

The options of Power Compiler’s set_clock_gating_style command improves the chip’s testability by spec-
ifying the amount and type of testability logic added during clock gating. It is possible to add a control
point for testing before or after the clock-gating latch, for example, and choose test_mode or
scan_enable mode. Other options add observability logic or setup and hold-time margin. To use the Design
Compiler commands check_test or check_dft, use the following commands prior: hookup_testports and
set_test_hold 1 Test_Mode.

Note that clock gating should not be used on designs that have variables (or signals) from which Design
Compiler implements master/slave flip-flops. Design Compiler uses the clocked_on_also signal-type attrib-
ute in implementing these flip-flops. At the abstraction level at which clock-gating occurs, Power
Compiler does not recognize this attribute and will gate only the slave clock of the flip-flop. It is possible
to use the set_clock_gating_signals command to exclude specific design variables (or signals) that are
implemented as master-slave flip-flops: dc_shell> set_clock_gating_signals -design TOP -exclude
{ A B }

13 ©2004 Synopsys, Inc.


In general, the coding that works best is a basic synchronous load-enable implementation in one of four
styles that can be mixed or nested together:

■ “If–Else” statements
■ Conditional assignments
■ “Case” statements
■ “For” loops

In addition to RTL optimization, Power Compiler optimizes power simultaneously with timing and area using
the following gate-level optimization techniques (in order of priority):

■ Sizing
■ Technology mapping
■ Pin swapping
■ Factoring
■ Buffer insertion
■ Phase assignment

These optimizations require the use of a power-characterized library. Because Power Compiler maintains
timing automatically and keeps area within the designer’s constraints, the tool provides “push-button” power
savings at the gate level.

High-level power management example


To show the potential for high-level power management in SoCs, Synopsys partnered with ARM®, National
Semiconductor® and Artisan Components® to create a test chip that demonstrates dramatic power savings.
The chip uses specialized hardware and software to control the voltage and clock frequency of various
chip domains, applying high-level control for the voltage and frequency scaling techniques described earlier
in this paper.

The control elements include ARM Intelligent Energy Manager software that balances processor workload
and energy consumption. PowerWise hardware from National monitors performance and communicates
with voltage regulators to scale the supply voltage to the minimum operating level at each operating
frequency. This system compensates for silicon performance variations due to the manufacturing process
as well as run-time performance changes due to temperature fluctuations.

The 240-MHz chip is partitioned into three primary power domains: voltage-scaled CPU and memory power
domains and a standard fixed-voltage domain for the rest of the chip. The independent power domains
allow precise voltage control and current measurement for the CPU and RAM. Standard cells and level
shifters operate in the 0.7-1.32V range.

For cache-intensive workloads, both the power consumption and the precise time to process a workload
were measured to compare dynamic frequency scaling alone with dynamic voltage and frequency scaling.
Figure 9 summarizes the results normalized to the 1.2V operating voltage. Note that this diagram shows
the power savings only for the chip’s dynamic-voltage-and-frequency-scaling subsystem. Normally in such
SoCs, some of the chip will not be voltage scalable. Components such as external memories typically
operate at a fixed voltage, so design partitioning and planning must take into account the system-level
power savings.

©2004 Synopsys, Inc. 14


Figure 9: ARM test chip developed shows that a combination of power-saving techniques can reduce a device’s
power consumption dramatically.

The figure shows that voltage and frequency scaling can significantly reduce energy consumption compared
to frequency scaling alone. Running at 120 MHz cuts power requirements by half, for example, but scaling
the supply voltage at the same time slashes power consumption to about 20 percent of full power.

Summary
Dramatic power reductions such as those achieved by the Synopsys, ARM, National and Artisan test chip
are possible through a combination of high- and low-level power management techniques. The typical SoC
may not require all of these techniques, but mainstream solutions are available to meet all design requirements.

Choosing the right solutions depends on careful power analysis as well as understanding the capabilities
of available tools. Analyzing power requirements as early as possible in the design flow helps avoid power-
related disasters. Early analysis also makes power goals easier to attain because higher-level techniques
save the greatest amount of power.

15 ©2004 Synopsys, Inc.


700 East Middlefield Road, Mountain View, CA 94043 T 650 584 5000 www.synopsys.com

Synopsys and Vera are registered trademarks and SystemC and OpenVera are trademarks of Synopsys, Inc.
All other trademarks or registered trademarks mentioned in this release are the intellectual property
of their respective owners and should be treated as such. All rights reserved. Printed in the U.S.A.
©2004 Synopsys, Inc. 05/04.KF.WO.04-12222

You might also like