Professional Documents
Culture Documents
I. Introduction
Domino logic is used extensively in high-speed circuit
design. The main reason for the higher performance of
domino logic compared to static CMOS is the reduced input
capacitance seen by driver; gates in domino logic [I]. In
CMOS, both PMOS and NMOS transistors are driven at the
input stage whereas in domino logic, only NMOS transistors,
which have lower gate capacitance than PMOS, are driven.
The higher performance of domino logic comes at the
expense of higher power consumption. The switching activity
in domino circuits is, on the average, double that in CMOS
circuits. This leads to higher power consumption even though
the switched capacitance is lower.
The use of dual supply voltage promises to be an effective
way of reducing power consumption in digital circuits
[7][81[9][10]. However, its implementation in CMOS logic
necessitates the use of a level shifter whenever a low voltage
gate drives a high voltage gate. In the absence of a level
shifter, the PMOS transistors in the high voltage gate are not
completely turned off by the high output of the low voltage
gate. This causes significant energy wastage due to high
current flow from supply voltage to ground. The level shifters
limit the logic granularity at which dual voltages can be used,
which in turn reduces the effectiveness of using dual voltages
in CMOS logic. Clustered Voltage Scaling (CVS) [7][8] and
module level voltage scaling [91[10] is used in CMOS to
reduce the overhead of level shifters. Domino logic, however,
does not have this problem since there are no PMOS
transistors that are driven by the previous logic gates, making
it possible to use dual supply voltages at the gate level.
Shieh et. al. [2] use dual supply voltages. gate sizing, and
a contention-alleviated static keeper (CASK) to reduce power
consumption in domino circuits while keeping the delay
fixed. This approach needs two sepatate supply voltages for
the gates and a bias voltage for the CASK circuitry, which is
~
Jung et. al. [3] use
used to speed up V D to ~V D~ o interfaces.
Adit D.Singh
0-7803-8736-8/05/$20.0002005 IEEE.
533
ASP-DAC 2005
I",,
06
5
2
w."dmim)
15
15
Figure 1. Domino Lcigic 3-input AND gate with PMOS (a) and
NMOS (b) pull-up
4 SIE.014
llE114
I
05
25
35
N M E Wdhlumj
534
'mn'02
l"li
-DR3NPOCYNPD
-W?
hpo by NQD
..c.&U03 UW bf PPD
2mna .
-OR3
NPD by P W
-OR3
PPD by NPD
lOlEUl5
-AhQ3PPObyPPD
cAND3NPOblPPD
AND3 PPD ai NPD
2mEa15~-..- .AND3NPDwNPD
OR3 PPDbf@PG
OR3 PW by PPO
-0R3NPD
qPPD
-.r.OR3PPDb/CPD
. .
.-.
..................... , . . . ~..................
............
....... ~ ..., ......
........
..................
>"ut- -.
A....
.____
.....,............... ........
...... .............
6iOE.316
om?
0%
,mi
15m
>mr
30
Load Capacitance
Figure 7. Variation of E'"' for PPD and NPD AND3 and OR3 gates
with differentdrivers and load capacitances
inverters of the gates which will not be switching at the same
time. This reduces the area without decreasing the
performance significantly. We observed that putting onr
NMOS transistor of width W from power supply to the output
._
o m
om
,a+
1%
?am?
Ir4F
80
Load C q a c n a n c e
Figure 6. Vanation of E'"' for PPD and NPD AND3 and OR3
gates with different drivers and load capacitances
535
NPD Gates
To reduce the energy consumption of combinational
domino logic circuits, we propose replacing the fast, high
energy PPD gates on the non-critical paths with the slow, low
energy NPD gates. The total delay of the circuit remains the
same as the original circuit with only PPD gates.
We first represent the combinational circuit as a directed
acyclic graph (DAG), G(V,E). If the circuit has multiple
primary inputs (PIS), we create a dummy P1 vertex, Pld, which
fans-out to the original PIS. Underlying this is the assumption
that all inputs arrive simultaneously. Similarly for POs, we
create a dummy PO vertex, POd, which has fan-ins from the
original POs. Each vertex 'v' of the DAG has associated with
it the following information:
(1) The logic function computed by the gate
corresponding to the vertex.
Ugorithm PPD-NPD
nputs: Topologically sorted list of circuit vertices, V;
lutput: Circuit with off-critical path PPD gates replaced with
rJPD gates.
:or every vertex v in order of decreasing metric value {
If v has any predecessor mapped to NPD gate [
high-delay+ v.NNdelay;
low-energy+ v.NNenergy;
Else 1
high-delay+ v.PNdelay;
low-energy- v.PNenergy;
1
If((high-delay-v.delay)<_v.ts) {
f l a g t 0;
For every successor p of v [
If p is mapped to a NPD gate [
If((v.es+high_delay)?p.es) {
if((v.es+high-delay+p.NNdelay)sp.lfi flag1;break;
1
Else if((p.NNdelay-p.delay)>p.ts)
f l a g t 1;break;
Else [
If((v.es+highPdelay)3.es) [
If((v.es+high-delay+p.NPdelay)>p.lf)
flag+
I ;break;
Else if((p.NPdelay-p.delay)>p.ts)
flag+ 1;break;
I
If(flag=O) {
Map v to an NPD gate.
v.delay+ high-delay;
v.energy+ low-energy:
p.energy+ p.NNenergy;
1
Else {
p.delay+ p.NPdelay;
p.energy- p.NPenergy;
1
1
Update-Time-Slacks(\.)
I
Figure 8. Algorithm for replacing off-critical path PPD gates with
NPD gates
ietric, and attempt to replace the gate corresponding to the
vertex with the NPD equivalent. This might not always be
possible, even for a vertex with non-zero metric value,
because the metric for the vertex was computed i n step (a)
under the assumption that all other vertices are mapped to
PPD gates and hence the vertex had a lot of slack. As the
replacement proceeds, the slack available to a vertex keeps on
reducing and might not be sufficiently, large to allow
536
Circuit
C1908
C2670
C3540
Table 1. Results
Delay (sec)
#Gates
Initial
Energy (J)
Final
Energy (J)
8.62E-12
2.24E-11
1.91E-I I
2.47E-11
794
1253
1987
C75.52
Average
2 m u , 17 63%
17%
216u.77362
13%
075u
12%
1.75u
226u
2.75
3250
9.13
27.03
19198
20.24
CPU Time
24.55
10.2 1
22.24
40.88
16.25
11.78
324ll.l?9?%
025u
0.23
19%
14%
% Saving
4.598-3 1
6.04E- 1 1
7.95E-11
C5315
Fraction of
NPD Gates
37%
V. Methodology
We tested our scheme on the ISCASSS benchmark
circuits. The circuits were synthesized using Synopsys Design
Compiler to a target library that was reduced to have only two
to four input AND and OR gates, and INVERTER gates
for simplicity. Circuits were optimized for minimum delay.
Since domino logic can only implement non-inverting
functions, the resulting circuit was not suitable for mapping to
domino logic gates. We implemented the bubble pushing and
537
VI. Results
We implemented the algorithm in Figure 8 using C++.
The compiled program was run for each of the benchmark
circuits on a Sun Sparc Ultra-80 machine. As mentioned in
Section 4, the delay and energy characteristics of the NPD
gates can be varied by changing the width of the inverter
NMOS pull-up transistor, This variation in the delay and
energy characteristics leads to a variation in the number of
PPD gates that are replaced with NPD gates by our algorithm
and hence a variation in the overall energy savings obtained.
Figure 9 shows that as we increase the width of the inverter
NMOS pull-up in the NPD gates, the energy savings go up.
This is expected because both the delay and energy of NPD
gates goes down with this increased width as shown in
VII. Conclusion
We presented a method to save dynamic energy in domino
logic circuits by replacing the PMOS pull-up transistor by an
NMOS transistor and adding an additional NMOS transistor
between power supply and the output inverter. This method
gives an average energy saving of 16.25% for the ISCAS85
benchmark circuits. This method makes it possible to exploit
the advantages of using dual supply voltage without the
necessiiy of a second supply voltage or level shifters. Apart
REFERENCES
I. P. Uyemura, CMOS Logic Circuit Design, Kluwer
Academic Publishers, March 1999.
S . 1. Shieh, 1. S. Wang, Design of low-power domino circuits
using
multiple
supply
voltages,
IEEE International Conference on Electronics, circuits and
Systems, Sept. 2001, pp. 711 - 714. .
S . 0. Jung, K. W. Kim, S. M. Kang, Low-swing clock domino
logic incorporating dual supply and dual threshold voltages,
Design Automation Conference, June 2002, pp. 467 - 472.
M. R. Prasad, D. Kirkpatrick, R. K. Brayton, Domino logic
synthesis and technology mapping, In:. Workhop on Logic
Synthesis, 1997.
538