You are on page 1of 33

Digital Integrated Circuits Lecture 16: Design for Low Power

Chih-Wei Liu VLSI Signal Processing LAB National Chiao Tung University cwliu@twins.ee.nctu.edu.tw

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Outline

Power and Energy Dynamic Power Static Power Low Power Design

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Power and Energy

Power is drawn from a voltage source attached to the VDD pin(s) of a chip. Instantaneous Power: Energy: Average Power:

P(t ) = iDD (t )VDD


E = P (t )dt = iDD (t )VDD dt
0 0 T T

Pavg

E 1 = = iDD (t )VDD dt T T 0

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Dynamic Power

Dynamic power is required to charge and discharge load capacitances when transistors switch. One cycle involves a rising and falling output. On rising output, charge Q = CVDD is required On falling output, charge is dumped to GND This repeats Tfsw times over an interval of T

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Dynamic Power Cont.


Pdynamic 1 = iDD (t )VDD dt T 0 VDD iDD (t )dt = T 0 VDD = [TfswCVDD ] T = CVDD 2 f sw
DD sw

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Activity Factor

Suppose the system clock frequency = f Let fsw = f, where = activity factor

If the signal is a clock, = 1 If the signal switches once per cycle, = Dynamic gates:

Switch either 0 or 2 times per cycle, = Depends on design, but typically = 0.1

Static gates:

Dynamic power:

Pdynamic = CVDD 2 f
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 6

Short Circuit Current

When transistors switch, both nMOS and pMOS networks may be momentarily ON at once Leads to a blip of short circuit current. < 10% of dynamic power if rise/fall times are comparable for input and output

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Example

200 Mtransistor chip

20M logic transistors

Average width: 12 Average width: 4

180M memory transistors

1.2 V 100 nm process Cg = 2 fF/m

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Dynamic Example

Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many banks!) Estimate dynamic power consumption per MHz. Neglect wire capacitance and short-circuit current.

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Dynamic Example

Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many banks!) Estimate dynamic power consumption per MHz. Neglect wire capacitance.
Clogic = ( 20 106 ) (12 )( 0.05 m / )( 2 fF / m ) = 24nF Cmem = (180 106 ) ( 4 )( 0.05 m / )( 2 fF / m ) = 72nF Pdynamic = 0.1Clogic + 0.05Cmem (1.2 ) f = 8.6 mW/MHz
2
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 10

Static Power

Static power is consumed even when chip is quiescent. Ratioed circuits burn power in fight between ON transistors Leakage draws power from nominally OFF devices
Vgs Vt

I ds = I ds 0e

nvT

Vds vT 1 e

Vt = Vt 0 Vds +

s + Vsb s

)
11

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

Ratio Example

The chip contains a 32 word x 48 bit ROM Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high Find static power drawn by the ROM 2 = 75 A/V Vtp = -0.4V

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

12

Ratio Example

The chip contains a 32 word x 48 bit ROM Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high Find static power drawn by the ROM 2 = 75 A/V Vtp = -0.4V 2 Solution: (VDD Vtp )
I pull-up = 2 = 24A Ppull-up = VDD I pull-up = 29W Pstatic = (31 + 24) Ppull-up = 1.6 mW

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

13

Leakage Example

The process has two threshold voltages and two oxide thicknesses. Subthreshold leakage: 20 nA/m for low Vt 0.02 nA/m for high Vt Gate leakage: 3 nA/m for thin oxide 0.002 nA/m for thick oxide Memories use low-leakage transistors everywhere Gates use low-leakage transistors on 80% of logic
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 14

Leakage Example Cont.

Estimate static power: High leakage: ( 20 106 ) ( 0.2 )(12 )( 0.05m / ) = 2.4 106 m Low leakage: ( 20 106 ) ( 0.8)(12 )( 0.05m / ) +

(180 10 ) ( 4 )( 0.05m / ) = 45.6 10 m


6 6

I static = ( 2.4 106 m ) ( 20nA / m ) / 2 + ( 3nA / m ) +


6

( 45.6 10 m ) ( 0.02nA / m ) / 2 + ( 0.002nA / m )

Pstatic

= 32mA = I staticVDD = 38mW

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

15

Leakage Example Cont.

Estimate static power: High leakage: ( 20 106 ) ( 0.2 )(12 )( 0.05m / ) = 2.4 106 m Low leakage: ( 20 106 ) ( 0.8)(12 )( 0.05m / ) +

(180 10 ) ( 4 )( 0.05m / ) = 45.6 10 m


6 6

I static = ( 2.4 106 m ) ( 20nA / m ) / 2 + ( 3nA / m ) +


6

( 45.6 10 m ) ( 0.02nA / m ) / 2 + ( 0.002nA / m )

Pstatic

= 32mA = I staticVDD = 38mW

If no low leakage devices, Pstatic = 749 mW (!)


DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 16

Low Power Design

Reduce dynamic power : C: VDD: f: Reduce static power

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

17

Low Power Design

Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f: lowest suitable frequency Reduce static power

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

18

Low Power Design

Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f: lowest suitable frequency Reduce static power Selectively use ratioed circuits Selectively use low Vt devices Leakage reduction: stacked devices, body bias, low temperature
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 19

Recall CMOS Power Consumption

(Dynamic) power dissipation

P = Ctotal V02 f

Ctotal: the total capacitance of the CMOS circuit

Propagation delay

Tpd =

k (V0 Vt )

Ccharge V0
2

Ccharge : the capacitance to be charged or discharged in a single clock cycle (along the critical path) V0 : the supply voltage; Vt : the threshold voltage K : technology parameter

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

20

Reduce

Capacitances

Transistor/Gate C Load C Interconnects External

Activity Frequency Power supply Some comments


Off-chip connections have high capacitive load System integration

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

21

Pipelining for Low Power (1/2)

For an M-level pipelined architecture, the critical path is reduced to 1/M and the capacitance to be charged or discharged in a single cycle (i.e. Ccharge) is also reduced to 1/M If the same clock speed is maintained (i.e. f = 1/Tpd), only 1/M of the non-pipelined capacitance is required to be charged or discharged, which suggests voltage reduction Suppose the voltage can be reduced to V0 , the power consumption of a pipelined architecture becomes

Ppipelined = Ctotal ( V0 ) f
2

= 2 Pnon pipelined
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 22

Remarks
Propagation delay of the original filter and the pipelined filter

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

23

Pipelining for Low Power (2/2)

Solving propagation delay of the original architecture


Tnon -pipelined =

k (V0 Vt )

Ccharge V0

propagation delay of the pipelined architecture Ccharge V0 Tpipelined = M 2 k ( V0 Vt ) setting the above two equations equal, the following quadratic equation can be obtained to solve
M ( V0 Vt ) = (V0 Vt )
2 2

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

24

Example: Reduce Power by Pipelining

Consider the following two FIR filters.


x(n)

x(n)

m1

m1

m1

D
m2

D
m2

y(n)
m2

Assume

y(n)

the multiplication and the addition take 10 u.t. and 2 u.t. respectively the capacitance of the multiplier is 5 times that of an adder In the fine-grain pipelined filter, the multiplier is broken into m1and m2, with 6 and 4 u.t. computation delay and 3 and 2 times capacitance of an adder supply voltage V0 and threshold voltage Vt are 5 and 0.6 respectively

Problems:
What is the supply voltage of the pipelined architecture if the clock periods are identical? What is the relative power consumption?

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

25

Solution

Calculate Ccharge: let CM, CA, Cm1 and Cm2 be the capacitance of the multiplier, the adder, and the two fine-grain pipelining parts of the pipelined FIR filter respectively

Non-pipelined: Ccharge= CM + CA = 6CA Pipelined: Ccharge= Cm1 = Cm2 +CA = 3CA

Notice the pipelining level M = 2 and the charging capacitance of the pipelined filter is one-half of that of the original one Solve

2 ( 5 0.6 ) = (5 0.6 )
2

50 2 31.36 + 0.72 = 0 = 0.6033 or 0.0239 (invalid,Q V0 = 0.1195 < Vt )

The supply voltage of the pipelined filter is

V pipelined = V0 = 3.0165

Power consumption ratio is 2 = 36.4%


DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 26

Parallel Processing for Low Power (1/2)

For an L-parallel architecture, the charge capacitance remains the same, but the total capacitance (i.e. Ctotal) is increased L times The clock speed of the L-parallel architecture is reduced to 1/L (i.e. f = 1/LTpd) to maintain the same sample rate, which means the Ccharge is charged or discharged L times longer. The supply voltage can be reduced to V0 , since more time is allowed to charged or discharged the same capacitance. Thus the power consumption of the parallel architecture becomes
Pparallel = (L Ctotal ) ( V0 )
2

f L

= 2 Pnon parallel

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

27

Remarks
Propagation delay of the original filter and the parallel filter

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

28

Parallel Processing for Low Power (2/2)

Solving

propagation delay of the original architecture

Tnon -parallel =

k (V0 Vt )

Ccharge V0
2

propagation delay of the parallel architecture

Tparallel =

k ( V0 Vt )

Ccharge V0
2

= Tnon -parallel L

setting these two propagation delays equal, the following quadratic equation can be obtained to solve

L( V0 Vt ) = (V0 Vt )
2

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

29

Example: Reduce Power by Parallel

Consider the following two FIR filters, with critical paths denoted in dash lines respectively
x(2k)

x(n)

D x(2k+1)
D D D y(n)

y(2k+1)

Assume

y(2k)

the multiplication and the addition take 8 u.t. and 1 u.t. respectively the capacitance of the multiplier is 8 times that of an adder both architectures operate at the sample period of 9 u.t. supply voltage V0 and threshold voltage Vt are 3.3 and 0.45 respectively

What is the supply voltage of the parallel architecture? What is the relative power consumption?

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

30

Solution

Calculate Ccharge: let CM and CA be the capacitance of the multiplier and the adder respectively

Non-parallel: Ccharge= CM + CA = 9CA Parallel: CM +2CA = 10CA

Notice the charging capacitance of the two architectures are not equal, and we cannot use the equation directly Solve 9C A V0 10C A V0 Tnon parallel = T = and parallel 2 2 k (V0 Vt ) k ( V0 Vt )
Q T parallel = 2 Tnon parallel 5 (V0 Vt ) = 9 ( V0 Vt )
2 2

98.01 2 67.3425 + 1.8225 = 0 = 0.6589 or 0.0282 (invalid,Q V0 = 0.09306 < Vt )


The supply voltage of the parallel filter is V pipelined = V0 = 2.17437 Power consumption ratio is 2 = 43.41%

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

31

Parallel Processing

Block processing

the number of inputs processed in a clock cycle is referred to as the block size
x(n) SISO y(n)

x(n)

Serial to Parallel Converter

x(3k) x(3k+1) x(3k+2)

MIMO

y(3k) y(3k+1) y(3k+2)

Parallel to Serial Converter

y(n)

at the k-th clock cycle, three inputs x(3k), x(3k+1), and x(3k+2) are processed simultaneously to generate y(3k), y(3k+1), and y(3k+2)

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

32

I/O Conversion

Serial to parallel converter


sampling period T/3 T/3

x(n)

x(3k+2)

x(3k+1)

x(3k)

Parallel to serial converter


y(3k+2) y(3k+1) y(3k)
3k

D
T/3

D
T/3

y(n)

sampling period

DIC-Lec16 cwliu@twins.ee.nctu.edu.tw

33

You might also like