Professional Documents
Culture Documents
Chih-Wei Liu VLSI Signal Processing LAB National Chiao Tung University cwliu@twins.ee.nctu.edu.tw
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Outline
Power and Energy Dynamic Power Static Power Low Power Design
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Power is drawn from a voltage source attached to the VDD pin(s) of a chip. Instantaneous Power: Energy: Average Power:
Pavg
E 1 = = iDD (t )VDD dt T T 0
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Dynamic Power
Dynamic power is required to charge and discharge load capacitances when transistors switch. One cycle involves a rising and falling output. On rising output, charge Q = CVDD is required On falling output, charge is dumped to GND This repeats Tfsw times over an interval of T
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Activity Factor
Suppose the system clock frequency = f Let fsw = f, where = activity factor
If the signal is a clock, = 1 If the signal switches once per cycle, = Dynamic gates:
Switch either 0 or 2 times per cycle, = Depends on design, but typically = 0.1
Static gates:
Dynamic power:
Pdynamic = CVDD 2 f
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 6
When transistors switch, both nMOS and pMOS networks may be momentarily ON at once Leads to a blip of short circuit current. < 10% of dynamic power if rise/fall times are comparable for input and output
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Example
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Dynamic Example
Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many banks!) Estimate dynamic power consumption per MHz. Neglect wire capacitance and short-circuit current.
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Dynamic Example
Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many banks!) Estimate dynamic power consumption per MHz. Neglect wire capacitance.
Clogic = ( 20 106 ) (12 )( 0.05 m / )( 2 fF / m ) = 24nF Cmem = (180 106 ) ( 4 )( 0.05 m / )( 2 fF / m ) = 72nF Pdynamic = 0.1Clogic + 0.05Cmem (1.2 ) f = 8.6 mW/MHz
2
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 10
Static Power
Static power is consumed even when chip is quiescent. Ratioed circuits burn power in fight between ON transistors Leakage draws power from nominally OFF devices
Vgs Vt
I ds = I ds 0e
nvT
Vds vT 1 e
Vt = Vt 0 Vds +
s + Vsb s
)
11
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
Ratio Example
The chip contains a 32 word x 48 bit ROM Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high Find static power drawn by the ROM 2 = 75 A/V Vtp = -0.4V
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
12
Ratio Example
The chip contains a 32 word x 48 bit ROM Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high Find static power drawn by the ROM 2 = 75 A/V Vtp = -0.4V 2 Solution: (VDD Vtp )
I pull-up = 2 = 24A Ppull-up = VDD I pull-up = 29W Pstatic = (31 + 24) Ppull-up = 1.6 mW
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
13
Leakage Example
The process has two threshold voltages and two oxide thicknesses. Subthreshold leakage: 20 nA/m for low Vt 0.02 nA/m for high Vt Gate leakage: 3 nA/m for thin oxide 0.002 nA/m for thick oxide Memories use low-leakage transistors everywhere Gates use low-leakage transistors on 80% of logic
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 14
Estimate static power: High leakage: ( 20 106 ) ( 0.2 )(12 )( 0.05m / ) = 2.4 106 m Low leakage: ( 20 106 ) ( 0.8)(12 )( 0.05m / ) +
Pstatic
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
15
Estimate static power: High leakage: ( 20 106 ) ( 0.2 )(12 )( 0.05m / ) = 2.4 106 m Low leakage: ( 20 106 ) ( 0.8)(12 )( 0.05m / ) +
Pstatic
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
17
Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f: lowest suitable frequency Reduce static power
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
18
Reduce dynamic power : clock gating, sleep mode C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage f: lowest suitable frequency Reduce static power Selectively use ratioed circuits Selectively use low Vt devices Leakage reduction: stacked devices, body bias, low temperature
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 19
P = Ctotal V02 f
Propagation delay
Tpd =
k (V0 Vt )
Ccharge V0
2
Ccharge : the capacitance to be charged or discharged in a single clock cycle (along the critical path) V0 : the supply voltage; Vt : the threshold voltage K : technology parameter
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
20
Reduce
Capacitances
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
21
For an M-level pipelined architecture, the critical path is reduced to 1/M and the capacitance to be charged or discharged in a single cycle (i.e. Ccharge) is also reduced to 1/M If the same clock speed is maintained (i.e. f = 1/Tpd), only 1/M of the non-pipelined capacitance is required to be charged or discharged, which suggests voltage reduction Suppose the voltage can be reduced to V0 , the power consumption of a pipelined architecture becomes
Ppipelined = Ctotal ( V0 ) f
2
= 2 Pnon pipelined
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw 22
Remarks
Propagation delay of the original filter and the pipelined filter
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
23
k (V0 Vt )
Ccharge V0
propagation delay of the pipelined architecture Ccharge V0 Tpipelined = M 2 k ( V0 Vt ) setting the above two equations equal, the following quadratic equation can be obtained to solve
M ( V0 Vt ) = (V0 Vt )
2 2
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
24
x(n)
m1
m1
m1
D
m2
D
m2
y(n)
m2
Assume
y(n)
the multiplication and the addition take 10 u.t. and 2 u.t. respectively the capacitance of the multiplier is 5 times that of an adder In the fine-grain pipelined filter, the multiplier is broken into m1and m2, with 6 and 4 u.t. computation delay and 3 and 2 times capacitance of an adder supply voltage V0 and threshold voltage Vt are 5 and 0.6 respectively
Problems:
What is the supply voltage of the pipelined architecture if the clock periods are identical? What is the relative power consumption?
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
25
Solution
Calculate Ccharge: let CM, CA, Cm1 and Cm2 be the capacitance of the multiplier, the adder, and the two fine-grain pipelining parts of the pipelined FIR filter respectively
Notice the pipelining level M = 2 and the charging capacitance of the pipelined filter is one-half of that of the original one Solve
2 ( 5 0.6 ) = (5 0.6 )
2
V pipelined = V0 = 3.0165
For an L-parallel architecture, the charge capacitance remains the same, but the total capacitance (i.e. Ctotal) is increased L times The clock speed of the L-parallel architecture is reduced to 1/L (i.e. f = 1/LTpd) to maintain the same sample rate, which means the Ccharge is charged or discharged L times longer. The supply voltage can be reduced to V0 , since more time is allowed to charged or discharged the same capacitance. Thus the power consumption of the parallel architecture becomes
Pparallel = (L Ctotal ) ( V0 )
2
f L
= 2 Pnon parallel
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
27
Remarks
Propagation delay of the original filter and the parallel filter
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
28
Solving
Tnon -parallel =
k (V0 Vt )
Ccharge V0
2
Tparallel =
k ( V0 Vt )
Ccharge V0
2
= Tnon -parallel L
setting these two propagation delays equal, the following quadratic equation can be obtained to solve
L( V0 Vt ) = (V0 Vt )
2
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
29
Consider the following two FIR filters, with critical paths denoted in dash lines respectively
x(2k)
x(n)
D x(2k+1)
D D D y(n)
y(2k+1)
Assume
y(2k)
the multiplication and the addition take 8 u.t. and 1 u.t. respectively the capacitance of the multiplier is 8 times that of an adder both architectures operate at the sample period of 9 u.t. supply voltage V0 and threshold voltage Vt are 3.3 and 0.45 respectively
What is the supply voltage of the parallel architecture? What is the relative power consumption?
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
30
Solution
Calculate Ccharge: let CM and CA be the capacitance of the multiplier and the adder respectively
Notice the charging capacitance of the two architectures are not equal, and we cannot use the equation directly Solve 9C A V0 10C A V0 Tnon parallel = T = and parallel 2 2 k (V0 Vt ) k ( V0 Vt )
Q T parallel = 2 Tnon parallel 5 (V0 Vt ) = 9 ( V0 Vt )
2 2
The supply voltage of the parallel filter is V pipelined = V0 = 2.17437 Power consumption ratio is 2 = 43.41%
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
31
Parallel Processing
Block processing
the number of inputs processed in a clock cycle is referred to as the block size
x(n) SISO y(n)
x(n)
MIMO
y(n)
at the k-th clock cycle, three inputs x(3k), x(3k+1), and x(3k+2) are processed simultaneously to generate y(3k), y(3k+1), and y(3k+2)
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
32
I/O Conversion
x(n)
x(3k+2)
x(3k+1)
x(3k)
D
T/3
D
T/3
y(n)
sampling period
DIC-Lec16 cwliu@twins.ee.nctu.edu.tw
33