You are on page 1of 8

LOW-POWER VLSI SIGNAL PROCESSING:

LOW COMPLEXITY DESIGN

Prof. Kaushik Roy


@ Purdue Univ.
VLSI Signal Processing

• Mainly multiply and accumulate operations

• Need to optimize multiply and add operations

• Need for proper rounding/truncation etc.

• Approximate nature of signal processing and its


applications

• Proper approximations to achieve low-power/error


Source: Intel
resiliency

Prof. Kaushik Roy


@ Purdue Univ.
Shared Multiplier

• Reduction of redundant computation by increasing


computation re-use
• Complexity reduction in FIR implementation

• High performance

• Low power

• Works efficiently if embedded in large DSP systems

Source: Intel

Prof. Kaushik Roy


@ Purdue Univ.
Vector scaling operation
X(n)

C0 C1 C2 CM-3 CM-2 CM-1

Y(n)
Z-1 Z-1 Z-1 Z-1 Z-1

< Transposed direct form FIR filter >

[ c0 , c1 , c2 ……….cM-2 , cM-1 ] X(n)

- FIR filtering operation can be expressed as a product


of coefficient vector C and scalar X(n)
Source: Intel
• Vector Scaling Operation , Y=C• x

Prof. Kaushik Roy


@ Purdue Univ.
Shared Multiplier Algorithm

• Specifically targets the reduction of redundant


computation in the vector scaling operation.

< Coefficient Decomposition >

c = 111010001100 c = 29 (111) + 27(1) + 22(11)


alphabet set = {1, 11, 111}
Alphabets - chosen basic bit sequences
Alphabet set - a set of alphabets that covers all the coefficients in vector C

c • x = 111010001100 • x
c • x = 29 (0111 • x ) + 27 (0001 • x ) + 22 (0011 • x )

if 0111 • x , 0001 • x and 0011 • x are available, c • x can be significantly


Source: Intel
simplified as add and shift operation

Prof. Kaushik Roy


@ Purdue Univ.
Shared Multiplier Architecture
Input 16 20
x 0: 1·x 1x (<<3)
1: 11·x (3x) MUX 1x AND
2: 101·x (5x) ISHIFTER
3: 111·x (7x)
( 8:1 ) gate
4: 1001·x (9x)
5: 1011·x (11x) 000 3 Select unit
6: 1101·x (13x)
16
7: 1111·x (15x)
20
SHIFTER 1000x
Precomputer bank 1000 1000  0001
( 8 alphabets)
111x (<<1) Product
MUX 111x AND
11101000 • x
Coefficient ISHIFTER Adder
( 8:1 ) gate
…. 11101000
011 1 1110x (<< 4)

SHIFTER
1110 Adder
1110  0111
Source: Intel

Prof. Kaushik Roy


@ Purdue Univ.
1616 Shared Multiplier Implementation

16 Select units & Adders


X

Bank of
Precomputers
Select
Unit
• 16  16 Wallace tree multiplier (WTM)
and carry save array multiplier (CSAM)
C0 - 3 4
are also implemented for comparison.
Critical Path
Select
Unit
C4 -7 Carry
4 Select units
Save Precomputer WTM CSAM
X•C & Adders
Adder
Select Delay 6.923 ns 11.231 ns 16.638 ns 23.398 ns
Unit
C8-11 Power 18.06 mW 18.91 mW 22.80 mW 21.78 mW
4

Area 162340 µm2 252120 µm2 241000 µm2 175640 µm2


Select
Unit • CMU library (0.35 µm technology)
Source: Intel
C12-15
4

Prof. Kaushik Roy


@ Purdue Univ.
FIR filter using Shared Multiplier

X(n) Precomputer
Z-1
Bank

Select Select Select Select


C0 Units & C1 Units & CM-2 Units & CM-1 Units &
Adders Adders Adders Adders
M 1
yn    ci  xn  i 
Z-1 Z-1 Z-1 Z-1
i 0

y(n) Adder Z-1 Adder Adder Z-1 Adder

• Computations ak• x are performed just once for all alphabets


and these values are shared by all the select units

• Only select unit and adders and lie on the critical path
Source: Intel

Prof. Kaushik Roy


@ Purdue Univ.

You might also like