You are on page 1of 1

Hardware Efficient Fast FIR Filter Based on

Karatsuba Algorithm
Evangelos Kyritsis and Kiamal Pekmestzi
Microprocessors and Digital Systems Lab (MicroLab)
School of Electrical and Computer Engineering, National Technical University of Athens (NTUA), Athens, Greece
evkiritsis@gmail.com, pekmes@cs.ntua.gr
Building Blocks

Abstract
In this work, an efficient implementation of a programmable Finite Impulse Response (FIR) filter based
on the use of the Karatsuba Multiplication Algorithm (KMA) is presented. In this FIR filter circuit, a
parallel, Modified Booth (MB) pre-encoded, Carry-Save (CS) Wallace tree multiplier is used as a
building block. The KMA is a fast divide and conquer algorithm for the multiplication of large numbers.
As a result, the proposed circuit is highly efficient in terms of speed, area and power in comparison
with the conventional FIR filter architecture. Simulations of FIR filters in transposed form made over
standard-cell implementation based on an Faraday 90nm technology show an average reduction of
about 15% in the delay, 9% in area and 17% in power

A. Sub-filter
The three sub-filters are in transposed form
The sub-filters are composed by multipliers (MU), CSA Wallace trees (4:2) and the required delay
units (D)
Both the intermediate products (MUs results) and the final result of the sub-filters are in Carry-Save
form

x(n)

Main Objectives

4:2

y(n)

h1

MU

h2

MU

MU

MU

h3

4:2

h0

4:2

Propose a novel FIR filter architecture based on Karatsuba formula


Utilize Carry-Save form in order to decrease the critical path and speed-up calculations
Evaluate the proposed architecture by comparing it with the conventional FIR filter architecture

Karatsuba Formula
Let us consider two numbers and b of 2N bits. Each number can be divided to two sub-words of N
bits, as follows:
= 2 +
and
b= 2 +

= =

22

+ +

MU: Multiplication Unit

B. Multiplier
2

We compute also the next quantity:


= = + + = +
Consequently the term is given by the following relation:
= +

Parallel Wallace multiplier


Coefficients pre-encoded and stored in Modified
Booth form
Result in Carry-Save form
Only the part of the correction terms containing
the input carries is added
The part of the correction term containing the ones
from the sign extension of all partial products is
isolated in order to form a final term for all MUs
which is added only once

sign

Partial Product
Generator

one
two

MB Encoding

The product of these numbers is given by the next relation:

Fig. 2: Sub-filter structure in transposed form

Correction Terms
(only input c arries)

CSA Wallace
Tree

Fig. 3: Multiplication Unit

Finally for the product holds that:


= 2 + 2 + 1 2

Results
Proposed Architecture
The Karatsuba formula is applied in order to split the original filter into three sub-filters of reduced
dynamic range, working in parallel:
1

: =

() ( )

The proposed architecture and the conventional FIR filter were synthesized based on the Faraday
90nm technology library. The following tools have been used for synthesis and simulations: Synopsys
Design Compiler, PrimeTime, PrimePower and ModelSim. The synthesis constraints have been set for
optimal results without keeping the hierarchy of the designs. Power consumption has been estimated
by full timing simulations. The area and power measurements have been obtained, considering the
maximum clock frequency the conventional filter could synthesized.

=0

=0
1

: =

() ( )
=0

1.4
1.2
1
0.8
0.6
0.4
0.2
0

1.28

-16%
-16%

1.08

0.98

0.82

16

The required adders and hardwired shifters are used for the implementation of the input and output
conversions
The original filter output is rebuild from the three sub-results by implementing the Karatsuba
formula:
y = 2 + 2 + 1 2

Critical Time (nsec)

() ( )

Critical Time (nsec)

: =

Critical Time Delay (T= 32tap)

Critical Time Delay (T= 16tap)


2

-12%

1.5

1.07

0.9

0.5
0

16

32

KARATSUBA

1.37

-16%

KARATSUBA

Area (T= 16tap)


1

0.46

0.409

0.3
0.2

-11%

0.1

0.145 0.164

Area (mm2)

-11%

-7%

0.8

16

0.6
0.4
0.2

-6%

0.283 0.301
16

32

32

Bit

Bit

KARATSUBA

Fig. 1: Karatsuba FIR filter

KARATSUBA

CONVENTIONAL

Power (T= 16tap)


-19%

200
150
100
50

264.6

213.7
-20%

106.2

132.9

0
16

32

Power (mW)

Power (mW)

250

700
600
500
400
300
200
100
0

-17% 659.6

550.2
-11%

197.1 222.1
16

Bit
KARATSUBA

CONVENTIONAL

CONVENTIONAL

Power (T= 32tap)

300

We have presented an efficient FIR filter based on the Karatsuba formula. The architecture we have
designed is composed by three sub-filters of reduced dynamic range. A parallel, MB pre-encoded, CS
Wallace tree multiplier was designed and used as a building block. We choose to use CS arithmetic in
order to enhance delay savings. The proposed Karatsuba design show an improved performance, a
smaller circuit area and lower power consumption, compared with the conventional transposed FIR
filter.

0.924

0.86

Conclusions

CONVENTIONAL

Area (T= 32tap)

0.5

Area (mm2)

32

Bit

Bit
CONVENTIONAL

0.4

1.55

32

Bit
KARATSUBA

CONVENTIONAL

You might also like