Karatsuba FIR

Hardware Efficient Fast FIR Filter Based on
Karatsuba Algorithm
Evangelos Kyritsis and Kiamal Pekmestzi
Microprocessors and Digital Systems Lab (MicroLab)
School of Electrical and Computer Engineering, National Technical University of Athens (NTUA), Athens, Greece
evkiritsis@gmail.com, pekmes@cs.ntua.gr
Building Blocks
Abstract
In this work, an efficient implementation of a programmable Finite Impulse Response (FIR) filter based
on the use of the Karatsuba Multiplication Algorithm (KMA) is presented. In this FIR filter circuit, a
parallel, Modified Booth (MB) pre-encoded, Carry-Save (CS) Wallace tree multiplier is used as a
building block. The KMA is a fast divide and conquer algorithm for the multiplication of large numbers.
As a result, the proposed circuit is highly efficient in terms of speed, area and power in comparison
with the conventional FIR filter architecture. Simulations of FIR filters in transposed form made over
standard-cell implementation based on an Faraday 90nm technology show an average reduction of
about 15% in the delay, 9% in area and 17% in power
A. Sub-filter
The three sub-filters are in transposed form
The sub-filters are composed by multipliers (MU), CSA Wallace trees (4:2) and the required delay
units (D)
Both the intermediate products (MUs results) and the final result of the sub-filters are in Carry-Save
form
x(n)
Main Objectives
4:2
y(n)
h1
MU
h2
MU
MU
MU
h3
4:2
h0
4:2
Propose a novel FIR filter architecture based on Karatsuba formula

Utilize Carry-Save form in order to decrease the critical path and speed-up calculations
Evaluate the proposed architecture by comparing it with the conventional FIR filter architecture
Karatsuba Formula
Let us consider two numbers and b of 2N bits. Each number can be divided to two sub-words of N
bits, as follows:
= 2 +
and
b= 2 +
= =
22
+ +
MU: Multiplication Unit
B. Multiplier
2
We compute also the next quantity:

= = + + = +
Consequently the term is given by the following relation:
= +
Parallel Wallace multiplier

Coefficients pre-encoded and stored in Modified
Booth form
Result in Carry-Save form
Only the part of the correction terms containing
the input carries is added
The part of the correction term containing the ones
from the sign extension of all partial products is
isolated in order to form a final term for all MUs
which is added only once
sign
Partial Product
Generator
one
two
MB Encoding
The product of these numbers is given by the next relation:
Fig. 2: Sub-filter structure in transposed form
Correction Terms
(only input c arries)
CSA Wallace
Tree
Fig. 3: Multiplication Unit
Finally for the product holds that:

= 2 + 2 + 1 2
Results
Proposed Architecture
The Karatsuba formula is applied in order to split the original filter into three sub-filters of reduced
dynamic range, working in parallel:
1
: =
() ( )
The proposed architecture and the conventional FIR filter were synthesized based on the Faraday
90nm technology library. The following tools have been used for synthesis and simulations: Synopsys
Design Compiler, PrimeTime, PrimePower and ModelSim. The synthesis constraints have been set for
optimal results without keeping the hierarchy of the designs. Power consumption has been estimated
by full timing simulations. The area and power measurements have been obtained, considering the
maximum clock frequency the conventional filter could synthesized.
=0
=0
1
: =
() ( )
=0
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1.28
-16%
-16%
1.08
0.98
0.82
16
The required adders and hardwired shifters are used for the implementation of the input and output
conversions
The original filter output is rebuild from the three sub-results by implementing the Karatsuba
formula:
y = 2 + 2 + 1 2
Critical Time (nsec)
() ( )
Critical Time (nsec)
: =
Critical Time Delay (T= 32tap)
Critical Time Delay (T= 16tap)

2
-12%
1.5
1.07
0.9
0.5
0
16
32
KARATSUBA
1.37
-16%
KARATSUBA
Area (T= 16tap)

1
0.46
0.409
0.3
0.2
-11%
0.1
0.145 0.164
Area (mm2)
-11%
-7%
0.8
16
0.6
0.4
0.2
-6%
0.283 0.301
16
32
32
Bit
Bit
KARATSUBA
Fig. 1: Karatsuba FIR filter
KARATSUBA
CONVENTIONAL
Power (T= 16tap)

-19%
200
150
100
50
264.6
213.7
-20%
106.2
132.9
0
16
32
Power (mW)
Power (mW)
250
700
600
500
400
300
200
100
0
-17% 659.6
550.2
-11%
197.1 222.1
16
Bit
KARATSUBA
CONVENTIONAL
CONVENTIONAL
Power (T= 32tap)
300
We have presented an efficient FIR filter based on the Karatsuba formula. The architecture we have
designed is composed by three sub-filters of reduced dynamic range. A parallel, MB pre-encoded, CS
Wallace tree multiplier was designed and used as a building block. We choose to use CS arithmetic in
order to enhance delay savings. The proposed Karatsuba design show an improved performance, a
smaller circuit area and lower power consumption, compared with the conventional transposed FIR
filter.
0.924
0.86
Conclusions
CONVENTIONAL
Area (T= 32tap)
0.5
Area (mm2)
32
Bit
Bit
CONVENTIONAL
0.4
1.55
32
Bit
KARATSUBA
CONVENTIONAL

Karatsuba FIR

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Karatsuba FIR

Uploaded by

Copyright:

Available Formats

Hardware Efficient Fast FIR Filter Based on

Propose a novel FIR filter architecture based on Karatsuba formula

MU: Multiplication Unit

We compute also the next quantity:

Parallel Wallace multiplier

The product of these numbers is given by the next relation:

Fig. 2: Sub-filter structure in transposed form

Fig. 3: Multiplication Unit

Finally for the product holds that:

Critical Time (nsec)

Critical Time (nsec)

Critical Time Delay (T= 32tap)

Critical Time Delay (T= 16tap)

Area (T= 16tap)

Fig. 1: Karatsuba FIR filter

Power (T= 16tap)

Power (T= 32tap)

Area (T= 32tap)

You might also like