You are on page 1of 6

1" In!' I Conf.

on Recent Advances in Information Technology I RAIT-2012 I

Design and Implementation of Two Variable


Multiplier Using KCM and Vedic Mathematics
L. Sriraman, T. N. Prabakar
Department of ECE
Oxford Engineering College, Trichy, Tamilnadu, India.
tnpn]s@gmail.com, tnprabakar@gmail.com

Abstract: In this paper, a novel multiplier architecture based


on ROM approach using Vedic Mathematics is proposed. This
multiplier's architecture is similar to that of a Constant Co presents some effective algorithms which can be applied to
efficient Multiplier (KCM). However, for KCM one input is to
various branches of Engineering such as Computing and
be fixed, while the proposed multiplier can multiply two
Digital Signal Processing.
variables. The proposed multiplier is implemented on a
Cyclone III FPGA, compared with Array Multiplier and
Urdhava Multiplier for both 8 bit and 16 bit cases and the II. ARRAY MULTIPLIER
results are presented. The proposed multiplier is 1.5 times
faster than the other multipliers for 16x16 case and consumes In Array multiplier [2], AND gates are used for
only 76% area for 8x8 multiplier and 42% area for 16x16 generation of the bit-products and adders for accumulation
multiplier. of generated bit products. All bit-products are generated in
parallel and collected through an array of full adders or any
other type of adders. Since the array multiplier is having a
regular structure, wiring and the layout are done in a much
Keywords: KCM; Urdhava; Vedic Maths; Array Multiplier;
simplified manner. Therefore, among other multiplier
FPGA.
structures, array multiplier takes up the least amount of area.
But it is also the slowest with the latency proportional to
I. INTRODUCTION
O(Wct), where Wd is the word length of the operand.
Example I describes the multiplication process using array
Multiplication is one of the more silicon-intensive
multiplier and Fig.l depicts the structure of the same.
functions, especially when implemented in Programmable
Instead of Ripple Carry Adder (RCA), here Carry Save
Logic. Multipliers are key components of many high
Adder (CSA) is used for adding each group of partial
performance systems such as FIR filters, Microprocessors,
product terms, because RCA is the slowest adder among all
Digital Signal Processors, etc. A system's performance is
other types of adders available. In case of multiplier with
generally determined by the performance of the multiplier,
CSA [5], partial product addition is carried out in Carry
because the multiplier is generally the slowest element in
save form and RCA is used only in final addition.
the system. Furthermore, it is generally the most area
consuming. Hence, optimizing the speed and area of the
Example 1: (1011 xiIOl) 10001111
=
multiplier is a major design issue.
1011
I I0 Ix
Vedic mathematics [I] is the ancient Indian system of 1011
mathematics which mainly deals with Vedic mathematical 0 0 0 0 ---. Left Shift by I bit
formulae and their application to various branches of I0 I I ---. Left Shift by 2 bit
mathematics. The word 'Vedic' is derived from the word 101 1 ---. Left Shift by 3 bit
'Veda' which means the store-house of all knowledge. ------

Vedic mathematics was reconstructed from the ancient 1000 1 1 1 1


Indian scriptures (Vedas) by Sri Bharati Krshna Tirthaji
(1884-1960), after his eight years of research on Vedas [1]. Here from the above example it is inferred that partial
According to his research, Vedic mathematics is mainly products are generated sequentially, which reduces the
based on sixteen principles or word-formulae which are speed of the multiplier. However the structure of the
termed as Sutras. This is a very interesting field and multiplier is regular.

978-1-4577-0697-4/12/$26.00 2012 IEEE


1" In!' I Conf. on Recent Advances in Information Technology I RAIT-2012 I

Une 0 0
Diagram
for2 0 ! X ! 0
Digits 1 2 3

Une 0 0 0
Diagram
or 3 0 0 0 X >K
Digits 1 2 3
0 0
X: ! 0 0
4 5
Une 0 0 0 0
! :X
0
Diagram
for4 0 0 0 0 0 >K
Digits 1 2 3
0
X
0 v

P, P, P, P
,

Fig. 1: Array Multiplier using CSA Hardware Architecture


P, P, P,
>K 0 0 0 0
4 5 6
0 0 0
III. URDHAVA MULTIPLIER
! 0 0 0
7
Urdhava Tiryakbhyam [1][3] (Vertically and
Fig. 2: Line Diagram for Urdhava Multiplication of 2, 3 and 4 digits
Crosswise), is one of Sixteen Vedic Sutras and deals with
the multiplication of numbers. The sutra is illustrated in
From the Example 2, it is observed that all the partial
Example 2 and the hardware architecture is depicted in
products are generated in parallel. So the speed of the
Fig.3. In this example two decimal numbers (31 x 35) are
multiplier is higher compared to array multiplier.
multiplied. Line diagram for the multiplication of two, three
and four digit numbers is shown in Fig. 2 using Urdhava The above discussions can now be extended to
Method. The digits on the two ends of the line are multiplication of binary number system with the preliminary
multiplied and the result is added with the previous carry. knowledge that the multiplication of two bits aa and ba is
When three or more lines are present, all the results are just an AND operation and can be implemented using
added to the previous carry. The least significant digit of the simple AND gate. To illustrate this multiplication scheme in
number thus obtained acts as one of the result digit and the binary number system, consider the multiplication of two
rest act as the carry for the next step. Initially the carry is binary numbers a3a2alaa and b3b2bl ba. As the result of this
taken to be zero. multiplication would be more than 4 bits, the product is
expressed as r7r6rSr4r3r2rlra. Least significant bit ra is
obtained by multiplying the least significant bits of the
Example 2: 31x 35 = 1085 multiplicand and the multiplier as shown in the Fig.2. The
digits on both sides of the line are multiplied and added with
3 1 1 3 1 3 the carry from the previous step. This generates one of the
I X I bits of the result (rn) and a carry (Cn). This carry is added in
3 5x 5x 3 5 3x the next step and thus the process goes on. If more than one
line are there in one step, all the results are added to the
5 15+3=18 9+ 1 =10 previous carry. In each step, least significant bit acts as the
result bit and the other entire bits act as carry. For example,
Carry to next stage
if in some intermediate step, we get 110, then 0 will act as
result bit and 11 as the carry (referred to as Cn in this text). It
Answer: 31 x 35 = 1085
should be clearly noted that Cn may be a multi-bit number.
Thus the following expressions (1) to (7) are derived:

a b ra aaba
= ... (1)
clrl alba + aobl
= ... (2)
c d x C2r2 Cl + a2ba + alb I + aob2
= ... (3)
C3r3 C2 + a3ba + a2bl + alb2 + aab3
= ... (4)
(ad cb) C4r4 C3 + a3bl + a2b2 + alb3 ... ( 5)
ac bd =

csrs C4 + a3b2 + a2b3


= ... (6)
C6r6 Cs + a3b3
= ... (7)
In case, if there is a carry in (ad+bc) term, that is added to
ac.
1" In!' I Conf. on Recent Advances in Information Technology I RAIT-2012 I

with c6r6rSr4r3r2r1rO being the final product. Partial products sum by one bit. If the squares of the numbers are stored in a
are calculated in parallel and hence the delay involved is ROM, the result can be instantaneously calculated.
just the time it takes for the signal to propagate through the However, in case of Odd difference, the process is different
gates. as the average is a floating point number. In order to handle
floating point arithmetic, Ekadikena Purvena - the Vedic
Sutra which is used to find the square of numbers end with
5 is applied. Example 5 illustrates this. In this case, instead
of squaring the average and deviation, [Average x (Average
+ 1)] - [Deviation x (Deviation+I)] is used. However,
instead of performing the multiplications, the same ROM is
used and using equation (10) the result of multiplication is
obtained.
2
', r, ', ', ', r, ', " n(n+l) = (n +n) ... (10)
teMplJl) 2
Here n is obtained from the ROM and is added with the
address which is equal to n(n+l). The sample ROM
Fig.3 Urdhava Multiplier Hardware Architecture
contents are given in Table 1.
TABLE 1: ROM CONTENTS
The main advantage of the Vedic Multiplication
algorithm ( Urdhava Tiryakbhyam Sutra) stems from the fact
Address Memory Content (Square)
that it can be easily implemented in FPGA due to its
1 1
simplicity and regularity [3]. The digital hardware
2 4
realization of a 4-bit multiplier using this Sutra is shown in
Fig. 3. This hardware design is very similar to that of the 3 9
array multiplier where an array of adders is required to 4 16
arrive at the final product. Here in Urdhava, all the partial .. . ...
products are calculated in parallel and the delay associated
is mainly the time taken by the carry to propagate through Thus, division and multiplication operations are
the adders. effectively converted to subtraction and addition operations
using Vedic Maths. Square of both Average and Deviation
IV. PRO POSED METHOD is read out simultaneously by using a two port memory to
reduce memory access time.
The proposed method is based on ROM approach
however both the inputs for the multiplier can be variables. Example 3: 16x 12 192 =

In this proposed method a ROM is used for storing the 1) Find the difference between (16-12) = 4 -7 Even Number
2
squares of numbers as compared to KCM where the 2) For Even Difference, Product = [Average] - [Deviation]
2
multiples are stored.
i. Average = [(a+b)/2] = [(16+12)12] = [28/2] = 14
Method: To find (a x b), first we have to find whether the
ii. Smallest(a,b) = smallest(l6,12) =12
difference between 'a' and 'b' is odd or even. Based on the
iii. Deviation = Average - Smallest (a,b) = 14 -12 =2
difference, the product is calculated using (8) and (9). 2 2
3) Product = 14 - 2 = 196 - 4 = 192
i. In case of Even Difference
Example 4: 15x 12 180
2 2
=

Result of Multiplication= [Average] - [Deviation] I) Find the difference between (15-12)=3 -7 Odd Number
... (8) 2) For Odd Number Difference find the Average and
Deviation.
11. In case of Odd Difference i.Average = [(a+b)/2] = [(12+15)/2] = 13.5
Result of Multiplication = [Average x (Average + 1)] - ii.Deviation = [Average - smallest(a, b)] =
[Deviation x (Deviation+I)] ... (9) [12.5 - smallest(l3,12)] = [13.5 - 12] = 1.5
Where, Average = [(a+b)/2] and Deviation = [Average - 3)Product = (l3xI4) - (lx2) = 182 - 2 =180
smallest(a,b)]
2
Example 5: 25 =625
Example 3 (Even difference) and Example 4 (Odd 1) To find the square of 25, first find the square of 5 which
difference) depict the multiplication process. Thus the two is 25 and put 2 in the tens place and 5 in the ones place of
variable multiplication is performed by averaging, squaring the answer respectively.
and subtraction. To find the average[(a+b)/2], which 2) To find the number in the hundreds place, multiply 2 by
involves division by 2 is performed by right shifting the its immediate next number, 3, which is equal to (2x3) = 6
1" In!' I Conf. on Recent Advances in Information Technology I RAIT-2012 I

2
3) Answer 25 = 625

FigA depicts the RTL view of the proposed multiplier


for 4x4 as a sample case, implemented on a Cyclone II
device. SxS multiplier is implemented using ROM Criterion Array Urdhava Proposed
approach, by storing the squares of the numbers in the Multiplier Multiplier Multiplier
memory starting from 0000 0000 to 1111 1111. The Area
memory requirement for an SxS multiplication will be SKB. Total Combinational
738 694 291
Functions
But in the case of 16xl6 multiplier the memory requirement
16 Dedicated Logic
will be huge, 2 x 32 2MB. So, in order to reduce the 96 96 48
=
Registers
memory requirements for higher order bit multiplication, Total Memory
0 0 16
(l6x16, 32x32, etc.) lower order (SxS) multiplier can be Bits( Kb)
instantiated[17]. By this process the constraint of larger Transitions 3173 3084 5341
Speed(After
memory requirements can be overcome. 77.65 80.23 119.76
Pipelining)(MHz)
Power
Static Power(mW) I 46.20 46.19 46.17
Dynamic Power(mW) I 4.41 3.61 9.57
I/O Power(mW) 17.37 17.34 17.41

180,-----
160 H._-----
140H._--__I------
120H._--__I------
Total Combinational
100 H.--__I---II--- Functions

80H._--__I--__-- Dedicated logic Registers

6oH.--__I---II---
Total Memory Bits(Kb)
40
20

Array Urdhava Proposed


Multiplier Multiplier Multiplier

Fig. 4: RTL View of Proposed Multiplier (4x4)


Fig. Sa: Area Comparison for (8x8)

V. EX PERIMENTAL RESULTS
l 'r-----

From the Table 2 and Table 3, it is inferred that the 140 +---
proposed multiplier is best suited for higher order bit 13S+---
multiplication (i.e., more than SxS). Since in FPGA there is
130+-----
sufficient amount of on chip memory, which can be used to
store the squares of the numbers, the proposed multiplier 1:15+-----
will consume only fewer logic elements for its 12l>+----
implementation. CritMOil Itrfay Mlilt,plfer Urdb... Pf<lf!OSod
Multiplier lMuntl.i'er

TABLE 2' RESULTS FOR 8x8 MULTIPLIER


Criterion Array Urdhava Proposed Fig. 5b: Speed Comparison for (8x8)
Multiplier Multiplier Multiplier
Area
Total Combinational
163 149 114
Functions
Dedicated Logic
48 48 16
Registers
Total Memory
0 0 8
Bits( Kb)
Transitions 1557 1501 1782
Speed(After
137.46 142.67 129.52
Pipelining)(MHz)
Power
1" In!' I Conf. on Recent Advances in Information Technology I RAIT-2012 I

50 From the above Fig.S and Fig.6 it is clear that the


4S
40 proposed multiplier outperforms both the Array Multiplier
3S
30
as well as the Urdhava Multiplier in performance.
2S
20 . Arroy MuItIpI1...
1, VI. CONCLUSIONS
10 UrdhIv. Muttlpllor
5 p"",a.ed Mul1ipl'er
o
Thus the proposed multiplier provides higher
performance for higher order bit multiplication. In the
proposed multiplier for higher order bit multiplication i.e.
for 16x16 and more, the multiplier is realized by
instantiating the lower order bit multipliers like 8x8. This is
Fig. 5c: Power Comparison for (SxS) mainly due to memory constraints. Effective memory
implementation and deployment of memory compression
800 ,------ algorithms can yield even better results.
700 +-i_---_----
600 +-i_---_f--- REFERENCES
500 +-i_---_f--- Total Combinational
Functions
400 +-i_---_f--- [I] Swami Bharati Krshna Tirthaji, Vedic Mathematics. Delhi: Motilal
Dedicated Logic Registers
300 +--1.---.f----=.-- Banarsidass Publishers, 1965.
200 t-i_---_r---_.t--- [2] K.K.Parhi "VLSI Digital Signal Processiong Systems -Design and
Total Memory Bits(Kb}
Implementation" John Wiley & Sons,1999.
100
[3] Harpreet Singh Dhillon and Abhijit Mitra "A Digital Multiplier
Architecture using Urdhava Tiryakbhyam Sutra oj Vedic
Array Urdhav,3 Proposed Mathematics" IEEE Conference Proceedings,200S.
Multiplier Multiplier Multiplier
[4] Asmita Haveliya "A Novel Design ./i)r High Speed Multiplier .fi)r
Digital Signal Processing Applications (Ancient Indian Vedic
mathematics approach)" International Journal of Technology And
Fig. 6a: Area Comparison for (l6x16) Engineering System(IJTES):Jan - March 2011- Vo12 .Nol
[5] Raminder Preet Pal Singh, Parveen Kumar, Balwinder Singh
140 ,----- "Perfimnance Analysis of'32-Bit Array Multiplier with a Carry Save
120 +----- Adder and with a Carry-Look-Ahead Adder" International Journal of
Recent Trends in Engineering, Vol 2, No. 6, November 2009
100 +----- [6] Parth Mehta, Dhanashri Gawali "Conventional versus Vedic
80 +------ mathematical method Jor Hardware implementation oj a multiplier"
2009 International Conference on Advances in Computing, Control,
60 +------ Series!

and Telecommunication Technologies
40 +------ [7] Prabir Saha, Arindam Banerjee, Partha Bhattacharyya, Anup
20 +------ Dandapat ""High Speed ASIC Design of Complex Multiplier Using
Vedic Mathematics" Proceeding of the 2011 IEEE Students'
Criterion Array Multiplier Urdhava Proposed
Technology Symposium 14-16 January, 20II, liT Kharagpur
u t i pl i er
M l Multiplier [S] H. D. Tiwari, G. Gankhuyag, C. M. Kim, and Y. B. Cho, "Multiplier
design based on ancient Indian Vedic Mathematics," in Proceedings
IEEE International SoC Design Cotiference, Busan, Nov. 24-25,
Fig. 6b: Speed Comparison for (l6x16) 200S,pp.65-6S
[9] H. Thapliyal, M. B. Srinivas and H. R. Arabnia , "Design And
50
Analysis oj a VLSI Based High PerJormance Low Power Parallel
45
quare Architecture", in Proc. Int. Conf. Algo. Math. Compo Sc., Las
Vegas, June 2005, pp. 72-76.
40
[10] P. D. Chidgupkar and M. T. Karad, "The Implementation oj Vedic
35
Algorithms in Digital Signal Processing", Global J. oj /c'ngg. /c'du.,
30 vol. 8, no.2, pp. 153-158, 2004.
Series!
25 [11] H. Thapliyal and M. B. Srinivas, "High Speed Efficient N x N Bit
Series2
20 Parallel Hierarchical Overlay Multiplier Architecture Based on
Series3
15 Ancient Indian Vedic Mathematics", EnJormatika Trans., vol. 2, pp.
10 225-22S, Dec. 2004.
[12] Wakerly, J.F. "Digital Design-Principles and Practices", 2006, 4th
Edition. Pearson Prentice Hall.
[13] J.Bhasker, "Verilog HDL Primer" BS P Publishers, 2003.
[14] Himanshu Thapliyal, S. Kotiyal and M.B. Srinivas, "Design and
Analysis (Jf a Novel Parallel Square and Cube Architecture Based
Fig. 6c: Power Comparison for (16xI6) on Ancient Indian Vedic Mathematics", Proceedings on 48th II/c'/c'/c'
International Midwest Symposium on Circuits and Systems
(MWSCAS 2005),
1" In!' I Conf. on Recent Advances in Information Technology I RAIT-2012 I

[15] Himanshu Thapliyal and Hamid R. Arabania, "A Time-


Area Power Efficient Multiplier and Square
Architecture Based on Ancient Indian Vedic
Mathematics" , proceedings on VLSI04, Las Vegas, U. S. A,
June 2004
[16J M. Ramalatha, K. Deena Dayalan, P. Dharani, S. Deborah Priya,
"High Speed Jc'nergy ElJzcient AL U Design using Vedic
Multiplication Techniques", ACTEA 2009, Zouk Mosbeh, Lebanon
[17] Pau1.B.C., Fujita.F.S., Okajima.M., "ROM Based Logic (RBL)
Design: A Low-Power 16 Bit Multiplier', IEEE Jouna! of Solid State
Circuits, Volume 44, Issue II, Pg. 2935-2942, Nov 2009.

You might also like