You are on page 1of 4

A HIGH PERFORMANCE BINARY TO BCD CONVERTER FOR DECIMAL

MULTIPLICATION
Jairaj Bhattacharya, Aman Gupta, Anshul Singh
Centre for VLSI and Embedded System Technologies (CVEST),
International Institute of Information Technology (IIIT)
Gachibowli, Hyderabad, 500032, India
Email: {jairaj, aman, anshul_singh}@students.iiit.ac.in

ABSTRACT
Decimal data processing applications have grown exponentially
in recent years thereby increasing the need to have hardware support
for decimal arithmetic. Binary to BCD conversion forms the basic
building block of decimal digit multipliers. This paper presents novel
high speed low power architecture for fixed bit binary to BCD
conversion which is at least 28% better in terms of power-delay
product than the existing designs.

1. INTRODUCTION
Decimal Arithmetic is receiving significant attention in
commercial business and internet based applications, providing
hardware support in this direction is henceforth necessary.
Recognizing the need for decimal arithmetic has led to specifications
for decimal floating point arithmetic to be added in the draft revision
of the IEEE-P754 standard [1]. Decimal arithmetic operations are
generally slow and complex, its hardware occupies more area. They
are typically implemented using iterative approaches or lookup table
based reduction schemes. This has led to the motivation behind
improving BCD architectures, to enable faster and compact
arithmetic.
In this paper we introduce a new architecture for binary to BCD
conversion of partial products which forms the core of decimal
multiplication algorithms such as [7] and [8]. The speedup, area
reduction and power consumption of the proposed architecture is
analyzed and comparisons with recent architectures is provided. The
current state of art conversion scheme [7] is studied and irregularities
in the implementation of their converter have been discussed. The
results show that the proposed design brings considerable
improvement in terms of latency, area and power consumption.
The rest of this paper is organized as follows: Section 2 gives an
overview on general BCD conversion and its need. Section 3
discusses prior work in the field of decimal multiplication and binary
to BCD conversion. It also analyzes two recent architectures and
examines the misinterpretation of the conversion algorithm in one of
the architectures. Section 4 explains the proposed algorithm and
Section 5 discusses the architecture in detail. A comparison of results
with those of recent contributions, based on timing, area and power is
presented in Section 6 and finally Section 7 concludes the paper.

2. BACKGROUND
BCD is a decimal representation of a number directly coded in
binary, digit by digit. For example, the number (9321)10 = (1001
0011 0010 0001) BCD. It can be seen that each digit of the decimal
number is coded in binary and then concatenated to form the BCD
representation of the decimal number. As any BCD digit lies
between [0, 9] or [0000, 1001], multiplying two BCD digits can
result in numbers between [0, 81]. All the possible combinations can

978-1-4244-5271-2/10/$26.00 2010 IEEE

be represented in a 7-bit binary number when multiplied, (81)10 or


(1010001)2 being the highest. In BCD multiplication where 4-bit
binary multipliers are used to multiply two BCD numbers X and Y
with digits, Xi and Yj, respectively, a partial product Pij is generated
of the form (p6p5p4p3p2p1p0)2. Conversion of Pij from binary to a
BCD number BiCj where  (Xi, Yj) = 10Bi + Cj needs fast and
efficient BCD converters. The binary to BCD conversion is generally
inefficient if the binary number is very large. Hence the conversion
can be done in parallel for every partial product after each BCD digit
is multiplied as shown in Figure 1 and the resulting BCD numbers
after conversion can be added using BCD adders. Another alternative
would be to compress the partial products of all binary terms in
parallel and then convert them to BCD as done in [8].

Figure 1: Illustration of BCD conversion in BCD

3. PRIOR WORK
Extensive work has already been done in the field of decimal
multiplication and binary to BCD conversion. Schulte et al. [2, 3]
have proposed different architectures for BCD multiplication.
Schwarz and Schulte [2] have proposed techniques for efficient
partial product generation using a recoding scheme for signed
magnitude partial products. Erle and Schulte [3] have proposed data
independent optimization techniques which help in reducing the
average latency of implementing arithmetic. Approaches of serial,
semi parallel and fully parallel decimal multiplications are available
in the literature [4, 9]. Owing to the particularities of using a radix 10
based multiplication scheme, one needs to generate BCD partial
products to be followed by BCD multi-operand addition. Partial
product generation and reduction techniques have been explored in
[9]. Lot of general architectures [10, 11, 12] for n-bit conversions are
available in the literature but usage of such converters for fixed bit
binary to BCD conversion can be expensive in terms of delay, power
and area. Recently, a series of BCD multipliers have been proposed
[6, 7, 8] which use fixed bit binary to BCD conversion. As shown in
Figure 1 the underlying principle to convert the 7-bit partial product
Pij = p6p5p4p3p2p1p0 to its corresponding 8-bit BCD number BiCj

315

where Cj represents the lower BCD digit and Bi represents the higher
BCD digit, has been done by various novel conversion
architectures [7, 8].
Sreehari et al. [8] reduces the area as well as brings speedup in
his design by proposing better correction logic. Underlying principle
behind the conversion architecture is the Shift -Add3 algorithm. The
correction block uses multiplexers to add 3 and has a critical path of
five correction blocks resulting in a delay of 10 multiplexers.
Another conversion algorithm used in state of the art BCD multiplier
[9] along with [6, 7] is shown in Figure 2(a).

Figure 2: (a) BCD conversion principle by [7]

The authors in [7] have misinterpreted this principle of


conversion and have made an incorrect implementation of the design
as shown in Figure 3. In case of the above example (63)10, by
following their implementation b3:0 is calculated to be (6)10 and c3:0 is
calculated to be (7)10 resulting in a wrong value (67)10. Some of the
errors in their implementation can be seen in Table 1. The results
have been tested using Verilog HDL simulations.
Table 1: The illustration of some errors (marked in bold) in the
architecture proposed in [7].
Binary Number to be
BCD value from
Actual BCD value
converted
[7]s circuit
9 7 = (63)10
B = (0110)2
B = (0110)2
(0111111)2
C = (0011)2
C = (0111)2
8 4 = (32)10
B = (0011)2
B = (0001)2
C = (0010)2
(0100000)2
C = (0010)2
3 9 = (27)10
B = (0010)2
B = (0010)2
C = (0111)2
(0011011)2
C = (0001)2
B = (0010)2
6 4 = (24)10
B = (0010)2
(0011000)2
C = (0100)2
C = (0110)2
As the proposed implementation in [7] is misinterpreted and
logically incorrect, one straightforward architecture based on the
underlying principle is shown in Figure 4. This architecture has been
logically verified in Verilog HDL.

Figure 2: (b) An example for the case of (63)10= (0111111)2


The first row (marked in bold) in Figure 2 shows corresponding
BCD weights as per the binary number. Weights of p0, p1, p2 and p3
remain the same but that of p4, p5 and p6 have been decomposed into
(10, 4, 2) , (20, 10, 2) and (40, 20, 4) as the new weights which
earlier were 16, 32 and 64 as explained in [7].
According to their principle, the four BCD digits in the right
column formed by the partial product bits, as shown in Figure 2(a),
have to be added by BCD adders to derive the BCD digit
C (c3c2c1c0)2 and decimal carries obtained would be added along with
the two BCD digits in the left column to derive BCD digit
B (b3b2b1b0)2 .
In Figure 2(b) we have taken (63)10 as an example where p6=0
and p5p4p3p2p1p0 = 111111. The numbers are arranged according to
the above principle into 4 BCD numbers {(0111)2=(7)10,
(0110)2=(6)10, (0010)2=(2)10 and (1000)2=(8)10 } in the right column
and 2 BCD numbers {(0011)2=(3)10, (0001)2=(1)10} in the left
column. Column wise decimal addition leads to {7 + 6 + 2 +8} =
{23} in the right column, hence c3:0= (0011)2= (3)10 and a carry of
(2)10 is added to the next significant digit. Similar addition in the left
column along with the carry (2)10 from the previous stage leads to
(6)10..
{3 +1+2}
=
{6},
hence
b3:0= (0110)2=

Figure 3: Binary to BCD conversion logic in [7]

Figure 4: Corrected Architecture


The critical gate delay (marked by lines in bold) in Figure 4 is
approximately {3 Delay of BCD adders + Delay of Half Adder}.
The implementation is slower when compared to [8] even after using
high speed BCD adders [5]. The next few sections explain our
proposed architecture and the conversion principle followed by
testing results for comparison.

4. PROPOSED ALGORITHM
The main objective of the proposed algorithm is to perform
highly efficient fixed bit binary to BCD conversion in terms of delay,
power and area. As mentioned earlier, most of the recently proposed
multipliers use 7-bit binary to 8-bit/2-digit BCD converters. The
proposed algorithm has been specifically designed for such
converters. The following paragraphs explain the proposed
algorithm.
Let p6p5p4p3p2p1p0 be the seven binary bits to be converted into
two BCD digits. To convert these binary bits into 2-digit BCD we
split the binary number into two parts, the first part contains the
lower significant bits (LSBs) p3, p2, p1 and p0 while the second part
contains the remaining higher significant bits (HSBs) p6, p5 and p4.
The lower significant part (LSBs) has the same weight as that of
a BCD digit and can be directly used to represent a BCD digit. The
only exception arrives when p3p2p1p0 exceeds (1001)2 or (9)10. To

316

convert the LSBs into a valid BCD number we check whether


p3p2p1p0 exceeds (1001)2, and if it does, we add (0110)2 to it. This
procedure of adding (0110)2 whenever the number exceeds (1001)2 is
called correction in BCD arithmetic. The carry obtained from this
procedure is added to the higher significant BCD digit calculated
from the HSBs of the original binary number. The HSBs not only
contribute to the higher significant BCD digit but also to the lower
significant BCD digit. These contributions of HSBs towards the
lower significant digit are added after BCD correction. The resulting
sum is then checked for the case (1001)2 and correction is done if
needed to obtain the final lower significant BCD digit. A possible
carry from the above operation is added to the higher significant digit
resulting in the final higher significant BCD digit.
When two BCD digits are multiplied only six combinations of p6,
p5 and p4 (HSBs) are possible, which are 000, 001, 010, 011, 100 and
101. Each of these combinations have a different contribution
towards the lower and higher significant BCD digits. This
contribution can be easily calculated by evaluating the weights of the
patterns which are p6x27 + p5x26+ p4x25. Contribution of each of
these patterns towards the lower and higher BCD digits is shown in
Table 2.
Table 2: Contribution of HSBs
BCD Weight
Higher Significant
Higher Significant Lower Significant
Bits (HSBs)
BCD Digit
BCD Digit
000
0000
0000
001
0001
0110
010
0011
0010
011
0100
1000
100
0110
0100
101
1000
0000

while p3, p2, p1 and p0 are the LSBs. z0 is same as p0 and hence no
operation is done on p0. {p3, p2 and p1} are used to check whether the
LSBs are greater than (1001)2 or not using equation (1) and are sent
to the BCD Correction block.
- (1)
C1 = p3. (p2 + p1)
Whenever C1 is high, BCD Correction block adds 011 to the
input bits. Figure 7 shows the implementation of BCD Correction
block.

Figure 6: Proposed Architecture

Figure 7: BCD Correction

Figure 8: 2-bit One Adder

In parallel, HSBs along with p3 are fed to a simple logic block


known as Contribution Generator which produces the higher
significant BCD digits. The logic implemented by the Contribution
Generator is as follows
-(2)
t 3 = p 6p 4
-(3)
t2 = p5(p4+p3) + p6p4
-(4)
t1 = (p5+p6)p4
-(5)
t0 = p6p5p4 + p5p4
Figure 5: Proposed Algorithm for BCD conversion
Figure 5 shows an example of the algorithm for number (111111)2 or
(63)10 or (0110 0011)BCD.

5. PROPOSED ARCHITECTURE
This section describes the proposed architecture in detail.
Maximum utilization of the fact that only limited and small numbers
of outcomes are possible for conversion has been made in designing
the architecture to reduce delay, power and area. Figure 6 shows the
proposed architecture.
As shown in Figure 6, p6p5p4p3p2p1p0 are the binary bits to be
converted into BCD bits z7z6z5z4 z3z2z1z0. p6, p5 and p4 are the HSBs

C1 is the carry from the lower significant digit, so it is added to


the higher significant digit t3t2t1t0. It is found that very few cases lead
to the propagation of the incoming carry from t1 to t2. Hence, we take
advantage of this situation and implement {t3, t2} in combinational
logic thus removing the need to add C1 to these terms, thus saving
hardware and complexity. 2-bit One Adder, as shown in Figure 8, is
used to add C1 to t0 and t1. There is a possibility of a carry
generation, when the contributions of HSBs are added to the
corrected LSBs (a3, a2 and a1). This carry is calculated beforehand by
a Carry Generator block using C1 and input bits p6 to p1. The logic
implemented by Carry Generator is given by the equation below
C2 = C1 (p4 (p3+p2) +p3p5) + p6p3 + p4p3p1 -(6)
C2 is also added to result of the first 2-bit One Adder using
another 2-bit One Adder and the final higher significant digit is

317

obtained. {t3 and t2} are equal to z7 and z6 respectively and are
directly available from the Contribution Generator block.
Contribution of HSBs towards lower significant BCD digit is
fixed and unique and is known once HSBs are known. We have
implemented four distinct adder units which add only specified
values to the inputs in parallel according to the contributions in
Table 2. The different adder blocks, +1, +2, +3 and +4 (shown in
Figure 9) add 001, 010, 011 and 100 to the input bits respectively.
Adder blocks take the corrected LSBs (a3, a2, a1) as inputs and
add specific numbers to them. The appropriate result is then
obtained through a multiplexer whose selection bits are p6, p5 and p4
(HSBs). The result from the multiplexer is then fed to BCD
Correction block which takes C2 as input to decide whether
correction has to be done or not. The results obtained from the BCD
Correction block are z3, z2 and z1 which, along with z0, form the final
lower significant BCD digit.

It is clear from Table 3 that the proposed design has an


improvement of 15.135% in terms of delay and is 15.769% better in
terms of power giving it a 28.52% improvement in terms of powerdelay product over the most efficient architecture in literature [8].
The proposed design also shows a significant improvement of
10.68% in terms of Area over [8].

7. CONCLUSION
This paper presented a novel architecture for binary to BCD
conversion used in decimal multiplication. The proposed converter
is flexible and can be plugged into any homogeneous multiplication
architectures to achieve better performance irrespective of the
method used to generate binary partial products. The proposed
architecture shows, on an average, an improvement of 28% in terms
of power-delay product over the most efficient architecture in the
literature.
NOTE: Owing to lack of space the design proposed in [8] is not
shown in the paper.

8. REFERENCES
[1]

(a) +1 block

(b) +2 block

(c) +3 block
(d) +4 block
Figure 9: +1, +2, +3, +4 adder blocks

6. SIMULATION AND RESULTS


All the architectures have been described using Verilog HDL.
Delay, power and area values for the designs are obtained by
synthesizing the Verilog HDL description in Synopsys Design
Compiler (includes Power Compiler) using Oklahoma State
University 180 nm standard cell library is used in the simulations.
The inputs are given at a clock frequency of 330 MHz. The
architectures are simulated for all the possible conversion cases and
the comparison of results are shown in Table 3.
Table 3: Average Delay, Power, Power-Delay Product and Area
comparison of various architectures
PowerArea
Delay Power
delay
Architecture
(m2)
( ns )
( nW )
Product
( 10-18 )
Proposed
1.57
557.09
874.66
1862
Srihari [8]
1.85
661.39
1223.57
2087
Corrected
Architecture
1.99
835.17 1661.98
3469
in Figure 4

IEEE standard for floating-point arithmetic. IEEE SC, Oct.


2006 [Online]. Available:
http://754r.ucbtest.org/drafts/archive/2006-10-04.pdf
[2] M. A. Erle, E. M. Schwarz and M. J. Schulte, "Decimal
multiplication with efficient partial product generation," 17th
IEEE Symposium on Computer Arithmetic, 2005, pp. 21-28.
[3] M. A. Erle and M. J. Schulte, "Decimal multiplication via
carry-save addition," Proceedings. IEEE International
Conference on Application-Specific Systems, Architectures,
and Processors, 2003, pp. 348 - 358.
[4] A. Vazquez, E. Antelo and P. Montuschi, "A New Family of
High-Performance Parallel Decimal Multipliers" 18th IEEE
Symposium on Computer Arithmetic, 2007, pp. 195-204.
[5] Sreehari Veeramachaneni, M. Keerthi Krishna , L. Avinesh, P
Sreekanth Reddy and M.B. Srinivas, Novel High-Speed 16Digit BCD Adders Conforming to IEEE 754r Format, IEEE
Computer Society Annual Symposium on VLSI, 2007, pp. 343350.
[6] R. K. James, T. K. Shahana, K. P. Jacob and S. Sasi, Decimal
multiplication using compact BCD multiplier International
Conference on Electronic Design, 2008, pp.1 6.
[7] G. Jaberipur and A. Kaivani, Binary-coded decimal digit
multipliers IET Computers and Digital Techniques, Volume
1, Issue 4, July 2007, pp. 377 - 381.
[8] Srihari Veeramachaneni and M. B. Srinivas, Novel HighSpeed Architecture for 32-Bit Binary Coded Decimal (BCD)
Multiplier International Symposium on Communications and
Information Technologies, 2008, pp. 543 546.
[9] G. Jaberipur and A. Kaivani, Improving the Speed of Parallel
Decimal Multiplication IEEE Transactions on Computers,
vol. 58, issue 11, Nov. 2009, pp. 1539 - 1552.
[10] M. Schmookler, High-speed binary-to-decimal conversion
IEEE Transactions on Computers, 1968, vol. 17, issue 5, pp.
506508.
[11] V. T. Rhyne, Serial binary-to-decimal and decimal-to-binary
conversion IEEE Transactions on Computers, 1970, vol. 19,
Issue 9, pp. 808812.
[12] B. Arazi and D. Naccache, Binary-to-decimal conversion
based on the 28 2 1 by 5 Electronics Letter, 1992, vol. 28,
issue 23, pp. 21512152.

318

You might also like