Professional Documents
Culture Documents
MULTIPLICATION
Jairaj Bhattacharya, Aman Gupta, Anshul Singh
Centre for VLSI and Embedded System Technologies (CVEST),
International Institute of Information Technology (IIIT)
Gachibowli, Hyderabad, 500032, India
Email: {jairaj, aman, anshul_singh}@students.iiit.ac.in
ABSTRACT
Decimal data processing applications have grown exponentially
in recent years thereby increasing the need to have hardware support
for decimal arithmetic. Binary to BCD conversion forms the basic
building block of decimal digit multipliers. This paper presents novel
high speed low power architecture for fixed bit binary to BCD
conversion which is at least 28% better in terms of power-delay
product than the existing designs.
1. INTRODUCTION
Decimal Arithmetic is receiving significant attention in
commercial business and internet based applications, providing
hardware support in this direction is henceforth necessary.
Recognizing the need for decimal arithmetic has led to specifications
for decimal floating point arithmetic to be added in the draft revision
of the IEEE-P754 standard [1]. Decimal arithmetic operations are
generally slow and complex, its hardware occupies more area. They
are typically implemented using iterative approaches or lookup table
based reduction schemes. This has led to the motivation behind
improving BCD architectures, to enable faster and compact
arithmetic.
In this paper we introduce a new architecture for binary to BCD
conversion of partial products which forms the core of decimal
multiplication algorithms such as [7] and [8]. The speedup, area
reduction and power consumption of the proposed architecture is
analyzed and comparisons with recent architectures is provided. The
current state of art conversion scheme [7] is studied and irregularities
in the implementation of their converter have been discussed. The
results show that the proposed design brings considerable
improvement in terms of latency, area and power consumption.
The rest of this paper is organized as follows: Section 2 gives an
overview on general BCD conversion and its need. Section 3
discusses prior work in the field of decimal multiplication and binary
to BCD conversion. It also analyzes two recent architectures and
examines the misinterpretation of the conversion algorithm in one of
the architectures. Section 4 explains the proposed algorithm and
Section 5 discusses the architecture in detail. A comparison of results
with those of recent contributions, based on timing, area and power is
presented in Section 6 and finally Section 7 concludes the paper.
2. BACKGROUND
BCD is a decimal representation of a number directly coded in
binary, digit by digit. For example, the number (9321)10 = (1001
0011 0010 0001) BCD. It can be seen that each digit of the decimal
number is coded in binary and then concatenated to form the BCD
representation of the decimal number. As any BCD digit lies
between [0, 9] or [0000, 1001], multiplying two BCD digits can
result in numbers between [0, 81]. All the possible combinations can
3. PRIOR WORK
Extensive work has already been done in the field of decimal
multiplication and binary to BCD conversion. Schulte et al. [2, 3]
have proposed different architectures for BCD multiplication.
Schwarz and Schulte [2] have proposed techniques for efficient
partial product generation using a recoding scheme for signed
magnitude partial products. Erle and Schulte [3] have proposed data
independent optimization techniques which help in reducing the
average latency of implementing arithmetic. Approaches of serial,
semi parallel and fully parallel decimal multiplications are available
in the literature [4, 9]. Owing to the particularities of using a radix 10
based multiplication scheme, one needs to generate BCD partial
products to be followed by BCD multi-operand addition. Partial
product generation and reduction techniques have been explored in
[9]. Lot of general architectures [10, 11, 12] for n-bit conversions are
available in the literature but usage of such converters for fixed bit
binary to BCD conversion can be expensive in terms of delay, power
and area. Recently, a series of BCD multipliers have been proposed
[6, 7, 8] which use fixed bit binary to BCD conversion. As shown in
Figure 1 the underlying principle to convert the 7-bit partial product
Pij = p6p5p4p3p2p1p0 to its corresponding 8-bit BCD number BiCj
315
where Cj represents the lower BCD digit and Bi represents the higher
BCD digit, has been done by various novel conversion
architectures [7, 8].
Sreehari et al. [8] reduces the area as well as brings speedup in
his design by proposing better correction logic. Underlying principle
behind the conversion architecture is the Shift -Add3 algorithm. The
correction block uses multiplexers to add 3 and has a critical path of
five correction blocks resulting in a delay of 10 multiplexers.
Another conversion algorithm used in state of the art BCD multiplier
[9] along with [6, 7] is shown in Figure 2(a).
4. PROPOSED ALGORITHM
The main objective of the proposed algorithm is to perform
highly efficient fixed bit binary to BCD conversion in terms of delay,
power and area. As mentioned earlier, most of the recently proposed
multipliers use 7-bit binary to 8-bit/2-digit BCD converters. The
proposed algorithm has been specifically designed for such
converters. The following paragraphs explain the proposed
algorithm.
Let p6p5p4p3p2p1p0 be the seven binary bits to be converted into
two BCD digits. To convert these binary bits into 2-digit BCD we
split the binary number into two parts, the first part contains the
lower significant bits (LSBs) p3, p2, p1 and p0 while the second part
contains the remaining higher significant bits (HSBs) p6, p5 and p4.
The lower significant part (LSBs) has the same weight as that of
a BCD digit and can be directly used to represent a BCD digit. The
only exception arrives when p3p2p1p0 exceeds (1001)2 or (9)10. To
316
while p3, p2, p1 and p0 are the LSBs. z0 is same as p0 and hence no
operation is done on p0. {p3, p2 and p1} are used to check whether the
LSBs are greater than (1001)2 or not using equation (1) and are sent
to the BCD Correction block.
- (1)
C1 = p3. (p2 + p1)
Whenever C1 is high, BCD Correction block adds 011 to the
input bits. Figure 7 shows the implementation of BCD Correction
block.
5. PROPOSED ARCHITECTURE
This section describes the proposed architecture in detail.
Maximum utilization of the fact that only limited and small numbers
of outcomes are possible for conversion has been made in designing
the architecture to reduce delay, power and area. Figure 6 shows the
proposed architecture.
As shown in Figure 6, p6p5p4p3p2p1p0 are the binary bits to be
converted into BCD bits z7z6z5z4 z3z2z1z0. p6, p5 and p4 are the HSBs
317
obtained. {t3 and t2} are equal to z7 and z6 respectively and are
directly available from the Contribution Generator block.
Contribution of HSBs towards lower significant BCD digit is
fixed and unique and is known once HSBs are known. We have
implemented four distinct adder units which add only specified
values to the inputs in parallel according to the contributions in
Table 2. The different adder blocks, +1, +2, +3 and +4 (shown in
Figure 9) add 001, 010, 011 and 100 to the input bits respectively.
Adder blocks take the corrected LSBs (a3, a2, a1) as inputs and
add specific numbers to them. The appropriate result is then
obtained through a multiplexer whose selection bits are p6, p5 and p4
(HSBs). The result from the multiplexer is then fed to BCD
Correction block which takes C2 as input to decide whether
correction has to be done or not. The results obtained from the BCD
Correction block are z3, z2 and z1 which, along with z0, form the final
lower significant BCD digit.
7. CONCLUSION
This paper presented a novel architecture for binary to BCD
conversion used in decimal multiplication. The proposed converter
is flexible and can be plugged into any homogeneous multiplication
architectures to achieve better performance irrespective of the
method used to generate binary partial products. The proposed
architecture shows, on an average, an improvement of 28% in terms
of power-delay product over the most efficient architecture in the
literature.
NOTE: Owing to lack of space the design proposed in [8] is not
shown in the paper.
8. REFERENCES
[1]
(a) +1 block
(b) +2 block
(c) +3 block
(d) +4 block
Figure 9: +1, +2, +3, +4 adder blocks
318