Professional Documents
Culture Documents
JULY 2001
23
n1
x = logb |X |
1. The organization of a (n + 1)-bit LNS digital word.
24
LNS Basics
The LNS maps a linear number X to a triplet as follows
X LNS
( z , s , x = log b | X | ),
(2)
(3)
It is noted that mappings (2) and (3) are required in the case that
an LNS processor receives as input and transmits as output linear data in digital format. Since all arithmetic operations can be
performed in the logarithmic domain, only an initial conversion
is imposed; therefore, as the amount of processing implemented
in LNS grows, the conversion overhead contribution to power
dissipation becomes negligible since it remains constant.
In stand-alone DSP systems a different approach is possible.
The LNS forward and inverse mapping overhead can be mitiCIRCUITS & DEVICES
JULY 2001
JULY 2001
RNS Basics
The RNS maps an integer X
to a N-tuple of residues xi ,
{ x1 , x2 ,K , x N },
X RNS
(4)
where
xi = X
mi
(5)
Z = XY = b x b = b
Logarithmic Operation
x +y
Z = X /Y =bx / b =b
Z = m X = m bx =b
z = log b Z = x + y
x y
z=x y
x
m
z = x / m , m , integer
Z = X m = (b x ) m
z = mx , m , integer
y x
y x
Z = X + Y = b x + b = b x (1 + b
Z = X Y = b x b = b x (1 b
z = x + log b (1 + b
y x
z = x + log b (1 b
y x
)
)
p01
0.5
0.4
Twos Complement
0.3
0.2
b = 1.5
b = 1.3
b = 1.1
0.1
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
bits
mi
(6)
where i = 1,2,K , M , and the symbol o stands for addition, subtraction, or multiplication. Every integer in the range
N
0 X < i = 1 mi has a unique RNS representation. Inverse
conversion is accomplished by means of the Chinese Remainder
Theorem (CRT) or mixed-radix conversion [16].
The basic architecture of an RNS processor in comparison to
a binary counterpart is depicted in Fig. 3. Figure 3 shows that
the word length n of the binary counterpart is partitioned into M
subwords, the residues, which can be processed independently
and are of word length significantly smaller than n. The ith residue channel performs arithmetic modulo mi . Conceptually, RNS
introduces a subword-level parallelism into an algorithm; therefore, its hardware implementation can enjoy the low-power benefits of parallel architectures [2].
n/M
n/M
n/M
n/M
mod m2
n/M
Inverse Converter
Forward Converter
mod m1
n/M
mod mM
(a)
(b)
(7)
3. Structure of a binary architecture (a) and the corresponding RNS processor (b).
26
JULY 2001
= ( ac bd ) + j( bc + da ),
(8)
where j is the imaginary unit (i.e., 1), and a, b, c, and d are real
numbers. Parhami [5] shows a different technique to reduce the
number of multiplications to three by performing five additions
or subtractions with an extra computational step. According to
this technique, the complex product is computed as
p = [c( a + b) b( c + d )] + j[c( a + b) a( c d )],
Equations (16) and (17) show that a complex multiplication requires only two residue multiplications instead of four multiplications, an addition, and a subtraction. Therefore, by paying an
initial cost for conversion, a significant computational complexity reduction can be achieved by the QRNS mapping, which is directly translated to power savings.
(9)
TC
2500
2000
RNS
SM
1500
1000
( qi , qi* ),
( ai , bi )
QRNS
(10)
500
where qi and qi* are the quadratic images of ai and bi , respectively. The quadratic images are obtained as
qi = ai + jbi
qi* = ai jbi
mi
mi
(11)
*
i
mi
= 0.
2500
(13)
RNS
TC
SM
1500
1000
mi
,
(14)
500
20
(15)
The quadratic mapping is of practical importance because it alleviates the dependency of the real and imaginary parts of a complex product from both the real and imaginary parts of both the
operands, as shown by Eq. (8). In other words, it eliminates the
cross-product terms. Therefore, by exploiting the QRNS, the
complex product {( qpi , qpi* )| i = 1,2,K , N } of two QRNS-encoded
complex
n u m be r s,
and
{( qai , qai* )| i = 1,2,K , N }
*
{( qbi , qbi )| i = 1,2,K , N }, can be evaluated as the direct product
of the corresponding quadratic images; i.e.,
(16)
mi
mi
JULY 2001
(17)
40
60
80
100
2500
2000
RNS
1500
TC
SM
1000
500
20
100
4. Number of low-to-high transitions, assuming strongly anti-correlated ( = 0.99) Gaussian data, for twos-complement, RNS, and
sign-magnitude number systems for 100 Monte Carlo runs.
80
2000
bi = 21 j 1 ( qi qi* )
mi
60
(12)
ai = 2 ( qi + q )
j 2 + 1
40
20
40
60
80
100
25000
TC
SM
20000
15000
QRNS
10000
5000
10
7. Number of low-to-high transitions for complex-number multiplication, assuming uncorrelated ( = 0) Gaussian operands.
compared to QRNS operations that cover a dynamic range in excess of 20 bits. Ten Monte-Carlo runs, each of 1000 samples,
compose the experiment, which is repeated for uncorrelated
( = 0), correlated ( = 0.99), and anti-correlated ( = 0.99)
Gaussian data; results are shown in Figs. 7-9, respectively. Even
in the case that QRNS provides significantly larger dynamic
range, it can be seen that the bit activity is reduced approximately two times.
DAmora et al. have compared the implementation of a direct-form complex FIR filter with its QRNS counterpart [18].
They report that, for a particular throughput rate, the
QRNS-based implementation requires half the area and a third
of the power dissipation of the conventional implementation.
The conventional implementation is assumed to utilize the
four-multiplication scheme for complex-number multiplication, while the QRNS implementation exploits the index transform.
The index transform reduces a modulo-m multiplication to a
modulo-( m 1) addition, for m prime, resembling the reduction
of multiplication to addition by LNS. An integer root can be determined, such that the residues r [1, m) can be written as
r = n
TC
SM
20000
15000
QRNS
10000
(18)
= n1 n2
= n1 + n2
= n,
(19)
where
5000
n = n1 + n2
2
25000
TC
20000
SM
15000
QRNS
10000
m1
(20)
10
8. Number of low-to-high transitions for complex-number multiplication, assuming strongly correlated ( = 0.99 ) Gaussian operands.
Hence, modulo multiplication can be performed as residue addition, preceded and followed by a mapping of the operands to
their indices and of the result to the residue. These mappings are
commonly implemented as table look-ups [16].
The QRNS can exploit the index transform because the utilized moduli need to be prime. Hence, in the case of DSP architectures such as FIR filters, the coefficients can be directly stored
in index-residue form, thus the strength of each multiplication
can be further reduced, since the determination of the corresponding indices is not repeated for every residue multiplication. The significant power dissipation savings reported by
DAmora et al. assume the utilization of the index transform for
residue multiplication [18].
Conclusions
5000
10
9. Number of low-to-high transitions for complex-number multiplication, assuming strongly anti-correlated ( = 0.99) Gaussian
28
Recent advances in computer arithmetic offer interesting alternative solutions for low-power design. Depending on an assortment of factors that need to be considered, such as signal
statistics, computational load, type of arithmetic operations, accuracy and dynamic range, it is worth evaluating the LNS or the
CIRCUITS & DEVICES
JULY 2001
[2] J.M. Rabaey and M. Pedram, Low Power Design Methodologies. Boston,
MA: Kluwer, 1996.
Thanos Stouraitis received a B.S. in physics and an M.S. in electronic automation from the University of Athens, Greece, in
1979 and 1981, respectively; an M.S. in electrical engineering
from the University of Cincinnati in 1983; and the Ph.D. degree
from the University of Florida in 1986. He was awarded the Outstanding Ph.D. Dissertation award of the University of Florida
and a Certificate of Appreciation by the IEEE Circuits and Systems Society in 1997. He is a professor of electrical and computer engineering at the University of Patras, Greece. He has
served on the faculty of the University of Florida and the Ohio
State University. He has published two books, several book chapters, and approximately 30 journal and 70 conference papers in
the areas of computer architecture, computer arithmetic, VLSI
signal and image processing, and low-power processing. He
serves on the IEEE Circuits and Systems Societys technical
committee on VLSI Systems and Applications and the digital signal processing and the multimedia systems committees (e-mail:
thanos@ee. Upatras.gr).
[6] P.E. Landman and J.M. Rabaey, Architectural power analysis: The dual bit
type method, IEEE Trans. VLSI Syst., vol. 3, pp. 173-187, June 1995.
Vassilis Paliouras received the Diploma in electrical engineering in 1992 and the Ph.D. degree in electrical engineering in
1999, from the Electrical and Computer Engineering Department, University of Patras, Greece. He works as a researcher at
the VLSI Design Laboratory, ECE Dept., while teaching microprocessor-based system design at the Computer Engineering
and Informatics Department, both at the University of Patras.
His research interests include computer arithmetic algorithms
and circuits, microprocessor architecture, and VLSI signal processing, areas where he has published more than 30 conference
and journal articles. Dr. Paliouras received the MEDCHIP VLSI
Design Award in 1997. He is also the recipient of the 2000 IEEE
Circuits and Systems Society Guillemin-Cauer Award. He is a
Member of ACM, SIAM, and the Technical Chamber of Greece.
References
[1] A.P. Chandrakasan, S. Sheng, and R. Brodersen, Low-power CMOS digital design, IEEE J. Solid-State Circuits, vol. 27, pp. 473-484, Apr. 1992.
JULY 2001
[3] K.K. Parhi, Low-energy CSMT carry generators and binary adders, IEEE
Trans. VLSI Syst., vol. 7, pp. 450-462, Dec. 1999.
[4] T.K. Callaway and E.E. Swartzlander, Jr., Power-delay characteristics of
CMOS multipliers, in Proc. 13th Symp. Computer Arithmetic (ARITH13),
Asilomar, USA, July 1997, pp. 26-32.
[5] B. Parhami, Computer ArithmeticAlgorithms and Hardware Designs.
New York: Oxford Univ. Press, 2000.
[7] T.K. Callaway and E.E. Swartzlander, Low power arithmetic components, in Low Power Design Methodologies. J.M. Rabaey and M. Pedram,
Eds. Boston, MA: Kluwer, 1996.
[8] E. Swartzlander and A. Alexopoulos, The sign/logarithm number system, IEEE Trans. Computers, vol. 24, pp. 1238-1242, Dec. 1975.
[9] R.E. Morley, Jr., G.L. Engel, T.J. Sullivan, and S.M. Natarajan, VLSI based
design of a battery-operated digital hearing aid, in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing, pp. 2512-2515, 1988.
[10] J.R. Sacha and M.J. Irwin, The logarithmic number system for strength
reduction in adaptive filtering, in Proc. Int. Symp. Low-Power Electronics
and Design (ISLPED98), Monterey, CA, 1998, pp. 256-261.
[11] V. Paliouras and T. Stouraitis, Signal activity and power consumption
reduction using the Logarithmic Number System, in Proc. 2001 IEEE Int.
Symp. Circuits and Systems (ISCAS), vol. 2, pp. II-653-II-656, 2001.
[12] V. Paliouras and T. Stouraitis, Low-power properties of the Logarithmic
Number System, in Proc. 15th Symp. Computer Arithmetic (ARITH15),
2001.
[13] N. Szab and R. Tanaka, Residue Arithmetic and its Applications to Computer Technology. New York: McGraw-Hill, 1967.
[14] W.L. Freking and K.K. Parhi, Low-power FIR digital filters using residue
arithmetic, in Proc. 31st Asilomar Conference on Signals, Systems, and
Computers, vol. 1, pp. 739-743, 1997.
[15] W.A. Chren, Jr., One-hot residue coding for low delay-power product
CMOS design, IEEE Trans. Circuits Syst. II, vol. 45, pp. 303-313, March
1998.
[16] M.A. Soderstrand, W.K. Jenkins, G.A. Jullien, and F.J. Taylor, Residue
Number Arithmetic: Modern Applications in Digital Signal Processing.
Piscataway, NJ: IEEE Press, 1986.
[17] M.K. Ibrahim, Novel digital filter implementations using hybrid
RNS-binary arithmetic, Signal Processing, vol. 40, no. 2-3, pp. 287-294,
1994.
[18] A. DAmora, A. Nannarelli, M. Re, and G.C. Cardarilli, Reducing power
dissipation in complex digital filters by using the Quadratic Residue Number System, in Proc. 34th Asilomar Conference on Signals, Systems, and
Computers, 2000.
CD
29