You are on page 1of 5

IS J AA

International Journal of Systems , Algorithms & Applications

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the correction circuits
1

V. Chandrasekhar, 2D. Maruthi kumar M. Tech student, Department of ECE, SKTRMCE, Kondair, India. 2 M. Tech, Department of ECE, SRIT, Anantapur, India. e-mail: chandrasekhar14@gmail.com1, maruthi.srit@gmail.com2

Abstract - In this brief, we proposed improved areaefficient weighted modulo 2n+1 adder. This is achieved by modifying existing diminished-1 modulo 2n+1 adder to incorporate simple correction schemes. Our proposed adders can produce modulo sums within the range {0,2n}, which is more than the range {0,2n1} produced by existing diminished-1 modulo 2n+1 adders. We have implemented the proposed adders using 0.13-m CMOS technology, and the area required for our adders is lesser than previously reported weighted modulo 2n+1 adders with the same delay constraints. Key-words: modulo 2n+1 adder, residue number system (RNS), VLSI design. I. INTRODUCTION The residue number system (RNS) [1] has been employed for efficient parallel carry-free arithmetic computations (addition, subtraction, and multiplication) in DSP applications as the computations for each residue channel can independently be done without carry propagation. Since RNS based computations can achieve significant speedup over the binary-system-based computation, they are widely used in DSP processors, FIR filters, and communication components [2][4]. Arithmetic modulo 2n + 1 computation is one of the most common RNS operations that are used in pseudorandom number generation and cryptography. The modulo 2 n + 1addition is the most crucial step among the commonly used moduli sets, such as {2n 1, 2n, 2n + 1}, {2n 1, 2n, 2n + 1, 22n + 1}, and {2n 1, 2n, 2n + 1, 2n+1 + 1}. There are many previously reported methods to speed up the modulo 2 n + 1 addition. Depending on the input/output data representations, these methods can be classified into two categories, namely, diminished-1 [5][7] and weighted [10], [11], respectively. In the diminished-1 representation, each input and output operand is decreased by 1 compared with its weighted representation. Therefore, only n-bit operands are needed in diminished-1 modulo 2n + 1 addition, leading to smaller and faster components. However, this incurs an overhead due to the translators from/tothe binary weighted system. On the other hand, the weighted-1 representation uses (n+1)-bit operands for computations, avoiding the overhead of translators, but requires larger area compared with the diminished-1 representations. The general operations in modulo 2n + 1 addition were discussed in , including diminished-1 and weighted modulo addition. In [6] and [7], the authors proposed efficient parallel-prefix adders for diminished-1 modulo 2n+1

addition. To improve the areatime and timepower products, the circular carry selection scheme was used to efficiently select the correct carry-in signals for final modulo addition [9]. The aforementioned methods all deal with diminished-1 modulo addition. However, the hardware for decreasing/increasing the inputs/outputs by 1 is omitted in the literature. In addition, the value zero is not allowed in diminished-1modulo 2n+1 addition, and hence, the zerodetection circuit is required to avoid incorrect computation. This leads to increased hardware cost, which was not considered in the designs proposed in [6][9]. In [10], the authors proposed a unified approach for weighted and diminished-1 modulo 2n+1 addition. This approach is based on making the modulo 2 n+1addition of two (n+1)-bit input numbers A and B congruent to Y+U+1, where Y and U are two n-bit numbers. Thus, any dimished-1 adder can be used to perform weighted modulo 2 n+1 addition of Y and U. In [11], the authors first used the translators to decrease the sum of two n-bit inputs A and B by 1 and then performed the weighted modulo 2n+1 addition using diminished-1 adders. It should be noted that, for the architecture in [11], the ranges of two inputs A and B are less than that proposed in [10] (i.e., {0, 2n 1} versus {0, 2n }). In this brief, we propose improved area-efficient weighted modulo 2n+1 adder design using diminished-1 adders with simple correction schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2n+1 and producing carry and sum vectors. The modulo 2n+1 addition can then be performed using parallelprefix structure diminished-1 adders by taking in the sum and carry vectors plus the inverted end-around carry with simple correction schemes. Compared with the work in [10], the area cost for our proposed adders is lower. In addition, our proposed adders do not require the hardware for zero detection that is needed in diminished-1modulo 2n+1 addition. Synthesis results show that our proposed adders are comparable to the work proposed in [10] and [11]. The rest of this brief is organized as follows. In section II, we will review the design of two previous weighted modulo 2n+1adders. Our proposed area-efficient weighted modulo2n+1 adder is presented in Section III. The synthesis results and comparisons are given in Section IV, and Section V concludes this brief. II. RELATED WORK Given two (n+1)-bit numbers A and B, where 0 A,B 2n , the values of diminished-1 of A and B are denote by A*=A1

Volume 2, Issue 2, February 2012, ISSN Online: 2277-2677

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the correction circuits

IS J AA

International Journal of Systems , Algorithms & Applications

and B*=B1, respectively. The diminished-1 sum S* can be computed by S*=|S 1|2n+1=|A+B1|2n+1=|A*+B*|2n+ cout(1) where |X|Z is defined as modulo Z of X, and cout is denoted as the inverted end-around carry of the diminished-1 modulo 2n sum of n-bit A* and B*. A. Vergos and Efstathiou [10] In [10], the authors first compute the congruent modulo sum of A + B to produce Y and U, and then, the final modulo sum is performed by any diminished-1 modulo adder as follows: Suppose A and B are two (n+1)-bit input numbers, i.e., A = anan1, . . . , a0 = an 2n + An and B = bnbn1, . . . , b0 = bn 2n + Bn, where 0 A,B2n, and An and Bn are two n-bit numbers; then |A+B|2n+1 = ||An+Bn+D+1|2n+1+ 1|2n+1 =|Y+U+1|2n+1 (2) In (2),D=2n4+2cn+1+sn, which is equivalent to 1111. . . cn+1sn, where cn+1 = an bn ( is denoted as the logic AND operation), and sn = an^ bn (^ is denoted as the logic EXCLUSIVEOR operation) is the bit of D with binary weights 21 and 20, respectively. The first step of (2) computes modulo 2 n+1 carry-save addition, giving the carry vector Y and the sum vector U, where Y =yn2,yn3, . . . , y0 (~yn1 )and U = un1un2, . . . , u0 are produced by adding An, Bn, and D, respectively. It can be seen that the values of D with binary weights of 22 through 2n1 are all 1, which can simplify the design of adders to produce the carries and sums using OR and XNOR gates for every bit position directly [denoted FA+ in Fig. 1(a)]. In the bits of D with binary weights 21 and 20, the adders should be modified to accept the values sn and cn+1, respectively, as shown in Fig. 1(b). B. Vergos and Bakalis [11] In [11], the authors subtract the sum of the two n-bit inputs A and B by 1 to produce the diminished-1 value A_ and B_, and modulo 2n sum of A_ and B_ can be performed by any diminished-1 architecture, as follows: ||A+B|2n+1|2n = |A/+B/|2n + cout (3)
Fig. 2 Architecture proposed in [11].

The architecture proposed in [10] makes use of a constant time operator, which is composed of a simplified carry-save adder stage, leading to efficient modulo 2n+1 adder. The architecture proposed in [11] can be applied in the design of area-efficient residue generators and multi-operand modulo adders. However, it should be noted that, in [10], the values that are subtracted by the inputs A and B are not constants. In [11], the way to implement the translator for decreasing the sum of two inputs by 1 was not mentioned. Furthermore, in [11], the ranges of two inputs A and B are less than the one proposed in [10] (i.e., {0,2n1} versus {0,2n}). To remedy these problems, we will propose our area-efficient weighted modulo 2n+1 adder design in the next section. III. AREA & DELAY-EFFICIENT WEIGHTED MODULO 2n+ 1 ADDER Instead of subtracting the sum of A and B by D, which is not a constant as proposed in [10], we use the constant value (2n+1) to be added by the sum of A and B. In addition, we make the two inputs A and B to be in the range {0,2n}, which is 1 more than {0,2n1} as proposed in [11]. In the following, we present the designs of our proposed weighted modulo n+1 adder. From [8]given two (n+1)-bit inputs A = anan1, . . . , a0 and B = bnbn1, . . . , b0, where 0 A,B 2n. The weighted modulo 2n+1 of A+B can be represented as follows: |A+B|2n+1 = A+B(2n+1), if (A+B)>2n =A+B, otherwise. . ..(4) Equation (4) can be stated as ||A+B|2n+1|2n =|A+B(2n+1)|2n,if(A+B)>2n =|A+B(2n+1)|2n+|(2n+1)|2n,otherwise(5) This can then be expressed as ||A+B|2n+1|2n = |A+B (2n+1) |2n, if (A+B) >2n =|A+B (2n+1) |2n+1, otherwise...(6) From (6), it can easily be seen that the value of the weighted modulo 2n+1 addition can be obtained by first subtracting the value of the sum of A and B by (2n+1) (i.e.,0111.......1) and then using the diminished-1 adder to get the final modulo sum by making the inverted end-around carry as the carry-in. Now, we present the method of weighted modulo 2 n+1 addition of A and B as follows. Denoting Y/ and U/ as the carry and sum vectors of the summation of A,B and (2n+1), where Y/ = y/n2

Fig. 1 (a) Architecture of the weighed modulo 2n + 1 adder proposed in [10], (b) Architecture of FA1, FA0, and FA+, respectively.

The value cout is the inverted end-around carry produced by A/ + B/, and the architecture of [11] is shown in Fig. 2.
Volume 2, Issue 2, February 2012, ISSN Online: 2277-2677

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the correction circuits

IS J AA
an 0 0 0 0 0 0 0 0 1 bn 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 an-1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 bn-1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 u/n1 1 0 0 1 1 X 0 X 1 0 X X 1 X X X

International Journal of Systems , Algorithms & Applications


y/n-1 0 1 1 1 1 X 1* X 1 1
*

y/n3 y/0 (~y/n1)and U/ = u/n-1 u/n-2 u/n-3 ...... u/0 , the modulo addition can be expressed as follows:
A B ( 2 n 1)
2
n

FIX 0 0 0 0 0 X 1 X 0 1 X X 1 X X X

2
n2 i 0

ai bi 2 n 1

( 2a n 2bn a n 1 bn 1 ) 0111.....1 2 n 1

2
n2 i 0

ai bi 1 2 n 1

( 2a n 2bn a n 1 bn 1 1) 2n

2
n2 i 0

2y u
/ i

/ i

n 1

1 1
2
n

X X 1
*

For i=0 to n2, the values of and can be expressed as y/i =aibi and u/i = ~ (aibi), respectively ( is denoted as logic OR operation). Since the bit widths of Y/ and U/ are only n bits, the values of y/n-1 and u/n1 are required to be computed taking the values of an, bn, an1, and bn1 into consideration (i.e., y/n-1 and u/n1 are the values of the carry and the sum produced by 2an+2bn+an1+bn1+ 1, respectively). It should be noted that 0 A,B 2n, which means an=an1=1 or bn=bn1=1 will cause the value of A or B to exceed the range of {0, 2n}. Thus, these input combinations (i.e., an=an1=1 or bn=bn1= 1) are not allowed and can be viewed as dont care conditions, which can help us simplify the circuits for generating y/n-1 and u/n1. That is, the maximum value of 2an + 2bn + an1 + bn1 + 1 is 5, which occurs at an = bn = 1 (i.e., the maximum value of y/n-1 is 2). y/i u/i The truth table for generating y/n-1, u/n1 and FIX is given in Table I, where is denoted as dont care. The reason for FIX is that, under some conditions, y/n1 = 2 (e.g., an=bn=1 and an1=b n1=0), which cannot be represented by 1-bit line marked as * in Table I); therefore, the value of y/n-1 is set to 1, and the remaining value of carry (i.e., 1) is set to FIX. Notice that FIX is wired-OR with the carry-out of Y/+ U/ (i.e., cout) to be the inverted end around carry (denoted by cout FIX) as the array-in for the diminished-1 addition stage later on. When y/n-1=2, FIX=1; otherwise, FIX=0. According to Table I, we can have y/n-1= (an bn an1 bn1), u/n1 = ~(an1 bn1), and FIX = anbn bnan1 anbn1, respectively. Based on the aforementioned, our proposed weighted modulo 2 n + 1 addition of A and B is equivalent to ||A + B|2n+1| 2n = |A + B (2n + 1)| 2n =|Y/ + U/|2n + ~ (cout FIX).. (7) Two examples for our proposed addition methods are given as follows. Example 1: Suppose n = 8, A = 10010 = 0011001002, and B =15010 = 0100101102, respectively. Step 1: (A + B) (2n + 1) => Y/ = 011101102, U/ = 000011012, FIX = 0. Step 2: Y/+ U/ = 011100112, cout = 0, => Y/+ U/+ = 111110 = 25010. 102 = |100 + 150|257

( 2a n 2bn a n 1 bn 1 1)

1 1 1 1 1

X X X

TABLE I: TRUTH TABLE FOR GENERATING y/n-1, u/n-1 AND FIX (X: CONDITIONS WHEN y/n-1 = 2)

Example 2: Suppose n = 8, A = 20010 = 0110010002, and B = 15010 = 0100101102, respectively. Step 1: (A + B) (2n + 1) => Y/= 010111102, U/ = 101000012, FIX = 1. Step 2: Y/ +U/ = 111111112, cout = 0, => Y/ + U/= 001011 +150|17 = 9310. 1012 = |200

The architecture for our proposed adder is given in Fig. 3.From Fig. 3, the signal of FIX can be computed in parallel with the translation to Y/+ U/, leading to efficient correction. In addition, the hardware cost for our correction scheme and FAF are less than the one proposed in [10], due to the fact that there are two inconstant numbers that should be processed [i.e., FA1 and FA0, as shown in Fig. 1(b)] in the translation stage.

Fig. 3 Architecture of our proposed weighted modulo 2n + 1 adder with the correction scheme.

IV. SYNTHESIS RESULTS AND COMPARISONS We use Verilog structuring hardware description language to design our proposed adders and the existing work in [10] to compare delay/area performance. We adopt Sklansky-style [5], [12] and BrentKungstyle [13] parallel-prefix structures for the diminished-1 adder implementations.
10

Volume 2, Issue 2, February 2012, ISSN Online: 2277-2677

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the correction circuits

IS J AA
n=8 Logical delay Routing delay Total Delay Existing SK 12.00(49.9%) 12.025(50.1%) 24.025

International Journal of Systems , Algorithms & Applications

Sklansky-style parallel-prex structure with correction circuits for our proposed weighted modulo 28 +1 adder is shown in Fig. 4 The square and diamond () nodes denote the pre- and post processing stages of the operands, respectively. The black nodes () evaluate the prex operator, and the white nodes () pass the unchanged signals to the next prex level.

Implemented SK 10.475(53.6%) 9.065(46.4%) 19.540

TABLE IV: Delay synthesis results for various modulo (2n + 1) adders; (UNITS IN NANO SECONDS ) SKLANSKY-STYLE PARALLEL-PREFIX STRUCTURE-BASED DIMINISHED-1 ADDER n=8 Logical delay Routing delay Fig. 4. Diminished-1 adder based on the Sklansky-style parallelprex structure with the correction circuits for our proposed weighted modulo 28 +1 adder. Total Delay Existing BK 10.728(51.4%) 10.124(48.6%) 20.852 Implemented BK 10.475(53%) 9.303(47%) 19.778

TABLE V: Delay synthesis results for various modulo (2n + 1) adders; BRENTKUNG-STYLE PARALLEL-PREFIX STRUCTUREBASED DIMINISHED-1 ADDER

It should be noted that the value of sn (i.e., cout) can be computed by wiring-AND all propagate signals (i.e., sn =i=0n-1 pi. All the implementations for [10] and in this brief are synthesized in 0.13-m TSMC CMOS technology, and the results are given in Table II. Since in [11], the authors did not mention the way to implement the translator for producing A/ and B/, we use the similar carry-save architecture used in [10] and in this brief for adding -bit A, B and the value 1 (i.e., nbit 1s) to produce the carry and sum vectors for the diminished-1 adder. The area cost in is less than the one in [10] and in this brief as the range for inputs represented in is within {0, 2n 1}, and not within {0, 2n} for the inputs in [10] and in this brief. This leads to hardware savings in the translation stage. We also compare the area cost using the BrentKung-style parallel-prefix-based diminished-1 adder for implementation, and the results are given in Table III & V. It can be observed that the area cost for our proposed adders is less than [10] under the same delay. This is due to the constant value (2n + 1) that we have used, instead of the inconstant value used for translating the inputs A and B in [10].
n=8 Area (no. of slices) Gate count Levels of logic No. of input LUTs Existing SK 29 336 16 54 Implemented SK 24 273 12 43

V. CONCLUSION In this brief, an improved area-efficient weighted modulo 2n + 1 adder has been proposed. This has been achieved by modifying the existing diminished-1 modulo adders to incorporate simple correction schemes. The proposed adders can perform weighted modulo 2n + 1 addition and produce sums that are within the range {0, 2n}. Synthesis results show that our proposed adders can outperform previously reported weighted modulo adder in terms of area and delay constraints. ACKNOWLEDGMENT The authors would like to express their cordial thanks to their esteemed patrons Mr.Sharad Kulkarni, Head of Electronics and Communication Department, SKTRMCE, M. Nagar, and also to our beloved Asst professors, P. Naresh Kumar and D. Maruthi Kumar. REFERENCES
[1] M. A. Soderstrand, W. K. Jenkins, G. A. Jullien, and F. J. Taylor, Residue Number System Arithmetic: Modern Applications in Digital Signal Processing. New York: IEEE Press, 1986. [2] J. Ramrez, A. Garca, S. Lpez-Buedo, and A. Lloris, RNSenabled digital signal processor design, Electron. Lett. vol. 38, no. 6, pp. 266 268, Mar. 2002. [3] G. C. Cardarilli, A. Nannarelli, and M. Re, Reducing power dissipation in FIR filters using the residue number system, in Proc. IEEE 43rd IEEE MWSCAS, Aug. 2000, pp. 320323. [4] A. Nannarelli, M. Re, and G. C. Cardarilli, Tradeoffs between residue number system and traditional FIR filters, in Proc. IEEE ISCAS, May 2001, pp. 305308. [5] R. Zimmermann, Efficient VLSI implementation of modulo 2 n 1 addition and multiplication, in Proc. 14th IEEE Symp. Comput. Arithmetic, Apr. 1999, pp. 158167. [6] H. T. Vergos, C. Efstathiou, and D. Nikolos, Diminished-one modulo 2n + 1 adder design, IEEE Trans. Comput., vol. 51, no. 12, pp. 1389 1399, Dec. 2002. [7] C. Efstathiou, H. T. Vergos, and D. Nikolos, Modulo 2 n 1 adder design using select-prefix blocks, IEEE Trans. Comput., vol. 52, no. 11, pp. 13991406, Nov. 2003. [8] B. Cao, C. H. Chang, and T. Srikanthan, An efficient reverse converter for the 4-moduli set {2n 1, 2n, 2n + 1, 22n + 1} based on the new Chinese remainder theorem, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 10, pp. 12961303, Oct. 2003.

TABLE II: Area Synthesis results for various modulo (2n+1) adders; SKLANSKY-STYLE PARALLEL-PREFIX STRUCTURE-BASED DIMINISHED-1 ADDER n=8 Area (no. of slices) Gate count Levels of logic No. of input LUTs Existing BK 30 336 13 54 Implemented BK 25 270 12 44

TABLE III: Area Synthesis results for various modulo (2 n+1) adders; BRENTKUNG-STYLE PARALLEL-PREFIX STRUCTUREBASED DIMINISHED-1 ADDER
Volume 2, Issue 2, February 2012, ISSN Online: 2277-2677

11

Design of Weighted Modulo 2n + 1 Adder Using Diminished-1 adder with the correction circuits

IS J AA

International Journal of Systems , Algorithms & Applications

[9] T.-B. Juang, M.-Y. Tsai and C.-C. Chiu, Corrections on VLSI design of diminished-one modulo 2n + 1 adder using circular carry selection, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 3, pp. 260261, Mar. 2009. [10] H. T. Vergos and C. Efstathiou, A unifying approach for weighted and diminished-1 modulo 2n+ 1 addition, IEEE

Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 10, pp. 1041 1045, Oct. 2008. [11] H. T. Vergos and D. Bakalis, On the use of diminished-1 adders for weighted modulo 2n + 1 arithmetic components, in Proc. 11th EUROMICRO Conf. Digit. Syst. Des. Archit., Methods Tools, Sep. 008, pp. 752759.

Volume 2, Issue 2, February 2012, ISSN Online: 2277-2677

12

You might also like