Professional Documents
Culture Documents
e-ISSN: 2455-5703
Research Scholar
Department of Electronics and Communication Engineering
1,2
Dr.Mahalingam College of Engineering and Technology, Pollachi-642002 INDIA
1,2
Abstract
Modular multiplication forms a key operation in many public key cryptosystems. Montgomery Multiplication is one of the wellknown algorithms to carry out the modular multiplication more quickly. Carry Save Adders are employed to avoid carry
propagation at each addition operation. To reduce the extra clock cycles, Configurable carry save adder either with one full-adder
or two half-adders can be employed. In addition to that, a mechanism used to skip the unnecessary carry-save addition operations
in the one-level CCSA while maintaining the short critical path delay had been developed. In the proposed architecture, maximum
worst case delay is analyzed to enhance the throughput. In the path, additional buffers are introduced so that the clock is
synchronized to reduce the worst case delay. As a result, pipelining concept is introduced which increases the speed and achieves
a high throughput. The pipelined architecture is applied in RSA public key algorithm to increase the throughput of RSA
cryptosystem.
Keyword- Carry save addition, Montgomery modular multiplier, Pipelining, RSA
__________________________________________________________________________________________________
I. INTRODUCTION
The increase in data communication and internet services like electronic commerce, the security occupies an important role over
the inter-network. Public key cryptosystems by Rivest,R.L., et al provides data security to such networks. In these cryptosystems,
modular multiplication (MM) plays an important role in arithmetic functions. To enhance security, MM with large integers is
preferred. Montgomery multiplication proposed by Montgomery.P.L.is one of the fast algorithms to carry out the MM more
quickly. This algorithm determines the quotient only depending on the least significant digit of operands and replaces the
complicated division with a series of shifting modular additions. Montgomery MM is given by=A*B*R-1(Mod N) where, N is the
k-bit modulus, R-1 is the inverse of R modulo N, R R-1 = 1 (mod N) and R = 2k mod N. Hence it can be easily implemented to
speed up the encryption and decryption process in VLSI circuits. Long carry propagation is a major problem in performing addition
for large operands in binary representation. To solve this problem, several approaches based on carry save addition were proposed
to achieve a significant speedup of Montgomery MM. These approaches can be divided into semi carry save (SCS) and full carry
save (FCS) strategy.
The works by Kim, Y.S. et al, Bunimov,V. et al and Zhengbing,H.et al proposed that in Semi Carry Save format, the
inputs and outputs of the Montgomery multiplication are represented in binary form but the intermediate results of modular
multiplication are kept in carry save format for avoiding carry propagation. However, the format conversion from the carry-save
representation of the final product into its binary representation must be performed at the end of each modular multiplication. This
conversion can be simply accomplished by adding the carry and sum terms of carry-save representation. But the addition still
suffers from long carry propagation, and extra circuit and time are probably needed for these conversions.
In Full Carry Save format given by Walter, C.D and Zhengbing, H et al maintaining all the inputs and outputs of the
Montgomery modular multiplication in carry-save form except the final step for getting the result of modular exponentiation.
However, this implies that the number of operands in modular multiplication must be increased so that additional registers to store
these operands are required. Therefore, the FCS based Montgomery multipliers possibly have higher hardware complexity and
longer critical path than SCS based multipliers.
A. Montgomery multiplication
Modular multiplication of two integers X and Y, simply performs,
S = A.B mod N
Given an integer an, where n is the k-hit modulus, Ais
A = a*r (mod N)
Where r=2k.
Likewise, given an integer b<n, Bis said to be its n-residuewith respect to r if,
B = b*r (mod N)
463
Fig. 1: MM algorithm
B. RSA
RSA is one of the most widely used public key algorithms at present.
The RSA encryption and decryption functions are given by
C= Me mod n
D= Cd mod n
Respectively, where M is a plain text message block, C is a cipher text block, n is the k-bit modulus, and e and d are the
public and private exponents respectively. The equation ed = 1(mod(p-1)(q-1)) must also hold, where p and q are two large prime
numbers and n = pq. Thus, an RSA operation is modular exponentiation with operands satisfying the conditions stated above. RSA
requires repeated modular multiplications to accomplish the computation of modular exponentiation and the size of modulus is
generally at least 1024 bits for long term security.
This paper aim at enhancing the performance of CSA based SCS Montgomery multiplier while maintaining reduced delay
through pipelining. The proposed method is implemented in RSA public key algorithm to increase the speed and throughput of
RSA cryptosystems.
(3)
464
465
In the next clock cycle of the ith iteration, SM3 outputs a proper x according to and generated in the ith iteration as
shown in steps 9-12, M1 and M2 output the correct SC and SS according to skipi+1 generated in the ith iteration. If skipi+1 = 0, SC
1 and SS 1 are selected.Otherwise, SC 2 and SS 2 are selected, so that the right-shift 1-bit operations in steps 13 and 17
of SCS-MM-New algorithm are performed together in the next cycle of the iteration i.
466
In addition, M4 and M5 also select and output the correct SC[i]2:0 and SS[i]2:0 according to skipi+1 generated in the ith
iteration.
Note that SC[i]2:0 and SS[i]2:0 can also be obtained from M1 and M2 but a longer delay is required because they are 4-to1 multiplexers. After the while loop in steps 7-24 is completed, and stored in FFs are reset to 0. Then, the format conversion
= +
in steps 3 and 4.
in steps 26 and 27 can be performed by the SCS-MM-New multiplier similar to the computation of
Finally, SS [k + 5] equals to 0.
The proposed pipelined architecture is implemented in Rivest Shamir Adleman algorithm (RSA), one of the public key
cryptosystems. Modular exponentiation is the main operation performed in RSA. Modular exponentiation is achieved through
repeated modular multiplication. Hence, the modular exponentiation in RSA algorithm is replaced by the proposed pipelined
Montgomery modular multiplier to increase the speed and throughput of RSA cryptosystems. Montgomery algorithm is used to
calculate modular exponentiation of two integer values in RSA algorithm. The simulation result is shown in the Fig. 6.
467
Delay
Area*
Power(W)
(ns)
Existing
5.379
33
3.237
1024
Proposed
5.136
36
3.171
Existing
7.381
38
3.171
2048
Proposed
7.138
43
3.122
Table 1: Comparison of existing and pipelined Montgomery multipliers with 1024 and 2048 bit key sizes.
* - number of slices occupied
Multiplier
V. CONCLUSION
SCS-MM-New multiplier has the shortest critical path delay and needs fewer clock cycles to complete one Montgomery MM, and
thus spends the least execution time and achieves the highest throughput rate. This paper presented a pipelined Montgomery
modular architecture to reduce the delay and power. While using the proposed multiplier architecture in the present day RSSA
algorithms, the computational speed in such algorithms increases which is a major advantage in the cryptosystems.
REFERENCES
[1] Bunimov, V.,Schimmler, M. and Tolg, B. (2002).A complexity-effective version of Montgomerys algorithm.Proc. Workshop
complex Effective Designs.
[2] Kim, Y.S., Kang, W.S. and Choi, J.R. (2000).Asynchronous implementation of 1024-bit modular processor for RSA
cryptosystem.Proc. 2nd IEEE Asia-Pacific Conf. ASIC, pp.187-190.
[3] Kuang, S.R., Wang, J.P., Chang, K.C. and Hsu, H.W. (2013). Energy-efficient high-throughput Montgomery modular
multipliers for RSA crytosystems.IEEE Trans, VLSI Syst. Vol.21, no.11, pp.1999-2009.
[4] Kuang,S.R., Kun-Yi Wu, and Ren-Yao Lu.(2015). Low-Cost High-Performance VLSI Architecture for Montgomery
Modular Multiplication.IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Volume:PP , Issue:99.
468
[5] McIvor, C.,McLoone, M. and McCanny, J. V. (2004).Modified Montgomery modular multiplication and RSA exponentiation
techniques.IEE Proc.-Comput.Digit.Techn, Vol. 151, no. 6, pp. 402408.
[6] Montgomery, P.L. (1985).Modular multiplication without trial division Math.Comput., Vol. 44, no. 170, pp. 519521.
[7] Rivest, R.L., Shamir, A. and Adleman, L. (1978).A method for obtaining digital signatures and public-key
cryptosystemsCommun. ACM, Vol. 21, no. 2,pp. 120126.
[8] Walter, C.D. (1999), Montgomery exponentiation needs no final subtractions, Electron. Lett., Vol.35, no.21. pp.1831-1832.
[9] Zhengbing, H., Al Shboul, R.M. and Shirochin, V.P. (2007).An efficient
architecture of 1024-bits cryptoprocessor for RSA cryptosystem based on modified Montgomerys algorithm. Proc. 4th IEEE
Int. Workshop Intell.Data Acquisition Adv. Comput.Syst., pp.643-646.
469