You are on page 1of 53

CHAPTER-1

INTRODUCTION
1.1 BACKGROUND AND OBJECTIVES:
In todays digital world, encryption is emerging as a disintegrable
part of all communication networks and information processing systems,
for protecting both stored and in transit data. Encryption is the
transformation of plain data (known as plaintext) into unintelligible data
(known as ciphertext) through an algorithm referred to as cipher. There
are numerous encryption algorithms that are now commonly used in
computation, but the U.S. government has adopted the Advanced
Encryption Standard (AES) to be used by Federal departments and
agencies for protecting sensitive information. The National Institute of
Standards and Technology (NIST) has published the specifications of this
encryption standard in the Federal Information Processing Standards
(FIPS) Publication 197. [1]

Any conventional symmetric cipher, such as AES, requires a single


key for both encryption and decryption, which is independent of the
plaintext and the cipher itself. It should be impractical to retrieve the
plaintext solely based on the ciphertext and the encryption algorithm,
without knowing the encryption key. Thus, the secrecy of the encryption
key is of high importance in symmetric ciphers such as AES. Software
implementation of encryption algorithms does not provide ultimate
secrecy of the key since the operating system, on which the encryption
software runs, is always vulnerable to attacks.

1.2 DRAWBACKS OF SOFTWARE:

There are other important drawbacks in software implementation of


any encryption algorithm, including lack of CPU instructions operating
1
on very large operands, word size mismatch on different operating
systems and less parallelism in software. In addition, software
implementation does not fulfill the required speed for time critical
encryption applications. Thus, hardware implementation of encryption
algorithms is an important alternative, since it provides ultimate secrecy
of the encryption key, faster speed and more efficiency through higher
levels of parallelism.

1.3 ENCRYPTION METHODS:


1.3.1 KEY BASED APPROACH:

Different versions of AES algorithm exist today (AES128, AES196,


and AES256) depending on the size of the encryption key. In this project,
a hardware model for implementing the AES128 algorithm was
developed using the Verilog hardware description language. A unique
feature of the design proposed in this project is that the round keys, which
are consumed during different iterations of encryption, are generated in
parallel with the encryption process.

1.3.2 LANGUAGES:
The hardware model was then completely verified using a testbench,
which took advantage of the Verilogs programming feature, by
constructing random test objects and providing them to the model. Then,
the verified model was synthesized using the Synopsis Design-Compiler
tool to get an estimate of the number of gates, area and timing of the
hardware model. Finally, the performances of software and hardware
implementations were compared.

2
1.3 FINITE FIELDS:

In this section, the preliminaries on nite elds (also known as


Galois elds) used in the subsequent sections are presented. The detailed
description of these elds can be found in a number of publications, see
for example [18] and [19]. According to Lin and Costello in [19], the
denition of a nite eld is as follows. Let F be a set of elements on
which two binary operations of addition and multiplication, shown by +
and , respectively, are dened. Then, the set F and these operations
construct a nite eld if the following conditions are satised:

1. The set F be commutative under addition. The identity element in


addition is zero.

2. The non-zero elements of set F be commutative under multiplication.


The identity element in multiplication is one.

3. Multiplication be distributive over addition, i.e., for a, b and c in set F


we have a (b + c) = a b + a c.

4. The number of elements in the eld be nite.For the prime number q,


GF(q) is the nite eld with elements {0, 1, 2, , q 1}. In this nite eld
all the operations are carried out mod q. It is noted that a polynomial P(x)
of degree m, denoted as P(x) =Pm i=0 pix I where all the coecients are
in GF(q), is called an irreducible polynomial and used to construct the
nite eld when it is not the product of two or more polynomials in
GF(q). For any prime number q and degree m, there exist a eld GF(q ^m
) whose elements and irreducible polynomial are polynomials of degrees
m1 and m, respectively. It is noted that there are m coecients for the
polynomial representation of each element in GF(q^ m ).

3
In the AES, the irreducible polynomial of P(x) = x^8 + x^4 + x^3 + x
+ 1 is used to construct GF(2^8). Each element in GF(2^8) is
represented by a polynomial of degree 7, having 8 coecients in
GF(2). Furthermore, all the eld operations are carried out using the
above mentioned irreducible polynomial.

Cryptosystems and Public key cryptography:


The word Cryptography is derived from the Greek and it literally
means secret writing. Cryptography has been around for more than a
thousand years and the Roman Empire was thought to be the masters of
cryptography as they used simple cipher techniques to hide the meaning
of messages. Some of the earlier and popular cryptographic techniques
were Caesar cipher, Substitution cipher and Transposition ciphers.
Cryptography is the process of encrypting the plain text into an
incomprehensible cipher text by the process of Encryption and the
conversion back to plain text by process of Decryption.

Cryptographic systems are generally classified on the following basis:


1. Type of operations used to for transforming plaintext to cipher
text: Most encryption algorithms are based on 2 general principles,
a. Substitution, in which each element in plain text is mapped
to some other element to form the cipher text
b. Transposition, in which elements in plaintext are rearranged
to form cipher text.
2. Number of keys used: If both the sender and the receiver use a
same key then such a system is referred to as Symmetric, single-key,
secret-key or conventional encryption. If the sender and receiver use
different keys, then such a system is called Asymmetric, Two-key, or
public-key encryption.

4
3. Processing of Plain text: A Block cipher processes the input one
block at a time, producing an output block for each input block. A
Stream cipher processes the input elements continuously producing
output elements on the fly.

Most of the cryptographic algorithms are either symmetric or asymmetric


key algorithms.
1. Secret Key Cryptography: This type of cryptosystem uses the
same key for both encryption and decryption. Some of the advantages
of such a system are
- Very fast relative to public key cryptography
- Considered secure, as long as the key is strong
Symmetric key cryptosystems have some disadvantages too.
Exchange and administration of the key becomes complicated. Non-
repudiation is not possible. Some of the examples of Symmetric key
cryptosystems include DES, 3-DES, RC4, RC5 etc.

2. Public Key Cryptography:


This type of cryptosystems uses different keys for encryption and
decryption. Each user has a public key, which is known to all others,
and a private key, which remains a secret. The private key and public
key are mathematically linked. Encryption is performed with the
public key and the decryption is performed with the private key.
Public key cryptosystems are considered to be very secure and
supports Non-repudiation. No exchange of keys is required thus
reducing key administration to a minimum. But it is much slower than
Symmetric key algorithms and the cipher text tend to be much larger
than plaintext. Some of the examples of public key cryptosystems
include Diffie-Hellman, RSA and Elliptic Curve Cryptography.

5
Brief Overview of some known algorithms:

Diffie-Hellman (DH) public-key algorithm:


Diffie-Hellman was the first public-key algorithm ever invented,
way back in 1976. It gets its security from calculating discrete logarithms
in a finite field. The idea behind Diffie-Hellman algorithm is to generate a
private key that can later be used for communication, and sharing it in a
secure fashion. Two people, say Alice and Bob, can use this algorithm to
generate a secret key and for key distribution. First Alice and Bob agree
on large prime numbers n and g such that g is primitive mod n. Alice and
Bob could do this over an insecure channel. Alice and Bob perform the
following steps.
1. Alice chooses a random large integer x and sends Bob a = g x
mod n
2. Similarly Bob chooses a random large integer y and sends
Alice: b = gy mod n
3. Alice computes k from b that Bob sent, k = bx mod n
4. Similarly Bob computes k = ay mod n
Both k and k are equal to gxy mod n. Any person listening to the
conversation would only know n, g, a and b. They cannot recover x and y
because of the Discrete Logarithm problem. The security lies on choosing
large values of n and g. The Diffie-Hellman key exchange protocol can be
easily extended to three or more people.

Rounds and Transformations:


Here we briey explain the four transformations of each round of the
encryption. Each transformation in every round of encryption/decryption
acts on its 128-bit input which is considered as a four by four matrix,

6
called state (shown in dotted rectangles in Figure 2.1), whose entries are
eight bits. cryptanalysis is one of the two most widely used attacks on
block ciphers; the other is differential cryptanalysis.
Differential cryptanalysis:
Differential cryptanalysis is a general form of cryptanalysis applicable
primarily to block ciphers, but also to stream ciphers. In the broadest
sense, it is the study of how differences in an input can affect the resultant
difference at the output. In the case of a block cipher, it refers to a set of
techniques for tracing differences through the network of transformations,
discovering where the cipher exhibits non-random behaviour, and
exploiting such properties to recover the secret key.

Brute-force attack: A Brute-force attack is a method of defeating a


cryptographic scheme by trying a large number of possibilities; for
example, possible keys in order to decrypt a message. In most schemes,
the theoretical possibility of a brute force attack is recognized, but it is set
up in such a way that it would be computationally infeasible to carry it
out. Accordingly, one definition of "breaking" a cryptographic scheme is
to find a method faster than the brute force attack.
Side-channel attack: a side channel attack is any attack based on
additional information gained from the physical implementation of a
cryptosystem, rather than brute force or theoretical weaknesses in the
algorithms (compare cryptanalysis). For example, timing information,
power consumption, electromagnetic leaks or even sound can provide an
extra source of information which can be exploited to break the system.
Many side-channel attacks require considerable technical knowledge of
the internal operation of the system on which the cryptography is
implemented.

7
2.3 Analysis of symmetric-encryption Algorithms
In this section several symmetric encryption algorithms will be analyzed.
The history of the algorithms, their key sizes, block sizes, code sizes 1,
number of rounds of algorithms and security levels will be discussed. We
focus on block ciphers since they are much easier implemented in
software. Implementing in software will avoid time-consuming bit
manipulations as long as they operate on data in computer-sized blocks.

Security of AES:
Three possible approaches to attacking the AES algorithm are as follows:
Brute Force: This involves trying out all the possible private keys.
Mathematical attacks: There are several approaches, all equivalent
in effect to factoring the product of 2 primes.
Timing attacks: These depend on the running time of the
decryption algorithm.

Choosing large p and q values can prevent such attacks. Security of RSA
thus lies in choosing the value n, which makes such attacks extremely
difficult.

8
CHAPTER-2
PROJECT DESCRIPTION

2.1 LITERATURE REVIEW:

This chapter is aimed at describing the RC4 encryption standard and


the hardware implementations that have been proposed for its RC5
encryption and decryption. The RC4 algorithm consists of two parts: The
key scheduling algorithm (KSA) and the pseudo-random generation
algorithm (PRGA). The algorithms are shown in Figure 1 where l is the
length of the secret key in bytes, and N is the size of the array S or the S-
box in words. A common key size in RC4 is between 5 and 32 bytes. In
most applications RC4 is used with a word size n = 8 and array size N = 2
. In the first phase of RC4 operation an identity permutation (0; 1; :::; N
1) is loaded in the array S. A secret key K is then used to initialize S to a
random permutation by shuffling the words in S. During the second phase
of the operation, the PRGA produces random words from the permutation
in S.
Each iteration of the PRGA loop produces one output word, which
constitutes the running key stream. The key stream is bit-wise XORed
with the plaintext to obtain the cipher text. All the operations described in
Figure 1 are byte operations (n = 8). Most modern processors however
operate on 32-bit or 64-bit words. If the word size in RC4 is increased to
n = 32 or n = 64, to increase its performance, the size of array S becomes
232 or 264 bytes which is not practical. Note that these are the array sizes
to store all the 32-bit or 64-bit permutations respectively. Cryptanalysis of
RC4 attracted a lot of attention in the cryptographic community after it
was made public in 1994. Indeed numerous significant weaknesses were
discovered, including Finneys forbidden states [2], classes of weak keys

9
[23], patterns that appear twice the expected probability (the second byte
bias) [14], partial message recovery [14], full key recovery attacks [4],
analysis of biased distribution of RC4 initial permutation [17], and
predicting and distinguishing attacks [13]. Knudsen et al. have attacked
versions of RC4 with n < 8 by their backtracking algorithm [11]. Fluhrer
et al. observed the most serious weakness in RC4 in [4] where RC4 was
proved to have a practical attack in the security protocol WEP.

2.1.1 EXISTING SYSTEM:

2.1.2 PROPOSED SYSTEM:

2.2 The RC5 encryption algorithm


The RC5 encryption algorithm is a block cipher that converts plaintext
data blocks of 16, 32, and 64 bits into ciphertext blocks of the same
length [8-10]. It uses a key of selectable length b (0, 1, 2, ..., 255) byte.
The algorithm is organized as a set of iterations called rounds r that takes
values in the range (0, 1, 2, ..., 255) as illustrated in Fig. 1. An expanded
key array is created out from the original key by means of a key

10
schedule. The expanded key array is used with both
encryption/decryption routines and its length is dependent on the number
of rounds. The operations performed on the data blocks include bitwise
exclusive-OR of words, data-dependent rotations by means of circular
left and right rotations and Two's complement addition/subtraction of
words, which is modulo-2 w addition/subtraction, where w is the word
size in bits. They always affect a complete 16, 32 or 64-bit data block at
a time.
There are two inputs to the encryption function, which are the plain
image to be encrypted and the expanded secret key. For RC5 image
encryption, the image header is extracted from the image to be encrypted
and the image data stream is divided into blocks of 64-bit length [11].
The first 64-bit block of image is entered as the plain image to the
encryption function of RC5. The second input the RC5 encryption
algorithm is the expanded secret key that is derived from the user-
supplied secret key by the key schedule. Then, the next 64-bit plain image
block follows it, and so on with the scan

Groups:
A mathematical structure consisting of a set G and a binary operator on
G is a group if,
a, b G, if c = a b, then c G (Closure)
a (b c) = (a b) c, a, b, c G (Associative)
e G, such that a G, a e = e a = a (Identity element)
a G, a G such that, a a = a a = e. a is unique for each
a and is called the inverse of a.

11
The group is represented as G, . Additionally, a group is said to be
abelian if it also satisfies the commutative property, i.e., a, b G, if, a
b = b a.

Rings:
A Ring is a set R with two binary operations + and (Addition and
multiplication) defined on R such that the following conditions are
satisfied.
R, + is an Abelian group
a (b c) = (a b) c, a, b, c R (Associativity of )
a (b + c) = (a b) + (a c), a, b, c R (Distributivity of over
+)

A Ring, in which is commutative is called a commutative ring. Further,


if the ring contains an identity element with respect to , i.e. e R and
a R, a e = e a = a, then e is called the identity element or the
unity element and is represented by 1. If R contains a unity element, then
R is called a Unitary Ring.

Fields and Vector Spaces:


A Field F is a commutative and a unitary ring such that, F* = a | a
F and a 0 is a multiplicative group. The ring Zp is a Field, if and only if
p is a prime.

If F is a field. A subset K of F that is also a field under the operations of F


(with restriction to K) is called a sub field of F. In this case, F is called an
extension field of K. If K F then K is a proper sub field of F. A field is
called prime if it has no proper sub field.

12
If F is a field and V is an additive abelian group, then V is called the
vector space over F, if an operation F x V V is defined such that:
a (v + u) = av + au
(a + b) v = av + bv
a (bv) = (a.b) v
1.v = v
where, a, b F and u, v V.
The elements of F are called the scalars and the elements of V are called
the vectors.

If v1, v2, , vm V, and f1, f2, , fm F, then the vector v = f i v j , 1


i, j m, is a linear combination of the vectors in V. The set of all such
linear combinations is called the span of V.

The vectors v1, v2, , vm V are said to be linearly independent over F if


there exists no scalars f1, f2, , fm F such that f i v j 0, 1 i, j m.

A set S = u1, u2, , un are said to the basis of V iff all the elements of S
are linearly independent and span V. If a vector space V over a field F has
a basis of a finite number of vectors, then this number is called the
dimension of V over F.

If F is an extension field of a field F p then, F is a vector space over F p.


The dimension of F over Fp is called the degree of the extension of F over
Fp .

13
Finite Fields:
A field of a finite number of elements is denoted F q or GF(q), where
q is the number
of elements. This is also known as a Galois Field.

The order of a Finite field Fq is the number of elements in Fq. Further,


there exists a finite field Fq of order q iff q is a prime power, i.e. either q
is prime or q = pm, where p is prime. In the latter case, p is called the
characteristic of Fq and m is called the extension degree of Fq and every
element of Fq is a root of the polynomial xp
m
x over Zp.

Let us consider two classes of Finite fields F p (Prime Field, p is a prime


number) and F2m (Binary finite field).

Prime Field Fp:

The prime field Fp consists of the set of integers 0, 1, 2, .., p 1,


with the following arithmetic operations defined over it.
Addition: a, b Fp, r Fp, where r = (a + b) mod p
Multiplication: a, b Fp, s Fp, where s = (a b) mod p

Binary Finite Field F2m:

The finite field F2 m , called a characteristic two finite field or a


binary finite field can be viewed as a vector space of m dimensions over
F2, which consists of 2 elements 0 and 1. There exists m elements 0, 1,
2, , m-1 in F2m such that each element F2m can be uniquely

m 1

represented as = a
i 0
i i , where ai 0, 1, 0 i m

14
The string 0, 1, 2, , m-1 is called the basis of F2m over F2. Given
such a basis, every field element can be represented as a bit string
(a0a1a2am-1). Generally two kinds of basis are used to represent binary
finite fields: polynomial basis and normal basis.

Polynomial basis representation of F2m:

Let f(x) = xm + fm-1xm-1 + + f2x2 + f1x + f0, where fi 0, 1, 0 i


m, be an irreducible polynomial of degree m over F 2. f(x) is called the
reduction polynomial of F2m .

The finite field F2m is comprised of all polynomials over F2 of degree less
than m, i.e.:
F2 m = am-1xm-1 + am-2xm-2 + + a2x2 + a1x + a0 : ai 0, 1.

The field element am-1xm-1 + am-2xm-2 + + a2x2 + a1x + a0 is usually


represented by the bit string (am-1am-2a2a1a0) of length m such that
F2 m = (am-1am-2a2a1a0) : ai 0, 1.

Thus, the elements of F2m can be represented by the set of all binary
strings of length m. The multiplicative identity 1 is represented by the bit
string (00001) and the bit string of all zeroes represents the additive
identity 0.

The following operations are defined on the elements of F2 m when using


f(x) as the reduction polynomial.

15
Addition: If a = (am-1am-2a2a1a0) and b = (bm-1bm-2b2b1b0) are
elements of F2m , then, c = a + b = (cm-1cm-2c2c1c0), where ci = (ai
+ bi) mod 2 = ai bi.
Multiplication: If a = (am-1am-2a2a1a0) and b = (bm-1bm-2b2b1b0)
are elements of F2m , then, c = a . b = (cm-1cm-2c2c1c0), where the
polynomial
cm-1xm-1 + cm-2xm-2 + + c2x2 + c1x + c0 is the remainder when the
polynomial
(am-1xm-1 + am-2xm-2 + + a1x + a0) (bm-1xm-1 + bm-2xm-2 + + b1x +
b0) is divided by f(x) over F2.
Inversion: If a is a nonzero element in F2m , then the inverse of a,
denoted a1, is a unique element c F2m , where a.c = c.a = 1

Normal basis representation of F2m:


m 1
A normal basis of F2m over F2 is a basis of the form , 2 , 2 ,..., ,
2
2

m 1
where F2m . Any element a F2m can be written as a = a
i0
i
i
,

where ai 0, 1.
Gaussian Normal Bases (GNB): A GNB representation of F2m exists if
there exists a positive integer T such that p = Tm + 1 is prime and
gcd(Tm/k , k) = 1, where k is the multiplicative order of 2 modulo p. The
GNB representation is called a type T GNB for F2 m .

The following operations are defined over F2 m when using a type T GNB
representation.

16
Addition: If a = (am-1am-2a2a1a0) and b = (bm-1bm-2b2b1b0) are
elements of F2m , then, c = a + b = (cm-1cm-2c2c1c0), where ci = (ai
+ bi) mod 2 = ai bi.
Squaring: Let a = (am-1am-2a2a1a0) F2m . Squaring is a linear
operation in F2 m . Hence

2
m -1
m -1 i 1 m -1
2i
a m - 1a0 a 2 a m - 2 .
i
2
a

ai
i0
ai 2
i0
a i - 1 2
i0
Hence

squaring a field element is simply a rotation of the vector


representation.
Multiplication: Let p = Tm + 1 and let u Fp. Let us define a
sequence F(0), F(1), , F(p - 1) by F(2i uj mod p) = i, for 0 i
m, 0 j T.
If a = (am-1am-2a2a1a0) and b = (bm-1bm-2b2b1b0) are elements of
F2 m , then the product c = a.b = (cm-1cm-2c2c1c0) where,
p2

k1 a F(k 1) i bF(p - k) i If T is even

ci
m/2 p2

k1 (ak i -1bm/2 k i -1 a m/2 k i -1bk i -1 ) k1 aF(k 1) i bF(p - k) i If T is odd

for each i, 0 i m, where indices are reduced modulo m.


Inversion: If a is a nonzero element in F2m , then the inverse of a,
denoted a1, is a unique element c F2m , where a.c = c.a = 1.

2.3 CIPHER TRANSFORMATIONS:


The RC5 cipher either operates on individual bytes of the State or
an entire row/column. At the start of the cipher, the input is copied into
the State as described in Section 2.2. Then, an initial Round Key addition
17
is performed on the State. Round keys are derived from the cipher key
using the Key Expansion routine. The key expansion routine generates a
series of round keys for each round of transformations that are performed
on the State.

The transformations performed on the state are similar among all


RC5 versions but the number of transformation rounds depends on the
cipher key length. The final round in all RC5 versions differs slightly
from the first Nr 1 rounds as it has one less transformation performed on
the State. Each round of RC5 cipher (except the last one) consists of all
the following transformation:
. This uses three primitive operations and their inverse.

1. Addition of words +. This is modulo-2w addition and the inverse


operation subtraction of words-.
2. Bit wise exclusive OR (XOR) of words
3. The rotation of word x left by y bits is denoted x<<<y. The inverse
operation is the rotation of word x right by y bits is denoted x>>>y.

18
CHAPTER-3
WORKING OF PROJECT

3.1 OVERVIEW:
Cryptographic algorithms can be divided on the basis of key usage
as Symmetric and Asymmetric ciphers. In symmetric ciphers a key is
used as a parameter to the encryption algorithm which takes the data and
converts it into a random sequence of characters which have no relation
(ideally) to the original data. This random sequence of characters is
known as cipher text.

This cipher text is sent to the receiver over the medium. The receiver
then gives the same key as input to the decryption algorithm and converts
the cipher text back to the plain text. If the key used for encryption is not
the same as the key used for decryption, the cipher is asymmetric.
Asymmetric ciphers are mainly used to exchange the keys for exchanging
the symmetric keys which are used to establish a secure connection
between devices. Asymmetric ciphers are not used extensively because
they are inherently slower compared to symmetric ciphers.

Ciphers can also be divided as stream based or block based ciphers


based
on the size difference between the cipher text and the corresponding plain
text. In Block based ciphers the length of plain text is same as the cipher
text but in stream based ciphers, the cipher text is usually longer than the
plaintext. Though stream ciphers are less complex and easier to
implement compared to Block based ciphers, they have security issues
arising due to the pseudo random generators used in them.

19
3.2 BASIC PRIMITIVES:

RC5 has a variable word size, a variable number of rounds and a


variable length secret key. RC5 is exactly designated as RC5-w/r/b, where
w denotes word size in bits, the standard value is 16,32and 64 bits; r
denotes number of rounds and allowable value ranges from 0 to 255; b
denotes length of users secret keyin bytes and the allowable value ranges
from 0 to 255. The parameters we have used are RC5-32/12/16. RC5
consists of three components: key expansion, encryption and decryption
algorithm

20
3.2.1 Key Expansion

This routine expands the users secret key K to fill the expanded key
array S, S resembles an array of t=2(r+1) random binary words
determined by K. It uses two word-sized binary constants Pw and Qw.

3.2.2 Specifications

The algorithm uses series data dependent rotations heavily to


randomize the data during encryption. The decryption stage performs the
inverse of the operations performed in the encryption stage to obtain the
original data or plain text. Both the encryption and decryption stages use
the expanded version of the key called as S array for their operations. The
flexibility of the algorithm is due to the fact that the word length (W), key
size(b) and the number of rounds(R), are variable. Their values can be
adjusted depending on the requirements. The word length specifies the
number of bits in each word which the algorithm takes as input.
Increasing the word length increases the throughput. But in software
implementations, it is necessary to consider the register size of the CPU.
Any length greater than the size of the CPU registers degrades
performance. The key size is the length of the key (in bytes). Increasing
the key size improves security by reducing vulnerability to rainbow or
brute force attacks. The number of rounds specifies the number of
iterations in the encryption and decryption procedures. Apart from
randomizing the data even further, it increases the encryption and
decryption times which is a trade-off for security, because it makes brute
force attacks difficult or even infeasible.

21
3.3 ENCRYPTION AND DECRYPTION

START

Divide the plaintext block into two 32-bit words: A and B

initialize Round Constant;


initialize key

A = A+S0; B = B + S1;

A = ((A XOR B) <<< B) + S2i;


B = ((B XOR A) <<< A) + S2i+1

Round = Round - 1

Round = 0

END

Fig FLOW CHART

We assume that the input block is given in two w-bit registers A and
B. We also assume that key expansion has already been performed, so
that the arrays(0, t-1) has been computed. Here is the encryption
algorithm in following figure 1

22
Here in this diagram illustrates the Fiestal structure which is basic
principle of the symmetric data security process. Basic operation of RC5
encryption algorithm was discussed in chapter 1.

Fig .1 Encryption modules


3.3.1 Mixing in the secret key

The third process is to mix in the users secret key in the array S and
L array.

i=j=0;
A=B=0;
Do 3*max(t,c) times;

A=S[i]=(S[i]+A+B)<<< 3;

B=L[i]=(L[i]+A+B)<<< (A+B);

i= (i+1)mod (t);

j= (j+1)mod (c);

23
3.3.2 . Initializing the array S

The second process is to initialize arrayS to a pseudo random bit


pattern using arithmetic progression by constant values Pw and Qw

.S[0]=Pw;

For i=1 to t-1 do

S[i]=S[i-1]+Qw

24
CHAPTER-4
HARDWARE AND SOFTWARE USED
4.1 FPGA
A field-programmable gate array (FPGA) is an integrated circuit
designed to be configured by the customer or designer after
manufacturinghence "field-programmable".
The FPGA configuration is generally specified using a hardware
description language (HDL), similar to that used for an application-
specific integrated circuit (ASIC) (circuit diagrams were previously used
to specify the configuration, as they were for ASICs, but this is
increasingly rare). FPGAs can be used to implement any logical function
that an ASIC could perform.

FPGAs contain programmable logic components called "logic


blocks", and a hierarchy of reconfigurable interconnects that allow the
blocks to be "wired together"somewhat like a one-chip programmable
breadboard.

Logic blocks can be configured to perform complex


combinational functions, or merely simple logic gates like AND and
XOR. In most FPGAs, the logic blocks also include memory elements,
which may be simple flip-flops or more complete blocks of memory.

4.1.1 Introduction

25
The area of field programmable gate array (FPGA) design is
evolving at a rapid pace. The increase in the complexity of the FPGA's
architecture means that it can now be used in far more applications than
before. The newer FPGAs are steering away from the plain vanilla type
"logic only" architecture to one with embedded dedicated blocks for
specialized applications.
Definitions of Relevant Terminology are
Field-programmable Device (FPD) a general term that
refers to any type of integrated circuit used for implementing digital
hardware, where the chip can be configured by the end user to realize
different designs.

PLA a Programmable Logic Array (PLA) is a relatively


small FPD that contains two levels of logic, an AND-plane and an OR-
plane, where both levels are programmable.

PAL a Programmable Array Logic (PAL) is a relatively small


FPD that has a programmable AND-plane followed by a fixed OR-plane.
SPLD refers to any type of Simple PLD, usually either a PLA or PAL.
CPLD a more Complex PLD that consists of an arrangement of
multiple SPLD-like blocks on a single chip.

4.1.2 The FPGA Landscape


In the semiconductor industry, the programmable logic segment is
the best indicator of the progress of technology. No other segment has
such varied offerings as field programmable gate arrays. It is no wonder

26
that FPGAs were among the first semiconductor products to move to the
0.13m technology, and again recently to 90nm technology.

Fig. 13 Structure of an FPGA

27
The players in the current programmable logic market are Altera,
Atmel, Actel, Cypress, Lattice, Quick logic and Xilinx. Some of the
larger and
more popular device families are: Stratix from Altera, Accelerator from
Actel, is XPGA from Lattice and Virtex from Xilinx.
Between these FPGA devices, many major electronics applications
such as communications, video, image and digital signal processing,
storage area networks and aerospace are covered.

4.1.3 FPGA synthesis: the vendor-independent approach


Dedicated memory blocks offer data storage and can be
configured as basic single-port RAMs, ROMs (read only memory),
FIFOs (first in first out), or CAMs (Content Addressable m\Memory).
Data processing or the logic fabric of these FPGAs varies widely in size
with the biggest Xilinx Virtex-II Pro offering up to 100K LUT4s. The
ability to interface the FPGA with backplanes, high-speed buses, and
memories is possible by the availability of various single-ended and
differential I/O standards support.
Many of the major electronics applications such as communications,
video, image and digital signal processing; storage area networks and
aerospace are covered between the above-mentioned FPGA devices.
In a similar manner, for programmable systems applications
requiring embedded processors, the Virtex-II Pro with its 32-bit RISC
processor (PowerPC 405) would be an ideal choice.

Table 4.1 Features Offered In FPGA

Features Xilinx virtex Altera Actel Lattice is


II stratix axcelerator pXPGA

28
Pro
Clock DCM PLL PLL Sys
management Up to 12 Up to 12 Up to 8 CLOCK
PLL up to
8
Embedded Block RAM Tri Matrix Embedded Sys MEM
memory Up to 10 Memory RAM Blocks
blocks Mbit Up to10 Up to 338K Up to 414K
Mbit
Data CLB and LEs and Logic PFU based
processing 18-bitx 18-bit embedded modules
Multipliers multipliers (C-cell &R-
cell)
Programmable Select IO Advanced Advanced Sys IO
I/O s IO Support IO Support
Special Embedded DSP blocks Per pin Sys Hs 1
features power PC405 FIFOs for for high
Cores bus speed serial
application interface

4.1.4 Applications of FPGAs


A list of typical applications includes: random logic, integrating
multiple SPLDs, device controllers, communication encoding and
filtering, small to medium sized systems with SRAM blocks, and many
more.

4.2 Altera DE0 Board


The DE0 board has many features that allow the user to implement a
wide range of designed circuits, from simple circuits to various
multimedia projects.The ollowing hardware is provided on the DE0
board:
29
Altera Cyclone III 3C16 FPGA device
Altera Serial Configuration device EPCS4
USB Blaster (on board) for programming and user API control;
both JTAG and Active Serial
(AS) programming modes are supported
8-Mbyte SDRAM
4-Mbyte Flash memory
SD Card socket
3 pushbutton switches
10 toggle switches
10 green user LEDs
50-MHz oscillator for clock sources
VGA DAC (4-bit resistor network) with VGA-out connector
RS-232 transceiver
PS/2 mouse/keyboard connector
Two 40-pin Expansion Headers

Figure 14. The DE0 board.

30
4.2.1 Block Diagram of the DE0 Board:

Figure 4.2 gives the block diagram of the DE0 board. To provide
maximum flexibility for the user, all connections are made through the
Cyclone IIII FPGA device. Thus, the user can configure the FPGA to
implement any system design.

Figure 15. Block diagram of the DE0 board.


4.2.2 Cyclone IIII 3C16 FPGA
15,408 LEs
56 M9K Embedded Memory Blocks
504K total RAM bits
56 embedded multipliers
4 PLLs
346 user I/O pins

Built-in USB Blaster circuit


On-board USB Blaster for programming and user API
(Application programming interface) control
Using the Altera EPM240 CPLD

31
SDRAM
One 8-Mbyte Single Data Rate Synchronous Dynamic RAM
memory chip
Supports 16-bits data bus
Flash memory
4-Mbyte NOR Flash memory
Support Byte (8-bits)/Word (16-bits) mode
SD card socket
Provides both SPI and SD 1-bit mod SD Card access
Pushbutton switches
3 pushbutton switches
Normally high; generates one active-low pulse when the switch is
pressed
Slide switches
10 Slide switches
A switch causes logic 0 when in the DOWN position and logic 1
when in the UP position
General User Interfaces
10 Green color LEDs (Active high)
4 seven-segment displays (Active low)
16x2 LCD Interface (Not include LCD module)
Clock inputs
50-MHz oscillator
VGA output
Uses a 4-bit resistor-network DAC
With 15-pin high-density D-sub connector
Supports up to 1280x1024 at 60-Hz refresh rate

Serial ports
One RS-232 port (Without DB-9 serial connector)

32
One PS/2 port (Can be used through a PS/2 Y Cable to allow you to
connect a keyboard and mouse to one port)
Two 40-pin expansion headers
72 Cyclone III I/O pins, as well as 8 power and ground lines, are
brought out to two 40-pin expansion connectors 40-pin header is
designed to accept a standard 40-pin ribbon cable used for IDE
hard drives

4. SOFTWARES USED

We have used Modelsim, and QuartersII. Let us see in brief.

4.1 MODEL SIM :

High Performance and Capacity Mixed HDL Simulation Model Sim

Mentor Graphics was the first to combine single kernel simulator


(SKS) technology with a unified debug environment for Verilog, VHDL,
and SystemC. The combination of industry-leading, native SKS
performance with the best integrated debug and analysis environment
make ModelSim the simulator of choice for both ASIC and FPGA design.
The best standards and platform support in the industry make it easy to
adopt in the majority of process and tool flows.

ModelSim-Altera Edition

33
Recommended for simulating all FPGA designs (Cyclone, Arria,
and Stratix series FPGA designs)
33 percent faster simulation performance than ModelSim-Altera
Starter Edition.
No line limitations
Buy today for $945.

ModelSim-Altera Starter Edition

Support for simulating small FPGA designs


10,000 executable line limitations.

4.2 QUARTUS II:

Quartus II is a software tool produced by Altera for analysis and


synthesis of HDL designs, which enables the developer to compile
their designs, perform timing analysis, examine RTL diagrams,
simulate a design's reaction to different stimuli, and configure the
target device with the programmer.

Quartus II Web Edition

The Web Edition is a free version of Quartus II that can be


downloaded or delivered by mail for free. This edition provided
compilation and programming for a limited number of Altera devices.
The low-cost Cyclone family of FPGAs is fully supported by this
edition, as well as the MAX family of CPLDs, meaning small

34
developers and educational institutions have no overheads from the
cost of development software.

License registration is required to use the Web Edition of Quartus II,


which is free and can be renewed an unlimited number of times.

Quartus II Subscription Edition

Quartus II Subscription Edition is also available for free download,


but a license must be paid for to use the full functionality in the
software. The free Web Edition license can be used on this software,
restricting the devices that can be used.

35
CHAPTER-5
IMPLEMENTATION OF ALGORITHM
5.1 OVERVIEW:
A primary objective of this project was to develop a synthesizable
model for the AES128 encryption algorithm. Synthesis is the process of
converting the register transfer level (RTL) representation of a design into
an optimized gate-level netlist. This is a major step in ASIC design flow
that takes an RTL model closer to a low-level hardware implementation.

Synthesis consists of three main steps. The first step is the


Translation, which involves converting the RTL description of a
design into a non-optimized intermediate representation that is used by
the synthesis tool. The second step is the logic optimization, which
optimizes the internal representation by removing redundant logic and
performing Boolean logic optimizations. The third step is called
technology mapping & optimization which maps the internal
representation to an optimized gate level representation using the
technology library cells based on design constraints.[3]

In this chapter, we describe how the Synopsys Design_Compiler tool


was utilized to synthesize the verified AES128 model, by using a script
that was developed to perform the synthesis based on certain constraints.
The script generates several reports about the synthesis outcome
including timing and area estimates.

5.2 SYNTHESIS METHODOLOGY:


The first step in the synthesis process is to read all the components in
the design hierarchy. There are three components in the 3-level design
hierarchy that needs to be synthesized. Since the RTL model utilizes a

36
Verilog Package, then the synthesis tool needs to enable the semantics
of a package. In addition, the synthesis tool needs to know if there are
multiple instances of calling an automatic function in the design, to
preserve separate values for each instance.

After reading the design files, they are Analyzed and


Elaborated through which the RTL code is converted into the
Synopsys Design Compiler(SDC) internal format. [6] The intermediate
results are stored in the defined working library.

After this step, a 40MHz clock signal is applied to the clock port of
the root module, and the synthesis tool is programmed not to modify the
clock tree during the optimization phase. In addition, an arbitrary input
delay of 5ns with respect to the clock port is applied to all input and
output ports (except the clock port itself) to set a safe margin by
considering any unintended source of delay such as the delay associated
with driving module/modules.

Then, the design is constrained with hypothetical maximum area


equal to zero to force the tool to make the gate level netlist as compact as
possible.

In the next steps, the tool is programmed to consider a unique design


for each cell instance by removing the multiply-instantiated hierarchy in
the current design. Then, the synthesis script removes the boundaries
from all the components in the design hierarchy and removes all levels of
hierarchy.

37
Finally, the tool compiles the design with high effort and reports any
warning related the mapping and final optimization step. At the end, the
tool generates reports for the optimized gate level netlist area, the worst
combinational path timing, and any violated design constraint.

5.3 SYNTHESIS TIMING RESULT:


The synthesis tool optimizes the combinational paths in a design. In
General, four types of combinational paths can exist in any design: [3]
1- Input port of the design under test to input of one internal flip-
flip
2- Output of an internal flip-flip to input of another flip-flip
3- Output of an internal flip-flip to output port of the design under
test
4- A combinational path connecting the input and output ports of
the design under test

The last DC command in the script developed in previous section,


instructs the tool to report the path with the worst timing. In this case, the
path with the worst timing is a combinational path of type two. The delay
associated with this path is the summation of delays of all combinational
gates in the path plus the Clock-To-Q delay of the originating flip-flop,
which was calculated as 24.09ns. By considering the setup time of the
destination flip-flop in this path, which is 0.85ns, the 40MHz clock signal
satisfies the worst combinational path delay. The delays of combinational
gates, setup time of flip-flops and Clock-To-Q values are derived from the
LSI_10k library file that was used for the mapping step during synthesis.
The synthesis timing report is shown below:

38
5.4 SYNTHESIS AREA RESULT:
The synthesis area report shows the total number of cells and nets in
the netlist. It also uses the area parameter associated with each cell in the
LSI_10K library file, to calculate the total combinational and sequential
area of the netlist. The total area of the gate level netlist is unknown
since it depends on total area of the interconnects, which itself is a
function of the wiring load model used in physical design. The total cell
area in the netlist is reported as 22978 units, which is the sum of
combinational and sequential areas. The synthesis area report is shown
below:

5.5 SYNTHESIS CONSTRAINT VIOLATORS RESULT:


To enforce the synthesis tool to create the most compact netlist, the
area of the gate level netlist was constrained to zero during the synthesis
process. As a result, the only constraint violation, which is expected, is
related to the area as shown bellow:

39
CHAPTER-6
RESULT AND DISCUSSION:
MODEL SIM OUTPUT:

Figure 16.Simulated output.

40
AREA UTILIZATION REPORT:

Figure 17.Flow summary report

41
PERFORMANCE REPORT:

42
Figure 18.Fmax. summary report of slow carner.

PERFORMANCE REPORT:

43
Figure 19.Fmax. summary report of fast carner.

SYNTHESIS REPORT:
44
Figure 20.RTL Schematic report.

45
MAP VIEWER:

Technology map viewer

46
POWER ANALYZES:

47
Fig. Power dissipation report

48
CONCLUSIONS AND FUTURE WORK

In our project we perused the concept of Cryptography including the


various schemes of system based on the kind of key and a few algorithms
such as RSA and AES. We studied in detail the mathematical foundations
for AES based systems, basically the concepts of rings, fields, groups,
Galois finite fields and their properties. The various algorithms for the
computation of the scalar product of a point were studied and their
complexity were analyzed.

The advantage of this over the other Fault detection systems are proved
by parameters .The key strength of this systems in comparison to other is
fault detection is impleted in all levels of algorithm implementation and
this will increase reliability.

FUTURE WORK
The future research is also to design and implement a fault detection
scheme for the AES which is independent of the internal structures of the
S-box (and inverse S-box). For this reason and considering the fact that
the S-box and inverse S-box consist of inversion in GF(28) and ane
(inverse ane) transformation, we denote the input and the output of the
multiplicative inversion section of S-Box as 8-bit I and I1 , respectively.
The above mentioned scheme can also be expanded to multiple-bit parity
scheme depending on the desirable fault coverage needed and the critical
path delay and area overhead that can be tolerated, i.e., instead of one bit
parity, one can use n bit parity. As n increases, the timing and area
overhead increase while better fault coverage is achieved.

49
Finally, considering the power consumption as one of the critical
factors in FPGA or ASIC designs, the mentioned fault detection schemes
can be designed and optimized for low power implementations.

50
REFERANCES:
[1] M. Akkar and C. Giraud, An Implementation of DES and AES,
Secure against Some Attacks, In Proc. of the Workshop on
Cryptographic Hardware and Embedded Systems (CHES2001), Paris,
France, pp. 315-325, May 2001.
[2] http://www.altera.com/products/software/products/quartus2/qts-
index.html
[3] R. Anderson, E. Biham, and L. Knudsen, Serpent: A Proposal for the
Advanced Encryption Standard, AES algorithm submission, June 1998.
[4] G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri, Error
Analysis and Detection Procedures for a Hardware Implementation of the
Advanced Encryption Standard, IEEE Trans. on Computers, vol. 52, no.
4, pp. 492-505, April 2003.
[5] G. Bertoni, L. Breveglieri, I. Koren, and P. Maistri, An ecient
hardwarebased fault diagnosis scheme for AES: performances and cost,
In Proc. of the IEEE International Symposium on Defect and Fault
Tolerance in VLSI Systems (DFT2004), Cannes, France, pp. 130-138,
Oct. 2004.
[6] D. Boneh, R. A. DeMillo, and R. J. Lipton, On the Importance of
Eliminating Errors in Cryptographic Computations, Journal of
Cryptology, vol. 14, no. 2, pp. 101-119, 2001.
[7] L. Breveglieri, I. Koren, and P. Maistri, Incorporating Error
Detection and Online Reconguration into a Regular Architecture for the
Advanced Encryption Standard, In Proc. of the IEEE International
Symposium on Defect and Fault Tolerance in VLSI Systems (DFT2005),
Monterey, CA, USA, pp. 72-80, Oct. 2005.
[8] C. Burwick et al., MARS-A Candidate Cipher for AES, AES
algorithm submission, August 1999, available at http://www.nist.gov/.

51
[9] D. Canright, A Very Compact Rijndael S-box, Naval Postgraduate
School Technical Report: NPS-MA-05-001, May 2005.
[10] G. C. Cardarilli, M. Ottavi, S. Pontarelli, M. Re, and A. Salsano,
Fault localization, error correction, and graceful degradation in radix 2
signed digit-based adders, IEEE Trans. on Computers, vol. 55, no. 5, pp.
534-540, May 2006.
[11] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, A self
checking Reed Solomon encoder: design and analysis, In Proc. of the
IEEE International Symposium on Defect and Fault Tolerance in VLSI
Systems (DFT2005), Monterey, CA, USA, pp. 111-119, Oct. 2005.
[12] A. Elbirt, W. Yip, B. Chetwynd, and C. Paar, An FPGA-based
performance evaluation of the AES block cipher candidate algorithm
nalists, IEEE Trans. of VLSI Systems, pp. 545-557, August 2001.
[13] S. Fenn, M. Gossel, M. Benaissa, and D. Taylor, On-Line Error
Detection for Bit-Serial Multipliers in GF(2^m ), Journal of Electronic
Testing: Theory and Applications, vol. 13, no. 1, August 1998.
[14] A. Hodjat and I. Verbauwhede, Area-Throughput Trade-Os for
Fully Pipelined30 to 70 Gbits/s AES Processors, IEEE Trans. on
Computers, vol. 55, no. 4,pp. 366-372, April 2006.
[15] T. Ichikawa et al, Hardware Evaluation of the AES Finalists, In
Proc. 3th AES Candidate Conference, New York, April 2000.
[16] R. Karri, W. Kaijie, P. Mishra, and K. Yongkook, Fault-based Side-
Channel Cryptanalysis Tolerant Rijndael Symmetric Block Cipher
Architecture, In Proc.of the IEEE International Symposium on Defect
and Fault Tolerance in VLSI Systems (DFT2001), San Francisco, CA,
USA, pp. 418-426, 2001.
[17] M. Karpovsky, K. J. Kulikowski, and A. Taubin, Dierential Fault
Analysis Attack Resistant Architectures for the Advanced Encryption
Standard, CARDIS 04: Sixth smart Card Research and Advanced

52
Application IFIP Conference,Toulouse, France, vol. 153, pp. 177-192,
August 2004.
[18] R. Lidl and H. Niederreiter, Introduction to Finite Fields and Their
Applications, Cambridge University Press, 1994.
[19] S. Lin and D. J. Costello, Error Control Coding, Prentice Hall,
second edition, Upper Saddle River, NJ, USA, 2004.
[20] T. G. Malkin, F. Standaert, and M. Yung, A Comparative
Cost/Security Analysis of Fault Attack Countermeasures, In Proc. of the
Workshop on Fault Diagnosis and Tolerance in Cryptography
(FDTC2006), Yokohama, Japan, pp. 159-172,
Oct. 2006.

53

You might also like