Professional Documents
Culture Documents
INTRODUCTION
1.1 Introduction
Security is a broad topic and covers a multitude of sins. Most security problems are intentionally
caused by malicious people trying to gain some benefit or harm someone. The requirement of information
security has undergone two major changes in last two decades. In earlier days cabinets with a combination
lock for storing sensitive documents were used. With introduction of computer, the need for automated tools
for protecting files and other information became evident. This is very important in case of shared systems as
well as for data network or internet. The generic term for the collection of the tools designed to protect data
and thwart hackers is Computer Security
In this digital world, with the increment of Internet in human life every step like Banking, payment,
financial transaction etc. The importance of network security is also increasing. Security forms the backbone
of todays digital world.
1.1.1
Aim
To implement a n area efficient universal cryptography processor for smart cards.
2. Three versions of the algorithm are available differing only in the key generation procedure and in the
number of rounds the data is processed for a complete encryption (decryption).
3. The 128-b input data is considered as a 4X4 array of 8-b bytes (also called state in the algorithm).
1.2
Objectives
The objective of this project is to find concurrent structure independent fault detection schemes for
reaching reasonable fault coverage. It makes robust implementation of AES against these above attacks and
provides highest efficiencies, showing reasonable area and time complexity overheads.
CHAPTER 2
GENERAL THEORY
2.1 Introduction to VLSI
Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining
thousands
of transistors into
single
chip.
VLSI
began
in
the
1970s
when
complex semiconductor and communication technologies were being developed. The microprocessor is a
VLSI device. Before the introduction of VLSI technology most ICs had a limited set of functions they could
perform. An electronic circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI lets IC
designers add all of these into one chip.
Overview:
The first semiconductor chip held one transistor each. Subsequent advances added more and more
transistors, and as a consequence, more individual functions or systems were integrated over time. The first
integrated circuits held only a few devices, perhaps as many as ten diodes , transistors, resistors and
capacitors, making it possible to fabricate one or more logic gates on a single device. Now known
retrospectively as "small-scale integration"(SSI), improvements in technique led to devices with hundreds of
logic gates, known as large-scale integration(LSI),i.e., system with at least a thousand logic gates. Current
technology has moved far past this mark and today's microprocessor have many millions of gates and
hundreds of millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale integration above
VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge number of gates and
transistors available on common devices has rendered such fine distinctions moot. Term suggesting greater
than VLSI levels of integration are no longer in widespread use. Even VLSI is now somewhat quaint, given
the common assumption that all microprocessors are VLSI or better .
As of early 2008 , billion-transistor processors are commercially available, an example of which is
Intel's Montecito Itanium chip. This is expected to become more common place as semiconductor
fabrication moves from the current generation of 65 nm processor to the next 45 nm generations(while
experiencing new challenges such as increased variation across process corner). Another notable example is
NVIDIA's 280 series GPU.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
This microprocessor is unique in the fact that its 1.4 Billion transistor count capable of a teraflop of
performance, is almost entirely dedicated to logic(Itanium's transistor count is largely due to the 24MB L3
cache).Current design, as opposed to the earliest devices, use extensive design automation logic synthesis to
lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain highperformance logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest
efficiency(sometimes by bending or breaking established design rules to obtain the last bit of performance
by trading stability).
What is VLSI?
VLSI stands for Very Large Scale Integration". This is the field which involves packing more and
more logic devices into smaller and smaller areas.
1.
4. Low-cost terminals for Web browsing still require sophisticated electronics, despite their dedicated
function.
5. Personal computers and workstations provide word-processing, financial analysis, and games.
Computers include both central processing units (CPUs) and special-purpose hardware for disk access,
faster screen display, etc.
6. Medical electronic system measure bodily functions and perform complex processing algorithms to
warn about unusual conditions. The availability of these complex systems, far from overwhelming
consumers, only creates demand for even more complex systems.
The growing sophisticated of application continually pushes the design and manufacturing of
integrated circuits and electronic systems to new levels of complexity.
And perhaps the most amazing characteristic of this collection of systems is its variety-as systems
become more complex, we build not a few general purpose computers but an ever wider range of specialpurpose systems. our ability to do so is a testament to our growing mastery of both integrated circuit
manufacturing and design, but the increasing demands of customers continue to test the limits of design and
manufacturing
With introduction of computer, the need for automated tools for protecting files and other information
became evident. This is very important in case of shared systems as well as for data network or internet. The
generic term for the collection of the tools designed to protect data and thwart hackers is Computer Security.
Development of Cryptography
The history of cryptography begins thousands of years ago. Until recent decades, it has been the story
of what might be called classic cryptography that is, of methods of encryption that use pen and paper, or
perhaps simple mechanical aids. In the early 20th century, the invention of complex mechanical and
electromechanical machines, such as the Enigma rotor machine, provided more sophisticated and efficient
means of encryption; and the subsequent introduction of electronics and computing has allowed elaborate
schemes of still greater complexity, most of which are entirely unsuited to pen and paper.
The development of cryptography has been paralleled by the development of cryptanalysis the
"breaking" of codes and ciphers. The discovery and application, early on, of frequency analysis to the reading
of encrypted communications has on occasion altered the course of history. Thus the Zimmermann
Telegram triggered the United States' entry into World War I; and Allied reading of Nazi Germany's ciphers
shortened World War II, in some evaluations by as much as two years.
Until the 1970s, secure cryptography was largely the preserve of governments. Two events have since
brought it squarely into the public domain: the creation of a public encryption standard (DES), and the
invention of public-key cryptography.
Need of Cryptography
The main use of cryptography is mentioned below:
1) Private or confidentiality
2) Data integrity
3) Authentication
4) Non-repudiation
1. Confidentiality is a service used to keep the content of information from all but those authorized to
posses it. Secrecy is a term synonymous with confidentiality and privacy. There are numerous
approaches to providing confidentiality, ranging from physical protection to mathematical algorithms
which render data unintelligible.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
2.
Data integrity is a service which addresses the unauthorized alteration of data. To assure data
integrity, one must have the ability to detect data manipulation by unauthorized parties. Data
manipulation includes such things as insertion, deletion, and substitution.
3.
Authentication is a service related to identification. This function applies to both entities and
information itself. Two parties entering into a communication should identify each other. Information
delivered over a channel should be authenticated as to origin, date of origin, data content, time sent,
etc. For these reasons this aspect of cryptography is usually subdivided into two major classes: Entity
authentication and data origin authentication. Data origin authentication implicitly provides data
integrity (for if a message is modified, the source has changed).
4.
Non-repudiation is a service which prevents an entity from denying previous commitments or actions.
When disputes arise due to an entity denying that certain actions were taken, a means to resolve the
situation is necessary. For example, one entity may authorize the purchase of property by another
entity and later deny such authorization was granted. A procedure involving a trusted third party is
needed to resolve the dispute.
A fundamental goal of cryptography is to adequately address these four areas in both theory and
practice. Cryptography is about the prevention and detection of cheating and other malicious activities and
to secure what you have as sensitive information.
10
11
While cryptography is the science of securing data, cryptanalysis is the science of analyzing and
breaking secure communication. Classical cryptanalysis involves an interesting combination of analytical
reasoning, application of mathematical tools, pattern finding, patience, determination, and luck. Cryptanalysts
are also called attackers. Cryptology embraces both cryptography and cryptanalysis.
A related discipline is steganography, which is the science of hiding messages rather than making
them unreadable. Steganography is not cryptography; it is a form of coding. It relies on the secrecy of the
mechanism used to hide the message. If, for example, you encode a secret message by putting each letter as
the first letter of the first word of every sentence, its secret until someone knows to look for it, and then it
provides no security at all.
How Does Cryptography Work?
A cryptographic algorithm, or cipher, is a mathematical function used in the encryption and
decryption process. A cryptographic algorithm works in combination with a keya word, number, or
phraseto encrypt the plaintext. The same plaintext encrypts to different cipher text with different keys. The
security of encrypted data is entirely dependent on two things: the strength of the cryptographic algorithm and
the secrecy of the key. A cryptographic algorithm, plus all possible keys and all the protocols that make it
work, comprise a cryptosystem. PGP is a cryptosystem.
In cryptographic systems, the term key refers to a numerical value used by an algorithm to alter
information, making that information secure and visible only to individuals who have the corresponding key
to recover the information.
2.8.1
Secret key cryptography is also known as symmetric key cryptography. With this type of cryptography,
both the sender and the receiver know the same secret code, called the key. Messages are encrypted by the
sender using the key and decrypted by the receiver using the same key.
12
This method works well if you are communicating with only a limited number of people, but it
becomes impractical to exchange secret keys with large numbers of people. In addition, there is also the
problem of how you communicate the secret key securely.
Block diagram of secret key cryptography.
Plaintext
Encryption
Ciphertext
Decryption
Plaintext
Self-synchronizing stream ciphers calculate each bit in the keystream as a function of the previous n
bits in the keystream.
13
Synchronous stream ciphers generate the keystream in a fashion independent of the message stream
but by using the same keystream generation function at sender and receiver.
Block Ciphers
Block ciphers can operate in one of several modes; the following four are the most important:
DES
ii.
TRIPLE-DES
iii.
BLOWFISH
iv.
IDEA
v.
RC4
vi.
RC5
vii.
TwoFish
Ann wants to communicate secretly with Bill. Ann encrypts her message using Bills public key
(which Bill made available to everyone) and Ann sends the scrambled message to Bill.
14
When Bill receives the message, he uses his private key to unscramble the message so that he can read
it.
When Bill sends a reply to Ann, he scrambles the message using Anns public key.
When Ann receives Bills reply, she uses her private key to unscramble his message.
The major advantage asymmetric encryption offers over symmetric key cryptography is that senders and
receivers do not have to communicate keys up possible using the public keys.
Block diagram of public-key cryptography.
Public-key
Plaintext
Encryption
private-key
Ciphertext
Decryption
Plaintext
15
16
Ciphertext
Encrypted Session
Original
Plaintext
17
Digital Signatures:
1. A major benefit of public key cryptography is that it provides a method for employing digital
signatures. Digital signatures let the recipient of information verify the authenticity of the
informations origin, and also verify that the information was not altered while in transit. Thus, public
key digital signatures provide authentication and data integrity. These features are every bit as
fundamental to cryptography as privacy, if not more.
2. A digital signature serves the same purpose as a seal on a document, or a handwritten signature.
However, because of the way it is created, it is superior to a seal or signature in an important way. A
digital signature not only attests to the identity of the signer, but it also shows that the contents of the
information signed have not been modified. A physical seal or handwritten signature cannot do that.
However, like a physical seal that can be created by anyone with possession of the signet, a digital
signature can be created by anyone with the private key of that signing keypair.
3. Some people tend to use signatures more than they use encryption. For example, you may not care if
anyone knows that you just deposited $1,000 in your account, but you do want to be darn sure it was
the bank teller you were dealing with.
4. The basic manner in which digital signatures are created is shown in the following figure. The
signature algorithm uses your private key to create the signature and the public key to verify it. If the
information can be decrypted with your public key, then it must have originated with you.
Block diagram of Private key and public key
Private Key
Original Text
Signing
Public Key
Signed Text
Verifying
Verified Text
18
The primary advantage of public-key cryptography is increased security and convenience: private keys
never need to transmitted or revealed to anyone. In a secret-key system, by contrast, the secret keys
must be transmitted (either manually or through a communication channel), and there may be a chance
that an enemy can discover the secret keys during their transmission.
ii.
Another major advantage of public-key systems is that they can provide a method for digital
signatures. Authentication via secret-key systems requires the sharing of some secret and sometimes
requires trust of a third party as well. As a result, a sender can repudiate a previously authenticated
message by claiming that the shared secret was somehow compromised by one of the parties sharing
the secret. For example, the Kerberos secret-key authentication system involves a central database that
keeps copies of the secret keys of all users; an attack on the database would allow widespread forgery.
Public-key authentication, on the other hand, prevents this type of repudiation; each user has sole
responsibility for protecting his or her private key. This property of public-key authentication is often
called non-repudiation.
The disadvantages of Public-Key Cryptography Compared with Secret-Key Cryptography are as follow:i.
A disadvantage of using public-key cryptography for encryption is speed: there are popular secret-key
encryption methods that are significantly faster than any currently available public-key encryption
method. Nevertheless, public-key cryptography can be used with secret-key cryptography to get the
best of both worlds. For encryption, the best solution is to combine public- and secret-key systems in
order to get both the security advantages of public-key systems and the speed advantages of secret-key
systems. The public-key system can be used to encrypt a secret key which is used to encrypt the bulk
of a file or message. Such a protocol is called a digital envelope.
ii.
are not available. A successful attack on a certification authority will allow an adversary to
impersonate whomever the adversary chooses to by using a public-key certificate from the
compromised authority to bind a key of the adversary's choice to the name of another user.
iii.
In some situations, public-key cryptography is not necessary and secret-key cryptography alone is
sufficient. This includes environments where secure secret-key agreement can take place, for example
by users meeting in private. It also includes environments where a single authority knows and
manages all the keys, e.g., a closed banking system. Since the authority knows everyone's keys
already, there is not much advantage for some to be "public" and others "private." Also, public-key
cryptography is usually not necessary in a single-user environment. For example, if you want to keep
19
your personal files encrypted, you can do so with any secret-key encryption algorithm using, say, your
personal password as the secret key. In general, public-key cryptography is best suited for an open
multi-user environment.
iv.
Public-key cryptography is not meant to replace secret-key cryptography, but rather to supplement it,
to make it more secure. The first use of public-key techniques was for secure key exchange in an
otherwise secret-key system; this is still one of its primary functions. Secret-key cryptography remains
extremely important and is the subject of much ongoing study and research. Some secret-key
cryptosystems are discussed in the sections on block ciphers and stream ciphers.
20
So if XYZ is a new client for ABC, ABC must send XYZ a copy of the secret key so that XYZ
can then encrypt its payroll information and transmit it to ABC. ABC, using the same key, decrypts
XYZs information and processes the payroll data. Since a system is only as strong as its weakest link,
key security during transmission becomes as important for XYZ as encrypting the data.
3. As mentioned earlier, public key cryptography lends itself to a new technology called digital
signatures. Digital signatures involve a reversing of the normal public/private encryption/decryption
process. Here is an example that demonstrates its use. Suppose Mary wants to send the ABC company
a request for a special document. Before the ABC company can send that document, they must be
assured that the requestor is actually Mary.
A digital signature can verify Marys validity to ABC in the following way. Mary first encrypts
her name using her private key. She then encrypts the request along with the encrypted name using the
ABC companys well-known public key. When the ABC company receives the message, it decrypts
the request using its private key and then decrypts the signature using Marys well-publicized public
key. If the name decrypts successfully, then it must be Marys signature since she is the only one who
could have encrypted it with her secret private key. The request can be safely processed.
4. Digital signatures are gaining popularity in many Internet transactions involving signature verification
such as contracts and other legal negotiations as well as court documents. Recent enhancements to
digital signatures include digital time stamps. Digital timestamps apply a when criteria to a digital
signature by attaching a widely publicized summary number to the signature.
That summary number is only produced at some given point in time, essentially linking that
signature to a certain date/time. Its an especially effective technology since it doesnt rely on the
security of keys
5. As mentioned earlier that for large documents, use of public key cryptography is prohibitive because
transmission speeds are so slow. By using something called a digital envelope, the best of both
symmetric (transmission speed) and public key (security) cryptography can be used. Here is an
example of how a digital envelope works. Mary wants to send a very large document to her main
office overseas. Because of its sensitivity, Mary believes it should be sent using public key
cryptography but knows she cant because its too large. She decides to use a digital envelope.
6. Mary first creates a special session key and uses this key to symmetrically encrypt her document. That
is, she uses a symmetric cryptographic algorithm. She then encrypts the session key with her
organizations public key. So now the document is encrypted using symmetric cryptography and the
key that encrypted it is encrypted using public key cryptography. The encrypted key is called the
digital envelope. She then transmits both the key and the document to the main office.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
21
CHAPTER 3
HARDWARE DESCRIPTION
3.1 Advanced encryption Standards
The Advanced Encryption Standard (AES), also referenced as Rijndael (its original name), is a
specification for the encryption of electronic data established by the U.S. National Institute of Standards and
Technology (NIST) in 2001.
AES
is
based
on
the
Rijndael
cipher developed
by
Daemen and Vincent Rijmen, who submitted a proposal to NIST during the AES selection process. Rijndael
is a family of ciphers with different key and block sizes.
For AES, NIST selected three members of the Rijndael family, each with a block size of 128 bits, but
three different key lengths: 128, 192 and 256 bits.
AES has been adopted by the U.S. government and is now used worldwide. It supersedes the Data
Encryption Standard (DES), which was published in 1977. The algorithm described by AES is a symmetrickey algorithm, meaning the same key is used for both encrypting and decrypting the data.
In the United States, AES was announced by the NIST as U.S. FIPS PUB 197 (FIPS 197) on November
26, 2001. This announcement followed a five-year standardization process in which fifteen competing designs
were presented and evaluated, before the Rijndael cipher was selected as the most suitable (seeAdvanced
Encryption Standard process for more details).
AES became effective as a federal government standard on May 26, 2002 after approval by
the Secretary of Commerce. AES is included in the ISO/IEC 18033-3 standard. AES is available in many
different encryption packages, and is the first publicly accessible and open cipher approved by the National
Security Agency (NSA) for top secret information when used in an NSA approved cryptographic module
The name Rijndael (Dutch pronunciation: [rindal]) is a play on the names of the two inventors (Joan
Daemen and Vincent Rijmen). It is also a combination of the Dutch name for the Rhine river and a Dale.
22
ii.
AES allows for three different key lengths: 128, 192, or 256 bits. Most of our discussion will assume
that the key length is 128 bits. [With regard to using a key length other than 128 bits, the main thing
that changes in AES is how you generate the key schedule from the key an issue I address at the
end . The notion of key schedule in AES is explained].
Block diagram of Advanced Encryption Standards.
23
iii.
Encryption consists of 10 rounds of processing for 128-bit keys, 12 rounds for 192-bit keys, and 14
rounds for 256-bit keys.
iv.
Except for the last round in each case, all other rounds are identical.
v.
Each round of processing includes one single-byte based substitution step, a row-wise permutation
step, a column-wise mixing step, and the addition of the round key. The order in which these four
steps are executed is different for encryption and decryption.
vi.
To appreciate the processing steps used in a single round, it is best to think of a 128-bit block as
consisting of a 4 4 matrix of bytes, arranged as follows.
vii.
Therefore, the first four bytes of a 128-bit input block occupy the first column in the 4 4 matrix of
bytes. The next four bytes occupy the second column, and so on.
viii.
ix.
x.
Each round of processing works on the input state array and produces an output state array.
xi.
The output state array produced by the last round is rearranged into a 128-bit output block.
xii.
Unlike DES, the decryption algorithm differs substantially from the encryption algorithm. Although,
overall, the same steps are used in encryption and decryption, the order in which the steps are carried
out is different, as mentioned previously.
xiii.
AES, notified by NIST as a standard in 2001, is a slight variation of the Rijndael cipher invented by
two Belgian cryptographers Joan Daemen and Vincent Rijmen.
xiv.
Whereas AES requires the block size to be 128 bits, the original Rijndael cipher works with any block
size (and any key size) that is a multiple of 32 as long as it exceeds 128. The state array for the
different block sizes still has only four rows in the Rijndael cipher. However, the number of columns
24
depends on size of the block. For example, when the block size is 192, the Rijndael cipher requires a
state array to consist of 4 rows and 6 columns.
xv.
As explained in Lecture 3, DES was based on the Feistel network. On the other hand, what AES uses
is a substitution permutation network in a more general sense. Each round of processing in AES
involves byte-level substitutions followed by word-level permutations. Speaking generally, DES also
involves substitutions and permutations, except that the permutations are based on the Feistel notion of
dividing the input block into two halves, processing each half separately, and then swapping the two
halves.
xvi.
The nature of substitutions and permutations in AES allows for a fast software implementation of the
algorithm.
a 128-bit key, the key is also arranged in the form of a matrix of 4 4 bytes. As with the input Assuming
block, the first word from the key fills the first column of the matrix, and so on.
ii.
The four column words of the key matrix are expanded into a schedule of 44 words. (As to how exactly
this is done, we will explain that later in Section 8.8.) Each round consumes four words from the key
schedule.
iii.
The figure below depicts the arrangement of the encryption key in the form of 4-byte words and the
expansion of the key into a key schedule consisting of 44 4-byte words.
Block diagram shows the four words of the original 128-bit key being expanded into a key schedule
consisting of 44 words
Fig 3.2 The four words of the original 128-bit key being expanded into a key schedule consisting of 44
words
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
25
ii.
The number of rounds shown in Figure 2, 10, is for the case when the encryption key is 128 bit long.
iii.
Before any round-based processing for encryption can begin, the input state array is XORed with the
rst four words of the key schedule. The same thing happens during decryption except that now we
XOR the ciphertext state array with the last four words of the key schedule.
iv.
For encryption, each round consists of the following four steps: 1) Substitute bytes, 2) Shift rows, 3)
Mix columns, and 4) Add round key. The last step consists of XORing the output of the previous three
steps with four words from the key schedule.
v.
For decryption, each round consists of the following four steps: 1) Inverse shift rows, 2) Inverse
substitute bytes, 3) Add round key, and 4) Inverse mix columns. The third step consists of XORing the
output of the previous two steps with four words from the key schedule. Note the dierences between
the order in which substitution and shifting operations are carried out in a decryption round vis-a-vis
the order in which similar operations are carried out in an encryption round.
vi.
The last round for encryption does not involve the Mix columns step. The last round for decryption
does not involve the Inverse mix columns step.
Block diagram of overall structure of AES
Fig 3.3 The overall structure of AES for the case of 128-bit encryption key
26
27
3.3
Individual blocks
Fig 3.4 One round of encryption is shown at left and one round of decryption at right
3.3.1
using an
8-bit substitution box, the Rijndael S-box. This operation provides the non-linearity in the cipher. The S-box
used is derived from the multiplicative inverse over GF(28), known to have good non-linearity properties. To
avoid attacks based on simple algebraic properties, the S-box is constructed by combining the inverse function
with an invertible affine transformation. The S-box is also chosen to avoid any fixed points (and so is
a derangement), i.e.,
,
And also any opposite fixed points, i.e.
While performing the decryption, Inverse SubBytes step is used, which requires first taking the affine
transformation and then finding the multiplicative inverse? In the SubBytes step, each byte in the state is
replaced with its entry in a fixed 8-bit lookup table, S; bij = S(aij)
28
29
Matrix multiplication is composed of multiplication and addition of the entries, and here the
multiplication operation can be defined as this: multiplication by 1 means no change, multiplication by 2
means shifting to the left, and multiplication by 3 means shifting to the left and then performing XOR with the
initial unshifted value. After shifting, a conditional XOR with 0x1B should be performed if the shifted value
is larger than 0xFF. (These are special cases of the usual multiplication in GF(28).) Addition is simply XOR.
In more general sense, each column is treated as a polynomial over GF(28) and is then multiplied
modulo x4+1 with a fixed polynomial c(x) = 0x03 x3 + x2 + x + 0x02. The coefficients are displayed in
their hexadecimal equivalent
of
the
binary
representation
of
bit
polynomials
from GF(2)[x].
The MixColumns step can also be viewed as a multiplication by the shown particular MDS matrix in the finite
field GF(28). This process is described further in the article Rijndael mix columns.
In the mixcolumns step, each column of the state is multiplied with a fixed polynomial c(x).
30
3.3.4
from the main key using Rijndael's key schedule; each subkey is the same size as the state. The subkey is
added by combining each byte of the state with the corresponding byte of the subkey using bitwise XOR.
In the AddRoundKey step, each byte of the state is combined with a byte of the round subkey using
the XOR operation ().
byte-oriented
approach,
it
is
possible
to
combine
31
Each round has its own round key that is derived from the original 128-bit encryption key in the
manner described in this section. One of the four steps of each round, for both encryption and
decryption, involves XORing of the round key with the state array.
ii.
The AES Key Expansion algorithm is used to derive the 128bit round key for each round from the
original 128-bit encryption key. As youll see, the logic of the key expansion algorithm is desiged to
ensure that if you change one bit of the encryption key, it should aect the round keys for several
rounds.
iii.
In the same manner as the 128-bit input block is arranged in the form of a state array, the algorithm
rst arranges the 16 bytes of the encryption key in the form of a 44 array of bytes.
iv.
The rst four bytes of the encryption key constitute the word w0, the next four bytes the word w1, and
so on.
v.
The algorithm subsequently expands the words [w0,w1,w2,w3] into a 44-word key schedule that can
be labeled w0, w1, w2, w3,................., w43
vi.
Of these, the words [w0,w1,w2,w3] are bitwise XORed with the input block before the round-based
processing begins.
vii.
The remaining 40 words of the key schedule are used four words at a time in each of the 10 rounds.
viii.
The above two statements are also true for decryption, except for the fact that we now reverse the
order of the words in the key schedule. The last four words of the key schedule are bitwise XORed
with the 128-bit ciphertext block before any round-based processing begins. Subsequently, each of the
four words in the remaining 40 words of the key schedule are used in each of the ten rounds of
processing.
32
ix.
Now comes the dicult part: How does the Key Expansion Algorithm expand four words
w0,w1,w2,w3 into the 44 words w0,w1,w2,w3,w4,w5,........,w43
x.
The key expansion algorithm will be explained in the next subsection with the help of Figure 3.8. As
shown in the gure, the key expansion takes place on a four-word to four-word basis, in the sense that
each grouping of four words decides what the next grouping of four words will be.
Fig 3.9 The key expansion takes place on a four-word to four-word basis.
33
3. We next replace the value in each cell by its multiplicative inverse in GF(28) based on the
irreducible polynomial x8+x4+x3+x+1
4. The hex value 0x00 is replaced by itself since this element has no multiplicative inverse.
5. After the above step, lets represent a byte stored in a cell of the table by b7b6b5b4b3b2b1b0
where b7 is the MSB and b0 the LSB. For example, the byte stored in the cell (9, 5) of the
above table is the multiplicative inverse (MI) of 0x95, which is 0x8A. Therefore, at this point,
the bit pattern stored in the cell with row index 9 and column index 5 is 10001010, implying
that b7 is 1 and b0 is 0. [Verify the fact that the MI of 0x95 is indeed 0x8A. The polynomial
representation of 0x95 (bit pattern: 10010101) is x7 +x4 +x2 +1, and the same for 0x8A (bit
pattern: 10001010) is x7 + x3 + x. Now show that the product of these two polynomials
modulo the polynomial x8 + x4 + x3 + x + 1 is indeed 1.]For bit scrambling, we next apply the
following transformation to each bit bi of the byte stored in a cell of the lookup table:
b i = bib(i+4) mod 8b(i+5) mod 8b(i+6) mod 8b(i+7) mod 8ci
where ci is the ith bit of a specially designated byte c whose hex value is 0x63.
( c7c6c5c4c3c2c1c0 01100011 )
6. The above bit-scrambling step is better visualized as the following vector-matrix operation.
Note that all of the additions in the product of the matrix and the vector are actually XOR
operations. [Because of the [A]~x +~ b appearance of this transformation, it is commonly
referred to as the ane transformation.
7. The very important role played by the c byte of value 0x63: Consider the following two
conditions on the SubBytes step: (1) In order for the byte substitution step to be invertible, the
byte-to-byte mapping given to us by the 16 16 table must be one-one.
34
That is, for each input byte, there must be a unique output byte. And, to each output
byte there must correspond only one input byte. (2) No input byte should map to itself, since a
byte mapping to itself would weaken the cipher.
8. Taking multiplicative inverses in the construction of the table does give us unique entries in the
table for each input byte except for the input byte 0x00 since there is no MI dened for the allzeros byte. What is interesting is that if it were not for the c byte, the bit scrambling step would
also leave the input byte 0x00 unchanged.With the ane mapping shown above, the 0x00
input byte is mapped to 0x63. At the same time, it preserves the one-one mapping for all other
bytes.
9. In addition to ensuring that every input byte is mapped to a dierent and unique output byte,
the bit-scrambling step also breaks the correlation between the bits before the substitution and
the bits after the substitution.
10. The 16 16 table created in this manner is called the S-Box. The S-Box is the same for all the
bytes in the state array.
11. The steps that go into constructing the 16 16 lookup table are reversed for the decryption
table, meaning that you rst apply the reverse of the bit-scrambling operation to each byte, as
explained in the next step, and then you take its multiplicative inverse in GF(28).
12. For bit scrambling for decryption, you carry out the following bit-level transformation in each
cell of the table:
where di is the ith bit of a specially designated byte d whose hex value is 0x05.
( d7d6d5d4d3d2d1ddc0 = 00000101 ) Finally, you replace the byte in the cell by its
multiplicative inverse in GF(28).
13. The bytes c and d are chosen so that the S-box has no xed points. That is, we do not want S
box(a) = a for any a. Neither do we want S box(a) = a ,where a is the bit wise complement of
a.
35
The ShiftRows transformation consists of (i) not shifting the rst row of the state array at
all; (ii) circularly shifting the second row by one byte to the left; (iii) circularly shifting the
third row by two bytes to the left; and (iv) circularly shifting the last row by three bytes to
the left.
ii.
iii.
Recall again that the input block is written column-wise. That is the first four bytes of the
input block fill the first column of 22 Computer and Network Security by Avi Kak Lecture
8 the state array, the next four bytes the second column, etc. As a result, shifting the rows
in the manner indicated scrambles up the byte order of the input block.
iv.
For decryption, the corresponding step shifts the rows in exactly the opposite fashion. The
rst row is left unchanged, the second row is shifted to the right by one byte, the third row
to the right by two bytes, and the last row to the right by three bytes, all shifts being
circular.
36
More precisely, each byte in a column is replaced by two times that byte, plus three times
the the next byte, plus the byte that comes next, plus the byte that follows. [As you know
from Lecture 7, additions in GF(28) mean the same thing as XOR. So plus implies
XOR.] The words next and follow refer to bytes in the same column, and their meaning
is circular, in the sense that the byte that is next to the one in the last row is the one in the
rst row. [By two times and three times, we mean multiplications in GF(28) by the bit
patterns 000000010 and 00000011, respectively.]
ii.
For the bytes in the rst row of the state array, this operation can be stated as
iii.
For the bytes in the second row of the state array, this operation can be stated as
iv.
For the bytes in the third row of the state array, this operation can be stated as
v.
And, for the bytes in the fourth row of the state array, this operation can be stated as
37
vi.
where, on the left hand side, when a row of the leftmost matrix multiples a column of the state
array matrix, additions involved are meant to be XOR operations.
vii.
38
CHAPTER 4
SOFTWARE DESCRIPTION
4.1 Introduction to Xilinx
The Xilinx ISE is a design environment for FPGA products from Xilinx, and is tightly-coupled to the
architecture of such chips, and cannot be used with FPGA products from other vendors.[2] The Xilinx ISE is
primarily used for circuit synthesis and design, while the ModelSim logic simulator is used for system-level
testing. Other components shipped with the Xilinx ISE include the Embedded Development Kit (EDK), a
Software Development Kit (SDK) and ChipScope Pro.
The main challenging areas in VLSI are performance, cost, testing, area, reliability and power. The
demand for portable computing devices and communication system are increasing rapidly. These applications
require low power dissipation for VLSI circuits [1]. The ability to design, fabricate and test Application
Specific Integrated Circuits (ASICs) as well as FPGAs with gate count of the order of a few tens of millions
has led to the development of complex embedded SOC. Hardware components in a SOC may include one or
more processors,
memories
and
dedicated
components
interfaces to various peripherals. One of the approaches for SOC design is the platform based approach. For
example, the platform FPGAs such as Xilinx Virtex II Pro and Altera Excalibur include custom designed
fixed programmable processor cores together with millions of gates of reconfigurable logic devices.
In addition to this, the development of Intellectual Property (IP) cores for the FPGAs for a variety of
standard functions including processors, enables a multimillion gate FPGA to be configured to contain all the
components of a platform based FPGA. Development tools such as the Altera System-On-Programmable
Chip (SOPC) builder enable the integration of IP cores and the user designed custom blocks with the Nios II
soft-core processor. Soft-core processors are far more flexible than the hard-core processors and they can be
enhanced with custom hardware to optimize them for specific application. Power dissipation is a challenging
problem for todays System-on-Chips (SOCs) design and test.
Evolution of Computer-Aided Digital Design
Digital circuit design has evolved rapidly over the last 25 years. The earliest digital circuits were
designed with vacuum tubes and transistors. Integrated circuits were then invented where logic gates were
placed on a single chip. The first integrated circuit (IC) chips were SSI (Small Scale Integration) chips where
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
39
the gate count was very small. As technologies became sophisticated, designers were able to place circuits
with hundreds of gates on a chip. These chips were called MSI (Medium Scale Integration) chips. With the
advent of LSI (Large Scale Integration), designers could put thousands of gates on a single chip. At this point,
design processes started getting very complicated, and designers felt the need to automate these
processes. Electronic Design Automation (EDA), techniques began to evolve. Chip designers began to use
circuit and logic simulation techniques to verify the functionality of building blocks of the order of about 100
transistors. The circuits were still tested on the breadboard, and the layout was done on paper or by hand on a
graphic computer terminal.
The earlier edition of the book used the term CAD tools. Technically, the term Computer-Aided
Design (CAD) tools refers to back-end tools that perform functions related to place and route, and layout of
the chip . The term Computer-Aided Engineering (CAE) tools refers to tools that is used for front-end
processes such HDL simulation, logic synthesis, and timing analysis. Designers used the terms CAD and CAE
interchangeably. Today, the term Electronic Design Automation is used for both CAD and CAE. For the sake
of simplicity, in this book, we will refer to all design tools as EDA tools.
With the advent of VLSI (Very Large Scale Integration) technology, designers could design single chips with
more than 100,000 transistors. Because of the complexity of these circuits, it was not possible to verify these
circuits on a breadboard. Computer-aided techniques became critical for verification and design of VLSI
digital circuits. Computer programs to do automatic placement and routing of circuit layouts also became
popular. The designers were now building gate-level digital circuits manually on graphic terminals. They
would build small building blocks and then derive higher-level blocks from them. This process would
continue until they had built the top-level block. Logic simulators came into existence to verify the
functionality of these circuits before they were fabricated on chip.
As designs got larger and more complex, logic simulation assumed an important role in the design process.
Designers could iron out functional bugs in the architecture before the chip was designed further.
Emergence of HDLs
For a long time, programming languages such as FORTRAN, Pascal, and C were being used to
describe computer programs that were sequential in nature. Similarly, in the digital design field, designers felt
the need for a standard language to describe digital circuits. Thus, Hardware Description Languages (HDLs)
came into existence. HDLs allowed the designers to model the concurrency of processes found in hardware
elements. Hardware description languages such as Verilog HDL and VHDL became popular. Verilog HDL
originated in 1983 at Gateway Design Automation. Later, VHDL was developed under contract from
DARPA. Both Verilog and VHDL simulators to simulate large digital circuits quickly gained acceptance from
designers.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
40
Even though HDLs were popular for logic verification, designers had to manually translate the HDLbased design into a schematic circuit with interconnections between gates. The advent of logic synthesis in the
late 1980s changed the design methodology radically. Digital circuits could be described at a register transfer
level (RTL) by use of an HDL. Thus, the designer had to specify how the data flows between registers and
how the design processes the data. The details of gates and their interconnections to implement the circuit
were automatically extracted by logic synthesis tools from the RTL description.
Thus, logic synthesis pushed the HDLs into the forefront of digital design. Designers no longer had to
manually place gates to build digital circuits. They could describe complex circuits at an abstract level in
terms of functionality and data flow by designing those circuits in HDLs. Logic synthesis tools would
implement the specified functionality in terms of gates and gate interconnections.
HDLs also began to be used for system-level design. HDLs were used for simulation of system boards,
interconnect buses, FPGAs (Field Programmable Gate Arrays), and PALs (Programmable Array Logic). A
common approach is to design each IC chip, using an HDL, and then verify system functionality via
simulation.
Today, Verilog HDL is an accepted IEEE standard. In 1995, the original standard IEEE 1364-1995
was approved. IEEE 1364-2001 is the latest Verilog HDL standard that made significant improvements to the
original standard.
41
42
New EDA tools have emerged to simulate behavioral descriptions of circuits. These tools combine the
powerful concepts from HDLs and object oriented languages such as C++. These tools can be used instead of
writing behavioral descriptions in Verilog HDL.
The behavioral description is manually converted to an RTL description in an HDL. The designer has
to describe the data flow that will implement the desired digital circuit. From this point onward, the design
process is done with the assistance of EDA tools.
Logic synthesis tools convert the RTL description to a gate-level netlist. A gate-level netlist is a
description of the circuit in terms of gates and connections between them. Logic synthesis tools ensure that
the gate-level netlist meets timing, area, and power specifications. The gate-level netlist is input to an
Automatic Place and Route tool, which creates a layout. The layout is verified and then fabricated on a chip.
Thus, most digital design activity is concentrated on manually optimizing the RTL description of the
circuit. After the RTL description is frozen, EDA tools are available to assist the designer in further processes.
Designing at the RTL level has shrunk the design cycle times from years to a few months. It is also possible to
do many design iterations in a short period of time.
Behavioral synthesis tools have begun to emerge recently. These tools can create RTL descriptions
from a behavioral or algorithmic description of the circuit. As these tools mature, digital circuit design will
become similar to high-level computer programming. Designers will simply implement the algorithm in an
HDL at a very abstract level. EDA tools will help the designer convert the behavioral description to a final IC
chip.
It is important to note that, although EDA tools are available to automate the processes and cut design
cycle times, the designer is still the person who controls how the tool will perform. EDA tools are also
susceptible to the "GIGO : Garbage In Garbage Out" phenomenon. If used improperly, EDA tools will lead to
inefficient designs. Thus, the designer still needs to understand the nuances of design methodologies, using
EDA tools to obtain an optimized design.
Importance of HDLs
HDLs have many advantages compared to traditional schematic-based design.
i.
Designs can be described at a very abstract level by use of HDLs. Designers can write their RTL
description without choosing a specific fabrication technology. Logic synthesis tools can automatically
convert the design to any fabrication technology. If a new technology emerges, designers do not need
to redesign their circuit.
ii.
They simply input the RTL description to the logic synthesis tool and create a new gate-level net list,
using the new fabrication technology. The logic synthesis tool will optimize the circuit in area and
timing for the new technology.
43
iii.
By describing designs in HDLs, functional verification of the design can be done early in the design
cycle. Since designers work at the RTL level, they can optimize and modify the RTL description until
it meets the desired functionality. Most design bugs are eliminated at this point. This cuts down design
cycle time significantly because the probability of hitting a functional bug at a later time in the gatelevel net list or physical layout is minimized.
iv.
Designing with HDLs is analogous to computer programming. A textual description with comments is
an easier way to develop and debug circuits. This also provides a concise representation of the design,
compared to gate-level schematics. Gate-level schematics are almost incomprehensible for very
complex designs.
HDL-based design is here to stay.[3] With rapidly increasing complexities of digital circuits and
v.
increasingly sophisticated EDA tools, HDLs are now the dominant method for large digital designs.
No digital circuit designer can afford to ignore HDL-based design.
vi.
New tools and languages focused on verification have emerged in the past few years. These languages
are better suited for functional verification. However, for logic design, HDLs continue as the preferred
choice.
Verilog HDL is a general-purpose hardware description language that is easy to learn and easy to use.
It is similar in syntax to the C programming language. Designers with C programming experience will
find it easy to learn Verilog HDL.
ii.
Verilog HDL allows different levels of abstraction to be mixed in the same model. Thus, a designer
can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also, a designer
needs to learn only one language for stimulus and hierarchical design.
iii.
Most popular logic synthesis tools support Verilog HDL. This makes it the language of choice for
designers.
iv.
All fabrication vendors provide Verilog HDL libraries for postlogic synthesis simulation. Thus,
designing a chip in Verilog HDL allows the widest choice of vendors.
v.
The Programming Language Interface (PLI) is a powerful feature that allows the user to write custom
C code to interact with the internal data structures of Verilog. Designers can customize a Verilog HDL
simulator to their needs with the PLI.
44
Trends in HDLs
The speed and complexity of digital circuits have increased rapidly. Designers have responded by
designing at higher levels of abstraction. Designers have to think only in terms of functionality. EDA tools
take care of the implementation details. With designer assistance, EDA tools have become sophisticated
enough to achieve a close-to-optimum implementation.
The most popular trend currently is to design in HDL at an RTL level, because logic synthesis tools
can create gate-level net lists from RTL level design. Behavioral synthesis allowed engineers to design
directly in terms of algorithms and the behavior of the circuit, and then use EDA tools to do the translation
and optimization in each phase of the design. However, behavioral synthesis did not gain widespread
acceptance. Today, RTL design continues to be very popular. Verilog HDL is also being constantly enhanced
to meet the needs of new verification methodologies.
Formal verification and assertion checking techniques have emerged. Formal verification applies
formal mathematical techniques to verify the correctness of Verilog HDL descriptions and to establish
equivalency between RTL and gate-level netlists. However, the need to describe a design in Verilog HDL will
not go away. Assertion checkers allow checking to be embedded in the RTL code. This is a convenient way to
do checking in the most important parts of a design.
New verification languages have also gained rapid acceptance. These languages combine the
parallelism and hardware constructs from HDLs with the object oriented nature of C++. These languages also
provide support for automatic stimulus creation, checking, and coverage. However, these languages do not
replace Verilog HDL. They simply boost the productivity of the verification process. Verilog HDL is still
needed to describe the design.
For very high-speed and timing-critical circuits like microprocessors, the gate-level net list provided
by logic synthesis tools is not optimal. In such cases, designers often mix gate-level description directly into
the RTL description to achieve optimum results. This practice is opposite to the high-level design paradigm,
yet it is frequently used for high-speed designs because designers need to squeeze the last bit of timing out of
circuits, and EDA tools sometimes prove to be insufficient to achieve the desired results.
Another technique that is used for system-level design is a mixed bottom-up methodology where the
designers use either existing Verilog HDL modules, basic building blocks, or vendor-supplied core blocks to
quickly bring up their system simulation. This is done to reduce development costs and compress design
schedules.
For example, consider a system that has a CPU, graphics chip, I/O chip, and a system bus. The CPU
designers would build the next-generation CPU themselves at an RTL level, but they would use behavioral
models for the graphics chip and the I/O chip and would buy a vendor-supplied model for the system bus.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
45
Thus, the system-level simulation for the CPU could be up and running very quickly and long before the RTL
descriptions for the graphics chip and the I/O chip are completed.
Hierarchical Modeling Concepts
Before we discuss the details of the Verilog language, we must first understand basic hierarchical
modeling concepts in digital design. The designer must use a "good" design methodology to do efficient
Verilog HDL-based design. In this chapter, we discuss typical design methodologies and illustrate how these
concepts are translated to Verilog. A digital simulation is made up of various components. We talk about the
components and their interconnections.
Learning Objectives
i.
ii.
iii.
Describe four levels of abstraction - behavioral, data flow, gate level, and switch level - to represent
the same module.
iv.
Describe components required for the simulation of a digital design. Define a stimulus block and a
design block. Explain two methods of applying stimulus.
Design Methodologies
There are two basic types of digital design methodologies: a top-down design methodology and
a bottom-up design methodology. In a top-down design methodology, we define the top-level block and
identify the sub-blocks necessary to build the top-level block. We further subdivide the sub-blocks until we
come to leaf cells, which are the cells that cannot further be divided.
Block diagram of Top design Methodology
46
In a bottom-up design methodology, we first identify the building blocks that are available to us. We
build bigger cells, using these building blocks. These cells are then used for higher-level blocks until we build
the top-level block in the design. Figure 4.2 shows the bottom-up design process.
Typically, a combination of top-down and bottom-up flows is used. Design architects define the
specifications of the top-level block. Logic designers decide how the design should be structured by breaking
up the functionality into blocks and sub-blocks. At the same time, circuit designers are designing optimized
circuits for leaf-level cells. They build higher-level cells by using these leaf cells. The flow meets at an
intermediate point where the switch-level circuit designers have created a library of leaf cells by using
switches, and the logic level designers have designed from top-down until all modules are defined in terms of
leaf cells.
To illustrate these hierarchical modeling concepts, let us consider the design of a negative edgetriggered 4-bit ripple carry counter described, 4-bit Ripple Carry Counter.
4-bit Ripple Carry Counter
The ripple carry counter shown in Figure 4.3 is made up of negative edge-triggered toggle flipflops
(T_FF).
of
the
T_FFs can
be
made
up
from
negative
edge-triggered
D-flipflops(D_FF)
and inverters (assuming q_bar output is not available on the D_FF), as shown in Figure 4.4.
47
We
combine
small
building
blocks
and
build
bigger
blocks;
e.g.,
we
could
build D_FF from and and or gates, or we could build a custom D_FF from transistors. Thus, the bottom-up
flow meets the top-down flow at the level of the D_FF.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
48
Modules
We now relate these hierarchical modeling concepts to Verilog. Verilog provides the concept of
a module. A module is the basic building block in Verilog. A module can be an element or a collection of
lower-level design blocks. Typically, elements are grouped into modules to provide common functionality
that is used at many places in the design. A module provides the necessary functionality to the higher-level
block through its port interface (inputs and outputs), but hides the internal implementation. This allows the
designer to modify module internals without affecting the rest of the design.
In Figure 4.5, ripple carry counter, T_FF, D_FF are examples of modules. In Verilog, a module is
declared by the keyword module. A corresponding keyword end module must appear at the end of the module
definition. Each module must have amodule_name, which is the identifier for the module, and
a module_terminal_list, which describes the input and output terminals of the module.
Verilog is both a behavioral and a structural language. Internals of each module can be defined
at four levels of abstraction, depending on the needs of the design.
The module behaves identically with the external environment irrespective of the level of abstraction
at which the module is described. The internals of the module are hidden from the environment. Thus, the
level of abstraction to describe a module can be changed without any change in the environment. These levels
will be studied in detail in separate chapters later in the book. The levels are defined below.
i.
ii.
Dataflow level
At this level, the module is designed by specifying the data flow. The designer is aware of how
data flows between hardware registers and how the data is processed in the design.
iii.
Gate level
The module is implemented in terms of logic gates and interconnections between these gates.
Design at this level is similar to describing a design in terms of a gate-level logic diagram.
iv.
Switch level
This is the lowest level of abstraction provided by Verilog. A module can be implemented in terms
of switches, storage nodes, and the interconnections between them. Design at this level requires
knowledge of switch-level implementation details.
49
Verilog allows the designer to mix and match all four levels of abstractions in a design. In the digital
design community, the term register transfer level (RTL) is frequently used for a Verilog description that uses
a combination of behavioral and dataflow constructs and is acceptable to logic synthesis tools.
If a design contains four modules, Verilog allows each of the modules to be written at a different level
of abstraction. As the design matures, most modules are replaced with gate-level implementations.
Normally, the higher the level of abstraction, the more flexible and technology-independent the
design. As one goes lower toward switch-level design, the design becomes technology-dependent and
inflexible. A small modification can cause a significant number of changes in the design. Consider the
analogy with C programming and assembly language programming. It is easier to program in a higher-level
language such as C. The program can be easily ported to any machine. However, if you design at the
assembly level, the program is specific for that machine and cannot be easily ported to another machine.
Instances
A module provides a template from which you can create actual objects. When a module is invoked,
Verilog creates a unique object from the template. Each object has its own name, variables, parameters, and
I/O interface.
Components of a Simulation
Once a design block is completed, it must be tested. The functionality of the design block can be tested
by applying stimulus and checking results. We call such a block the stimulus block. It is good practice to keep
the stimulus and design blocks separate. The stimulus block can be written in Verilog. A separate language is
not required to describe stimulus. The stimulus block is also commonly called a test bench. Different test
benches can be used to thoroughly test the design block
Two styles of stimulus application are possible. In the first style, the stimulus block instantiates the
design block and directly drives the signals in the design block.
50
Fig 4.7 Stimulus and Design Blocks Instantiated in a Dummy Top-Level Module
51
Next, the FPGA that will be used with this project needs to be specified. The compilation process is
device specific, so the complete device specification (including package type) must be entered when creating
a new project. Look on the top of the FPGA you are using. The de-vice model, package type, and speed grade
will be printed on it. The Spartan-IIE device used on the Digi lab IIE board is shown in Fig-ure L1.2. For the
Digilab IIE board, set the Device Family to Spar-tan2E. The device model is XC2S200E, the package type is
PQ208, and the speed grade is 6, so set the Device to xc2s200e-6pq208. Finally, set the Design Flow to XST
VHDL and click OK
A project can be retargeted (i.e., compiled for another FPGA device) after the project is created by
selecting the xc2s200e-6pq208-XST VHDL item in the Module View window and then choosing Source )
Properties.
The programmable logic boards used for CIS 372 are Xilinx Virtex-II Pro development systems. The
centerpiece of the board is a Virtex-II Pro XC2VP30 FPGA (field-progammable gate array), which can be
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
52
programmed via a USB cable or compact flash card. The board also features PS/2, serial, Ethernet, stereo
audio and VGA video ports, user buttons, switches and LEDS, and expansion ports for connecting to other
boards.
1. Preliminaries
Each Klab station contains a Windows machine on the left and a Linux machine on the right. The
software for programming the FPGA (Xilinx ISE Project Navigator) is on the Windows machine. Open ISE
from Start -> All Programs -> Xilinx ISE 8.2i -> Project Navigator.
On the Windows machine, your eniac account is mounted on the S: drive. Xilinx tools have to access
many files. They get incredibly slow when they have to access those files over Samba. It is recommended that
you keep copy of your project in your eniac account, copy the project directory to the local drive (C:user is the
only writeable directory, so somewhere under there), use the local copy while in the lab, copy the project back
to your eniac account when you are done, and delete the local copy making sure to empty the recycling bin.
2. Creating a new project in ISE
1. First, ISE may have opened a previously used project. If so, close the project using File -> Close
Project.
2. An ISE project contains all the files needed to design a piece of hardware and download it to the
FPGA. Go to File -> New Project to create a new ISE project. Give the project a location on your
mapped Eniac drive and enter a name for the project, such as "tutorial". Set the Top-Level Source
Type to HDL and click Next.
3. The following screen allows you to set the properties for the FPGA you will be downloading your
design to. For our boards, the correct settings are Family = "Virtex2P", Device = "XC2VP30",
Package = "FF896", and Speed = "-7". Set the Synthesis Tool to "XST (VHDL/Verilog)" and
Simulator to "Modelsim-XE Verilog" and click Next.
4. On the next screen, click the New Source button. Select Verilog Module from the list and give the
module the file name "switch", then click Next. A Verilog module is a self-contained hardware unit
with an interface of inputs and outputs, which are specified on the next screen.
5. This screen takes your inputs and outputs and automatically generates code for your module.
6. Click Next and click Finish on the next screen. This will bring you back to the New Project window
click Next twice and then Finish once to generate your module.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
53
should
appear next to Check Syntax. If you get compilation errors, resolve them (ask a TA for help if
necessary) before continuing.
4. Assigning ports to pins
1. For the ports in the module (SWITCHES and LEDS) to control the components on the board, they
must be connected to pins on the FPGA. To do this, click on switch.v in the Sources window, expand
the User Constraints item in Processes and double-click on Assign Package Pins. When prompted to
add a UCF file to your project, click Yes.
2. Xilinx PACE will open up. On the left, the ports of your module will be listed, and on the right is a
diagram of the FPGA. Click the Package View tab at the bottom of this window to see a diagram of
the unconnected pins on the FPGA.
3. Each component on the board is connected to a pin on the FPGA. Connect the pins to your module's
ports by clicking in the Loc box next to each port and typing in the proper pin, as shown in the image
below. You should see each pin location fill in with blue on the pin diagram to the right. The other
information (I/O Std., Drive Str., etc.) does not have to be filled in. Your list of ports should look like
this when you are done.
4. When you are done entering the pin information, click Save. You may be prompted to choose a Bus
Delimiter; choose the top option, XST Default: < >, and click OK. You can then close PACE.
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
54
5. Go back to the ISE Project Navigator. In the Sources window, expand the hierarchy for the switch
module to see the new file, switch.ucf, that has been added to the project. By double-clicking on Edit
Constraints (Text) in the Processes window, you can see the format of the UCF file. If you made a
mistake in your pin locations or want to change them in the future, you can directly edit the UCF file
instead of using PACE.
5. Generating a programming file
1. Click on switch.v in the Sources window. In the Processes window, scroll down and double-click on
Generate Programming File. This will run all of the processes necessary to create a file that can be
downloaded onto the board to program the FPGA. Running these processes may take several minutes;
progress is indicated by the spinning
When a process completes, a
will
appear next to it. All errors must be resolved before a programming file can be generated. Errors are
output to the console, and can be more easily seen by clicking on the Errors tab. Warnings cause
a
to appear next to the process and can be seen under the Warnings tab.
scroll up in the Processes window and double-click on View Design Summary to see a report of your
design and links to more detailed reports.
Design Simulation
Verifying Functionality using Behavioral Simulation
Create a test bench waveform containing input stimulus you can use to verify the
Functionality of the counter module. The test bench waveform is a graphical view of a test bench.
Create the test bench waveform as follows:
i.
ii.
iii.
In the New Source Wizard, select Test Bench WaveForm as the source type, and type counter_tbw in
the File Name field.
55
iv.
Click Next.
v.
The Associated Source page shows that you are associating the test bench waveform with the source
file counter. Click Next.
vi.
The Summary page shows that the source will be added to the project, and it displays the source
directory, type and name. Click Finish.
vii.
You need to set the clock frequency, setup time and output delay times in the Initialize Timing dialog
box before the test bench waveform editing window opens.
viii.
The blue shaded areas that precede the rising edge of the CLOCK correspond to theInput Setup Time
in the Initialize Timing dialog box. Toggle the DIRECTION port to define the input stimulus for the
counter design as follows:
1. Click on the blue cell at approximately the 300 ns to assert DIRECTION high so that the counter
will count up.
2. Click on the blue cell at approximately the 900 ns to assert DIRECTION low so that the counter
will count down.
ix.
x.
In the Sources window, select the Behavioral Simulation view to see that the test bench waveform file
is automatically added to your project.
xi.
56
// Local Wires
//wire [31:0] w0, w1, w2, w3;
reg
[127:0] text_in_r;
reg
[127:0] text_out;
reg
[7:0]
reg
[7:0]
reg
[7:0]
reg
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
57
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
reg
done, ld_r;
reg
[3:0]
dcnt;
// Misc Logic
//always @(posedge clk)
if(!rst) dcnt <= #1 4'h0;
else
if(ld) dcnt <= #1 4'hb;
else
if(|dcnt) dcnt <= #1 dcnt - 4'h1;
always @(posedge clk) done <= #1 !(|dcnt[3:1]) & dcnt[0] & !ld;
always @(posedge clk) if(ld) text_in_r <= #1 text_in;
always @(posedge clk) ld_r <= #1 ld;
// Initial Permutation (AddRoundKey)
//always @(posedge clk)
58
// Round Permutations
//assign sa00_sr = sa00_sub;
assign sa01_sr = sa01_sub;
assign sa02_sr = sa02_sub;
assign sa03_sr = sa03_sub;
assign sa10_sr = sa11_sub;
assign sa11_sr = sa12_sub;
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
59
60
61
s0,s1,s2,s3;
reg
s0_o,s1_o,s2_o,s3_o;
[7:0]
begin
mix_col[31:24]=xtime(s0)^xtime(s1)^s1^s2^s3;
mix_col[23:16]=s0^xtime(s1)^xtime(s2)^s2^s3;
mix_col[15:08]=s0^s1^xtime(s2)^xtime(s3)^s3;
mix_col[07:00]=xtime(s0)^s0^s1^s2^xtime(s3);
end
endfunction
function [7:0] xtime;
input [7:0] b; xtime={b[6:0],1'b0}^(8'h1b&{8{b[7]}});
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
62
endfunction
// Modules
//aes_key_expand_128 u0(
.clk(clk),
.kld(ld),
.key(key),
.wo_0( w0),
.wo_1( w1),
.wo_2( w2),
.wo_3( w3));
aes_sbox us00(.a(sa00), .d(sa00_sub ));
aes_sbox us01(.a(sa01), .d(sa01_sub ));
aes_sbox us02(.a(sa02), .d(sa02_sub ));
aes_sbox us03(.a(sa03), .d(sa03_sub ));
aes_sbox us10(.a(sa10), .d(sa10_sub ));
aes_sbox us11(.a(sa11), .d(sa11_sub ));
aes_sbox us12(.a(sa12), .d(sa12_sub ));
aes_sbox us13(.a(sa13), .d(sa13_sub ));
aes_sbox us20(.a(sa20), .d(sa20_sub ));
aes_sbox us21(.a(sa21), .d(sa21_sub ));
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
63
sa22_sub));
sa23_sub));
sa30_sub));
sa31_sub));
sa32_sub));
sa33_sub));
endmodule
Test bench
module aes_crypto_processor_tb;
reg clk;
reg rst;
reg kld;
reg [31:0]bus_key;
reg [31:0]bus_text;
wire [31:0]bus_textout;
initial
begin
clk = 1'b0;
rst = 1'b0;
kld = 1'b0;
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
64
bus_key = 32'h0000000a;
bus_text = 32'h0000000a;
#10 rst = 1'b1;
#10 bus_key = 32'h0000000a;
bus_text = 32'h0000000b;
Inverse cipher:
module aes_inv_cipher_top(clk, rst, kld, ld, done, key, text_in, text_out );
input clk, rst;
input kld, ld;
output done;
input [127:0] key;
input [127:0] text_in;
output [127:0] text_out;
wire
reg
reg
[127:0] text_in_r;
reg
[127:0] text_out;
reg
[7:0]
reg
[7:0]
reg
[7:0]
65
reg
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
wire
[7:0]
reg
reg
[3:0]dcnt;
// Misc Logic
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
66
go <= #1 1'b1;
else
if(done)
go <= #1 1'b0;
// Initial Permutation
always @(posedge clk)
67
// Round Permutations
assign sa00_sr = sa00;
assign sa01_sr = sa01;
assign sa02_sr = sa02;
assign sa03_sr = sa03;
assign sa10_sr = sa13;
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
68
69
70
s0,s1,s2,s3;
begin
inv_mix_col[31:24]=pmul_e(s0)^pmul_b(s1)^pmul_d(s2)^pmul_9(s3);
inv_mix_col[23:16]=pmul_9(s0)^pmul_e(s1)^pmul_b(s2)^pmul_d(s3);
inv_mix_col[15:08]=pmul_d(s0)^pmul_9(s1)^pmul_e(s2)^pmul_b(s3);
inv_mix_col[07:00]=pmul_b(s0)^pmul_d(s1)^pmul_9(s2)^pmul_e(s3);
end
endfunction
function [7:0] pmul_e;
input [7:0] b;
JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE
71
72
input [7:0] b;
reg [7:0] two,four,eight;
begin
two=xtime(b);four=xtime(two);eight=xtime(four);pmul_b=eight^two^b;
end
endfunction
function [7:0] xtime;
input [7:0] b;xtime={b[6:0],1'b0}^(8'h1b&{8{b[7]}});
endfunction
// Key Buffer
//
reg
[127:0] kb[10:0];
reg
[3:0]
reg
kdone;
reg
kb_ld;
kcnt;
73
sa00_sub));
74
aes_inv_sbox us01(.a(sa01_sr),.d(
sa01_sub));
aes_inv_sbox us02(.a(sa02_sr),.d(
sa02_sub));
aes_inv_sbox us03(.a(sa03_sr),.d(
sa03_sub));
aes_inv_sbox us10(.a(sa10_sr),.d(
sa10_sub));
aes_inv_sbox us11(.a(sa11_sr),.d(
sa11_sub));
aes_inv_sbox us12(.a(sa12_sr),.d(
sa12_sub));
aes_inv_sbox us13(a(sa13_sr),.d(
sa13_sub));
aes_inv_sbox us20(.a(sa20_sr),.d(
sa20_sub));
aes_inv_sbox us21(.a(sa21_sr),d(
sa21_sub));
aes_inv_sbox us22(.a(sa22_sr),.d(
sa22_sub));
aes_inv_sbox us23(.a(sa23_sr),.d(
sa23_sub));
aes_inv_sbox us30(.a(sa30_sr),.d(
sa30_sub));
aes_inv_sbox us31(.a(sa31_sr),.d(
sa31_sub));
aes_inv_sbox us32(.a(sa32_sr),.d(
sa32_sub));
aes_inv_sbox us33(.a(sa33_sr),.d(
sa33_sub));
endmodule
75
CHAPTER 5
RESULT ANALYSIS
5.1 Specifications
Software
: Xilinx 9.2i
Family
: Spartan
Device
: XC3S200
Package
: FT256
Speed Grade
: -4
: HDL
Synthesis Tool
: XST (Verilog)
Simulator
Preferred Language
: Verilog
Key
Output
Output Message: 128 bits (h00112233445566778899aabbccddeeff);
76
77
78
79
This above simulation show the encrypted result converted back into original output
h00112233445566778899aabbccddeeff.
80
5.3.2
81
5.3.4
This above diagram shows the RTL Schematic of the decryption output.
82
b) Disadvantages
1. Takes a long time to figure out the code.
2. It takes long to create the code.
3. If you were to send a code to another person in the past, it will take long to get to that person.
4. Overall cryptography it's a long process.
83
APPLICATIONS
1. Communications : GSM, Payphones
2. Entertainment : Pay-TV, Public event access control
3. Health care : Insurance data, Personal file
4. Government : Identification, Passport, Driving license
5. E-banking : Access to accounts, To do transactions , Sharing
6. E-commerce : Sale of tickets, Reservations
7. Education : Student database, personal data like results
8. Biometric : Finger print recognition
84
CONCLUSION
This design presents, for the first time, a universal cryptography processor for smart-card applications
that supports both private and public key cryptography algorithms. This is achieved this by expressing the
primitives of three important algorithms for smart cards (DES, AES, and ECC) in terms of simple logical
operations that maximize the number of common blocks among them. This approach resulted in a crypto
processor that meets the power consumption and performance specifications of smart cards and occupies 2.25
mm
in 0.18-m CMOS when SRAM memory blocks are used. This area represents just 9% of the
maximum available smart-card chip area of 25 mm .Using FeRAM instead of SRAM memory blocks
provides nonvolatile configuration at no extra area overhead
85
FUTURE SCOPE
DNA Cryptography:
DNA cryptography is a new born cryptographic field emerged with the research of DNA computing,
in which DNA is used as information carrier and the modern biological technology is used as
implementation tool.
The vast parallelism and extraordinary information density inherent in DNA molecules are explored
for cryptographic purposes such as encryption, authentication, signature, and so on.
QUANTUM Cryptography:
Quantum cryptography attempts to achieve the same security of information as other forms of
cryptography but through the use of photons, or packets of light. The process, though still in
experimental stages, makes use of the polarization nature of light and is proving to be a very
promising defence against eavesdropping.
86
REFERENCES
[1]. www.cryptography.com
[2]. www.wikipedia.com
[3]. www.io.com/~hcexres/power_tools/hyperweb/website1.PDF
[4]. www.abo.fi/~ipetre/crypto
[5]. www.google.com
[6]. www.howstuffworks.comhttp://rijndael.info/audio/rijndael_pronunciation.wav
[7]. Nicolas Courtois, Josef Pieprzyk, "Cryptanalysis of Block Ciphers with Overdefined Systems of
Equations". pp267287, ASIACRYPT 2002.
[8]. Joan Daemen, Vincent Rijmen, "The Design of Rijndael: AES The Advanced Encryption Standard."
Springer, 2002. ISBN 3-540-42580-2.
[9]. Christof Paar, Jan Pelzl, "The Advanced Encryption Standard", Chapter 4 of "Understanding
Cryptography, A Textbook for Students and Practitioners". (companion web site contains online
lectures on AES), Springer, 2009.
87