Professional Documents
Culture Documents
Semester 2, 13/14
Digital Hash
Hash is simply a "summary", or "tag", which is generated from a digital document using a
mathematical rule or algorithm.
Designed in such a way that a small change in the document would produce a big change in the
hash.
A hash is not an encryption of the document and most importantly, it's very difficult to find two
documents that have the same hash.
Hashes are used to check the integrity of files and documents.
Detection of bit errors caused by unreliable transmission links or faulty storage media.
o Solution: Message Digest acting as a unique fingerprint for the document (similar function
as CRC).
Protection against unauthorized modification
o Without protection a forger could create both an alternative document and its
corresponding correct message digest.
o Symmetric Key Solution: Message Authentication Code (MAC) formed by using a keyed
message digest function.
o Asymmetric Key Solution: Digital Signature formed by encrypting the message digest with
the document authors private key.
Page 1 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Hash Algorithm
A one-way hash function, H(m), is a mathematical function that takes a message string, m, of any
length (pre-string) and returns a smaller fixed-length string, h (hash value).
h = H(m), where h is of length m
Characteristic of hash function:
Can take arbitrary-length input and return an output of fixed length.
Given m, it is easy to compute h.
Given h, it is hard to compute m such that H(m) = h (not easier than a brute-force search)
Given m, it is hard to find another message M, such that H(m) = H(m)
The hash value of a file is a small unique fingerprint called message digests or one-way
transformations.
The message digest value should depend on every bit of the corresponding message.
If a single bit of the original message changes its value, or one bit is added or deleted, then
about 50% of the digest bits should change their values in a random fashion.
A good hash function achieves a pseudo-random message-to-digest mapping, causing two
nearly identical messages to have totally different hash values.
A single bit change in a document should cause about 50% of the bits in the digest to change
their values!
Page 2 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
A message digest of a fixed size acts as a unique fingerprint for an arbitrary-sized message, document
or packed software distribution file.
With a common digest size of 128 256 bits, about 1038 1077 different fingerprint values can be
represented.
o If on every day of the 21th century 10 billion people wrote 100 letters each, this would
amount to 3.65 1016 documents, only only a tiny percentage of all possible values would
be used.
Page 3 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Checksum
Let's convince that a simple checksum would make a poor message digest algorithm.
Suppose Bob owes Alice $100.99" and sends an IOU to Alice consisting of the text string
"IOU100.99BOB".
The ASCII representation (in hexadecimal notation) for these letters is 49, 4F, 55, 31, 30, 30,
2E, 39, 39, 42, 4F, 42.
IOU1 49 4F 55 31
00. 9 30 30 3E 39
9BOB 39 42 4F 42
B2 C1 D2 AC
IOU9 49 4F 55 39
00. 1 30 30 3E 31
9BOB 39 42 4F 42
B2 C1 D2 AC
Page 4 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Page 5 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Page 6 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Each block xi then serves as input to an internal fixed-size hash function f, the compression
function of h, which computes a new intermediate result of bit length n for some fixed n, as a
function of the previous n -bit intermediate result and the next input block xi
Hi-1 serves as the n bit chaining variable between stage (i 1) and stage i
H0 is a pre-defined starting value or initializing value (IV).
An optional output transformation g is used in a final step to map the n-bit chaining variable to an
m-bit result g(Ht)
Particular hash functions are distinguished by the nature of the preprocessing, compression
function, and output transformation.
Length of a Hash Value
The essential cryptographic properties of a hash function are that it is both one-way and
collision-free.
The most basic attack on a hash function is to choose inputs to the hash function at random
until either we find some input that will give us the target output value we are looking for
(thereby contradicting the one-way property), or we find two inputs that produce the same
output (thereby contradicting the collision-free property).
Suppose the hash function produces an n-bit long output. If we are trying to find some input
which will produce some target output value y, then since each output is equally likely we
expect to have to try 2n possible input values.
Page 7 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Hash Algorithm
Page 8 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Block Algorithms
Both MD5 and SHA hash functions work on input data blocks of exactly 512 bits.
A document to be hashed must first be partitioned into an integer number of data blocks of
this size.
This is done by first appending a 64 bit document length L to the end of the document and
then inserting 0 511 padding bits in front of the document length field in order to fill the last
block up to 512 bits.
This block-by-block processing allows the hashing of arbitrarily large documents in a serial
fashion.
Page 9 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Example 3: What is the number of padding bits if the length of the original message is 2590 bits?
Solution
We can calculate the number of padding bits as follows:
| |
Page 10 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Word Expansion
Page 11 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Page 12 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
SHA-512: Functions
Example
Page 13 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Apply the Majority function on buffers A, B, and C. If the leftmost hexadecimal digit of these
buffers are 0x7, 0xA, and 0xE, respectively, what is the leftmost digit of the result?
Solution
The digits in binary are 0111, 1010, and 1110.
The first bits are 0, 1, and 1. The majority is 1.
The second bits are 1, 0, and 1. The majority is 1.
The third bits are 1, 1, and 1. The majority is 1.
The fourth bits are 1, 0, and 0. The majority is 0.
The result is 1110, or 0xE in hexadecimal.
Example
Apply the Conditional function on E, F, and G buffers. If the leftmost hexadecimal digit of these
buffers are 0x9, 0xA, and 0xF respectively, what is the leftmost digit of the result?
Solution
The digits in binary are 1001, 1010, and 1111.
The first bits are 1, 1, and 1. The result is F1, which is 1.
The second bits are 0, 0, and 1. The result is G2, which is 1.
The third bits are 0, 1, and 1. The result is G3, which is 1.
The fourth bits are 1, 0, and 1. The result is F4, which is 0.
The result is 1110, or 0xE in hexadecimal.
SHA-512: Constants
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Converting this number to binary with only 64 bits in the fraction part, we get
MD5 is a strengthened version of MD4 with four rounds; an attack against one round has been
found.
MD5 is more commonly used, the implementation is better optimized and thus faster on Intel
processors (893 K/sec).
All the MD algorithms produce 128-bit hashes
SHA produces a larger 160-bit hash, and there are no known attacks against it.
The first version of SHA had a weakness which was later corrected; the code used here
implements the second, corrected, version. It operates at 336 K/sec.
We use two common and trusted hashing algorithms - SHA-1 and MD5. We concatenate the
output of both functions into a 288-bit (36-byte) value. The reason we use both values is
twofold: i) we're just plain paranoid, and ii) it's been suggested that MD5 has some subtle
weaknesses.
Using two algorithms strengthens our hashing functions. Cryptographers of the future will
have to break both algorithms in order to break the strength of the combined hash functions.
For reference, the probability of finding a bitstring that produces the same hash code as
another (different) bit string is 1 in 2288, or 4.97 x 1086.
Page 15 of 16
Faculty of Computing
Universiti Teknologi Malaysia
SCSR3443 Introduction to Cryptography
Semester 2, 13/14
Rabin Scheme
Davies-Meyer Scheme
Page 16 of 16