Bart 2014

Hash Functions - Bart Preneel
November 2014
Hash functions
X.509 Annex D
RIPEMD-160
MDC-2
SHA-256
MD2, MD4, MD5
SHA-512
SHA-3
SHA-1
Cryptographic Hash Functions

This is an input to a crypto
cryptographic hash function. The input
is a very long string, that is
reduced by the hash function to a
string of fixed length. There are
additional security conditions: it
should be very hard to find an
input hashing to a given value (a
preimage) or to find two colliding
inputs (a collision).
Bart Preneel
KU Leuven - COSIC
firstname.lastname@esat.kuleuven.be
1A3FD4128A198FB3CA345932
Univ. of Piraeus, 11 November 2014

Insert presenter logo
here on slide master
Applications
Agenda
Definitions
short unique identifier to a string

digital signatures
data authentication
Iterations (modes)
one-way function of a string
Compression functions
protection of passwords
micro-payments
Constructions
confirmation of knowledge/commitment
g
SHA-3
SHA 3
pseudo-random string generation/key derivation

entropy extraction
construction of MAC algorithms, stream ciphers, block
ciphers,
Conclusions
2005: 800 uses of MD5 in Microsoft Windows

3
Preimage resistance
Security requirements (n-bit result)

preimage
collision
2nd preimage
preimage
in a password file, one does not store
(username, password)
but
(username,hash(password))
h(x)
h(x)
2n
= h(x)
2n
h(x)
h(x)
h
h(x)
2n/2
this is sufficient to verify a password

an attacker with access to the
password file has to find a preimage
2n
5
November 2014
Second preimage resistance

2nd preimage
Collision resistance
hacker Alice prepares two versions
of a software driver for the O/S
company Bob
Channel 1: high capacity and insecure
h(x)
Channel 2: low capacity but secure
( authenticated
(=
th ti t d cannott be
b modified)
difi d)
an attacker can modify x but not h(x)
h(x)
= h(x)
he can only fool the recipient if he

finds a second preimage of x
2n
7
Collision resistance (2/2)

in many cryptographic protocols,
Alice wants to commit to a value x
without revealing it
Alice picks a secret random string r
and sends y = h(x || r) to Bob
in a later phase of the protocol
protocol, Alice
reveals x and r to Bob and he
checks that y is correct
if Alice can find a collision, that is
(x,r) and (x,r) with x x she can
cheat
if Bob can find a preimage, he can
learn x and cheat
Alice submits x for inspection to Bob

if Bob is satisfied, he digitally signs
h(x) with his private key
Alice now distributes x to users of
the O/S; these users verify the
signature with Bobs public key
h
h(x)
h(x)
2n/2
this signature works for x and for x,

since h(x) = h(x)
The birthday paradox
collision
x is correct code
x contains a backdoor that gives Alice
access to the machine
collision
how many people r do I need to have in a room to

have a probability of p=50% to have at least 2
people with the same birthday?
answer: 23
what is the probability that the birthdays of r people are distinct?
h
h(x)
r terms
q = 1 - p = 1 . 364/365 . 363/365 . 362/365 (365-(r-1))/365
q = 1-p 0.5 for r = 23
h(x)
intuition: number of distinct pairs of people is 23.22/2 = 253; each pair has
probability 1/365 to have the same birthday
2n/2
9
exercise: how many people do you need in a room to have a probability

of 0.50 to have 3 people with the same birthday?
10
The birthday paradox - proof
The birthday paradox (2)

given a set with S elements
choose r elements at random (with replacements) with r S
the probability p that there are at least 2 equal elements (a
collision) 1 - exp - r(r-1)/2S)
more precisely, it can be shown that
p 1 - exp - r(r-1)/2S)
if r < 2S then p 0.6 r (r-1)/2S
r terms
q = 1-p = 1 . ((S-1)/S) . ((S-2)/S) . ((S-(r-1))/S)

or q = k=1r-1 (S-k/S)
ln q = k=1r-1 ln (1-k/S) k=1r-1 -k/S = -r(r-1)/2S
Taylor: if x 1: ln (1-x) x
summation: k=1r-1 k = r (r-1)/2
for a hash function with an n-bit result, a collision can be

found in time 2n/2 (and memory 2n/2)
hence p = 1 q = 1 - exp - r(r-1)/2S)

11
12
November 2014
Brute force (2nd) preimage
Pseudo-random function
computationally indistinguishable from a random function
multiple target second preimage (1 out of many):
Advhprf = Pr [ K $ K: AhK(.) 1] - Pr [ f $ RAND(m,n): Af 1]
if one can attack 2t simultaneous targets, the effort to find a single

preimage is 2n-t
RAND(m,n): set of all functions from m-bit to n-bit strings
multiple target second preimage (many out of

many):
titime-memory trade-off
t d ff with
ith (2n) precomputation
t ti and
d
storage (22n/3) time per (2nd) preimage: (22n/3)
f
? or ?
[Hellman80]
This concept makes only

sense for a function with a
secret key
answer: randomize hash function with a parameter S

(salt, key, spice,)
13
14
Brute force attacks in practice
Quantum computers
in principle exponential parallelism
(2nd) preimage search

n = 128: 14 B$ for 1 year if one can attack
parallel
240
inverting a one-way function: 2n reduced to 2n/2
targets in
[Grover96]
collision search: can we do better than 2n/2 ?
parallel collision search: small memory using

cycle
y
finding
g algorithms
g
((distinguished
g
p
points))
2n/3 computation + hardware [Brassard-Hoyer-Tapp98] = 22n/3

[Bernstein09] classical collision search requires 2n/4 computation
and hardware (= standard cost of 2n/2 )
n = 128: 1 M$ for 5 hours (or 1 year on 60K PCs)

n = 160: 56 M$ for 1 year
need 256-bit result for long term security (30 years or more)
15
16
Properties in practice
collision resistance is not always necessary
other properties are needed:
PRF: pseudo-randomness if keyed (with secret key)

PRO: pseudo-random oracle property (formalization of security
properties when there is no key)
near-collision resistance
partial preimage resistance (most of input known)
multiplication freeness
Iteration
( d off compression
(mode
i ffunction)
ti )
how to formalize these requirements and the

relation between them?
17
18
18
November 2014
How not to construct a hash function
Hash function: iterated structure
Divide the message into t blocks xi of n bits each
IV
Message block 1: x1
Message block 2: x2
x1
H1
x2
H2
H3
x4
x3
split messages into blocks of fixed length and hash them

block by block with a compression function f
need padding at the end
Message block t: xt
efficient and elegant. but
Hash value h(x)

19
Security relation between f and h
Security relation between f and h (2)
iterating f can degrade its security
solution: Merkle-Damgrd (MD) strengthening

fix IV, use unambiguous padding and insert length at the end
trivial example: 2nd preimage
IV
H1
x2
x1
IV = H1
H2
x2
f is collision resistant h is collision resistant
H3
[Merkle89-Damgrd89]
H2
H3
f is ideally 2nd preimage resistant h is ideally 2nd

preimage resistant [Lai-Massey92]
x4
x3
20
many other results
x4
x3
21
22
Attacks on MD-type iterations
Security relation between f and h (3)

length extension: if one knows h(x), easy to compute h(x || y) without knowing x or IV
IV
x2
x1
IV
H1
H1
H2
H3
Sec security degrades lineary with number 2t of message blocks

hashed: 2n-t+1 + t 2n/2+1
appending the length does not help here!
H4= h(x || y)
x3
x1
H1
x2
multi-collision
multi collision attack and impact on concatenation
herding attack
[J
[Joux04
04]
[Kelsey-Kohno06]
reduces security of commitment using a hash function from 2n

on-line 2n-t + precomputation 2.2(n+t)/2 + storage 2t
solution: output transformation

IV
long message 2nd preimage attack

[Dean-Felten-Hu'99], [Kelsey-Schneier05]
H3= h(x)
x3
x2
x1
H2
H2
x3
H3
x4
example: t=n/2 on-line 2n/2 + precomputation 2.23n/4 + storage 2n/2
g
23
24
November 2014
How (NOT) to strengthen a hash function?
Multiple collisions multi-collision
[Coppersmith85][Joux04]
Assume ideal hash function h with n-bit result

(2n/2) evaluations of h (or steps): 1 collision
answer: concatenation
h1 (n1-bit result) and h2 (n2-bit result)
h(x)=h(x)
(r. 2n/2) steps: r2 collisions
iintuition:
t iti
th
the strength
t
th off g against
i t
collision/(2nd) preimage attacks is the
product of the strength of h1 and h2
if both are independent
h1
h(x1)=h(x1)) ; h(x2)=h(x2)) ; ; h(xr2)=h(xr2))
h2
(22n/3) steps: a 3-collision

h(x)= h(x)=h(x)
g(x) = h1(x) || h2(x)
(2n(t-1)/t) steps: a t-fold collision (multi-collision)
but.
h(x1)= h(x2)= =h(xt)

25
26
Multi-collisions on iterated hash function (2)

IV
x1, x1
H1
x2, x2
H2
x3, x3
H3
Multi-collisions [Coppersmith85][Joux 04]
finding multi-collisions for an iterated hash function is not
much harder than finding a single collision (if the size of the
internal memory is n bits)
algorithm
generate R =
x4, x4
f IV:
for
IV collision
lli i for
f block
bl k 1:
1 x1, x1
for H1: collision for block 2: x2, x2

R
2n1/2-fold
multi-collision
lti lli i ffor h2
in R: search by brute
force for h1
h1
Time: n1. 2n2/2 + 2n1/2
<< 2(n1 + n2)/2
now h(x1||x2||x3||x4) = h(x1||x2||x3||x4) = h(x1||x2||x3||x4) =

= h(x1||x2||x3||x4) a 16-fold collision (time: 4 collisions)
h2
g(x) = h1(x) || h2(x)
27
Multi-collisions [Coppersmith85][ Joux 04]
28
Improving MD iteration
consider h1 (n1-bit result) and h2 (n2-bit result), with n1 n2.

concatenation of 2 iterated hash functions (g(x)= h1(x) || h2(x))
is as most as strong as the strongest of the two (even if both
are independent)
salt + output transformation + counter + wide pipe

salt
IV
cost of collision attack against g at most

n1 . 2n2/2 + 2n1/2 << 2(n1 + n2)/2
cost of (2nd) preimage attack against g at most
n1 . 2n2/2 + 2n1 + 2n2 << 2n1 + n2
if either of the functions is weak, the attacks may work better
29
2n
x1
salt
H1
f
1
2n
x2
f
2
salt
H2
2
2n
x3
f
3
salt
salt
H3
2n
x4
security reductions well understood

many more results on property preservation
impact of theory limited
2n
|x|
4
30
November 2014
Improving MD iteration
Tree structure: parallelism
degradation with use: salting (family of functions,

randomization)
or should a salt be part of the input?
[Damgrd89], [Pal-Sarkar03], [Keccak team13]
x1 x2
x3
x7
x8
PRO: strong output transformation g
x5 x6
x4
also solves length extension
long message 2nd preimage: preclude fix points
counter f fi [Biham-Dunkelman07]
multi-collisions, herding: avoid breakdown at 2n/2

with larger internal memory: known as wide pipe
e.g., extended MD4, RIPEMD, [Lucks05]
31
32
Permutation () based: sponge

x1
x2
x3
h1
x4
Modes: summary
growing theory to reduce security properties of
hash function to that of compression function
(MD) or permutation (sponge)
h2
H10
r
preservation of large range of properties
H20
relation between properties
it is very nice to assume multiple properties of the

compression function f, but unfortunately it is very
hard to verify these
squeeze
absorb
if result has n bits, H1 has r bits (rate), H2 has c bits (capacity) and
the permutation is ideal collisions
min (2c/2 , 2n/2)
2nd preimage min (2c/2 , 2n)
preimage
min (2c , 2n)
33
still no single comprehensive theory
34
Single block length: [Rabin78]
IV
Compression
p
functions
H1
x4
x3
x2
x1
H2
E
H2
H3
H4
H2
Merkles meet-in-the-middle: (2nd) preimage in time 2n/2

select 2n/2 values for (x1,x2) and compute forward H2
select 2n/2 values for (x3,x4) and compute backward H2
by the birthday paradox expect a match and thus a (2nd) preimage
35
35
36
November 2014
Block cipher (EK) based: single block length
Permutation () based
small permutation
Grstl
parazoa
Miyaguchi-Preneel
Davies-Meyer
JH
Hi
Hi-1
xi
xi
Hi
H1i-1
H2i-1
Hi-1
H2i
Hi-1
xi
xi
H1i
output length = block length m; rate 1; 1 key schedule per encryption
Hi
12 secure compression functions (in ideal cipher model)
lower bounds: collision 2m/2, (2nd) preimage 2m

[Preneel+93], [Black-Rogaway-Shrimpton02], [Duo-Li06], [Stam09],
37
Block cipher (EK) based: double block length

(3n to 2n compression)
what is the best

collision/preimage security for
2 block cipher calls?
[Mennink12]
Number of
block cipher
calls
For optimal collision security:

what is the best preimage
security for s block cipher
calls? (upper bounds are
known)
[Jetchev-zen-Stam12]
(2)
2n/3
MDC-4
MDC-2
(2)
analysis of slightly more complex schemes very

difficult
MD versus sponge debate:
(4)
sponge is simpler
should xi and Hi-1 be treated differently?
trivial
(1)
n 5n/4
3n/2
2n
preimage
39
40
Hash function history 101

RSA
DES
1980
Hash function
constructions
t ti
HARDWARE
5n/8
3n/5
n/2
powerful tools available
single
block
length
1990
2000
SOFTWARE
(3)
Iteration modes and compression functions

security of simple modes well understood
Open problems:
collision
38
double
block
length
AES
small
permutations
ad hoc
schemes
Dedicated
MD2
MD4
MD5
security
reduction
for
factoring,
DLOG,
lattices
SNEFRU
SHA-1
RIPEMD-160
SHA-2
Whirlpool
SHA-3
2010
41
41
42
November 2014
MDx-type hash function history

MD4
Ext. MD4
HAVAL
RIPEMD
x0
RIPEMD-160
SHA-2
92
94
95
02
SHA-3
12
A0
B0
A1
B1
C0
D0
C1
D1
C16
D16
93
SHA(-0)
[Rivest91]: 4 rounds of 16 steps
H i-1
90
91
MD5
SHA-1
MD5
x15
A16
B16
xp(0)
A17
B17
xp(15)
A32
B32
xq(0)
A33
B33
xq(15)
A48
B48
C48
D48
xr(0)
A49
B49
C49
D49
xr(15)
A64
B64
C17
D17
C32
D32
C33
D33
xi
K
i
C64
D64
+
Hi
43
44
The complexity of collision attacks
State updates in the MD4 family
brute force: 1 million PCs (1 year) or US$ 100,000 hardware (4 days)

MD4
SHA/SHA-1
SHA-256
90
80
70
60
50
40
30
20
10
0
Design principles copied in MD5, RIPEMD, HAVAL, SHA,

SHA-1, SHA-256, ...
19
92
19
92
19
94
19
96
19
98
20
00
20
02
20
04
20
06
20
08
20
10
All hash functions in use today
MD4
MD5
SHA-0
SHA-1
Brute force
45
46
Slide credit: C. Rechberger
SHA-1
Rogue CA attack
designed by NIST (NSA) in 94
[Sotirov-Stevens-Appelbaum-Lenstra-Molnar-Osvik-de Weger 08]
log2 complexity
request user cert; by special
[Wang+05]
[Mendel+08]
[Wang+04]
[Sugita+06]
[Manuel+09]
[Stevens12]
collision this results in a fake CA

cert (need to predict serial
number + validity period)
impact: rogue CA
that can issue certs
that are trusted by
all browsers
[McDonald+09]
Most attacks
unpublished/withdrawn
User1
Self-signed
root key
CA1
CA2
User2
User x
Rogue CA
6CAshaveissuedcertificatessignedwithMD5in2008:
prediction: collision for SHA-1 in the next 12 months
47
RapidSSL,FreeSSL(freetrialcertificatesofferedbyRapidSSL),TCTrustCenter
AG,RSADataSecurity,Verisign.co.jp
48
November 2014
Upgrades
SHA-2 [FIPS180,NIST02]
SHA-224, SHA-256, SHA-384, SHA-512
RIPEMD-160 is good replacement for SHA-1
upgrading algorithms is always hard
non-linear message expansion

64/80 steps
SHA-384 and SHA-512: 64-bit architectures
SHA-256 collisions: 31/64 steps 265.5 [Mendel+13]
TLS uses MD5 || SHA

SHA-1
1 to protect algorithm
negotiation (up to v1.1)
free start collision: 52/64 steps (2128-) [Li+12]
non-randomness
d
4
47/64
/64 steps ((practical)
i l) [Bi
[Biryukov+11][Mendel+11]
k
11][M d l 11]
SHA-256 preimages: 45/64 steps (2256-) [Khovratovitch12]
upgrading negotiation algorithm is even

harder: need to upgrade TLS 1.1 to TLS 1.2
implementations today faster than anticipated

18 cycles/byte on Core 2 (2008) 7.8 cycles/byte on Haswell (2013)
adoption accelerated by other attacks on TLS
since 2013 deployment in TLS 1.2
49
50
NIST AHS competition (SHA-3)

SHA-3: 224, 256, 384, and 512-bit message digests
(similar to SHA-2)
Call:
02/11/07
Deadline (64): 31/10/08
SHA-3
Round 1 (51): 09/12/08
80
(bit and
(bits
db
bytes)
t )
60
Round 2 (14): 24/7/09
64
51
Final (5):
10/12/10
Selection:
02/10/12
40
14
20
Q4/10
Q4/12
0
Q4/08
Q3/09
round 1
final
52
The candidates
Preliminary cryptanalysis
53
Slide credit: Christophe De Cannire
round 2
51
51
54
November 2014
End of Round 1 candidates
Round 2 candidates
55
56
Properties: bits and bytes
Reductions: 256-bit result
[Watanabe10]
Blake-256
Grstl-256
JH-256
K
Keccak-256
k 256
Skein-256
pre
sec
coll.
indiff. assumption
256
256
256
256
256
256
256-L
256
256
256
128
128
128
128
128
128
128
256
256
256
E ideal
, ideal
ideal
ideal
id l
E ideal
ideal
SHAKE-128
128
128
128
128
NIST
256
256-L
128
57
58
Software performance
eBash [Bernstein-Lange]
Reductions: 512-bit result
Blake-512
Grstl-512
JH-512
K
Keccak-512
k 512
Skein-512
pre
sec
coll.
512
512
256
512
512
512
512-L
256
512
512
256
256
256
256
256
256
256
256
512
256
E ideal
, ideal
ideal
ideal
id l
E ideal
ideal
logarithmic scale
indiff. assumption
SHAKE-512
256
256
256
256
NIST
512
512-L
256
slower
59
factor 4 in cycles/byte
60
10
November 2014
Hardware: post-place & route results

ASIC 130nm [Guo-Huang-Nazhandali-Schaumont10]
Throughput
(Gbps)
20
Keccak
SHA256
Blake
Keccak
BMW
16
CubeHash
ECHO
Fugue
12
Grostl
Grstl
Hamsi
JH
Keccak
Luffa
Blake
Shabal
SIMD
nominal version:
Skein
Skein
5x5 array of 64 bits
0
0
permutation: 25, 50, 100, 200, 400, 800, 1600
SHAvite
JH
40,000
80,000
120,000
160,000
200,000
Area
(GateEqv)
24 rounds of 5 steps
61
62
Slide credit: Patrick Schaumont, Virginia Tech
Keccak: FIPS 202
Performance of hash functions [Bernstein-Lange]
(draft: 28 May 2014)
(cycles/byte) Intel Core 2 Quad Q9550; 4 x 2833MHz (2008)
append 2 extra bits for domain separation to allow
2001
flexible output length (XOFs or eXtendable Output Functions)

tree structure (Sakura) allowed by additional encoding
6 versions
SHA3-224: n=224; c =
SHA3-256: n=256; c =
SHA3 384: n=384; c =
SHA3-384:
SHA3-512: n=512; c =
SHAKE128: n=x; c =
SHAKE256: n=x; c =
448;
512;
768;
1024;
256;
512;
r = 1152
r = 1088
r = 832
r = 576
r = 1344
r = 1088
(72%)
(68%)
(52%)
(36%)
(84%)
(68%)
pad 01
pad 11 for XOF
if result has n bits, H1 has r bits (rate), H2 has c bits (capacity) and
the permutation is ideal collisions
min (2c/2 , 2n/2)
2nd preimage min (2c/2 , 2n)
preimage
min (2c , 2n)
(estimated)
63
64
Hash functions: conclusions
SHA-1 would have needed 128-160 steps

instead of 80
2004-2009 attacks: cryptographic meltdown but
not dramatic for most applications
theory is developing for more robust iteration
modes and extra features; still early for building
blocks
Nirwana: efficient hash functions with security
reduction
65
11

Bart 2014

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bart 2014

Uploaded by

Copyright:

Available Formats

Hash Functions - Bart Preneel

MD2, MD4, MD5

Cryptographic Hash Functions

Univ. of Piraeus, 11 November 2014

short unique identifier to a string

one-way function of a string

pseudo-random string generation/key derivation

2005: 800 uses of MD5 in Microsoft Windows

Security requirements (n-bit result)

in a password file, one does not store

this is sufficient to verify a password

Hash Functions - Bart Preneel

Second preimage resistance

Channel 1: high capacity and insecure

an attacker can modify x but not h(x)

he can only fool the recipient if he

Collision resistance (2/2)

Alice submits x for inspection to Bob

this signature works for x and for x,

The birthday paradox

how many people r do I need to have in a room to

what is the probability that the birthdays of r people are distinct?

exercise: how many people do you need in a room to have a probability

The birthday paradox - proof

The birthday paradox (2)

q = 1-p = 1 . ((S-1)/S) . ((S-2)/S) . ((S-(r-1))/S)

for a hash function with an n-bit result, a collision can be

hence p = 1 q = 1 - exp - r(r-1)/2S)

Hash Functions - Bart Preneel

Brute force (2nd) preimage

multiple target second preimage (1 out of many):

Advhprf = Pr [ K $ K: AhK(.) 1] - Pr [ f $ RAND(m,n): Af 1]

if one can attack 2t simultaneous targets, the effort to find a single

RAND(m,n): set of all functions from m-bit to n-bit strings

multiple target second preimage (many out of

This concept makes only

answer: randomize hash function with a parameter S

Brute force attacks in practice

(2nd) preimage search

inverting a one-way function: 2n reduced to 2n/2

collision search: can we do better than 2n/2 ?

parallel collision search: small memory using

2n/3 computation + hardware [Brassard-Hoyer-Tapp98] = 22n/3

n = 128: 1 M$ for 5 hours (or 1 year on 60K PCs)

PRF: pseudo-randomness if keyed (with secret key)

how to formalize these requirements and the

Hash Functions - Bart Preneel

How not to construct a hash function

Hash function: iterated structure

Divide the message into t blocks xi of n bits each

split messages into blocks of fixed length and hash them

efficient and elegant. but

Hash value h(x)

Security relation between f and h

Security relation between f and h (2)

iterating f can degrade its security

solution: Merkle-Damgrd (MD) strengthening

trivial example: 2nd preimage

f is collision resistant h is collision resistant

f is ideally 2nd preimage resistant h is ideally 2nd

many other results

Attacks on MD-type iterations

Security relation between f and h (3)