Ch8.Dictionaries Hashing - Handouts

Dictionaries 2/19/08 14:15
Outline and Reading

• Dictionary ADT (§8.1)
Dictionaries & Hashing (Ch 8) • Hash Tables (§8.2)
• Ordered Dictionaries (§8.3)
0 ∅
1 025-612-0001 • Binary search (§8.3.3)
2 981-101-0002
• Lookup table (§8.3.2)
3 ∅
4 451-229-0004
• Skip Lists (§8.4)
2/19/08 14:15 Dictionaries 1 2/19/08 14:15 Dictionaries 2
Unordered Sequence
Dictionary ADT (§8.1.1) - Log File (§8.1.2)
The dictionary ADT models a Dictionary ADT methods: • A log file is a dictionary implemented by means of an unsorted
searchable collection of key-  find(k): if the dictionary has sequence
element items an item with key k, returns • We store the items of the dictionary in a sequence (based on a doubly-
The main operations of a linked lists or a circular array), in arbitrary order
the position of this element,
dictionary are searching, else, returns a null position. • Performance:
inserting, and deleting items • insertItem takes O(1) time since we can insert the new item at the
 insertItem(k, o): inserts item beginning or at the end of the sequence
Multiple items with the same (k, o) into the dictionary • find and removeElement take O(n) time since in the worst case (the item is
key are allowed  removeElement(k): if the not found) we traverse the entire sequence to look for an item with the
Applications: dictionary has an item with given key
key k, removes it from the Space - can be O(n), where n is the number of elements in the dictionary
address book
•

dictionary and returns its • The log file is effective only for dictionaries of small size or for
 credit card authorization
element. An error occurs if dictionaries on which insertions are the most common operations, while
 mapping host names (e.g., searches and removals are rarely performed (e.g., historical record of
cs16.net) to internet addresses there is no such element. logins to a workstation)
(e.g., 128.148.34.101)  size(), isEmpty()
 keys(), Elements()
Direct Address Table Hash Tables

• A direct address table is a dictionary in which • Hashing
• The keys are in the range {0,1,2,…,N} • Hash table (an array) of size N, H[0,N]
• Stored in an array of size N - T[0,N] • Hash function h that maps keys to indices in H
• Item with key k stored in T[k]
• Performance: • Issues
• insertItem, find, and removeElement all take O(1) time • Hash functions - need method to transform key to an index in H
• Space - requires space O(N), independent of n, the number of that will have nice properties.
items stored in the dictionary • Collisions - some keys will map to the same index of H (otherwise
we have a Direct Address Table). Several methods to resolve the
• The direct address table is not space efficient unless the range collisions
of the keys is close to the number of elements to be stored in
• Chaining - put elts that hash to same location in a linked list
the dictionary, I.e., unless n is close to N.
• Open addressing - if a collision occurs, have a method to select another
location in the table.
• Probe sequences
1
Hash Functions and

Hash Tables (§8.2) Example
A hash function h maps keys of a given type to We design a hash table for 0 ∅
integers in a fixed interval [0, N − 1] a dictionary storing items 1 025-612-0001
(SSN, Name), where SSN 2 981-101-0002

Example: 3 ∅
h(x) = x mod N (social security number) is a 4 451-229-0004
is a hash function for integer keys nine-digit positive integer
…
The integer h(x) is called the hash value of key x Our hash table uses an
array of size N = 10,000 and 9997 ∅
9998 200-751-9998
A hash table for a given key type consists of the hash function 9999 ∅
 Hash function h h(x) = last four digits of x
 Array (called table) of size N
When implementing a dictionary with a hash table, the goal is to

store item (k, o) at index i = h(k)
Hash Functions (§8.2.2) Hash Code Maps (§8.2.3)

Memory address: Component sum:
The hash code map is We reinterpret the memory We partition the bits of
A hash function is
 
address of the key object as the key into components
applied first, and the
usually specified as the compression map is
an integer of fixed length (e.g., 16
Good in general, except for
composition of two applied next on the

numeric and string keys
or 32 bits) and we sum
the components
functions: result, i.e., Integer cast: (ignoring overflows)
h(x) = h2(h1(x))
Hash code map: The goal of the hash
 We reinterpret the bits of the  Suitable for numeric keys
key as an integer of fixed length greater
h1: keys → integers function is to Suitable for keys of length
 than or equal to the
Compression map: “disperse” the keys in less than or equal to the number of bits of the
an apparently random number of bits of the integer integer type (e.g., long
h2: integers → [0, N − 1] way type (e.g., char, short, int
and double on many
and float on many machines)
machines)
Hash Code Maps (cont.) Compression Maps (§8.2.4)

Polynomial accumulation: Polynomial p(z) can be
 We partition the bits of the key
evaluated in O(n) time
Division: Multiply, Add and
into a sequence of components
of fixed length (e.g., 8, 16 or 32 using Horner’s rule:  h2 (y) = y mod N Divide (MAD):
bits) The size N of the h2 (y) = (ay + b) mod N
a0 a1 … an−1  The following  
 We evaluate the polynomial polynomials are hash table is usually  a and b are
successively computed, chosen to be a prime
p(z) = a0 + a1 z + a2 z2 + …
each from the previous
nonnegative integers
… + an−1zn−1
at a fixed value z, ignoring one in O(1) time  The reason has to do such that
overflows p0(z) = an−1 with number theory a mod N ≠ 0
 Especially suitable for strings pi (z) = an−i−1 + zpi−1(z) and is beyond the ν Otherwise, every
(e.g., the choice z = 33 gives at
most 6 collisions on a set of (i = 1, 2, …, n −1) scope of this course integer would map to
50,000 English words) We have p(z) = pn−1(z) the same value b
2
Collision Handling
(§8.2.5) Exercise: chaining
Collisions occur when 0 ∅
Assume you have a hash table H with
1 025-612-0001
different elements are 2 ∅ N=9 slots (H[0,8]) and let the hash
mapped to the same 3 ∅ function be h(k)=k mod N.
cell 4 451-229-0004 981-101-0004
Demonstrate (by picture) the insertion
Chaining: let each of the following keys into a hash table
cell in the table point Chaining is simple, with collisions resolved by chaining.
to a linked list of but requires  5, 28, 19, 15, 20, 33, 12, 17, 10
elements that map additional memory
there outside the table
Linear Probing Search with Linear Probing

Open addressing: the Example: Consider a hash table A Algorithm find(k)
colliding item is placed in a that uses linear probing i ← h(k)
 h(x) = x mod 13
different cell of the table p←0
Linear probing handles  Insert keys 18, 41, 22, find(k) repeat
collisions by placing the 44, 59, 32, 31, 73, in this  We start at cell h(k) c ← A[i]
colliding item in the next order  We probe consecutive if c = ∅
(circularly) available table cell. locations until one of the return Position(null)
So the i-th cell checked is: following occurs else if c.key () = k
 H(k,i) = (h(k)+i)mod N  An item with key k is return Position(c)
Each table cell inspected is found, or
else
referred to as a “probe” 0 1 2 3 4 5 6 7 8 9 10 11 12  An empty cell is found, i ← (i + 1) mod N
or
Colliding items lump together, p←p+1
causing future collisions to  N cells have been
unsuccessfully probed until p = N
cause a longer sequence of 41 18 44 59 32 22 31 73
return Position(null)
probes 0 1 2 3 4 5 6 7 8 9 10 11 12
Exercise: open addressing &

Updates with Linear Probing linear probing
To handle insertions and insertItem(k, o)
deletions, we introduce a
special object, called
 We throw an exception Assume you have a hash table H with
AVAILABLE, which replaces
if the table is full
We start at cell h(k)
N=11 slots (H[0,10]) and let the hash
deleted elements
function be h(k)=k mod N.

removeElement(k)  We probe consecutive

cells until one of the
 We search for an item with
following occurs Demonstrate (by picture) the insertion
key k
 If such an item (k, o) is
 A cell i is found that is of the following keys into a hash table
either empty or stores
found, we replace it with the
special item AVAILABLE
AVAILABLE, or with collisions resolved by linear
and we return the position of  N cells have been
unsuccessfully probed
probing.
this item
 Else, we return a null  We store item (k, o) in  10, 22, 31, 4, 15, 28, 17, 88, 59
position cell i
3
Double Hashing Example of Double Hashing

Double hashing uses a k h (k ) d (k ) Probes
secondary hash function d(k) Common choice of Consider a hash 18 5 3 5
and handles collisions by compression map for the table storing integer 41 2 1 2
placing an item in the first keys that handles 22 9 6 9
available cell of the series secondary hash function: 44 5 5 5 10
h(k,i) =(h(k) + id(k)) mod N d2(k) = q − k mod q collision with double 59 7 4 7
for i = 0, 1, … , N − 1
where hashing 32 6 3 6
The secondary hash function 31 5 4 5 9 0
 N = 13
d(k) cannot have zero values  q<N 73 8 4 8
 h(k) = k mod 13
The table size N must be a  q is a prime
 d(k) = 7 − k mod 7
prime to allow probing of all The possible values for 0 1 2 3 4 5 6 7 8 9 10 11 12
the cells Insert keys 18, 41,
d2(k) are
1, 2, … , q 22, 44, 59, 32, 31,
31 41 18 32 59 73 22 44
73, in this order
0 1 2 3 4 5 6 7 8 9 10 11 12
Exercise: open addressing & Performance of

linear probing Hashing
In the worst case, searches,
The expected running time
Assume you have a hash table H with N=11 insertions and removals on a
hash table take O(n) time of all the dictionary ADT
slots (H[0,10]) and let the hash function be The worst case occurs when all operations in a hash table is
h(k)=k mod N. the keys inserted into the O(1)
dictionary collide In practice, hashing is very
Demonstrate (by picture) the insertion of the The load factor α = n/N affects fast provided the load factor
following keys into a hash table with collisions the performance of a hash table is not close to 100%
Assuming that the hash values Applications of hash tables:
resolved by double hashing with secondary are like random numbers, it can small databases
hash function h2(k)=1 + (k mod (N-1))

be shown that the expected
 compilers
number of probes for an
browser caches
 10, 22, 31, 4, 15, 28, 17, 88, 59 insertion with open addressing is 
1 / (1 − α)
Universal Hashing Proof of Universality (Part 1)

Let f(k) = ak+b mod p
So a(j-k) is a multiple of p
Let g(k) = k mod N But both are less than p
A family of hash functions
So h(k) = g(f(k)).
is universal if, for any So a(j-k) = 0. I.e., j=k.
0<i,j<M-1, f causes no collisions: (contradiction)
Theorem: The set of
Pr(h(j)=h(k)) < 1/N.  Let f(k) = f(j). Thus, f causes no collisions.
all functions, h, as
Choose p as a prime  Suppose k<j. Then
between M and 2M.

defined here, is  aj + b   ak + b 
universal. aj + b −   p = ak + b −  p  p
Randomly select 0<a<p  p   
and 0<b<p, and define   aj + b   ak + b  
h(k)=(ak+b mod p) mod N a ( j − k ) =   −   p
  p   p 
4
Ordered Dictionaries
Proof of Universality (Part 2) (§8.3)
If f causes no collisions, only g can make h cause collisions. In an ordered Dictionary, we Ordered Dictionary ADT:
Fix a number x. Of the p integers y=f(k), different from x, the number wish to perform the usual  In addition to the generic
such that g(y)=g(x) is at most p / N  − 1 dictionary operations, but also dictionary ADT, the ordered
maintain an order relation for dictionary ADT supports the
Since there are p choices for x, the number of h’s that will cause a the keys in the dictionary.
collision between j and k is at most following functions:
p( p − 1)  closestBefore(k): return the
p (p / N  − 1) ≤ Naturally supports position of an item with the
N  Look-Up Tables - store largest key less than or
There are p(p-1) functions h. So probability of collision is at most dictionary in a vector by equal to k
p( p − 1) / N 1 nondecreasing order of the keys  closestAfter(k): return the
=  Binary Search position of an item with the
p( p − 1) N
smallest key greater than or
equal to k
Therefore, the set of possible h functions is universal.
Lookup Table Binary Search

A lookup table is a dictionary implemented by means of a sorted Binary search performs operation find(k) on a dictionary
sequence implemented by means of an array-based sequence, sorted by
 We store the items of the dictionary in an array-based sequence, key
sorted by key
 similar to the high-low game
 We use an external comparator for the keys
 at each step, the number of candidate items is halved
Performance:
 terminates after a logarithmic number of steps
 find takes O(log n) time, using binary search
 insertItem takes O(n) time since in the worst case we have to shift Example: find(7)
n/2 items to make room for the new item 0 1 3 4 5 7 8 9 11 14 16 18 19
 removeElement take O(n) time since in the worst case we have to l m h
shift n/2 items to compact the items after the removal 0 1 3 4 5 7 8 9 11 14 16 18 19
The lookup table is effective only for dictionaries of small size or l m h
for dictionaries on which searches are the most common 0 1 3 4 5 7 8 9 11 14 16 18 19
operations, while insertions and removals are rarely performed l m h
(e.g., credit card authorizations)
0 1 3 4 5 7 8 9 11 14 16 18 19
l=m =h
Outline and Reading

What is a skip list (§8.4)
Skip Lists Operations
 Search (§8.4.1)
 Insertion (§8.4.2)
S3 −∞ +∞
 Deletion (§8.4.2)
S2 −∞ +∞
Implementation
15
S1 −∞ 15 23 +∞
Analysis (§8.4.3)
S0 −∞ 10 15 23 36 +∞  Space usage
 Search and update times
5
What is a Skip List Search

A skip list for a set S of distinct (key, element) items is a series of lists S 0, We search for a key x in a a skip list as follows:
S1 , … , Sh such that  We start at the first position of the top list
 Each list Si contains the special keys +∞ and −∞  At the current position p, we compare x with y ← key(after(p))
 List S0 contains the keys of S in nondecreasing order x = y: we return element(after(p))
 Each list is a subsequence of the previous one, i.e., x > y: we “scan forward”
S0 ⊇ S1 ⊇ … ⊇ Sh x < y: we “drop down”
 List Sh contains only the two special keys  If we try to drop down past the bottom list, we return NO_SUCH_KEY
We show how to use a skip list to implement the dictionary ADT Example: search for 78
S3 −∞ +∞ S3 −∞ +∞
S2 −∞ 31 +∞ S2 −∞ 31 +∞
S1 −∞ 23 31 34 64 +∞ S1 −∞ 23 31 34 64 +∞
S0 −∞ 12 23 26 31 34 44 56 64 78 +∞ S0 −∞ 12 23 26 31 34 44 56 64 78 +∞
Randomized Algorithms Insertion

A randomized algorithm We analyze the expected To insert an item (x, o) into a skip list, we use a randomized
performs coin tosses (i.e., running time of a randomized algorithm:
uses random bits) to control algorithm under the following  We repeatedly toss a coin until we get tails, and we denote with i
its execution assumptions the number of times the coin came up heads
 the coins are unbiased, and
It contains statements of the  If i ≥ h, we add to the skip list new lists S h+1, … , Si +1, each
 the coin tosses are independent
type containing only the two special keys
The worst-case running time of
b ← random()  We search for x in the skip list and find the positions p0, p1 , …, pi
a randomized algorithm is often
if b = 0 of the items with largest key less than x in each list S 0, S1, … , Si
large but has very low
probability (e.g., it occurs when  For j ← 0, …, i, we insert item (x, o) into list Sj after position pj
do A …
else { b = 1} all the coin tosses give “heads”) Example: insert key 15, with i = 2
S3 −∞ +∞
do B … We use a randomized algorithm p2
to insert items into a skip list
Its running time depends on S2 −∞ +∞ S2 −∞ 15 +∞
the outcomes of the coin p1
tosses S1 −∞ 23 +∞ S1 −∞ 15 23 +∞
p0
S0 −∞ 10 23 36 +∞ S0 −∞ 10 15 23 36 +∞
Deletion Implementation
To remove an item with key x from a skip list, we proceed as We can implement a skip list
follows: with quad-nodes
 We search for x in the skip list and find the positions p0, p1 , …, pi A quad-node stores:
of the items with key x, where position pj is in list Sj item
quad-node

 We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si  link to the node before
 We remove all but one list containing only the two special keys  link to the node after
Example: remove key 34  link to the node below x
 link to the node after
S3 −∞ +∞ Also, we define special keys
p2
PLUS_INF and MINUS_INF,
S2 −∞ 34 +∞ S2 −∞ +∞
p1 and we modify the key
S1 −∞ 23 34 +∞ S1 −∞ 23 +∞ comparator to handle them
p0
S0 −∞ 12 23 34 45 +∞ S0 −∞ 12 23 45 +∞
6
Space Usage Height

Consider a skip list with n items The running time of the Consider a skip list with n items
The space used by a skip list
By Fact 1, we insert an item in search an insertion By Fact 1, we insert an item in list
depends on the random bits
 
list Si with probability 1/2i algorithms is affected by the Si with probability 1/2i
used by each invocation of the By Fact 3, the probability that list
By Fact 2, the expected size of height h of the skip list 
insertion algorithm

list Si is n/2i Si has at least one item is at most
We use the following two basic We show that with high n/2i
The expected number of nodes probability, a skip list with n
probabilistic facts: By picking i = 3log n, we have that
used by the skip list is items has height O(log n)
Fact 1: The probability of getting i the probability that S3log n has at
consecutive heads when We use the following least one item is
flipping a coin is 1/2 i additional probabilistic fact: at most
h h n/23log n = n/n3 = 1/n2
Fact 2: If each of n items is n 1 Fact 3: If each of n events has
present in a set with ∑ 2i = n ∑ 2i < 2n probability p, the probability Thus a skip list with n items has
probability p, the expected size i =0 i =0
that at least one event height at most 3log n with
of the set is np probability at least 1 − 1/n2
Thus, the expected space occurs is at most np
usage of a skip list with n
items is O(n)
Search and Update Times Summary

The search time in a skip list When we scan forward in a list,
is proportional to the destination key does not
belong to a higher list
A skip list is a data Using a more complex
the number of drop-down structure for probabilistic analysis,

steps, plus  A scan-forward step is associated
the number of scan-forward
with a former coin toss that gave dictionaries that uses a one can show that
 tails
steps By Fact 4, in each list the
randomized insertion these performance
The drop-down steps are expected number of scan-forward algorithm
bounded by the height of the steps is 2
bounds also hold with
In a skip list with n high probability
skip list and thus are O(log n) Thus, the expected number of
with high probability scan-forward steps is O(log n)
items
The expected space used Skip lists are fast and
To analyze the scan-forward We conclude that a search in a 
steps, we use yet another skip list takes O(log n) expected is O(n) simple to implement in
probabilistic fact: time  The expected search, practice
Fact 4: The expected number of The analysis of insertion and insertion and deletion
coin tosses required in order deletion gives similar results time is O(log n)
to get tails is 2

Ch8.Dictionaries Hashing - Handouts

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch8.Dictionaries Hashing - Handouts

Uploaded by

Copyright:

Available Formats

Dictionaries 2/19/08 14:15

Outline and Reading

2/19/08 14:15 Dictionaries 1 2/19/08 14:15 Dictionaries 2

2/19/08 14:15 Dictionaries 3 2/19/08 14:15 Dictionaries 4

Direct Address Table Hash Tables

2/19/08 14:15 Dictionaries 5 2/19/08 14:15 Dictionaries 6

Hash Functions and

(SSN, Name), where SSN 2 981-101-0002

When implementing a dictionary with a hash table, the goal is to

2/19/08 14:15 Dictionaries 7 2/19/08 14:15 Dictionaries 8

Hash Functions (§8.2.2) Hash Code Maps (§8.2.3)

2/19/08 14:15 Dictionaries 9 2/19/08 14:15 Dictionaries 10

Hash Code Maps (cont.) Compression Maps (§8.2.4)

2/19/08 14:15 Dictionaries 11 2/19/08 14:15 Dictionaries 12

2/19/08 14:15 Dictionaries 13 2/19/08 14:15 Dictionaries 14

Linear Probing Search with Linear Probing

2/19/08 14:15 Dictionaries 15 2/19/08 14:15 Dictionaries 16

Exercise: open addressing &

removeElement(k)  We probe consecutive

2/19/08 14:15 Dictionaries 17 2/19/08 14:15 Dictionaries 18

Double Hashing Example of Double Hashing

2/19/08 14:15 Dictionaries 19 2/19/08 14:15 Dictionaries 20

Exercise: open addressing & Performance of

2/19/08 14:15 Dictionaries 21 2/19/08 14:15 Dictionaries 22

Universal Hashing Proof of Universality (Part 1)

between M and 2M.

2/19/08 14:15 Dictionaries 23 2/19/08 14:15 Dictionaries 24

2/19/08 14:15 Dictionaries 25 2/19/08 14:15 Dictionaries 26

Lookup Table Binary Search

Outline and Reading

2/19/08 14:15 Dictionaries 29 2/19/08 14:15 Dictionaries 30

What is a Skip List Search

2/19/08 14:15 Dictionaries 31 2/19/08 14:15 Dictionaries 32

Randomized Algorithms Insertion

2/19/08 14:15 Dictionaries 33 2/19/08 14:15 Dictionaries 34

2/19/08 14:15 Dictionaries 35 2/19/08 14:15 Dictionaries 36

Space Usage Height

2/19/08 14:15 Dictionaries 37 2/19/08 14:15 Dictionaries 38

Search and Update Times Summary

2/19/08 14:15 Dictionaries 39 2/19/08 14:15 Dictionaries 40

You might also like