You are on page 1of 24

CS221A Data Structures &

Algorithms
Hashing

Agenda

Hashing Concepts and Preliminaries


Hash Function
Separate Chaining

Hashing

A technique used for performing insertions, deletions and


search in a constant average time.
A scenario in which the keys themselves point directly to
records.
Information encoded directly within a key can point us to
its associated record.
Examine the key and simply know where to look.

Hashing

Determine the location of the record by performing an


arithmetic computation on its key.
Result of this computation yields the location of the
record using a table called Hash-Table.
This computation is referred as Hash-Function.

Hashing

Typical Hash-table is an array of some fixed size,


containing the keys.
A key is typically a string with an associated value.
Each key is mapped into some number in the range 0
Tablesize -1 & placed in an appropriate cell.
Mapping is provided by Hash-function.
A hash function should be simple to compute and should
ensure that any two distinct keys get different cells.

Hashing

Typical Hash-table is an array of some fixed size,


containing the keys.
A key is typically a string with an associated value.
Each key is mapped into some number in the range 0
Tablesize -1 & placed in an appropriate cell.
Mapping is provided by Hash-function.
A hash function should be simple to compute and should
ensure that any two distinct keys get different cells.

Hashing
In the figure is an ideal Hashtable.
All the Distinct Greek Alphabetical
Names hash to distinct keys.
Beta Hashes to 1.
Theta Hashes to 3.
Epsilon Hashes to 6.

Alpha

Beta

Gamma

Theta

Omega

Delta

Epsilon

Pie

Hashing

In this case Keys are the


names of the contacts and
hash function maps it to
the index of the arrays
where there phone
number is stored.
The hash function is used
to transform the key into
the index (the hash) of an
array element (the slot or
bucket) where the
corresponding value is to
be found.

Hashing

Only issue is picking up or figuring out the hash function


& deciding what to do when two keys hash to the same
value (phenomena know to us as Collision).

Hashing

Get the juice flowing guys;


Problem Statement:

Lets assume we have to build an application that supports a


customer service department for some company. To simplify
the operation for both representatives and customers, how
will you store the data ?

Hashing

Simple Solution:

Key the account records by telephone number, thus when


answering a call, the service representative will retrieve
account information by entering the customers telephone
number into the system.
What sort of hash function you can come up with ?

Hash Function

For integer keys, then simply returning Key % Tablesize is


generally a reasonable strategy for a hash function.
If we have 0 key 99. and our table size is 10. what will
be the worst case scenario for this hash function. ?

Hash Function

For integer keys, then simply returning Key % Tablesize is


generally a reasonable strategy for a hash function.
If we have 0 key 99. and our table size is 10. what will
be the worst case scenario for this hash function. ?
Answer:

If the all the keys end in 0.

Hash Function

For the situations like this its preferred to have the table
size as Prime.
When the keys are random integers, this function is
effective in distributing the keys evenly.
When keys are string values an effective hash function can
be adding the ASCII values and using our Mod function to
create mapping.

Hash Function

typedef unsigned int Index;


Index Hash(Char *Key, int Tablesize)
{

int HashValue = 0;
While (*Key != \0)
{

HashValue + = *Key;

}
return HashValue % TableSize;

Hash Function

Where is the hash function in previous slide ineffective ?

Hash Function

If the table size is large, the function doesnt distribute


keys well.
For higher prime number table size i.e. for example
10,007, suppose all keys are less than 8 characters. Most
value a char can have is 127 in ASCII. So 127*8 = 1,016 is
the largest value hash function can assume..
0 1,016 are the possible values can be assumed. Try
taking Mod on this one ..
When two or more keys hash to same function, this is
known as collision. Lesser the collision better is your has
function.

Separate Chaining

Keep a list of all elements that hash to the same value.

Separate Chaining

To perform a Find, we use the hash function to determine


which list to traverse. We then traverse the list in a
normal manner, returning the position where the item is
found.

Separate Chaining

To perform an insert, we traverse down the appropriate


list to check whether the element is already in place.
If duplicates are expected, an extra field is usually kept
and this field would be incremented in the event of a
match.
If the element turns out to be new. It is either inserted in
front of the list or at the end of the list, whichever is
easier and its frequency of retrieval.

Hashing Implementation

typedef struct ListNode *Position;


typedef struct HashTbl *HashTable;
typedef Position List;
struct ListNode
{

}
struct HashTbl
{

ElementType Element;
Position Next;

int TableSize;
List *TheLists;

Hashing Implementation

HashTable InitializeTable(int TableSize)


{

HashTable H = NextPrime(TableSize);
HTheLists =
malloc(sizeof(List)*HTableSize);
For (int i =0;i<HTableSize;i++)
{

HTheLists[i] = malloc(sizeof(struct ListNode));


HTheLists[i]Next = Null;

}
return H;

Hashing Implementation

Position Find(ElementType Key, HashTable H)


{

Position P;
List L = H TheLists[Hash(Key, HTableSize)];
P=L Next;
While (P != NULL && PElement !=Key)
{
// Strcmp
P= PNext;
}
return P;

Hashing Implementation

void Insert(ElementType Key, HashTable H,


ElementType RecordValue)
{

Position Pos, NewCell;


List L;
Pos = Find(Key, H);
if (Pos == NULL)
{

NewCell = malloc(sizeof(struct ListNode));


L = HTheLists[Hash(Key, HTableSize)];
NewCellNext = LNext;
NewCellElement = RecordValue;
LNext = NewCell;

You might also like