You are on page 1of 5

HASHING Techniques

What is Hashing?

Hashing is a method to store data in an array so that storing, searching, inserting and deleting data is efficient
(theoretically it's O(1)). For this every record needs an unique key.
The basic idea is not to search for the correct position of a record with comparisons but to compute the position within
the array. The function that returns the position is called the hash function and the array is called a hash table.

Some Other Definitions


Hash Table

A Hash table is data structure that uses a random access data structure, such as an array, and a mapping function, called
a hash function, to allow average constant time O(1) searches.

Hash function

A hash function is a mapping between a set of input values and a set of integers, known as hash values. It is usually
denoted by H.

Hash of Key

Suppose, 'h' be a hash function and 'K' is a key, then h(K) is called hash-of-key. The hash of key is the index at which a
record with the key value K must be kept.

Direct Method

In direct hashing the key is the address without any algorithmic manipulation.
Direct hashing is limited, but it can be very powerful because it guarantees that there are no synonyms and therefore no
collision.

Modulo-division Method

This is also known as division remainder method.


This algorithm works with any list size, but a list size that is a prime number produces fewer collisions than other list
sizes.
The formula to calculate the address is :
Address = key MODULO list-size + 1
Where list-size is the number of elements in the array.
Example:
Given data :
Keys are : 137456 214562 140145

137456 % 19 + 1 = 11
214562 % 19 + 1 = 15
140145 % 19 + 1 = 2

Digit-extraction Method

Using digit extraction selected digits are extracted from the key and used as the address.
Example:
Using six-digit employee number to hash to a three digit address (000-999), we could select the first, third, and fourth
digits( from the left) and use them as the address.

The keys are:


379452 -> 394
121267 -> 112
378845 -> 388

Folding Method

Two folding methods are used they are:


1.Fold shift
2.Fold boundary

Fold Shift
In fold shift the key value is divided into parts whose size href="javascript:void(0);" style="color: rgb(0, 15, 255); text-
decoration: underline;" id="Y1685113S2"matches the size of the required address. Then the left and right parts are
shifted and added with the middle part.

Fold boundary
In fold boundary the left and right numbers are folded on a fixed boundary between them and the center number. The
two href="javascript:void(0);" style="color: rgb(0, 15, 255); text-decoration: underline;" id="Y1685113S4"outside
values are thus reversed.
Midsquare Method

In midsquare hashing the key is squared and the address is selected from the middle of the square number.
Limitation is the size of the key.

Example :
94522 = 89340304: address is 3403

Rotation Method

Rotation method is generally not used by itself but rather is incorporated in combination with other hashing methods.
It is most useful when keys are assigned serially.

Pseudo-random Hashing

A common random-number generator is shown below.


y = ax + c
To use the pseudorandom-number generator as a hashing method, we set x to the key, multiply it by the coefficient a,
and then add the constant c. The result is then divided by the list size, with the remainder being the hashed address.

Example:

Y= ((17 * 121267) + 7) modulo 307


Y= (2061539 + 7) modulo 307
Y= 2061546
Y= 41.
Open Addressing and Chaining

In hash tables, there's always a possibility that two data elements will hash to the same integer value. When this
happens, a collision takes place i.e. two data members s try to occupy the same place in the hash table array. There are
methods to deal with such situations like Open Addressing and Chaining .

There are three Open addressing methods, which vary in probe sequence to find the next vacant cell. These are Linear
probing, Quadratic probing and Double hashing.

Linear Probing / Sequential probing :

Linear Probing is resolving a hash collision by sequentially searching a hash table beginning at the location returned by
the hash function.
In this case, hash table is implemented using an array. The program stores the first element that generates a specific
array index at that index. For example, if the hash function generates 79, then you use array index 79 to store the
element. When the hash function generates the key 79 again, the program begins a sequential search starting at location
79, looking for the next available spot. The second element whose key was transformed by hash function into 79 will be
stored at the location 80, the third at 81 and so on. Of course, if 80 and 81 are already occupied, the elements will be
stored farther away from the location generated by hash function.

Quadratic Probing :

Quadratic Probing is a different way of rehashing. In the case of quadratic probing we are still looking for an empty
location. However, instead of incrementing offset by 1 every time, as in linear probing, we will increment the offset by
1, 3, 5, 7, ... We explore a sequence of location until an empty one is found as follows :
index, index + 1, index + 4, index + 9, index + 16, ...

Rehashing :

Resolving a collision by computing a new hash location (index) in the array.


Like linear probing, it uses one hash value as a starting point and then repeatedly steps forward an interval until the
desired value is located, an empty location is reached, or the entire table has been searched; but this interval is decided
using a second, independent hash function (hence the name double hashing). Unlike linear probing and quadratic
probing, the interval depends on the data, so that even values mapping to the same location have different bucket
sequences; this minimizes repeated collisions and the effects of clustering.
In other words, given independent hash functions h1 and h2, the jth location in the bucket sequence for value k in a hash
table of size m is :
h(j, k) = (h1(k) + j.h2(k)) mod m

Chaining :

In open addressing, collisions are resolved by looking for an open cell in the Hash table. A different approach is to
create a linked list at each index in the hash table. A different approach is to create a linked list at each index in the hash
table. A data item's key is hashed to the index in usual way, and the item is inserted into the linked list at that index.
Other items that hashes to the same index are just added to the linked list at that index. There is no need to search for
empty cells i the primary hash table array. This is known as Chaining method.
Lets consider the following example :

Elements : 89, 18, 49, 58, 9, 7

H(89) = 5 (Using Division - Remainder method)


H(18) = 4 (Using Division - Remainder method)
H(49) = 0 (Using Division - Remainder method)
H(58) = 2 (Using Division - Remainder method)
H(9) = (Using Division - Remainder method)
Here, there's already one element in the position 2, which is 58 in our example. But, now, 9 is also hashed to position 2.
When this type of situations occurs we say that a collision has taken place.

The collision avoided by Chaining method is an adjacency list representation. Whenever a collision takes place, we just
add to the adjacency list to the corresponding header where the collision occurred.

In our example, collision has occurred as header node 2, so we just add 9 and 58 to it as an adjacency list. If any further
collision occurs at 2 we add it to our existing list.

Example :

Difference between Linear Probing and Chaining :

Linear Probing Chaining


1) If the Hash function generates the same key, then If the Hash function generates the same key, then the
the algorithm invokes a sequential or linear search algorithm invokes a module that creates a node of a linked list
and places the value into the next available key. that eventually adds up at the back of the immediate last
generated node having the same key value.
2) It basically uses an array as Storage. It basically uses a linked list to store data for same hashed
values.

You might also like