You are on page 1of 35

Unit V

Sorting, Searching

Sorting:

Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Most common orders are in numerical or lexicographical order.

The importance of sorting lies in the fact that data searching can be optimized to a very high
level, if data is stored in a sorted manner. Sorting is also used to represent data in more readable
formats. Following are some of the examples of sorting in real-life scenarios.

Telephone Directory the telephone directory stores the telephone numbers of people
sorted by their names, so that the names can be searched easily.

Dictionary the dictionary stores words in an alphabetical order so that searching of any
word becomes easy.

Types of Sorting Techniques

There are many types of Sorting techniques, differentiated by their efficiency and space
requirements. Following are some sorting techniques which we will be covering in next sections.

1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Shell Sort
5. Quick Sort
6. Merge Sort
7. Radix Sort

Bubble Sort

Bubble Sort is an algorithm which is used to sort N elements that are given in a memory for eg:
an Array with N number of elements. Bubble Sort compares the entire element one by one and
sort them based on their values. It is also known as Sinking Sort.

It is called Bubble sort, because with each iteration the largest element in the list bubbles up
towards the last place, just like a water bubble rises up to the water surface.

Sorting takes place by stepping through all the data items one-by-one in pairs and comparing
adjacent data items and swapping each pair that is out of order.

How Bubble Sort Works?

We take an unsorted array for our example.


Bubble sort starts with very first two elements, comparing them to check which one is greater.

In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we compare 33
with 27.

We find that 27 is smaller than 33 and these two values must be swapped.

The new array should look like this

Next we compare 33 and 35. We find that both are in already sorted positions.

Then we move to the next two values, 35 and 10.

We know then that 10 is smaller 35. Hence they are not sorted.
We swap these values. We find that we have reached the end of the array. After one iteration, the
array should look like this

To be precise, we are now showing how an array should look like after each iteration. After the
second iteration, it should look like this

Notice that after each iteration, at least one value moves at the end.

And when there's no swap required, bubble sorts learns that an array is completely sorted.

Algorithm

We assume list is an array of n elements. We further assume that swap function swaps the values
of the given array elements.

begin BubbleSort(list)

for all elements of list


if list[i] > list[i+1]
swap(list[i], list[i+1])
end if
end for

return list

end BubbleSort

Program:
#include <iostream>

using namespace std;

// Sort arr[] of size n using Bubble Sort.


void BubbleSort (int arr[], int n)
{
int i, j;
for (i = 0; i < n; ++i)
{
for (j = 0; j < n-i-1; ++j)
{
// Comparing consecutive data and switching values if
value at j > j+1.
if (arr[j] > arr[j+1])
{
arr[j] = arr[j]+arr[j+1];
arr[j+1] = arr[j]-arr[j + 1];
arr[j] = arr[j]-arr[j + 1];
}
}
// Value at n-i-1 will be maximum of all the values below this
index.
}
}

int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;

int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}

BubbleSort(arr, n);

// Display the sorted data.


cout<<"\nSorted Data ";
for (i = 0; i < n; i++)
cout<<"->"<<arr[i];

return 0;

Insertion Sort

It is a simple Sorting algorithm which sorts the array by shifting elements one by one. Following
are some of the important characteristics of Insertion Sort.

1. It has one of the simplest implementation

2. It is efficient for smaller data sets, but very inefficient for larger lists.

3. Insertion Sort is adaptive, that means it reduces its total number of steps if given a
partially sorted list, hence it increases its efficiency.
4. It is better than Selection Sort and Bubble Sort algorithms.

5. Its space complexity is less. Like Bubble Sorting, insertion sort also requires a single
additional memory space.

6. It is a Stable sorting, as it does not change the relative order of elements with equal keys

How Insertion Sort Works?

We take an unsorted array for our example.

Insertion sort compares the first two elements.

It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list.

Insertion sort moves ahead and compares 33 with 27.

And finds that 33 is not in the correct position.

It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that the
sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-list
remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.

These values are not in a sorted order.

So we swap them.

However, swapping makes 27 and 10 unsorted.

Hence, we swap them too.

Again we find 14 and 10 in an unsorted order.

We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.

This process goes on until all the unsorted values are covered in a sorted sub-list.

Program for Insertion Sort

#include <stdlib.h>
#include <iostream.h>

using namespace std;

//member functions declaration


void insertionSort(int arr[], int length);
void printArray(int array[],int size);

int main() {
int array[5]= {5,4,3,2,1};
insertionSort(array,5);
return 0;
}

void insertionSort(int arr[], int length) {


int i, j ,tmp;
for (i = 1; i < length; i++) {
j = i;
while (j > 0 && arr[j - 1] > arr[j]) {
tmp = arr[j];
arr[j] = arr[j - 1];
arr[j - 1] = tmp;
j--;
}
printArray(arr,5);
}
}

void printArray(int array[], int size){


cout<< "Sorting tha array using Insertion sort... ";
int j;
for (j=0; j < size;j++)
for (j=0; j < size;j++)
cout <<" "<< array[j];
cout << endl;
}

Selection Sort

Selection sorting is conceptually the most simplest sorting algorithm. This algorithm first finds
the smallest element in the array and exchanges it with the element in the first position, then find
the second smallest element and exchange it with the element in the second position, and
continues in this way until the entire array is sorted.

Note: Selection sort is an unstable sort i.e it might change the occurrence of two similar elements
in the list while sorting. But it can also be a stable sort when it is implemented using linked list
data structures.
How Selection Sort Works?

Consider the following depicted array as an example.

For the first position in the sorted list, the whole list is scanned sequentially. The first position
where 14 is stored presently, we search the whole list and find that 10 is the lowest value.

So we replace 14 with 10. After one iteration 10, which happens to be the minimum value in the
list, appears in the first position of the sorted list.

For the second position, where 33 is residing, we start scanning the rest of the list in a linear
manner.

We find that 14 is the second lowest value in the list and it should appear at the second place. We
swap these values.

After two iterations, two least values are positioned at the beginning in a sorted manner.

The same process is applied to the rest of the items in the array.

Following is a pictorial depiction of the entire sorting process


Program for Selection Sort

#include <iostream>
using namespace std;

// Sort arr[] of size n using Selection Sort.


void SelectionSort (int arr[], int n)
{
int i, j;
for (i = 0; i < n; ++i)
{
for (j = i+1; j < n; ++j)
{
//Comparing consecutive data and switching values if i > j.
if (arr[i] > arr[j])
{
arr[i] = arr[i]+arr[j];
arr[j] = arr[i]-arr[j];
arr[i] = arr[i]-arr[j];
}
}
// Value at i will be minimum of all the value above this index.
}
}

int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;

int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}

SelectionSort(arr, n);

// Display the sorted data.


cout<<"\nSorted Data ";
for (i = 0; i < n; i++)
cout<<"->"<<arr[i];

return 0;
}

Quick Sort

Quick Sort, as the name suggests, sorts any list very quickly. Quick sort is not a stable search,
but it is very fast and requires very less additional space. It is based on the rule of Divide and
Conquer(also called partition-exchange sort). This algorithm divides the list into three main
parts :
1. Elements less than the Pivot element
2. Pivot element(Central element)
3. Elements greater than the pivot element
In the list of elements, mentioned in below example, we have taken 25 as pivot. So after the first
pass, the list will be changed like this.

6 8 17 14 25 63 37 52

Hence after the first pass, pivot will be set at its position, with all the elements smaller to it on its
left and all the elements larger than to its right. Now 6 8 17 14 and 63 37 52 are considered as
two separate lists, and same logic is applied on them, and we keep doing this until the complete
list is sorted.

Program for Quick Sort

#include<iostream>
#include<cstdlib>

using namespace std;

// Swapping two values.


void swap(int *a, int *b)
{
int temp;
temp = *a;
*a = *b;
*b = temp;
}

// Partitioning the array on the basis of values at high as pivot value.


int Partition(int a[], int low, int high)
{
int pivot, index, i;
index = low;
pivot = high;
// Getting index of pivot.
for(i=low; i < high; i++)
{
if(a[i] < a[pivot])
{
swap(&a[i], &a[index]);
index++;
}
}
// Swapping value at high and at the index obtained.
swap(&a[pivot], &a[index]);

return index;
}

// Random selection of pivot.


int RandomPivotPartition(int a[], int low, int high)
{
int pvt, n, temp;
n = rand();
// Randomizing the pivot value in the given subpart of array.
pvt = low + n%(high-low+1);

// Swapping pvt value from high, so pvt value will be taken as pivot
while partitioning.
swap(&a[high], &a[pvt]);

return Partition(a, low, high);


}

// Implementing QuickSort algorithm.


int QuickSort(int a[], int low, int high)
{
int pindex;
if(low < high)
{
// Partitioning array using randomized pivot.
pindex = RandomPivotPartition(a, low, high);
// Recursively implementing QuickSort.
QuickSort(a, low, pindex-1);
QuickSort(a, pindex+1, high);
}
return 0;
}

int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;

int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}
QuickSort(arr, 0, n-1);

// Printing the sorted data.


cout<<"\nSorted Data ";
for (i = 0; i < n; i++)
cout<<"->"<<arr[i];

return 0;
}

Merge Sort

Merge Sort follows the rule of Divide and Conquer. In merge sort the unsorted list is divided
into N sublists, each having one element, because a list consisting of one element is always
sorted. Then, it repeatedly merges these sublists, to produce new sorted sublists, and in the end,
only one sorted list is produced.

Merge sort first divides the array into equal halves and then combines them in a sorted manner.
How Merge Sort Works?
To understand merge sort, we take an unsorted array as the following

We know that merge sort first divides the whole array iteratively into equal halves unless the
atomic values are achieved. We see here that an array of 8 items is divided into two arrays of size
4.

This does not change the sequence of appearance of items in the original. Now we divide these
two arrays into halves.

We further divide these arrays and we achieve atomic value which can no more be divided.

Now, we combine them in exactly the same manner as they were broken down. Please note the
color codes given to these lists.
We first compare the element for each list and then combine them into another list in a sorted
manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10 and in the target
list of 2 values we put 10 first, followed by 27. We change the order of 19 and 35 whereas 42
and 44 are placed sequentially.

In the next iteration of the combining phase, we compare lists of two data values, and merge
them into a list of found data values placing all in a sorted order.
After the final merging, the list should look like this

Program for Merge Sort

#include <iostream>

using namespace std;

// A function to merge the two half into a sorted data.


void Merge(int *a, int low, int high, int mid)
{
// We have low to mid and mid+1 to high already sorted.
int i, j, k, temp[high-low+1];
i = low;
k = 0;
j = mid + 1;

// Merge the two parts into temp[].


while (i <= mid && j <= high)
{
if (a[i] < a[j])
{
temp[k] = a[i];
k++;
i++;
}
else
{
temp[k] = a[j];
k++;
j++;
}
}

// Insert all the remaining values from i to mid into temp[].


while (i <= mid)
{
temp[k] = a[i];
k++;
i++;
}

// Insert all the remaining values from j to high into temp[].


while (j <= high)
{
temp[k] = a[j];
k++;
j++;
}
// Assign sorted data stored in temp[] to a[].
for (i = low; i <= high; i++)
{
a[i] = temp[i-low];
}
}

// A function to split array into two parts.


void MergeSort(int *a, int low, int high)
{
int mid;
if (low < high)
{
mid=(low+high)/2;
// Split the data into two half.
MergeSort(a, low, mid);
MergeSort(a, mid+1, high);

// Merge them to get sorted output.


Merge(a, low, high, mid);
}
}

int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;

int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}

MergeSort(arr, 0, n-1);

// Printing the sorted data.


cout<<"\nSorted Data ";
for (i = 0; i < n; i++)
cout<<"->"<<arr[i];

return 0;
}

Shell Sort

Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm. This
algorithm avoids large shifts as in case of insertion sort, if the smaller value is to the far right and
has to be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them and then sorts
the less widely spaced elements. This spacing is termed as interval. This interval is calculated
based on Knuth's formula as
Knuth's Formula
h=h*3+1
where h is interval with initial value 1
How Shell Sort Works?

Let us consider the following example to have an idea of how shell sort works. We take the same
array we have used in our previous examples. For our example and ease of understanding, we
take the interval of 4. Make a virtual sub-list of all values located at the interval of 4 positions.
Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 44}

We compare values in each sub-list and swap them (if necessary) in the original array. After this
step, the new array should look like this

Then, we take interval of 2 and this gap generates two sub-lists - {14, 27, 35, 42}, {19, 10, 33,
44}

We compare and swap the values, if required, in the original array. After this step, the array
should look like this
Finally, we sort the rest of the array using interval of value 1. Shell sort uses insertion sort to sort
the array.
Following is the step-by-step depiction

// C++ implementation of Shell Sort


#include <iostream>
using namespace std;

/* function to sort arr using shellSort */


int shellSort(int arr[], int n)
{
// Start with a big gap, then reduce the gap
for (int gap = n/2; gap > 0; gap /= 2)
{
// Do a gapped insertion sort for this gap size.
// The first gap elements a[0..gap-1] are already in gapped order
// keep adding one more element until the entire array is
// gap sorted
for (int i = gap; i < n; i += 1)
{
// add a[i] to the elements that have been gap sorted
// save a[i] in temp and make a hole at position i
int temp = arr[i];

// shift earlier gap-sorted elements up until the correct


// location for a[i] is found
int j;
for (j = i; j >= gap && arr[j - gap] > temp; j -= gap)
arr[j] = arr[j - gap];

// put temp (the original a[i]) in its correct location


arr[j] = temp;
}
}
return 0;
}

void printArray(int arr[], int n)


{
for (int i=0; i<n; i++)
cout << arr[i] << " ";
}

int main()
{
int arr[] = {12, 34, 54, 2, 3}, i;
int n = sizeof(arr)/sizeof(arr[0]);

cout << "Array before sorting: \n";


printArray(arr, n);

shellSort(arr, n);

cout << "\nArray after sorting: \n";


printArray(arr, n);

return 0;
}
Radix Sort

Radix sort is a small method that many people intuitively use when alphabetizing a large list of
names. Specifically, the list of names is first sorted according to the first letter of each name, that
is, the names are arranged in 26 classes.
Intuitively, one might want to sort numbers on their most significant digit. However, Radix sort
works counter-intuitively by sorting on the least significant digits first. On the first pass, all the
numbers are sorted on the least significant digit and combined in an array. Then on the second
pass, the entire numbers are sorted again on the second least significant digits and combined in
an array and so on.

Searching:
Searching is an operation or a technique that helps finds the place of a given
element or value in the list. Any search is said to be successful or unsuccessful depending upon
whether the element that is being searched is found or not. Some of the standard searching
technique that is being followed in data structure is listed below:

Linear Search or Sequential Search


Binary Search
Linear Search:

This is the simplest method for searching. In this technique of searching, the element to be found
in searching the elements to be found is searched sequentially in the list. This method can be
performed on a sorted or an unsorted list (usually arrays). In case of a sorted list searching starts
from 0th element and continues until the element is found from the list or the element whose
value is greater than (assuming the list is sorted in ascending order), the value being searched is
reached.

As against this, searching in case of unsorted list also begins from the 0th element and continues
until the element or the end of the list is reached.

Example:
The list given below is the list of elements in an unsorted array. The array contains 10 elements.
Suppose the element to be searched is 46, so 46 is compared with all the elements starting from
the 0th element and searching process ends where 46 is found or the list ends.

The performance of the linear search can be measured by counting the comparisons done to find
out an element.

Program for Linear Search:

#include <iostream>
#include<conio.h>
#include<stdlib.h>
#define MAX_SIZE 5
using namespace std;

int main() {
int arr_search[MAX_SIZE], i, element;

cout << "Simple C++ Linear Search Example - Array\n";


cout << "\nEnter " << MAX_SIZE << " Elements for Searching : " << endl;
for (i = 0; i < MAX_SIZE; i++)
cin >> arr_search[i];

cout << "\nYour Data :";


for (i = 0; i < MAX_SIZE; i++) {
cout << "\t" << arr_search[i];
}

cout << "\nEnter Element to Search : ";


cin>>element;
/* for: Check elements one by one - Linear */
for (i= 0; i < MAX_SIZE; i++) {
/*If for Check element found or not */
if(arr_search[i] == element) {
cout << "\nLinear Search : Element : " << element << " : Found :
Position : " << i + 1 << ".\n";
break;
}
}

if (i == MAX_SIZE)
cout << "\nSearch Element : " << element << " : Not Found \n";

getch();
}

Binary Search

Binary search is a very fast and efficient searching technique. It requires the list to be in sorted
order. In this method, to search an element you can compare it with the present element at the
center of the list. If it matches, then the search is successful otherwise the list is divided into two
halves: one from the 0th element to the middle element which is the center element (first half)
another from the center element to the last element (which is the 2nd half) where all values are
greater than the center element.

The searching mechanism proceeds from either of the two halves depending upon whether the
target element is greater or smaller than the central element. If the element is smaller than the
central element, then searching is done in the first half, otherwise searching is done in the second
half.

Example:

For a binary search to work, it is mandatory for the target array to be sorted. We shall learn the
process of binary search with a pictorial example. The following is our sorted array and let us
assume that we need to search the location of value 31 using binary search.

First, we shall determine half of the array by using this formula

mid = low + (high - low) / 2

Here it is, 0 + (9-0) / 2 = 4 (integer value of 4.5). So, 4 is the mid of the array.
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We find
that the value at location 4 is 27, which is not a match. As the value is greater than 27 and we
have a sorted array, so we also know that the target value must be in the upper portion of the
array.

We change our low to mid + 1 and find the new mid value again.

low = mid + 1
mid = low + (high-low) / 2

Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.

The value stored at location 7 is not a match, rather it is more than what we are looking for. So,
the value must be in the lower part from this location.

Hence, we calculate the mid again. This time it is 5.


We compare the value stored at location 5 with our target value. We find that it is a match.

We conclude that the target value 31 is stored at location 5.

Binary search halves the searchable items and thus reduces the count of comparisons to be made
to very less numbers.

Program for Binary Search:

#include <iostream>
#include<conio.h>
#include<stdlib.h>

#define MAX_SIZE 5

using namespace std;

int main() {
int arr_search[MAX_SIZE], i, element;
int f = 0, r = MAX_SIZE, mid;

cout << "Simple C++ Binary Search Example - Array\n";


cout << "\nEnter " << MAX_SIZE << " Elements for Searching : " << endl;
for (i = 0; i < MAX_SIZE; i++)
cin >> arr_search[i];
cout << "\nYour Data :";
for (i = 0; i < MAX_SIZE; i++) {
cout << "\t" << arr_search[i];
}
cout << "\nEnter Element to Search : ";
cin>>element;
while (f <= r) {
mid = (f + r) / 2;
if (arr_search[mid] == element) {
cout << "\nSearch Element : " << element << " : Found : Position
: " << mid + 1 << ".\n";
break;
} else if (arr_search[mid] < element)
f = mid + 1;
else
r = mid - 1;
}

if (f > r)
cout << "\nSearch Element : " << element << " : Not Found \n";

getch();
}
Hashing:
It is a technique used for performing insertions, deletions and search operation in constant
average time by implementing Hash table data structure.

Instead of comparisons, it uses a mathematical function

Types of hashing

1.Static hashing

the hash function maps search key value to a fixed set of locations

2.Dynamic hashing

the hash table can grow to handle more items at run time.

Hash table

The hash table data structure is an array of some fixed size table, containing the keys. A key
value is associated with each record. A hash table is partitioned into array of buckets. Each
bucket has many slots and each slot holds one record

Hashing functions
A hashing function is a key-to-address transformation which acts upon a given key to
compare the relative position of the key in the hash table.

A key can be a number, string, record, etc.

A simple hash function

Hash (Key) = (Key) Mod (Table-size)

For example, if the key is 24 and the table size is 5, then

Hash (24) = 24 % 5 = 4

The key value 24 is placed in the relative location 4 in the hash table

Hash function

A good Hash Function should

Minimize collisions

Be easy and quick to compute

Distribute keys evenly in the hash table

Use all the information provided in the key


Routine for simple Hash Function

Hash(char *key, int Table_size)


{
int Hash_value = 0;
while(*key != 10)
Hash_value = Hash_value + *key;
*key++;
return (Hash_value % Table_size);
}
Methods of Hashing Function

1. Mid square method


2. Modulo division or division remainder
3. Folding method
4. Pseudo random number generator method
5. Digit or character extraction method
6. Radix transformation
1. Mid square method

The key is squared and the middle part of the result is taken as the hash value based on
the number or digits required for addressing.

H(X) = middle digits of X

For example :

Map the key 2453 into a hash table of size 1000.

Now X = 2453 and X = 6017209

Extracted middle value 172 is the hash value

1. Modulo division

This method computes hash value from key using modulo(%) operator

H(key) = Key % Table_size

Index Slot
0 4
1
2
3
For example:

Map the key 4 into a hash table of size 4

H(4) = 4 % 4 = 0

2. Folding method

This method involves splitting keys into two or more parts each of which has the same
length as the required address and then adding the parts to form the hash function

Two types
o Fold shifting method
o Fold boundary method

Fold shifting method


o key = 123203241
o Partition key into 3 parts of equal length.
o 123, 203 & 241
o Add these three parts
o 123+203+241 = 567 is the hash value
Fold boundary method
o Similar to fold shifting except the boundary parts are reversed
o 123203241 is the key
o 3 parts 123, 203, 241
o Reverse the boundary partitions
o 321, 203, 142
o 321 + 203 + 142 = 666 is the hash value
4. Pseudo random number generator method
This method generates random number given a seed as parameter and the resulting
random number then scaled into the possible address range using modulo division. The random
number produced can be transformed to produce a hash value.

5. Digit or Character extraction method


This method extracts the selected digits from the key

Example:
Map the key 123203241 to a hash table size of 1000
Select the digits from the positions 2 , 5, 8 .
Now the hash value = 204

6. Radix transformation
In this method , a key is transformed into another number base

Example :
Map the key (8465)10 using base 15
Now (8465)10 = (2795)15
Now the hash value is 2795
Applications of Hash tables
Database systems
Symbol tables
Data dictionaries
Network processing algorithms
Browse caches
Collision

Collision occurs when a hash value of a record being inserted hashes to an address that already
contain a different record. (i.e) when two key values hash to the same position.
Example : 37, 24 , 7
Index Slot
0
1
2 37
3
4 24
37 is placed in index 2
24 is placed in index 4
Now inserting 7
Hash (7) = 7 mod 5 = 2
2 collides
Collision Resolution strategies

The process of finding another position for the collide record is called Collision Resolution
strategy.
Two categories
1. Open hashing - separate chaining

Each bucket in the hash table is the head of a linked list. All elements that hash to
same value are linked together.
2. Closed hashing - Open addressing, rehashing and extendible hashing.
Collide elements are stored at another slot in the table.
It ensures that all elements are stored directly into the hash table.

1) Separate chaining

It is an open hashing technique


A pointer field is added to each record location.
10, 11, 81, 7, 34, 94, 17,29,89
Routine for insertion in Separate chaining
void insert(int key, Hashtable H)
{
Position pos,Newcell;
List L;
Pos=Find(key,H);
if(Pos==NULL) /*key is not found */
{
Newcell = malloc(sizeof(struct ListNode));
if(Newcell !=NULL)
{
L=HTheLists[Hash(key,H Tablesize)];
Newcell Next = L Next;
Newcell Element = key;
L Next = Newcell;
}
}
}
Position Find(int key, Hashtable H)
{
Position P;
List L;
L=HTheLists[Hash(key,H Tablesize)];
P = L Next;
while(P!=NULL && P Element !=key)
P=P Next;
return P;
}
Advantages and Disadvantages
Advantages

More number of elements can be inserted as it uses array of linked lists.
Collision resolution is simple and efficient.

Disadvantages
It requires pointers that occupy more space.
It takes more effort to perform search, since it takes time to evaluate the hash function
and also to traverse the list.

Open Addressing:

It is a closed hashing technique.


In this method, if collision occurs, alternative cells are tried until an empty
cell is found.
There are three common methods
Linear probing
Quadratic probing
Double hashing
a) Linear probing
In linear probing, for the ith probe the position to be tried is in a
linear function
F(i) = i, Hash(X) = X % Tablesize
Hi(x) =Hash(X) + F(i) mod Tablesize
= Hash(X) + i mod Tablesize
Example: To insert 42,39,69,21,71,55 to the hash table of size 10 using linear probing

1. H0(42) = 42 % 10 = 2
2. H0(39) = 39 %10 = 9
3. H0(69) = 69 % 10 = 9 collides with 39
H1(69) = (9+1) % 10 = 10 % 10 = 0
4. H0(21) = 21 % 10 = 1
5. H0(71) = 71 % 10 = 1 collides with 21
H1(71) = (1 +1) % 10 = 2 % 10 = 2 collides
with 42 H2(71) = (2 +1) % 10 = 3 % 10 = 3

6. H0(55) = 55 % 10 = 5
Index Empty Table After 42 After 39 After 69 After 21 After 71 After 55

0 69 69 69 69

1 21 21 21

2 42 42 42 42 42 42

3 71 71

5 55

9 39 39 39 39 39

Advantages
It doesnt require pointers
Disadvantages
It forms clusters that degrades the performance of the hash table
b) Quadratic probing
Based on quadratic function i.e., F(i) = i
Hi(x) =Hash(X) + F(i) mod Table size
Example: To insert 89, 18, 49, 58, 69 to the hash table of size 10 using quadratic probing
1. H0(89) = 89 %10 =9
2. H0(18) = 18 %10 =8
3. H0(49) = 49 %10 =9 collides with 89
4. H1(49) = (9+12) % 10 = 10 % 10 = 0
5. H0(58) = 58 %10 = 8 collides with 18
H1(58) = (8 +12) % 10 = 9 % 10 = 9 collides with 89
H2(58) = (8 + 22) % 10 = 12 % 10 = 2
5. H0(69) = 69 % 10 =9 collides with 89
H1(69) = (9 +12) % 10 = 10 % 10 = 0 collides with 49
H2(69) = (9 + 22) % 10 = 13 % 10 = 3

Index Empty Table After 89 After 18 After 49 After 58 After 69

0 49 49 49

2 58 58

3 69

5 55

8 18 18 18 18

9 89 89 89 89 89

Limitations:
It faces secondary clustering that is difficult to find the empty slot if the table is half full.
C) Double Hashing
It uses the idea of applying a second hash function to the key when a collision occurs.

The result of the second hash function will be the number of positions from the point
of collision to insert.
F(i) = i * Hash2(X)
Hi(x) = (Hash(X) + F(i)) mod Tablesize
Hi(x) = (Hash(X) + i * Hash2(X) ) mod Tablesize A popular second hash function is
Hash2(X) = R (X % R)
where R is a prime number
Insert : 89, 18, 49, 58, 69 using Hash2(X) = R (X % R) and R = 7
Open addressing hash table using double hashing

Here Hash(X) = X % 10 & Hash2(X) = 7 (X % 7)

Index Empty Table After 89 After 18 After 49 After 58 After 69

0 69

3 58 58

6 49 49 49

8 18 18 18 18

9 89 89 89 89 89

1. H0(89) = 89 % 10 = 9
2. H0(18) = 18 % 10 = 8
3. H0(49) = 49 % 10 = 9 collides with 89
H1(49) = ((49 % 10 ) + 1 * (7- (49 % 7)) ) % 10
=16 % 10 = 6
4. H0(58) = 58 % 10 = 8 collides with 18
H1(58) = ((58 % 10 ) + 1 * (7- (58 % 7)) ) % 10
=13 % 10 = 3
5. H0(69) = 69 % 10 = 9 collides with 89
H1(69) = ((69 % 10 ) + 1 * (7- (69 % 7)) ) % 10
= 10 % 10 = 0

Rehashing:

It is a closed hashing technique.


If the table gets too full, then the rehashing method builds new table that is about twice as
big and scan down the entire original hash table, comparing the new hash value for each
element and inserting it in the new table.
Rehashing is very expensive since the running time is O(N), since there are N elements to
rehash and the table size is roughly 2N
Rehashing can be implemented in several ways like

a. Rehash , as soon as the table is half full


b. Rehash only when an insertion fails

Routine for rehashing

HashTable Rehash(HashTable H)
{
int i, oldsize;
cell *oldcells;
oldcells = HThecells;
oldsize = H Table_size;
H= InitializeTable(2*oldsize);
for (i=0;i<oldsize; i++)
if (oldcells[i].Info==Legitimate)
Insert(oldcells[i].Element.H);
free(oldcells);
return H;
}

Example : Suppose the elements 13, 15, 24, 6 are inserted into an open addressing hash table of
size 7 and if linear probing is used when collision occurs.

Index Slot
0 6
1 15
2
3 24
4
5
6 13
If 23 is inserted, the resulting table will be over 70 percent full.

Index Slot
0 6
1 15
2 23
3 24
4
5
6 13
A new table is created. The size of the new table is 17, as this is the first prime number
that is twice as large as the old table size.
Index Slot
0
1
2
3
4
5
6 6
7 23
8 24
9
10
11
12
13 13
14
15 15
16
Advantages
Programmer doesnt worry about the table size
Simple to implement
Extendible Hashing:
When open addressing or separate hashing is used, collisions could cause several
blocks to be examined during a Find operation, even for a well distributed hash table.
Furthermore , when the table gets too full, an extremely expensive rehashing step must
be performed, which requires O(N) disk accesses.
These problems can be avoided by using extendible hashing.
Extendible hashing uses a tree to insert keys into the hash table.

Example:
Consider the key consists of several 6 bit integers.
The root of the tree contains 4 pointers determined by the leading 2 bits.
In each leaf the first 2 bits are identified and indicated in parenthesis.
D represents the number of bits used by the root(directory)
The number of entries in the directory is 2D

Suppose to insert the key 100100.


This would go to the third leaf but as the third leaf is already full.
So split this leaf into two leaves, which are now determined by the first three bits.
Now the directory size is increased to 3.
Similarly if the key 000000 is to be inserted, then the first leaf is split into 2 leaves.

Advantages & Disadvantages:


Advantages

Provides quick access times for insert and find operations on large databases.
Disadvantages

This algorithm does not work if there are more than M duplicates

You might also like