You are on page 1of 26

ANNA UNIVERSITY REGIONAL CENTRE COIMBATORE

NAME: N.RAJA

CLASS:

First year MCA

SUBJECT: Sorting & Hashing

DATE:

05/11/2013

Sorting
Sorting algorithm is an algorithm that puts elements of a list in a certain order . The most-used orders are numerical order and lexicographical order. Sorting algorithm classification: Computaional complexcity. Memory utilization Stability-Maintaining relative order of records with equal eys. !o.of comparisions Methods applied li e insertion"exchange""selection"merging etc. Sorting is a process of linear ordering of list of ob#ects Sorting techniques are categoried into: $. %nternal sorting &. 'xternal sorting Internal sorting: %t ta es the place in the main memory of a computer. 'g: (ubble sort"%nsertion sort"Shell sort")uic sort"*eap Sort"etc External sorting: %t ta es the place in secondary memory of a computer"Since the number of ob#ects to be stored is too large to fit in main memory 'g:Merge sort"Multiway Merge"+olyphase merge.

Bubble Sort:
(ubble Sort is a simple sorting algorithm. %t wor s by repeatedly stepping through the list to be sorted" comparing two items at a time and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed" which indicates that the list is sorted. The algorithm gets its name from the way smaller elements ,bubble, to the top of the list. (ecause it only uses comparisons to operate on elements" it is a comparison sort .

Analysis of bubble sort : Worest case performance O(n2) Best case performance O(n) Average case performance O(n2) Advantages : It is very simple to implement. Disadvantages : If list is very large it take long time to evaluate. Algorithm steps : Step 1: Read the input list si e. Step 2: Read the list elements. Step !: Assign count =" Step #: Repeat the loop until count less than list si e.(for n-1 passes) i) Assign inner$count=count+1 ii) Repeat the loop inner$count less than list si e (for arrange the 1) If list [count]> list [inner$count] (check element value ) 2) temp = list [count] !) list [count]=list[inner$count] #) a[inner$count]=temp

C Program To Implement Bubble Sort :


-include.stdio.h/ void bubble0int a12"int n34 int i"#"temp"a1&52"n4 void main03 6 clrscr034 printf0,7n7n'nter how many numbers you want to sort7n7n,34 scanf0,8d,"9n34 printf0,7n'nter the numbers 7n,34 for 0i:;4i.n4i<<3 6 scanf0,8d,"9a1i234 = bubble0a"n34 printf0,7n7n>inal sorted list is ,34 for 0i:;4i.n4i<<3 printf0,8d ,"a1i234 getch034 = void bubble0int a12"int n3 6 int 4 for0i:;4i.n-$4i<<3 6 printf0,7n7n +?SS-/8d ,"i<$34 for0#:i<$4#.:n-$4#<<3 6 if0a1i2/a1#23 6 temp:a1i24 a1i2:a1#24 a1#2:temp4 = = for0 :;4 .n4 <<3 printf0,8d ,"a1 234 printf0,7n,34 = =

SAMP E I!P"# A!$ %"#P"#: 'nter how many elements you want to sort 5 'nter the numbers 5 @ A & $ +?SS-/$ $ 5 @ A & +?SS-/& $ & 5 @ A +?SS-/A $ & A 5 @ +?SS-/@ $ & A @ 5 >inal sorted list is $ & A @ 5

Implementation Of Selection Sort


Selection sort: Selection sort is a sorting algorithm" specifically an in-place comparison sort . %t has B0n&3 complexity" ma ing it inefficient on large lists" and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity" and also has performance advantages over more complicated algorithms in certain situations. 'ffectively" we divide the list into two parts: the sublist of items already sorted" which we build up from left to right and is found at the beginning" and the sublist of items remaining to be sorted" occupying the remainder of the array. %t have n0n-$3 comparisios. Analysis o& selection sort: Corest case performance B0n&3 (est case performance B0n&3 ?verage case performance B0n&3 Ad'antages: $. %t is very simple to implement. $isad'antages: $. %f list is very large it ta es long time to evaluate.

Algorithm Steps: $. >ind the minimum value in the list &. Swap it with the value in the first position A. Depeat the steps above for the remainder of the list 0starting at the second position and advancing each time3

C Program To Implement Selection Sort:


-include .stdio.h/ -include.conio.h/ void insertionEsort0int x12"int length3 6 int ey"i"#4 for0#:$4#.length4#<<3 6 ey:x1#24 i:#-$4 while0x1i2/ ey 99 i/:;3 6 x1i<$2:x1i24 i--4 = x1i<$2: ey4 = = void main03 6 void insertionEsort0int 12"int34 int ?1$;;24 int x:;"n:;4 clrscr034 printf0,FFFFFF%!S'DT%B! SBDTFFFFFF,34 printf0,7n7n'!T'D T*' G%M%T : ,34 scanf0,8d,"9n34 printf0,7n7n'!T'D T*' 'G'M'!TS B!' (H B!'7n7n,34 for0x:;4x.n4x<<3 scanf0,8d,"9?1x234 printf0,7n7n!B! SBDT'I G%ST7n7n,34 for0x:;4x.n4x<<3 6

printf0,7n8d,"?1x234 = insertionEsort0?"n34 printf0,7n7nSBDT'I G%ST7n7n,34 for0x:;4x.n4x<<3 6 printf0,7n8d,"?1x234 = getch034 = Sample Input and %utput: 'nter how many elements you want to sort 5 'nter the numbers 5 @ A & $ +?SS-/$ $ 5 @ A & +?SS-/& $ & 5 @ A +?SS-/A $ & A 5 @ +?SS-/@ $ & A @ 5 (inal sorted list is ) * + , -

Implementation Of Insertion Sort


Insertion Sort: %nsertion sort is a simple sorting algorithm" a comparison sort in which the sorted array 0or list3 is built one entry at a time. %t is much less efficient on large lists than more advanced algorithms such as quic sort " heapsort" or merge sort. 0or3 %nsertion sorts wor s by ta ing elements from the list one by one and inserting them in their current position into a new sorted list. %nsertion sort consists of !-$ passes. %nsertion Sort +rovides Several ?dvantages:

J Simple implementation J 'fficient for 0quite3 small data sets J ?daptive" i.e. efficient for data sets that are already substantially sorted: the time complexity is B0 n < d3" where d is the number of inversions J More efficient in practice than most other simple quadratic 0i.e. B 0 n &33 algorithms such as selection sort or bubble sort: the average running time is n &K@" and the running time is linear in the best case J Stable" i.e. does not change the relative order of elements with equal eys J %n-place " i.e. only requires a constant amount B0$3 of additional memory space J Bnline " i.e. can sort a list as it receives it imitations o& insertion sort: %t is relatively efficient for small lists and mostly- sorted list. %t is expensive because of shifting all following elements by one. ?nalysis of insertion sort: Corest case performance B0n&3 (est case performance B0n3 ?verage case performance B0n&3

Algorithm steps: Step $: Dead the input list size. Step &: Dead the list elements. step A: pass:$ step @: Depeat the following steps until pass reach size-$ 0for !-$ passes3 i3 ey:list1+ass2 ii3 swap:pass-$ iii3 Depeat the follwoing steps if this condition is true list1swap2/ ey and swap /: ;.0for comparision3. a3 list 1sawp<$2:list1swap 2 b3 swap-iv3 list 1swap<$2

C Program To Implement Insertion Sort


-include .stdio.h/ -include.conio.h/ void insertionEsort0int x12"int length3 6 int ey"i"#4 for0#:$4#.length4#<<3 6 ey:x1#24 i:#-$4 while0x1i2/ ey 99 i/:;3 6 x1i<$2:x1i24 i--4 = x1i<$2: ey4 = = void main03 6 void insertionEsort0int 12"int34 int ?1$;;24 int x:;"n:;4 clrscr034 printf0,FFFFFF%!S'DT%B! SBDTFFFFFF,34 printf0,7n7n'!T'D T*' G%M%T : ,34 scanf0,8d,"9n34 printf0,7n7n'!T'D T*' 'G'M'!TS B!' (H B!'7n7n,34 for0x:;4x.n4x<<3 scanf0,8d,"9?1x234 printf0,7n7n!B! SBDT'I G%ST7n7n,34 for0x:;4x.n4x<<3 6 printf0,7n8d,"?1x234 = insertionEsort0?"n34 printf0,7n7nSBDT'I G%ST7n7n,34 for0x:;4x.n4x<<3 6 printf0,7n8d,"?1x234 = getch034 =

SAMP E I!P"# A!$ %"#P"#: '!T'D T*' G%M%T : 5 '!T'D T*' 'G'M'!TS B!' (H B!' 5 @ A & $ SBDT'I G%ST :$ & A @ 5

Implementation Of Quick Sort


.uic/ Sort: )iuc sort is the most efficient internal sorting technique.%t posseses a very good average case behaviour among all the sorting techniques.%t is also called partioning sort which uses divide and conquer techniques. $e&inition: +ic an element from the array 0the pivot3" partition the remaining elements into those greater than and less than this pivot" and recursively sort the partitions. There are many variants of the basic scheme above: to select the pivot" to partition the array" to stop the recursion on small partitions" etc. J )uic sort is a comparison sort and" in efficient implementations" is not a stable sort. J )uic sort sorts by employing a divide and conquer strategy to divide a list into two sublists. Analysis o& the quic/ sort: J Corest case performance B0n&3 J (est case performance B0n log n3 J ?verage case performance B0n log n3 Ad'antages o& quic/ sort: J %t is faster than other B0! log !3 algorims. J %t has better cache performance and high speed.

imitation: Dequires More Memory space. Algorithm steps: J +ic an element" called a pivot" from the list. J Deorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it 0equal values can go either way3. ?fter this partitioning" the pivot is in its final position. This is called the partition operation. J Decursively sort the sub-list of lesser elements and the sub-list of greater elements.

C Program To Implement Quick Sort


-include.stdio.h/ -include.conio.h/ int i"#"n"pivot"a1&;24 void quic 0int a12"int left"int right34 void swap0int a12"int i"int #34 void main03 6 int n"a1&;24 textcolor0$534 clrscr034 printf0,7n7n)L%MN SBDT,34 printf0,7n7n'nter the limit : ,34 scanf0,8d,"9n34 textcolor0@34 textcolor0534 clrscr034 printf0,7n7n'nter the element7n7n,34 for0i:;4i.n4i<<3 scanf0,8d,"9a1i234 quic 0a";"n-$34 textcolor0$;34 printf0,7n7nThe sorted list is 7n7n,34 for0i:;4i.n4i<<3 printf0,8d ,"a1i234 getch034 = void quic 0int a12"int first"int last3 6 if0first.last3

6 pivot:a1first24 i:first4 #:last4 while0i.#3 6 while0a1i2.:pivot99i.last3 i<<4 while0a1#2/:pivot99#/first3 #--4 if0i.#3 swap0a"i"#34 = swap0a"first"#34 quic 0a"first"#-$34 quic 0a"#<$"last34 = = void swap0int a12"int i"int #3 6 int temp4 temp:a1i24 a1i2:a1#24 a1#2:temp4 =

SAMP E I!P"# A!$ %"#P"#: 'nter the limit : 5 'nter the elements 5 @ A & $ The sorted list is $&A@5

Implementation Of Merge Sort


Merge sort: The most common algorithm used in external sorting is the mergesort. This algorithm follows divide and conquer startegy. i3 Iividing +hase. The problem is divided into smaller problem and solved recursively. ii3 Monquering +hase. The partitioned array is merged together recursively. Merge sort is an O0n log n3 comparison-based sorting algorithm. %n most implementations it is stable" meaning that it preserves the input order of equal elements in the sorted output. %t is an example of the divide and conquer algorithmic paradigm. %t was invented by Ohn von !eumann in $P@5. Analysis o& the Merge sort: $. Corest case performance B0n log n3 &. (est case performance B0n log n3 A. ?verage case performance B0n log n3 Ad'antages: $. %t has better cache performance &. Merge Sort is a Stable sort A. %t is simpler to understand than heapsort imitation: $. %t need secondary storage device for the large amount of data. &. %t requires extra memeory space. A. ? small list will ta e fewer steps to sort than a large list. Algorithm steps: J J J J %f the list is of length ; or $" then it is already sorted. Btherwise: Iivide the unsorted list into two sublists of about half the size. Sort each sublist recursively by re-applying merge sort. Merge the two sublists bac into one sorted list.

C Program #o Implement Merge Sort


-include.stdio.h/ -include.conio.h/ void mergeEsplit0int a12"int first"int last34 void merge0int a12"int f$"int l$"int f&"int l&34 int a1&52"b1&524 void main03 6 int i"n4 clrscr034 printf0,7n7nM'DQ' SBDT,34 printf0,7n7nFFFFFFFFF,34 printf0,7n7n'nter the limit : ,34 scanf0,8d,"9n34 printf0,7n'nter the elements7n,34 for0i:;4i.n4i<<3 scanf0,8d,"9a1i234 mergeEsplit0a";"n-$34 printf0,7n7nSorted list : ,34 for0i:;4i.n4i<<3 printf0,7n 8d,"a1i234 getch034 = void mergeEsplit0int a12"int first"int last3 6 int mid4 if0first.last3 6 mid:0first<last3K&4 mergeEsplit0a"first"mid34 mergeEsplit0a"mid<$"last34 merge0a"first"mid"mid<$"last34 = = void merge0int a12"int f$"int l$"int f&"int l&3 6 int i"#" :;4 i:f$4 #:f&4 while0i.:l$ 99 #.:l&3

6 if0a1i2.a1#23 b1 2:a1i<<24 else b1 2:a1#<<24 <<4 = while0i.:l$3 b1 <<2:a1i<<24 while0#.:l&3 b1 <<2:a1#<<24 i:f$4 #:;4 while0i.:l& 99 #. 3 a1i<<2:b1#<<24 =

SAMP E I!P"# A!$ %"#P"#: 'nter the limit : 5 'nter the elements 5 @ A & $ Sorted list: $ & A @ 5

Implementation Of Heap Sort


0eap sort: *eapsort 0method3 is a comparison-based sorting algorithm" and is part of the selection sort family. ?lthough somewhat slower in practice on most machines than a good implementation of quic sort" it has the advantage of a worst-case 10n log n3 runtime. *eapsort is an in-place algorithm" but is not a stable sort. %n heap sort the array is interpreted as a binary tree. This method has & phases. %n phase $" binary heap is constructed. %n phase &" delete min routine is performed.

Analysis o& the heap sort: Corest case performance B0n log n3 (est case performance B0n log n3 ?verage case performance B0n log n3 Ad'antages o& heap sort: $.%t is efficient for sorting large number of elements. &.%t has the advantages of worst case B0! log !3 running time. imitations: $. %t is not a stable sort. &. %t requires more processing time Alogrithm steps: Step$: Dead the size of the list Step&: Dead the elements of the list StepA: (inary heap construction. $3 Structrue +roperty: >or any element in array position %"the left child is in position &i"the right child is in &i<$"0ie3 the cell after the left child. *2 0eap %rder Property: The value in the parent node is smaller than or equal to the ey value of any of its child node.(uild the heap"apply the heap order propery starting from the right. most non-leaf node at the bottom level. Step @: Ielete min routine is performed. The array elements aer stored using deletemin operation"which gives the elements arranged in descending order.

C Program To Implement Heap Sort


-include.stdio.h/ -include.conio.h/ int hsort1&52"n"i4 void ad#ust0int"int34 void heapify034 void main03 6 int temp4 clrscr034 printf0,7n7t7t7t7t*'?+ SBDT,34 printf0,7n7t7t7t7tFFFF FFFF7n7n7n,34 printf0,7nenter no of elements:,34 scanf0,8d,"9n34 printf0,7nenter elements to be sorted7n7n,34 for0i:$4i.:n4i<<3 scanf0,8d,"9hsort1i234 heapify034 for0i:n4i/:&4i--3 6 temp:hsort1$24 hsort1$2:hsort1i24 hsort1i2:temp4 ad#ust0$"i-$34 = printf0,7nSBDT'I 'G'M'!T7n7n,34 for0i:$4i.:n4i<<3 printf0,8d7n,"hsort1i234 getch034 = void heapify03 6 int i4 for0i:nK&4i/:$4i--3 ad#ust0i"n34

= void ad#ust0int i"int n3 6 int #"element4 #:&Fi4 element:hsort1i24 while0#.:n3 6 if00#.n3990hsort1#2.hsort1#<$233 #:#<<4 if0element/:hsort1#23 brea 4 hsort1#K&2:hsort1#24 #:&F#4 = hsort1#K&2:element4 = SAMP E I!P"# A!$ %"#P"#: 'nter !o Bf 'lements:5 'nter 'lements To (e Sorted 5 @ A & $ Sorted 'lement : $ & A @ 5

Hashing
We have all used a dictionary, and many of us have a word processor equipped with a limited dictionary, that is a spelling checker. We consider the dictionary, as an ADT. E amples of dictionaries are found in many applications, including the spelling checker, the thesaurus, the data dictionary found in data!ase management applications, and the sym!ol ta!les generated !y loaders, assem!lers, and compilers. "enerally we would want to perform the following operations on any sym!ol ta!le# Determine if a particular name is in the ta!le $etrieve the attri!utes of that name %odify the attri!utes of that name &nsert a new name and its attri!utes Delete a name and its attri!utes 'tatic HashingHash Ta!les &n static hashing, we store the identifiers in a fi ed si(e ta!le called a hash ta!le. We use an arithmetic function, f, to determine the address, or location, of an identifier, , in the ta!le. Thus, f ) * gives the hash, or home address, of in the ta!le. The hash ta!le ht is stored in sequential memory locations that are partitioned into ! !uckets, ht +,-, ., ht +!./-. Each !ucket has s slots. 0sually s 1 /, which means that each !ucket holds e actly one record. We use the hash function f ) * to transform the identifiers into an address in the hash ta!le. Thus, f ) * maps the set of possi!le identifiers onto the integers , through !./. The identifier density of a hash ta!le is the ratio n2T, where n is the num!er of identifiers in the ta!le. The loading density or loading factor of a hash ta!le is a1n 2)s!*.

Definition#

Hashing 3unction#
A hash function, f, transforms an identifier, , into a !ucket address in the hash ta!le. We want to hash function that

is easy to compute and that minimi(es the num!er of collisions. We know that identifiers, whether they represent varia!le names in a program, word in a dictionary, or names in a telephone !ook, cluster around certain letters of the alpha!et. To avoid collisions, the hash function should depend on all the characters in an identifier. &t also should !e un!iased.

%id.square#

The middle of square hash function is frequently used in sym!ol ta!le application. We compute the function fm !y squaring the identifier and then using an appropriate num!er of !its from the middle of the square to o!tain the !ucket address. 'ince the middle !its of the square usually depend upon all the characters in an identifier, there is a high pro!a!ility that different identifiers will produce different hash addresses, even when some of the characters are the same.

Division#
This hash function is using the modulus )4* operator. We divide the identifier !y some num!er % and use the remainder as the hash address of . The hash function is# fD ) * 1 4 % This gives !ucket address that range from , to %./, where % 1 the ta!le si(e. The choice of % is critical.

3olding#

&n this method, we partition the identifier into several parts. All parts, e cept for the last one have the same length. We then add the parts together to o!tain the hash address for . There are two ways of carrying out this addition. This method is known as shift folding. The second method, know as folding at the !oundaries, reverses every other partition !efore adding.

Digit Analysis#
The last method we will e amine, digit analysis, is used with static files. A static file is one in which all the identifiers are known in advance. 0sing this method, we first transform the identifiers into num!ers using some radi , r. We then e amine the digits of each identifier, deleting those digits that have the most skewed distri!utions.

5verflow Handling #

There are two methods for detecting collisions and overflows in a static hash ta!le6 each method using different data structure to represent the hash ta!le.

Tow %ethods#
7inear 5pen Addressing )7inear pro!ing* 8haining

7inear 5pen Addressing #


When use linear open addressing, the hash ta!le is represented as a one.dimensional array with indices that range from , to the desired ta!le si(e./. The component type of the array is a struct that contains at least a key field. 'ince the keys are usually words, we use a string to denote them. 9define %A:;8HA$ /, 9define TA<7E;'&=E /> struct element

? char key+%A:;8HA$-6 @6 element hash;ta!le+TA<7E;'&=E-6

<efore inserting any elements into this ta!le, we must initiali(e the ta!le to represent the situation where all slots are empty. This allows us to detect overflows and collisions when we insert elements into the ta!le. The o!vious choice for an empty slot is the empty string since it will never !e a valid key in any application.

&nitiali(ation of a hash ta!le#


void ? init;ta!le ) element ht+ - * short i6 for ) i 1 ,6 i A TA<7E;'&=E6 i BB * ht + i -.key+,- 1 C0776

To insert a new element into the hash ta!le we convert the key field into a natural num!er, and then apply one of the hash functions discussed in Hashing 3unction. The function transform )!elow* uses this simplistic approach. To find the hash address of the transformed key, hash )!elow* uses the division method. short transform )char Dkey * ? short num!er 1 ,6 while )Dkey* num!er B1 DkeyBB6 return @ num!er6

short hash )char Dkey* ? return )transform )key* 4 TA<7E;'&=E*6 @

3our outcomes result from the e amination of a hash ta!le !ucket#


D The !ucket contains . &n this case, is already in the ta!le. Depending on the application, we may either simply report a duplicate identifier, or we may update information in the other fields of the element. DThe !ucket contains the empty string. &n this case, the !ucket is empty, and we may insert the new element into it. DThe !ucket contains a nonempty string other than . &n this case we proceed to e amine the ne t !ucket. DWe return to the home !ucket ht +f ) *- )E 1 TA<7E;'&=E*. &n this case, the home !ucket is !eing e amined for the second time and all remaining !uckets have !een e amined. The ta!le is full and we report an error condition and e it.

&mplementation of the insertion strategy#


void ? linear;insert )element item, element ht + - *

short i, hash;value6 hash;value 1 hash )item.key*6 i 1 hash;value6 while )strlen )ht +i-.key* ? if )F strcmp )ht +i-.key, item.key* ? printf) GDuplicate entry FHnG*6 e it )/*6 @ i 1 )iB/* 4 TA<7E;'&=E6

if )i 1 1 hash;value* ? printf) GThe ta!le is full FHnG*6 e it )/*6 @ @ ht +i- 1 item6 @

8haining#

7inear pro!ing and its variations perform poorly !ecause inserting an identifier requires the comparison of identifiers with different hash values. 'ince we would not know the si(es of the lists in advance, we should maintain them as linked chains. 9define %A:;8HA$ /, 9define TA<7E;'&=E /> 9define &';3077 )ptr* )F)ptr** struct element ? char key+%A:;8HA$-6 @6 typedef struct list Dlis;pointer6 struct list ? element item6 list;pointer link6 @6 list;pointer hash;ta!le+TA<7E;'&=E-6 The function chain;insert )!elow* implements the chaining strategy. The function first computes the hash address for the identifier. &t then e amines the identifiers in the list for the selected !ucket. &f the identifier is found, we print an error message and e it.

&f the identifier is not in the list, we insert it at the end of the list. &f list was empty, we change the head node to point to the new entry.

mplementation of the function chain;insert#


void ? chain;insert )element item, list;pointer ht+-*

short hash;value 1 hash )item.key*6 list;pointer ptr, trail 1 C077, lead 1 ht +hash;value-6 for) 6 lead6 trail 1 lead, lead 1 lead.Ilink* ? if )Fstrcmp)lead.Iitem.key, item.key** ? printf)GThe key is in the ta!le HnG*6 e it )/*6 @ @ ptr 1 new struct list6 if )&';3077 )ptr** ? printf) GThe memory is full HnG*6 e it )/*6 @ ptr.Iitem 1 item6 ptr.Ilink 1 C0776 if )trail* trail.Ilink 1 ptr6 else ht +hash;value- 1 ptr6 @ pu!lic static long hash)'tring name, int ta!le'i(e* ? 22 get !ottom J !its of first char

int tmp 1 name.charAt),* 4 KLM6 22 now multiply !y ne t two J.!it chars tmp 1 tmp D )name.charAt)/* 4 KLM*6 tmp 1 tmp D )name.charAt)K* 4 KLM*6 22 Cow return an inde within ta!leNs !ounds return )tmp 4 ta!le'i(e*6

You might also like