Professional Documents
Culture Documents
Backtracking
Backtracking is a general algorithm for finding all (or some) solutions to some computational problem,
that incrementally builds candidates to the solutions, and abandons each partial candidate c
("backtracks") as soon as it determines that c cannot possibly be completed to a valid solution.
N Queens Problem
The n-queens problem consists in placing n non-attacking queens on an n-by-n chess board. A queen can
attack another queen vertically, horizontally, or diagonally. E.g. placing a queen on a central square of
the board blocks the row and column where it is placed, as well as the two diagonals (rising and falling)
at whose intersection the queen was placed.
The algorithm to solve this problem uses backtracking, but we will unroll the recursion. The basic idea is
to place queens column by column, starting at the left. New queens must not be attacked by the ones to
the left that have already been placed on the board. We place another queen in the next column if a
consistent position is found. All rows in the current column are checked. We have found a solution if we
placed a queen in the rightmost column. A solution to the four-queens problem is shown.
MATLAB Implementation
clc;clear;
n =4 ; k=1;
V=[0 0 0 0 0 0 0 0];
Nqueens(1,n,V);
function [V]=Nqueens(k,n,V)
for i=1:n
if possible(i,k,n,V) == 1
V(k)=i;
if k==n
V
return;
else
Nqueens(k+1,n,V);
end
end
end
function [x,V]=possible(i,k,n,V)
x = 1;
for j=1:k-1
if V(j) == i
x = 0;
end
if(abs(V(j)-i) == abs(j-k))
x = 0;
end
end
Backtracking Algorithms
We create a state space tree. A possible way to structure the tree is:-
promising (i )
return ( weightSoFar + totalPossibleLeft > S) && ( weightSoFar == S || weightSoFar + w[i + 1] < S )
Matlab Implementation
clc;clear;
w=[3 4 5 6 9 10];
W = 9;
total = 0;
for i=1:length(w)
X(i)=0;
total = total+w(i);
end
w=sort(w);
sum_of_subsets(0,0,total,w,W,X);
function [X,weight,total,W]=sum_of_subsets(i,weight,total,w,W,X)
if promising(i,weight,total,w,W)==1
if weight == W
X
else
X(i+1) = w(i+1);
sum_of_subsets(i+1, weight+w(i+1), total-w(i+1),w,W,X);
X(i+1) = 0;
sum_of_subsets(i+1,weight,total-w(i+1),w,W,X);
end
end
function [x]=promising(i,weight,total,w,W)
x=0;
x = ((weight+total >= W) & (weight==W || weight+w(i+1)<=W));
Graph Coloring
Let G be a graph with no loops. A k-coloring of G is an assignment of k colors to the vertices of G in such
a way that adjacent vertices are assigned different colors. If G has a k-coloring, then G is said to be k-
coloring, then G is said to be k-colorable. The chromatic number of G, denoted by X(G), is the smallest
number k for which is k-colorable. For example,
MATLAB implementation
clc;clear;
A=[0 1 0 1;1 0 1 0;0 1 0 1;1 0 1 0];
for i=1:length(A)
v(i)=0;
end
colors=3;
mcoloring(0,A,v,colors);
Backtracking Algorithms
function [x]=graphcolorpromising(i,A,v)
x=1;
for j=1:i-1
function [v]=mcoloring(i,A,v,colors)
if (graphcolorpromising(i,A,v)==1)
if (i == length(A))
v
else
for color = 1:colors
v(i+1) = color;
mcoloring(i+1,A,v,colors);
end
end
end
3-coloring
4-coloring
Backtracking Algorithms
5-coloring
Not a permissible coloring, since one of the edge has color blue at both ends.
Lecture Notes on Dynamic Programming
Dynamic Programming
In mathematics and computer science, dynamic programming is a method of solving complex problems
by breaking them down into simpler steps. It is applicable to problems that exhibit the properties of
overlapping sub-problems.
Top-down dynamic programming simply means storing the results of certain calculations, which are
then re-used later because the same calculation is a sub-problem in a larger calculation. Bottom-up
dynamic programming involves formulating a complex calculation as a recursive series of simpler
calculations.
Page 1 of 13
Lecture Notes on Dynamic Programming
Given sequence a1 < a2 <··· < an of n sorted keys, with a search probability pi for each key ai. Want to
build a binary search tree (BST) with minimum expected search cost.
Cost of BST is
n
= ∑ depth (ai ) ⋅ pi
i =1
Observations:
• Optimal BST may not have smallest height or may not be height balanced.
• Optimal BST may not have highest-probability key at root.
Let C(i, j) denote the cost of an optimal binary search tree containing ai,…,aj .
The cost of the optimal binary search tree with ak as its root :
Page 2 of 13
Lecture Notes on Dynamic Programming
MATLAB Implementation
clear;clc;
%A=[1 3;2 4;3 2;4 1];
A=[1 76;2 15;3 36;4 43;5 64];
for i=1:length(A)
sum = 0;
for j=1:length(A)
if (i<=j)
sum = sum + A(j,2);
Freq(i,j) = sum;
Cost(i,j) = sum;
Root(i,j)=A(j,1);
end
end
end
for d=1:length(A)
for i=1:length(A)-d
j=i+d;
mincost = 10000;
for k=i:j
if k-1>0
left = Cost(i,k-1);
else
left = 0
end
if k+1<=length(A)
right = Cost(k+1,j);
else
right = 0;
end
if left + right < mincost
mincost = left + right;
root=k;
end
end
Cost(i,j)= Freq(i,j)+mincost;
Root(i,j)= root;
end
end
Page 3 of 13
Lecture Notes on Dynamic Programming
Page 4 of 13
Lecture Notes on Dynamic Programming
Example:
A1 is 10 by 100 matrix ; A2 is 100 by 5 matrix; A3 is 5 by 50 matrix; A4 is 50 by 1 matrix; A1A2A3A4 is a 10
by 1 matrix
(A1(A2(A3A4)))
– A34 = A3A4 , 250 mults, result is 5 by 1
– A24 = A2A34 , 500 mults, result is 100 by 1
– A14 = A1A24 , 1000 mults, result is 10 by 1
– Total is 1750
((A1A2)(A3A4))
– A12 = A1A2 , 5000 mults, result is 10 by 5
Page 5 of 13
Lecture Notes on Dynamic Programming
(((A1A2)A3)A4)
– A12 = A1A2 , 5000 mults, result is 10 by 5
– A13 = A12A3 , 2500 mults, result is 10 by 50
– A14 = A13A4 , 500 mults, results is 10 by 1
– Total is 8000
((A1(A2A3))A4)
– A23 = A2A3 , 25000 mults, result is 100 by 50
– A13 = A1A23 , 50000 mults, result is 10 by 50
– A14 = A13A4 , 500 mults, results is 10 by
– Total is 75500
(A1((A2A3)A4))
– A23 = A2A3 , 25000 mults, result is 100 by 50
– A24 = A23A4 , 5000 mults, result is 100 by 1
– A14 = A1A24 , 1000 mults, result is 10 by 1
– Total is 31000
Page 6 of 13
Lecture Notes on Dynamic Programming
To calculate the product of a matrix-chain A1A2...An, n-1 matrix multiplications are needed,
though different orders have different costs.
For matrix-chain A1A2A3, if the three have sizes 10-by-2, 2-by-20, and 20-by-5, respectively, then
the cost of (A1A2)A3 is 10*2*20 + 10*20*5 = 1400, while the cost of A1(A2A3) is 2*20*5 +
10*2*5 = 300.
For matrix-chain Ai...Aj where each Ak has dimensions Pk-1-by-Pk, the minimum cost of the
product m[i,j] corresponds to the best way to cut it into Ai...Ak and Ak+1...Aj:
m[i, j] = 0 if i = j
min{ m[i,k] + m[k+1,j] + Pi-1PkPj } if i < j
Use dynamic programming to solve this problem: calculating m for sub-chains with increasing
length, and using another matrix s to keep the cutting point k for each m[i,j].
Page 7 of 13
Lecture Notes on Dynamic Programming
Page 8 of 13
Lecture Notes on Dynamic Programming
MATLAB implementation
clear;clc;
%A=[5 4;4 6;6 2;2 7;];
A=[15 55;55 9;9 20;20 13;13 16];
n=length(A);
for i=1:n
M(i,i) = 0;
end
for L=2:n
for i=1:n-L+1
j=i+L-1;
M(i,j)=Inf;
for k=i:j-1
q=M(i,k)+M(k+1,j)+A(i,1)*A(k,2)*A(j,2);
if q < M(i,j)
M(i,j) = q;
S(i,j)=k;
end
end
end
end
Example
Page 9 of 13
Lecture Notes on Dynamic Programming
Page 10 of 13
Lecture Notes on Dynamic Programming
Page 11 of 13
Lecture Notes on Dynamic Programming
FLOYD-WARSHALL(G,A )
n = length(G)
for i := 1 to n do
for j := 1 to n do
if i == j then
A[i,j] := Inf;
Else
A[i,j] := G(i,j);
end if
end for
end for
for k := 1 to n do
for i := 1 to n do
for j := 1 to n do
if A[i,k] + A[k,j] < A[i,j] then
A[i,j] = A[i,k] + A[k,j];
end if
end for
Page 12 of 13
Lecture Notes on Dynamic Programming
end for
end for
END
Example –
G=
Inf 1 1 1
1 Inf 1 Inf
1 1 Inf 1
1 Inf 1 Inf
A=
2 1 1 1
1 2 1 2
1 1 2 1
1 2 1 2
Example 2
A=
Inf 10 15 5 20
10 Inf 5 5 10
15 5 Inf 10 15
5 5 10 Inf 15
20 10 15 15 Inf
D=
10 10 15 5 20
10 10 5 5 10
15 5 10 10 15
5 5 10 10 15
20 10 15 15 20
Page 13 of 13
Greedy Algorithms & Divide and Conquer
Greedy Algorithms
A greedy algorithm repeatedly executes a procedure which tries to maximize the return based on
examining local conditions, with the hope that the outcome will lead to a desired outcome for the global
problem. In some cases such a strategy is guaranteed to offer optimal solutions, and in some other cases
it may provide a compromise that produces acceptable approximations.
Typically, the greedy algorithms employ simple strategies that are simple to implement and require
minimal amount of resources.
One example would be a cable TV company laying cable to a new neighbourhood. If it is constrained to
bury the cable only along certain paths, then there would be a graph representing which points are
connected by those paths. Some of those paths might be more expensive, because they are longer, or
require the cable to be buried deeper; these paths would be represented by edges with larger weights.
A spanning tree for that graph would be a subset of those paths that has no cycles but still connects to
every house. There might be several spanning trees possible. A minimum spanning tree would be one
with the lowest total cost.
Example –
Page 1 of 24
Greedy Algorithms & Divide and Conquer
Prim’s Algorithm
In computer science, Prim's algorithm is an algorithm that finds a minimum spanning tree for a
connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that
includes every vertex, where the total weight of all the edges in the tree is minimized. Prim's algorithm
is an example of a greedy algorithm.
Prim's algorithm has the property that the edges in the set A always form a single tree. We begin with
some vertex v in a given graph G =(V, E), defining the initial set of vertices A. Then, in each iteration, we
choose a minimum-weight edge (u, v), connecting a vertex v in the set A to the vertex u outside of set A.
Then vertex u is brought in to A. This process is repeated until a spanning tree is formed.
Page 2 of 24
Greedy Algorithms & Divide and Conquer
Page 3 of 24
Greedy Algorithms & Divide and Conquer
Page 4 of 24
Greedy Algorithms & Divide and Conquer
Kruskal's algorithm
Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected
weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex,
where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it
finds a minimum spanning forest (a minimum spanning tree for each connected component). Kruskal's
algorithm is an example of a greedy algorithm.
It proceeds by listing the weights in increasing order, and then choosing those edges having the smallest
weights, with the one restriction that we never want to complete a circuit. In other words, as we go
along the sorted list of weights, we will always select the corresponding edge for our spanning tree
unless that choice completes a circuit.
Page 5 of 24
Greedy Algorithms & Divide and Conquer
Algorithm Kruskal( G)
visited=[1 2 3 4 5 6];
k=1;
for i=1:length(G)
for j=1:length(G)
if(G(i,j)<inf)
B(k,1)=G(i,j);
B(k,2)=i;
B(k,3)=j;
k=k+1;
end
end
end
B = sortrows(B);
% sorts by first element of row check MATLAB
i=1;
while(i<=length(B))
i=i+1;
B(i,:)=[];
end
for i=1:length(B)
u=parent(B(i,2),visited);
v=parent(B(i,3),visited);
if(u~=v)
A(B(i,2),B(i,3))= B(i,1);
if u<v
visited(v)=u;
else
visited(u)=v;
end
end
end
function [y]=parent(y,visited)
while (visited(y)~=y)
y=visited(y);
end
Page 6 of 24
Greedy Algorithms & Divide and Conquer
Page 7 of 24
Greedy Algorithms & Divide and Conquer
Page 8 of 24
Greedy Algorithms & Divide and Conquer
Page 9 of 24
Greedy Algorithms & Divide and Conquer
Algorithm Dijkstras(G)
visited=[0 0 0 0 0 0];path=[0 0 0 0 0 0];
start=1;visited(start)=1;dest=4;
for i=1:length(G)
d(i)=G(start,i);
path(i)=start;
end
for i=1:length(G)
min = inf;
u=0;
for j=1:length(G)
if visited(j)==0
if d(j)<min
min = d(j);
u = j;
end
end
end
visited(u)=1;
for v=1:length(G)
if visited(v)==0
if ((d(u)+G(u,v))<d(v))
d(v)=d(u)+G(u,v);
path(v)=u;
end
end
end
end
Hint : Try this on MATLAB
Page 10 of 24
Greedy Algorithms & Divide and Conquer
O(V2)
Knapsack Problem
Given a set of items, each with a weight and a value, determine the number of each item to include in a
collection so that the total weight is less than a given limit and the total value is as large as possible. It
derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and
must fill it with the most useful items.
A thief robbing a store finds ‘n’ items; the i-th item is worth vi dollars and weighs wi pounds, where vi
and wi are integers. He wants to take as valuable a load as possible, but he can carry at most W pounds
in his knapsack for some integer W. Which items should he take? This is called the 0-1 knapsack problem
because each item must either be taken or left behind; the thief cannot take a fractional amount of an
item or take an item more than once.
In the fractional knapsack problem, the setup is the same, but the thief can take fractions of items,
rather than having to make a binary (0-1) choice for each item. You can think of an item in the 0-1
knapsack problem as being like a gold ingot, while an item in the fractional knapsack problem is more
like gold dust.
Fractional Knapsack
This is called the fractional knapsack problem because any fraction of each item can be used. Using a
greedy strategy to solve this problem, we pack the items in order of their benefit/weight value.
O/1 Knapsack
This is called the 0-1 knapsack problem because each item must be taken in its entirety. If we use a
greedy strategy to solve this problem, we would take the objects in order of their benefit/weight value.
Page 11 of 24
Greedy Algorithms & Divide and Conquer
Algorithm FractionalKnapsack(P,W)
clear;clc;
P=[ 25 24 15];W=[18 15 10]; m=20;
X=[0 0 0];k=1;
for j=1:length(P)
B(k,1)=P(j)/W(j);
B(k,2)=W(j);
B(k,3)=j;
k=k+1;
end
B=sortrows(B,-1);
for i=1:length(B)
if B(i,2)<m
X(i)=1;
m=m-B(i,2);
else
break;
end
end
if i<=length(B)
X(i)=m/B(i,2)
end
Page 12 of 24
Greedy Algorithms & Divide and Conquer
Formally, the input to the problem is a sequence of pairs (d1, g1), (d2, g2), . . . , (d n, gn) where gi is a non-
negative real number representing the profit obtainable from job i, and d i is the deadline for job i.
A schedule is an array S(1), S(2), . . . , S(d) where d = max di (i.e., the latest deadline, beyond which no
jobs can be scheduled), such that if S(t) = i, then job i is scheduled at time t, 1 < t < d. If S(t) = 0, then no
job is scheduled at time t.
Algorithm JobSchedule(B)
//B=[99 2;67 3;45 1;34 4;23 5;10 3;];
//B(profit,deadline)
B=sortrows(B,-1);
[Y I]=max(B);
d=B(I(1,2),2);
for i=1:d
S(i)=0;
end
for i=1:length(B)
if(S(B(i,2))==0)
S(B(i,2))=B(i,1);
end
end
We have n jobs to execute, each one of which takes a unit time to process. At any time instant we can
do only one job. Doing job i earns a profit pi. The deadline for job i is di. Suppose n = 4; p = [50, 10, 15,
30]; d = [2, 1, 2, 1]. It should be clear that we can process no more than two jobs by their respective
deadlines. The set of feasible sequences are:
Page 13 of 24
Greedy Algorithms & Divide and Conquer
Let tj be the time required retrieving program ij where programs are stored in the order I = i1, i2, i3, …,in.
The time taken to access a program on the tape is called the mean retrieval time (MRT)
i.e tj = ∑l ik k=1,2,...j
Now the problem is to store the programs on the tape so that MRT is minimized. From the above
discussion one can observe that the MRT can be minimized if the programs are stored in an increasing
order i.e., l1 < l2 < l3, … ln. Hence the ordering defined minimizes the retrieval time.
Assume that 3 sorted files are given. Let the length of files A, B and C be 7, 3 and 5 units respectively. All
these three files are to be stored on to a tape S in some sequence that reduces the average retrieval
time. The table shows the retrieval time for all possible orders.
Page 14 of 24
Greedy Algorithms & Divide and Conquer
Huffman Coding
Huffman code is a technique for compressing data. Huffman's greedy algorithm looks at the occurrence
of each character and it as a binary string in an optimal way.
Suppose we have a data consists of 100,000 characters that we want to compress. The characters in the
data occur with following frequencies.
a b c d e f
Frequency 45,000 13,000 12,000 16,000 9,000 5,000
Consider the problem of designing a "binary character code" in which each character is represented by a
unique binary string.
a b c d e f
Frequency 45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 000 001 010 011 100 101
In a fixed-length code each codeword has the same length. In a variable-length code codewords may
have different lengths. Here are examples of fixed and variable length codes for our problem.
Page 15 of 24
Greedy Algorithms & Divide and Conquer
Fixed-length code requires 300,000 bits while variable code requires 224,000 bits.
Page 16 of 24
Greedy Algorithms & Divide and Conquer
Page 17 of 24
Greedy Algorithms & Divide and Conquer
This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g.,
quicksort, merge sort), multiplying large numbers (e.g. Karatsuba), syntactic analysis (e.g., top-down
parsers), and computing the discrete Fourier transform (FFTs).
T(n) = 2 T(n/2) + 2
Page 18 of 24
Greedy Algorithms & Divide and Conquer
assuming that each addition and multiplication between single digits takes O(1), this multiplication takes
O(n2).
Imagine multiplying an n-digit number by another n-digit number, where n is a perfect power of 2. (This
will make the analysis easier.) We can split up each of these numbers into two halves.
Say, ‘A’ and ‘B’ are the numbers. We can split A into (AL x 10n/2 + AR) and B into (BL x 10n/2 + BR)
Written in this manner we have broken down the problem of the multiplication of 2 n-digit numbers
into 4 multiplications of n/2- digit numbers plus 3 additions. Thus, we can compute the running time
Now, the question becomes, can we optimize this solution in any way. In particular, is there any way to
reduce the number of multiplications done?
Now, consider the work necessary in computing P1, P2 and P3. Both P2 and P3 are n/2-digit
multiplications. But, P1 is a bit more complicated to compute. We do two n/2 digir additions, (this takes
O(n) time), and then one n/2-digit multiplication. (Potentially, n/2+1 bits…)
After that, we do two subtractions, and another two additions, each of which still takes O(n) time. Thus,
our running time T(n) obeys the following recurrence relation:
Although this seems it would be slower initially because of some extra pre-computing before doing the
multiplications, for very large integers, this will save time.
Algorithm IntegerMultiplication(A,B, n)
If n = 1
Return A x B
Else
P=A / 10n/2;Q=A mod 10n/2;
R= B /10n/2;S= B /10n/2;
P1 = IntegerMultiplication(P+Q,R+S,n/2)
P2 = IntegerMultiplication(P,R,n/2)
Page 19 of 24
Greedy Algorithms & Divide and Conquer
P3 = IntegerMultiplication(Q,S,n/2)
return P2x10n+(P1-P2-P3)x10n/2+P3
End
Strassen showed that 2x2 matrix multiplication can be accomplished in 7 multiplication and 18
additions or subtractions.
P1 = (A 11+ A22)(B11+B22)
P2 = (A 21 + A22) * B11
P3 = A11 * (B12 - B22)
P4 = A22 * (B21 - B11)
P5 = (A 11 + A12) * B22
P6 = (A 21 - A11) * (B11 + B12)
P7 = (A 12 - A22) * (B21 + B22)
C11 = P 1 + P4 - P5 + P7
C12 = P 3 + P5
C21 = P 2 + P4
C22 = P 1 + P3 - P2 + P6
Page 20 of 24
Greedy Algorithms & Divide and Conquer
The closest pair resides in the left region, the right region, or across the borderline. The last case needs
to deal only with points at distance = min( left, right) from the dividing line, where right and right are the
minimal distances for the left and right regions, respectively.
The points in the region around the boundary line are sorted along the y coordinate, and processed in
that order. The processing consists of comparing each of these points with points that are ahead at most
in their y coordinate. Since a window of size × 2 can contain at most 6 points, at most five distances
need to be evaluated for each of these points.
The sorting of the points along the x and y coordinates can be done before applying the recursive divide-
and-conquer algorithm; they require O(n log n) time.
The processing of the points along the boundary line takes O(n) time. Hence, the recurrence equation
for the time complexity of the algorithm:
Page 21 of 24
Greedy Algorithms & Divide and Conquer
Page 22 of 24
Greedy Algorithms & Divide and Conquer
MATLAB Implementation
function [d]=closestpair(A,i,j,n)
if (n==2)
d=((A(i,1)-A(j,1))^2+(A(i,2)-A(j,2))^2)^0.5;
else
dleft=closestpair(A,1,n/2,n/2);
dright=closestpair(A,1+(n/2),n,n/2);
d=min(dleft,dright);
for i=1:n/2
for j=1+(n/2):n
if (A(i,1) < A(j,1)+d) & (A(i,2)+d > A(j,2)) & (A(j,2) > A(i,2)-
d)
d1=((A(i,1)-A(j,1))^2+(A(i,2)-A(j,2))^2)^0.5;
if d1<d
d = d1
A(i,:)
A(j,:)
end
end
end
end
end
clc;clear;
A=[ 2.1 3;1 1;2.2 2.5;4 4;4 3;3 3;3 2;3 1;];
A=sortrows(A);
closestpair(A,1,length(A),length(A));
Another implementation -
Page 23 of 24
Greedy Algorithms & Divide and Conquer
end closest_pair;
Page 24 of 24
Searching
Searching
Computer systems are often used to store large amounts of data from which individual records must be
retrieved according to some search criterion. Thus the efficient storage of data to facilitate fast
searching is an important issue. In this section, we shall investigate the performance of some searching
algorithms and the data structures which they use.
Sequential Search
If there are n items in our collection - whether it is stored as an array or as a linked list - then it is
obvious that in the worst case, when there is no item in the collection with the desired key, then n
comparisons of the key with keys of the items in the collection will have to be made.
Algorithm LinearSearch(A,Item)
For i =1 to length(A)
if A[i] = item
Print “Item Found”
Break;
Else if A[i] != item
i= i+1
Else
Print “item not found”
End
End
Binary Search
However, if we place our items in an array and sort them in either ascending or descending order on the
key first, then we can obtain much better performance with an algorithm called binary search.
In binary search, we first compare the key with the item in the middle position of the array. If there's a
match, we can return immediately. If the key is less than the middle key, then the item sought must lie
in the lower half of the array; if it's greater then the item sought must lie in the upper half of the array.
So we repeat the procedure on the lower (or upper) half of the array.
Algorithm BinaryIterativeSearch(A,Item)
// ‘A’ sorted array
Low = 1
Hi =n
While (Low<= Hi)
Mid = (Low + Hi)/2
If A[Mid] == Item
Page 1 of 6
Searching
Each step of the algorithm divides the block of items being searched in half. We can divide a set of n
items in half at most log2 n times.
Thus the running time of a binary search is proportional to log n and we say this is a O(logn) algorithm.
Page 2 of 6
Searching
Interpolation Search
Interpolation search is an extension of the binary search. The basic idea is similar to leafing through the
telephone book where one don’t just chose the middle element of the search area but decides where it
is most likely to be judging by the range limits.
To that end the difference of the first element of the interval (the smallest element) to the element to
be searched is divided by the span of the interval values in relation to the interval length. Multiplied
with the interval boundaries that then is the interpolated position of the element to be searched.
With approximately evenly distributed values, the expected complexity of the Interpolation Search is
O(log log n)
Page 3 of 6
Searching
Hashing
Hashing is a method for storing and retrieving records from an array. It lets you insert, delete, and
search for records based on a search key value. When properly implemented, these operations can be
performed in constant time i.e O(1). In fact, a properly tuned hash system typically looks at only one or
two records for each search, insert, or delete operation. This is far better than the O(log n) time required
to do a binary search on a sorted array of n records, or the O(log n) time required to do an operation on
a binary search tree. However, even though hashing is based on a very simple idea, it is surprisingly
difficult to implement properly.
A hash system stores records in an array called a hash table. Hashing works by performing a
computation on a search key K in a way that is intended to identify the position in Hash table that
contains the record with key K. The function that does this calculation is called the hash function, and is
usually denoted by the letter ‘h’. Since hashing schemes place records in the table in whatever order
satisfies the needs of the address calculation, records are not ordered by value. A position in the hash
table is also known as a slot. The number of slots in hash table hash table will be denoted by the variable
M with slots numbered from 0 to M - 1.
The goal for a hashing system is to arrange things such that, for any key value K and some hash function
h, i = h(K) is a slot in the table such that 0 <= i < M, and we have the key of the record stored at A[i]
equal to K.
Hashing generally takes records whose key values come from a large range and stores those records in a
table with a relatively small number of slots. Collisions occur when two records hash to the same slot in
the table. If we are careful — or lucky — when selecting a hash function, then the actual number of
collisions will be few. Unfortunately, even under the best of circumstances, collisions are nearly
unavoidable.
Thus, hashing implementations must include some form of collision resolution policy. Collision
resolution techniques can be broken into two classes: open hashing (also called separate chaining) and
closed hashing (also called open addressing). The difference between the two has to do with whether
collisions are stored outside the table (open hashing), or whether collisions result in storing one of the
records at another slot in the table (closed hashing).
Open Hashing
The simplest form of open hashing defines each slot in the hash table to be the head of a linked list. All
records that hash to a particular slot are placed on that slot's linked list.
Page 4 of 6
Searching
In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated
by the figure below. Here, we see a hash table of ten slots used to store four-digit numbers. The hash
function used is h(K) = K mod 10. The four values 1001, 9050, 9877, and 2037 are inserted into the table.
Page 5 of 6
Searching
In the above example you can see that 9877 is occupying slot 7, the next number 2037 when it is
entered does not find a free slot at slot 7 as it is already occupied hence the number will be pushed to
the next free slot i.e slot 8.
This tendency of linear probing to cluster items together is known as primary clustering. Small clusters
tend to merge into big clusters, making the problem worse.
Page 6 of 6