You are on page 1of 92

Data Structures & Algorithm

Introduction

Sk Mazharul Islam

Department of Computer Science & Engineering


RCC Institute of Information Technology, Kolkata
Equation has
names (x and y),
which hold values
(data). That
means the names
(x and y) are
placeholders for
representing
data. Similarly, in
computer science
programming we
need something
for holding data,
and variables is
the way to do
that.
Abstract Data Types (ADTs)
• All primitive data types (int, float, etc.) support
basic operations such as addition and subtraction.
The system provides the implementations for the
primitive data types.
• For user-defined data types we also need to define
operations. The implementation for these
operations can be done when we want to actually
use them.
• That means, in general, user defined data types are
defined along with their operations.
Abstract Data Types (ADTs)
• To simplify the process of solving problems,
we combine the data structures with their
operations
• we call this Abstract Data Types (ADTs). An
ADT consists of two parts:
1. Declaration of data
2.Declaration of operations
Commonly used ADTs include:
• Linked Lists, Stacks, Queues, Priority Queues,
Binary Trees, Dictionaries, Disjoint Sets (Union and
Find), Hash Tables, Graphs, and many others.
• For example, stack uses LIFO (Last-In-First-Out)
mechanism while storing the data in data
structures. The last element inserted into the stack
is the first element that gets deleted.
• Common operations of it are: creating the stack,
pushing an element onto the stack, popping an
element from stack, finding the current top of the
stack, finding number of elements in the stack, etc.
Important: While defining the ADTs do not worry
about the implementation details. They come into
the picture only when we want to use them.
Introduction to Algorithm
What is an Algorithm?
Consider the preparation of an omelette.

An algorithm is the step-by-step unambiguous instructions to solve a given problem.


What is a problem
• A problem is a question to which we seek an
answer.
• Examples
- We want to rearrange a list of numbers in
numerical order. (sort)
- Determine whether the number x is in a list S of n
numbers.
- What is the 25 Fibonacci number?
What is an instance of a problem?
• An instance of a problem is a specific
assignment of the parameters that define the
problem.
• For example in the case of sorting n numbers
we need to be giving the n numbers in specific
order and n the number of values to sort. This
creates the specific case we are interested in.
What is an Algorithm?
• In mathematics and computer science, an algorithm (from
Algoritmi, the Latin form of Al-Khwārizmī) is an effective
method expressed as a finite list of well-defined instructions
for calculating a function
• i.e a step by step solution to the problem.
• In computer systems, an algorithm is basically an instance of
logic written in software by software developers to be
effective for the intended "target" computer(s), in order for
the target machines to produce output from given input
(perhaps null).
• Program – an implementation of an algorithm in some programming language
• Data structure - Organization of data needed to solve the problem

Some Examples of Algorithms in daily life


Characteristics of an Algorithm
Communication to programmer is called algorithm specification: Natural Language,
Pseudocode, Programming language
Stages of Problem Solving
• Understanding the problem
• Planning an algorithm
• Designing an algorithm
• Validating and verifying an algorithm
• Analyzing an algorithm
• Implementing an algorithm
• Performing empirical analysis (if necessary)
Computation Model
Data Organization
Algorithm Design

Some algorithm design techniques are:


1. Divide-conquer
2. Dynamic programming
3. Greedy method
4. Backtracking
5. Branch and bound
Algorithm Specification
Validating and Verification
Why the Analysis of Algorithms?
• To go from city “A” to city “B”>>>>>>>>
• Many ways :by flight, by bus, by train and also by bicycle.
we choose the one that suits us.
• Similarly, in computer science, multiple algorithms are
available for solving the same problem (for example, a
sorting problem has many algorithms, like insertion sort,
selection sort, quick sort and many more).
• Algorithm analysis helps us to determine which algorithm
is most efficient in terms of time and space consumed.

Goal of the Analysis of Algorithms: comparing algorithms


(or solutions) mainly in terms of running time but also in
terms of other factors (e.g., memory, developer effort,
etc.)
Time and Space Complexity
How to Compare Algorithms

• To compare let us define a few objective


measures:
• Execution times? Not a good measure as
execution times are specific to a particular
computer.
• Number of statements executed? Not a good
measure, since the number of statements
varies with the programming language as well
as the style of the individual programmer.
Ideal solution to compare algorithm?
• express the running time of a given algorithm as a
function of the input size n (i.e., f(n)) and compare
these different functions corresponding to running
times. This kind of comparison is independent of
machine time, programming style, etc. It is also
called Asymptotic Analysis which allows the
approximations.
The Importance of Developing Efficient
Algorithms
– Sequential Search Versus Binary Search
– problem : Determine whether x  S
instance : S  [5, 7, 8, 10, 11, 13], n  6, and x  11
answer → yes

– Algorithm1 : Sequential Search


Procedure seqsearch ( n : integer; S : array[1..n] of keytype
x : keytype; var location : index);
begin
location := 1;
while location ≤ n and S[location] ≠ x do
location := location + 1;
if location > n then location := 0;
end;
– Algorithm2 : Binary Search
Procedure binsearch ( n : integer; S : array[1..n] of keytype;
x : keytype; var location : index);
var low, high, mid : index;
begin If the array S contains 32 items
and x is not in the array,
low = 1; high = n; location = 0; Sequential Search compares x with
while low ≤ high and location = 0 do all 32 items before determining
mid = ( low + high ) / 2; that x is not in the array.
if x = S[mid] then location = mid;
else if x < S[mid] then high = mid – 1;
else low = mid + 1; if n is a power of 2 and x is larger than all the
end; items in an array of size n, the number of
comparisons done by Binary Search is lg n + 1

– 2 different techniques for same problem → Efficiency?


time : O ( n) O(log n)
space :
O ( n) O ( n)
n #comp(Seq. Srch) #comp(Bin. Srch)
128 128 8
1,024 1,024 11
1,048,576 1,048,576 21
4,294,967,296 4,294,967,296 33

• Fibonacci Sequence

f 0  0, f1  1, f n  f n1  f n2
– Algorithm 1 (Recursion)
function fib (n : integer) : integer;
begin
if n ≤ 1 then fib=n
else fib = fib(n-1) + fib(n-2);
end;
– Divide-and-conquer approach
• Binary Search : no overlap
• Fibonacci sequence : full of overlapping

ƒ(5)

ƒ(3) ƒ(4)

ƒ(1) ƒ(2) ƒ(2) ƒ(3)

ƒ(0) ƒ(1) ƒ(0) ƒ(1) ƒ(1) ƒ(2)

Suppose T(n) the number of terms in the recursion ƒ(0) ƒ(1)


tree for n. If the number of terms more than
doubled every time n increased by 2.
– Theorem T (n)  2 n / 2 for n  2
proof : Induction on n
<Induction Basis>
n=2 &n=3
T(2) = 3 > 2 = 22/2
T(3) = 5 > 2.83 23/2
<Induction Hypothesis>
Suppose for all m such that 2 ≤ m < n
T(m) > 2m/2
<Induction Step>
T(n) = T(n-1) + T(n-2) + 1
> 2(n-1)/2 + 2(n-2)/2 + 1
> 2(n-2)/2 + 2(n-2)/2
= 2·2(n-2)/2 = 2n/2
– Algorithm 2 (Iteration)
function fib2 (n : integer) : integer;
var f : array[0..n] of integer;
i : index;
begin (n+1) terms for fib2(n)
f[0] = 0; n
(n+1)
if n > 0 then 22
n Iterative Alg Recursive Alg
f[1] = 1;
40 41 ns 1048 micro s.
for i = 2 to n do
60 61 ns 1s
f[i] = f[I – 1] + f[I – 2]
80 81 ns 18 min
fib2 = f[n]
end; 100 101 ns 13 days
120 121 ns 36 years
160 161 ns 3.8 ×107 years
200 201 ns 4 ×1013 years
– Algorithm Design : techniques
(D-&-C, Dynamic Programming, Greedy, Backtracking,
Branch-&-Bound, … )

• Several algorithms for the same problem.


“Which is the most Efficient algorithm?”
• => Algorithm Analysis

♧ Note : problem analysis


(solvable, unsolvable, NP-Complete, … )
• Time complexity Analysis
– Actual CPU time ×
# instructions × (programmers, languages, …)
→ some independent measure !
# op’s : a function of input size n
n+1, 2n/2, log n …
– Input size?
(Eg) Graph Problems: O(n+e), Matrix product: O(n2)…
– Time complexity analysis
• Determine input size
• Choose the basic operation
• Determine how many time the B.O. is done for each value of the
input size
Input Size Revisited
• Normally we say n the input size in our algorithms because n has been a
reasonable measure of the amount of data in the input. Ex. Sorting
• What is input size for the following algorithm? Is it a polynomial-time algorithm?
(The number of passes through the while loop in this prime-checking algorithm is
clearly in Θ(n½)). Definition : For a given algorithm, the input
size is defined as the number of characters it
takes to write the input.
 Which encoding scheme?
 Answer: it should be reasonable
 For binary encoding scheme number x is
encoded by lg[x]+1 no. of bit. Ex.
 31 = 11111 and [lg 31] + 1 = 5.
 We simply say that it takes about lg x bits to
encode a positive integer x in binary
input size for an algorithm that sorts n
positive integers.
• If largest integer is L, then it takes about lg L bits.
Input size for the n integers: n lg L. if base of
logarithm is 10 then
• Definition : For a given algorithm, W (s) is defined
as the maximum number of steps done by the
algorithm for an input size of s. W (s) is called the
worst-case time complexity of the algorithm.
Worst-case Exchange Sort does n(n − 1)2 comparisons of keys and 3n(n − 1) /2
assignments to sort n positive integers. Therefore, the maximum number of steps
done by Exchange Sort is no greater than
Time complexity of prime checking algo?

• In the worst case there are [n1/2] -1 passes


through the loop.
• Assume S=logn and n=10S
• Therefore the worst –case no of passes ≈ 10s
• For base 2 it is ≈ 2s which is non-polynomial

So the concept of precise input size is


important.
Complexity
• Every-Case Time Complexity
• Worst−Case Time Complexity
• Average-Case Time Complexity
• Best-Case Time Complexity
• Time complexity analysis of an algorithm determines
how many times the basic operation is done for each
value of the input size.
• In some cases the number of times it is done depends
not only on the input size, but also on the input's values.
• Example is Sequential Search. Here, if x is the first item
in the array, the basic operation is done once, whereas if
x is not in the array, it is done n times.
• In other cases, such as (Add Array Members), the basic
operation is always done the same number of times for
every instance of size n.. When this is the case, T(n) is
defined as the number of times the algorithm does the
basic operation for an instance of size n. T(n) is called
the every-case time complexity of the algorithm, and
the determination of T(n) is called an every-case time
complexity analysis.
– Example of every-case/worst case/average case/best case
• Add array Members
T(n) = n (n : array size)
• Matrix multiplication
T(n) = n3 (n : # rows) “every-case time complexity”
• Sequential Search
– Worst-case t.c. T(n) = n = W(n) [if x is the last item in the
array or if x is not in the array]
– Best-case t.c B(n) = 1 [If x = S[1], there will be one pass
through the loop regardless of the size of n]
– Average-case t.c. A(n)
(i) x  S :
An average can be called n
1 1 n 1
A( n)   ( k  )   k 
“typical” only if the actual k 1 n n 2
cases do not deviate (ii) x  S or x  S :
much from the average n
p
(that is, only if the A( n)   ( k  )  n(1  p )
k 1 n
standard deviation is
p n( n  1) p p
small).    n(1  p )  n(1  ) 
n 2 2 2
– Time complexity
– Space complexity (memory)
– Complexity function
f : N  R
– Note : 2 algorithms for the same prob.
1000·n , n2
Which one is faster? (or better? )
n2 > 1000·n if n > 1000
Threshold : 1000
But, theoretically 1000·n is better
• Order
– 1000·n is better than 0.01·n2 if n > 100,000
O ( n) O( n 2 )
Linear-time algorithm quadratic-time algorithm
n 0.1n2 0.1n2+n+100
10 10 120
20 40 160
50 250 400
100 1000 1200
1000 100000 101100
– Quadratic term eventually dominates
– “Throw away” low-order terms g (n)  5n 2  100n  20  θ(n 2 )
T (n)  n(n  1) / 2  θ(n 2 )

• “  (n 2 )algorithm” or “quadratic algorithm.”


What is Rate of Growth?
• The rate at which the running time increases as a
function of input is called rate of growth.
• Example: Assume that you go to a shop to buy a car
and a bicycle. If your friend sees you there and asks
what you are buying, then in general you say buying
a car. This is because the cost of the car is high
compared to the cost of the bicycle (approximating
the cost of the bicycle to the cost of the car).
Digital Media Lab. 78
Digital Media Lab. 79
Why is it called Asymptotic Analysis?
• From the previous discussion(for all three notations:
worst case, best case, and average case), we can
easily understand that, in every case for a given
function f(n) we are trying to find another function
g(n) which approximates f(n) at higher values of n.
That means g(n) is also a curve which approximates
f(n) at higher values of n.
• In mathematics we call such a curve an asymptotic
curve. In other terms, g(n) is the asymptotic curve
for f(n). For this reason, we call algorithm analysis
asymptotic analysis.

You might also like