Professional Documents
Culture Documents
(CS210/ESO207/ESO211)
Lecture 36
Sorting
beyond O(n log n) bound
1
Overview of todays lecture
The sorting algorithms you studied till now
Integer sorting
Solving 2 problems from Practice sheet 6 and one problem
from Practice sheet 5.
2
Sorting algorithms studied till now
Algorithms for Sorting n elements
Insertion sort: O(
)
Selection sort: O(
)
Bubble sort: O(
)
Merge sort: O( log )
Quick sort: worst case O(
] in O()
time and using O() space in word RAM model.
Practice sheet 6
We shall solve exercises 5 and 1 from this sheet
Important note
Though the solution is provided for this problem here, one should NOT feel that such
a problem will be asked in the end sem exam of this course. It was a mistake of the
instructor to put it in the practice sheet.
Problem 5 of practice sheet 6.
Description(in terms of interval):
Given a set A of n intervals, compute smallest set B of intervals so that for every
interval I in A\B, there is some interval in B which overlaps/intersects with I.
A
The set of green intervals is a solution
but not an optimal solution.
Solution of Problem 5 of practice sheet 6.
Description(in terms of interval):
Given a set A of n intervals, compute smallest set B of intervals so that for every
interval I in A\B, there is some interval in B which overlaps/intersects with I.
Let I* be the interval with earliest finish time.
Let I be the interval with maximum finish time overlapping I*.
Lemma1: There is an optimal solution for set A that contains I.
A
I*
I
Solution of Problem 5 of practice sheet 6.
Question: How to obtain smaller instance A using this greedy approach ?
Naive approach (again inspired from the job scheduling problem): remove from A all
intervals which overlap with I. This is A.
This approach does not work! Here is a counterexample.
The problem is that some deleted interval (in this case I) could have been used for
intersecting many intervals if it were not deleted. But deleting it from the instance
disallows it to be selected in the solution.
A
I
I*
I
Overview of the approach
In order to make sure we do not delete intervals (like I in the previous slide)
if they are essential to be selected to cover many other intervals, we make
some observations and introduce a terminology called Uniquely covered
interval. It turns out that we need to keep I in the smaller instance if there is
an interval there which is uniquely covered by I . Otherwise, we may discard
I.
An Observation
We can delete all intervals whose finish time is before finish time of I because any interval
overlapped by such intervals will anyway be overlapped by I. Let us consider intervals
which overlap with I, but have finish time greater than that of I. In the example shown
below, these intervals are those three intervals which cross the red line.
Observation1: Among the intervals crossing the red line, we need to keep only that interval
which has maximum finish time. (I in this picture)
Proof: Notice that each of these intervals are anyway intersected by I. As far as using them
to intersect other intervals in concerned, we may better choose I for this purpose.
So from now onwards, we shall assume that there is exactly one interval I in A which
overlaps I (intersects the red line) and has finish time larger than I.
I
I*
I
A
Uniquely covered interval
I2 is said to be uniquely covered by I1 if
I2 is fully covered by I1
Every interval overlapping I2 is also full covered by I1.
Lemma2 : There is an optimal solution containing I1.
Proof: Surely I2 or some other interval overlapping it must be there in the optimal solution. If
we replace that interval by I1, we still get a solution of the same size and hence an optimal solution.
I2
I1
We are now ready to give description/construction of A from A. There will be
two cases. We shall then prove that |Opt(A)| = |Opt(A)| + 1 for each of
these cases.
Important note:
The reader is advised to full understand Lemma1, Lemma2, Observation1,
and the notion of Uniquely covered interval. Also fully internalize the
notations I*, I, and I. This will help the reader understand the rest of the
solution.
Constructing A from A
Constructing A from A
A
I
I*
I
I
Case1: There is an interval I D uniquely covered by I
A
I
I
D E
We need to take care
of intervals whose
starting point is to
the right of red line
(finish time of I).
We can partition these
intervals into two sets.
D: those which overlap with I.
E: those that start after the
end of I and hence do not
overlap with I.
D E
Now we shall describe the two
cases for construction of A.
Constructing A from A
If there is an interval I D uniquely covered by I, then we define A as
follows. Remove all intervals from A which overlap with I (this was our usual
way of defining A in our wrong solution). Now add I to this set. This set is
the smaller instance A for Case 1.
We shall now define A for Case 2.
Constructing A from A
Case2: There is no interval uniquely covered by I
A
I
I*
I
D E
A
D
E
Constructing A from A
If there is no interval in D uniquely covered by I, then we define A as
follows. Remove all intervals from A which overlap with I (this was our usual
way of defining A in our wrong solution). This set is the smaller instance A
for Case 2.
Theorem1: |Opt(A)| = |Opt(A)| + 1
We shall prove this theorem for case 1 as well as
case 2.
Case1: There is an interval I D uniquely covered by I
|Opt(A)| |Opt(A)| + 1
A
I
I*
I
I
A
I
I
D E
D E
Now Using Lemma2, it follows
that there is an optimal
solution for A containing I.
What to add to this solution
to get a solution for A ?
We need to add just I to get a
solution for A and we are done.
Case1: There is an interval I uniquely covered by I
|Opt(A)| |Opt(A)| - 1
A
I
I*
I
I
A
I
I
D E
D E
Using Lemma1 and Lemma2,
it follows that there is an
optimal solution for A
containing I and I.
We need to just remove I from
this optimal solution for A to get
a solution for A and we are done.
This finishes the proof of Theorem for Case 1.
We shall now analyze Case2 and prove Theorem for this case as well.
Case2: There is no interval uniquely covered by I
|Opt(A)| |Opt(A)| + 1
A
I
I*
I
A
D E
D E
Consider any optimal solution
for A. Note that this optimal
solution takes care of D and E.
So we just need to take care of intervals
from A which intersect the red line.
These are taken care by adding I to this
solution. We are done.
Case2: There is no interval uniquely covered by I
|Opt(A)| |Opt(A)| - 1
A
I
I*
I
A
D E
D E
Using Lemma1, it follows that
there is an optimal solution
for A containing I.
If I is not in this optimal solution,
we can see that removing I from
this optimal solution gives a valid
solution for A.
So let us consider the case when I is
present in the optimal solution of A.
The problem is that I is not present
in A, so we need a substitute of I
from A.
Notice that I can serve the purpose
of overlapping of intervals from D
only. So we should search for
substitute for I from D only.
We replace I by the interval from D which
intersects the violet line and has earliest start
time. See the following slide for its justification.
Let be the interval in D which intersects the violet vertical line (has finish time greater than
that of I) and has earliest start time. It suffices if we can show that every interval of D
overlaps with . We proceed as follows. Consider any interval in D. There are two cases.
Finish time of is less than that of I. In other words, does not intersects the violet
line. In this case, there must be some other interval in D that overlaps and intersects
the violet line (otherwise, would be uniquely covered by I); since start time of is less
than this interval, so is overlapped by as well.
Finish time of is more than I. In other words, does intersect the violet line. Hence
overlaps with as well since the latter also intersects the violet line.
Hence if remove I and I from the given optimal solution of A, and add to it, we get a
solution for A. Since optimal solution for A has to be smaller or equal in size related to this
solution, we get |Opt(A)| |Opt(A)| - 1 for Case 2.
Hence we have proved Theorem1: |Opt(A)| = |Opt(A)| + 1
Now we need to design the algorithm for our problem based on the greedy strategy that
we used for constructing A from A.
Simplification and efficient implementation of
the algorithm
Though the algorithm looks quite complex to implement, but as will soon become clear,
it is quite simple to implement. We first introduce some notations to facilitate a clean
representation of the algorithm.
Notations:
f(I): finish time of interval I;
Maxf(I,A): maximum finish time of an interval from A that overlaps with I. (If no interval
overlaps with I, then Maxf(I,A)=f(I)).
Maxf-Interval(I,A): the interval from A with maximum finish time that overlaps with I. (If no
interval overlaps with I, then Maxf-Interval(I,A)=I).
Cover: set of intervals selected in till now. (At the end of the algo, Notations will be an optimal
solution)
][: Empty interval.
Algorithm
I ][; Cover ; A A;
While A<> do
{ If (I = ][)
{ let I be the interval in A with earliest finish time;
let I maxf-Interval(I);
Cover Cover U {I};
I maxf-Interval(I, A);
remove all intervals from A that are overlapped by I;
}
Else If (there is an interval I A with maxf(I) < f(I))
{ I I;
Cover Cover U {I};
I maxf-Interval(I, A);
remove all intervals from A that are overlapped by I;
}
Else I ][ ;
}
return Cover;
Algorithm
(further refinements of the same algo)
I ][; Cover ; A A;
While A<> do
{ let I be the interval in A with earliest finish time;
If (maxf(I) < f(I)) I I;
Else I maxf-Interval(I , A);
Cover Cover U {I};
I maxf-Interval(I, A);
remove all intervals from A that are overlapped by I;
}
return Cover;
It is easy to observe that each iteration of the while loop can be implemented in O()
time.
Proof of correctness for the algorithm
Though we had derived a proof of correctness while arriving at the algorithm, the same can be
given now as well. This may be helpful if you are not interested in the way we arrived at the
algorithm and are just wish to see the correctness of the algorithm.
Let Overlapped = A\A; In plain words, Overlapped is the set of intervals from A which are
overlapped by some interval from Cover.
In the beginning of an iteration, the following assertions hold:
1. There is an optimal solution for A containing Cover.
2. Every interval from Overlapped is overlapped by an interval from Cover, and I is an interval
with maximum finish time from the set Overlapped.
3. Every interval from A has start time greater than finish time of any interval from Cover .
The above assertion can be proved by induction on the number of iterations. The arguments
needed will be a small collection of arguments used for proving Theorem 1.
Concluding slide for exercise 5
Theorem:
There is an O(
, let it be . If is
closer to than , then we can conclude the following:
1. all elements greater than and less than must be among the set of nearest
elements from . These / elements are eliminated from input and added to our
solution.
2. None of the elements which are greater than can be among the set of nearest
element from . These / elements are also removed from the input.
In this way, we have found / nearest element from . Moreover, the input has reduced
from to . Keep repeating it. We get nearest element from inO() time.
Finding DFS tree from start and finish time
There was a problem in practice sheet 5 where, given start time and finish
time of DFS traversal for all vertices, the aim is to compute DFN number and
DFS tree.
A few students were facing the problem of determining children of a node in
DFS tree. An easy way to achieve this goal is an indirect way:
In order to compute children of a vertex in DFS tree, it suffices if we can
compute parent of each vertex. We can do the latter task as follows.
Among all vertices neighboring to a vertex u, find all those vertices whose
start time is smaller than that of u. All these vertices are ancestors of u. Who
among them will be parent of u? Surely, the vertex with maximum start time.
So we can compute parent of vertex u in O(deg(u)) time. Time spent over all
vertices will be O(m+n) time. Hence we can compute children of each vertex
in DFS tree and hence the entire DFS tree structure in O(m+n)time.