You are on page 1of 23

Ch.

7 - QuickSort
Quick but not Guaranteed

Ch.7 - QuickSort
Another Divide-and-Conquer sorting algorithm
As it turns out, MERGESORT and HEAPSORT, although O(n
lgn) in their time complexity, have fairly large constants
and tend to move data around more than desirable (e.g.,
equal-key items may not maintain their relative position
from input to output).
We introduce another algorithm with better constants, but a
flaw: its worst case in O(n2). Fortunately, the worst case
is rare enough so that the speed advantages work an
overwhelming amount of the time and it is O(nlgn) on
average.
03/27/15

91.404

Ch.7 - QuickSort
Like in MERGESORT, we use Divide-and-Conquer:
1. Divide: partition A[p..r] into two subarrays A[p..q1] and
A[q+1..r] such that each element of A[p..q1] is A[q],
and each element of A[q+1..r] is A[q]. Compute q as
part of this partitioning.
2. Conquer: sort the subarrays A[p..q1] and A[q+1..r] by
recursive calls to QUICKSORT.
3. Combine: the partitioning and recursive sorting leave us
with a sorted A[p..r] no work needed here.
An obvious difference is that we do most of the work in the
divide stage, with no work at the combine one.
03/27/15

91.404

Ch.7 - QuickSort
The Pseudo-Code

03/27/15

91.404

Ch.7 - QuickSort

03/27/15

91.404

Ch.7 - QuickSort
Proof of Correctness: PARTITION
We look for a loop invariant and we observe that at the
beginning of each iteration of the loop (l.3-6) for any
array index k:

1. If pki, then A[k]x;


2. If i+1kj1, then A[k]>x;
3. If k=r, then A[k]=x.
4. Ifjkr1, then we dont know anything about A[k].
03/27/15

91.404

Ch.7 - QuickSort
The Invariant

Initialization. Before the first iteration: i=p1, j=p. No values


between p and i; no values between i+1 and j1. The first two
conditions are trivially satisfied; the initial assignment satisfies 3.
Maintenance. Two cases
1.A[j]>x.

2. A[j]x.

03/27/15

91.404

Ch.7 - QuickSort
The Invariant

Termination. j=r. Every entry in the array is in one of the three sets
described by the invariant. We have partitioned the values in the
array into three sets: less than or equal to x, greater than x, and a
singleton containing x.

Running time of PARTITION on A[p..r] is (n), where n=rp+1.

03/27/15

91.404

Ch.7 - QuickSort
QUICKSORT: Performance a quick look.
We first look at (apparent) worst-case partitioning:
T(n)=T(n1)+T(0)+(n)=T(n1)+(n).
It is easy to show using substitution - that T(n)=(n2).
We next look at (apparent) best-case partitioning:
T(n)=2T(n/2)+(n).
It is also easy to show (case 2 of the Master Theorem) that
T(n)=(nlgn).
Since the disparity between the two is substantial, we
need to look further
03/27/15

91.404

Ch.7 - QuickSort
QUICKSORT: Performance Balanced Partitioning

03/27/15

91.404

10

Ch.7 - QuickSort
QUICKSORT: Performance the Average Case

As long as the number of good splits is bounded below as


a fixed percentage of all the splits, we maintain
logarithmic depth and so O(nlgn) time complexity.

03/27/15

91.404

11

Ch.7 - QuickSort
QUICKSORT: Performance Randomized QUICKSORT
We would like to ensure that the choice of pivot does not
critically impair the performance of the sorting algorithm
the discussion to this point would indicate that
randomizing the choice of the pivot should provide us
with good behavior (if at all possible with the data-set we
are trying to sort). We introduce

03/27/15

91.404

12

Ch.7 - QuickSort
QUICKSORT: Performance Randomized QUICKSORT
And the recursive procedure becomes:

Every call to RANDOMIZEDPARTITION has introduced


the (constant) extra overhead of a call to RANDOM.

03/27/15

91.404

13

Ch.7 - QuickSort
QUICKSORT: Performance Rigorous Worst Case
Analysis
Since we do not, a priori, have any idea of what the splits of
the subarrays will be, we have to represent a possible
worst case (we already have an O(n2) bound from the
bad split example so it could be worse although we
hope not). The worst case leads to the recurrence
T(n)=max0qn1(T(q)+T(nq1))+(n),
where we remember that the pivot does not appear at the
next level (down) of the recursion.
03/27/15

91.404

14

Ch.7 - QuickSort
QUICKSORT: Performance Rigorous Worst Case
Analysis
We have to come up with a guess and the basis for the
guess is our likely bad split case: it tells us we cannot
hope for any better than (n2). So we just hope it is no
worse Guess T(n)cn2 for some c>0 and start doing
algebra for the induction:
T(n) max0qn1(T(q)+T(nq1))+(n)
max0qn1(cq2+c(nq1)2)+(n).
Differentiate cq2 + c(n q 1)2 twice with respect to q, to
obtain 4c>0 for all values of q.
03/27/15

91.404

15

Ch.7 - QuickSort
QUICKSORT: Performance Rigorous Worst Case
Analysis
Since the expression represents a quadratic curve,
concave up, it reaches it maximum at one of the
endpoints q=0 and q=n1. As we evaluate, we find
max0qn1(cq2+c(nq1)2)+(n)
cmax0qn1(q2+(nq1)2)+(n)

c(n1)2+(n)=cn22cn+1+(n)cn2
by choosing c large enough to overcome the positive
constant in (n).
03/27/15

91.404

16

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
Understanding partitioning.
1. Each time PARTITIONis called, it selects a pivot element and
this pivot element is never included in successive calls: the
total number of calls to PARTITIONis n.
2. Each call to PARTITION costs O(1) plus an amount of time
proportional to the number of iterations of the forloop.
3. Each iteration of the for loop (in line 4) performs a
comparison , comparing the pivot to another element in A.
4. We need to count the number of times l. 4 is executed.

03/27/15

91.404

17

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
Lemma 7.1. Let X be the number of comparisons
performed in l. 4 of PARTITION over the entire execution
of QUICKSORTon an n-element array. Then the running
time of QUICKSORT is O(n+X).
Proof: the observations on the previous slide.
We need to find X, the total number of comparisons
performed over all calls to PARTITION.

03/27/15

91.404

18

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
1. Rename the elements of A as z1,z2,,zn, so that zi is the
ith smallest element of A.
2. Define the set Zij={zi,zi+1,,zj}.
3. Question: when does the algorithm compare zi and zj?
4. Answer: at most once notice that all elements in every
(sub)array are compared to the pivot once, and will never
be compared to the pivot again (since the pivot is
removed from the recursion).
5. Define Xij=I{ziiscomparedtozj}, the indicator variable of
this event. Comparisons are over the full run of the
algorithm.
03/27/15

91.404

19

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
6. Since each pair is compared
at most once, we can write
n 1 n
X =

ij

i=1 j =i+1

7. Taking expectations of both sides:

n 1 n
n 1 n
n 1 n
X ij = E [ X ij ] = Pr{zi iscomparedtoz j }.
E [ X ] = E
i=1
i=1 j =i+1
j =i+1
i=1 j =i+1

8. We need to compute Pr{ziiscomparedtozj}.


9.

We will assume all zi and zj are distinct.

10.For any pair zi, zj, once a pivot x is chosen so that zi<x<
zj, zi and zj will never be compared again (why?).
03/27/15

91.404

20

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
11.If zi is chosen as a pivot before any other item in Zij, then
zi will be compared to every other item in Zij.
12.Same for zj.
13. zi and zj are compared if and only if the first element to
be chosen as a pivot from Zij is either zi or zj.
14.What is that probability? Until a point of Zij is chosen as
a pivot, the whole of Zij is in the same partition, so every
element of Zij is equally likely to be the first one chosen
as a pivot.
03/27/15

91.404

21

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
15.Because Zij has j i + 1 elements, and because pivots
are chosen randomly and independently, the probability
that any given element is the first one chosen as a pivot
is 1/(ji+1). It follows that:
16. Pr{ziiscomparedtozj}
=Pr{ziorzjisfirstpivotchosenfromZij}
=Pr{ziisfirstpivotchosenfromZij}+
Pr{zjisfirstpivotchosenfromZij}
=1/(ji+1)+1/(ji+1)=2/(ji+1).
03/27/15

91.404

22

Ch.7 - QuickSort
QUICKSORT: Performance Expected RunTime
17.Replacing the right-hand-side in 7, and grinding through
some algebra:
n 1

n 1 n i
n 1 n
n 1
2
2
2 n 1
E[X] =
=
< = 2H n = O(lg n ) = O(n lg n).
j i +1 i=1 k =1 k +1 i=1 k =1 k i=1
i=1 j =i+1
i=1

And the result follows.

03/27/15

91.404

23

You might also like