Professional Documents
Culture Documents
53
2 Dictionaries “heads.” Thus, we expect S1 to have about n/2 items,
An interesting alternative to balanced binary search 5’2 to have about n/4 items, and, in general, Si to have
trees for efficiently realizing the ordered dictionary ab- about n/2” items. In other words, we expect the height
stract data type (ADT) is the skip list [3-71. This struc- h of S to be about logn.
ture makes random choices in arranging items in such Using the position abstraction used previously by the
a way that searches and updates take O(logn) time on authors [2] for nodes in sequences and trees, we view a
average, where n is the number of items in the dictio- skip list as a two-dimensional collection of positions ar-
nary. Interestingly, the notion of average time used here ranged horizontally into levels and vertically into tow-
does not depend on any probability distribution defined ers. Each level corresponds to a sequence Si and each
on the keys in the input. Instead, the running time is tower contains positions storing the same item across
averaged over all possible outcomes of random choices consecutive sequences. The positions in a skip list can
used when inserting items in the dictionary. be traversed using the following operations:
Algorithm SkipSearch(
Code Fragment 2: Insertion in a skip list, assuming More generally, given a constant c > 1, h is larger than
random0 returns a random number between 0 and 1, and clog n with probability at most l/n’-‘. Thus, with high
we never insert past the top level. probability, the height h of S is O(logn).
Consider the running time of a search in skip list S,
and recall that such a search involves two nested while
55
loops. The inner loop performs a scan forward on a level Algorithm quickSort(
of S as long as the next key is no greater than the search
key k, and the outer loop drops down to the next level Input: Sequence S of n comparable elements
and repeats the scan forward iteration. Since the height Output: A sorted copy of S
h of S is O(logn) with high probability, the number of
drop-down steps is O(log n) with high probability. if n = 1 then
So we have yet to bound the number of scan-forward return S.
steps we make. Let ni be the number of keys examined end if
while scanning forward at level i. Observe that, after pick a random integer r in the range [0, n - l]
the key at the starting position, each additional key ex- let 2 be the element of S at rank r.
amined in a scan-forward at level i cannot also belong put the elements of S into three sequences:
to level i + 1. If any of these items were on the previous
level, we would have encountered them in the previous l L, storing the elements in S less than 2:
scan-forward step. Thus, the probability that any key l E, storing the elements in S equal to x
l G, storing the elements in S greater than Z.
is counted in ni is l/2. Therefore, the expected value
of ni is exactly equal to the expected number of times let L’ t quickSort
we must flip a fair coin before it comes up heads. This let G’ t quickSort
expected value is 2. Hence, the expected amount of time return L’ + E + G’.
spent scanning forward at any level i is O(1). Since S
has O(logn) 1eve1s with high probability, a search in S Code Fragment 3: Randomized quick-sort algorithm.
takes the expected time O(logn). By a similar analy-
sis, we can show that the expected running time of an
insertion or a removal is O(log n). times that a fair coin must be flipped until it shows
Finally, let us turn to the space requirement of a skip “heads” k times is 2k. Consider now a single recursive
list S. As we observed above, the expected number of invocation of randomized quick-sort, and let m denote
items at level i is n/2”, which means that the expected the size of the input sequence for this invocation. Say
total number of items in S is that this invocation is “good” if the pivot chosen is such
that subsequences L and G have size at least m/4 and
h
at most 3m/4 each. Thus, since the pivot is chosen uni-
c formly at random and there are m/2 pivots for which this
i=o
invocation is good, the probability that an invocation is
Hence, the expected space requirement of S is O(n). good is l/2.
Consider now the recursion tree T associated with an
instance of the quick-sort algorithm. If a node v of T of
3 Sorting
size m is associated with a “good” recursive call, then
One of the most popular sorting algorithms is the quick- the input sizes of the children of v are each at most 3m/4
sort algorithm, which uses a pivot element to split a (which is the same as m/(4/3)). If we take any path in
sequence and then it recursively sorts the subsequences. T from the root to an external node, then the length
One common method for analyzing quick-sort is to as- of this path is at most the number of invocations that
sume that the pivot will always divide the sequence al- have to be made (at each node on this path) until achiev-
most equally. We feel such an assumption would pre- ing log,,, n good invocations. Applying the probabilistic
suppose knowledge about the input distribution that is fact reviewed above, the expected number of invocations
typically not available, however. Since the intuitive goal we must make until this occurs is at most 210&/s n.
of the partition step of the quick-sort method is to divide Thus, the expected length of any path from the root to
the sequence S almost equally, let us introduce random- an external node in T is O(logn). Observing that the
ization into the algorithm and pick as the pivot a random time spent at each level of T is O(n), the expected run-
element of the input sequence. This variation of quick- ning time of randomized quick-sort is O(n log n).
sort is called randomized quick-sort, and is provided in
Code Fragment 3.
4 Selection
There are several analyses showing that the expected
running time of randomized quicksort is O(n log n) (e.g., The selection problem we asks that we return the kth
see [1,4,8]), independent of any input distribution as- smallest element in an unordered sequence S. Again
sumptions. The analysis we give here simplifies these using randomization, we can design a simple algorithm
analyses considerably. for this problem. We describe in Code Fragment 4 a
Our analysis uses a simple fact from elementary prob- simple and practical method, called randomized quick-
ability theory: namely, that the expected number of select, for solving this problem.
56
Algorithm quickSelect(S, L): of its parent call being good, the expected value of g(n)
is the same as the expected number of times we must
Input: Sequence S of n comparable elements, and an flip a fair coin before it comes up “heads.” This implies
integer lc E [l, n] that E(g(n)) = 2. Thus, if we let T(n) be a shorthand
Output: The kth smallest element of S notation for E(t(n)) (the expected running time of the
randomized quick-select algorithm), then we can write
if n = 1 then the case for n > 1 as T(n) 5 T(3n/4) + 2bn. Converting
return the (first) element of 5’. this recurrence relation to a closed form, we get that
end if
P%/, nl
pick a random integer r in the range [0, n - l]
let z be the element of S at rank T. T(n) 5 2bn - C (3/4)i.
put the elements of S into three sequences: i=o
Thus, the expected running time of quick-select is O(n).
l L, storing the elements in S less than z
l E, storing the elements in S equal to z
5 Conclusion
l G, storing the elements in S greater than x.
We have discussed the use of randomization in teaching
if k 5 IL1 then several key concepts on data structures and algorithms.
quickSelect(L, lc) In particular, we have presented simplified analyses for
else if k 5 IL1 + IEl then skip lists and randomized quick-sort, suitable for a CS2
return 2 {each element in E is equal to x} course, and for randomized quick-select suitable for a
else DS&A course.
quickSelect(G, k - IL1 - IEI) These simplified analyses, as well as some additional
end if ones, can also be found in the recent book on data struc-
tures and algorithms by the authors [2]. The reader
Code Fragment 4: Randomized quick-select algorithm. interested in further study of randomization in data
structures and algorithms is also encouraged to examine
We note that randomized quick-select runs in O(n2) the excellent book on Randomized Algorithms by Mot-
worst-case time. Nevertheless, it runs in O(n) expected wani and Raghavan (41 or the interesting book chapter
time, and is much simpler than the well-known deter- by Seidel on “backwards analysis” of randomized algo-
ministic selection algorithm that runs in O(n) worst-case rithms [8].
time (e.g., see [l]). A s was the case with our quick-sort
analysis, our analysis of randomized quick-select is sim- References
pler than existing analyses, such as that in [l]. PI T. H. Cormen, C. E. Leiserson, and R. L. Rivest. In-
Let t(n) denote the running time of randomized troduction to Algorithms. MIT Press, Cambridge, MA,
quick-select on a sequence of size n. Since the random- 1990.
ized quick-select algorithm depends on the outcome of PI M. T. Goodrich and R. Tamassia. Data Structures and
random events, its running time, t(n), is a random vari- Algorithms in Java. John Wiley and Sons, New York,
able. We are interested in bounding E(t (n)), the ex- 1998.
pected value of t(n). Say that a recursive invocation of [31 P. Kirschenhofer and H. Prodinger. The path length of
randomized quick-select is “good” if it partitions S, so random skip lists. Acta Informatica, 31:775-792, 1994.
that the size of L and G is at most 3n/4. Clearly, a re- PI R. Motwani and P. Raghavan. Randomized Algorithms.
cursive call is good with probability l/2. Let g(n) denote Cambridge University Press, New York, NY, 1995.
the number of consecutive recursive invocations (includ- [51 T. Papadakis, J. I. Munro, and P. V. Poblete. Average
ing the present one) before getting a good invocation. search and update costs in skip lists. BIT, 32:316-332,
Then 1992.
t(n) 5 bn *g(n) + t(3n/4), PI P. V. Poblete, J. I. Munro, and T. Papadakis. The bi-
nomial transform and its application to the analysis of
where b 2 1 is a constant (to account for the overhead
skip lists. In Proceedings of the European Symposium on
of each call). We are, of course, focusing in on the case
Algorithms (ESA), pages 554-569, 1995.
where n is larger than I, for we can easily characterize
in a closed form that t( 1) = 6. Applying the linearity of [71 W. Pugh. Skip lists: a probabilistic alternative to baI-
expectation property to the general case, then, we get anced trees. Commun. ACM, 33(6):668-676, 1990.
57