You are on page 1of 26

Bucket Sort and Radix Sort

Time complexity of Sorting

Several sorting algorithms have been


discussed and the best ones, so far:

Can we do better than O( n log n )?

10/02/05

Heap sort and Merge sort: O( n log n )


Quick sort (best one in practice): O( n log n )
on average, O( n2 ) worst case
No.
It can be proven that any comparison-based
sorting algorithm will need to carry out at
least O( n log n ) operations
BucketSor
t
Slide 2

Restrictions on the problem

Suppose the values in the list to be


sorted can repeat but the values have a
limit (e.g., values are digits from 0 to 9)
Sorting, in this case, appears easier
Is it possible to come up with an
algorithm better than O( n log n )?

10/02/05

Yes
Strategy will not involve comparisons
BucketSor
t
Slide 3

Decision-tree example
Sort a1, a2, , an

1:2

2:3
123

1:3
213

1:3
132

312

2:3
231

321

Each internal node is labeled i:j for i, j {1, 2,, n}.


The left subtree shows subsequent comparisons if ai aj.
BucketSor
The right subtree shows subsequent comparisons
t if ai aj.
10/02/05

Slide 4

Decision-tree example
Sort a1, a2,
a3 9, 4, 6

1:2

94

2:3

123

1:3
213

1:3
132

312

2:3
231

321

Each internal node is labeled i:j for i, j {1, 2,, n}.


The left subtree shows subsequent comparisons if ai aj.
BucketSor
if ai aj.
The right subtree shows subsequent comparisons
10/02/05

t
Slide 5

Decision-tree example
Sort a1, a2,
a3 9, 4, 6

1:2
2:3

123

1:3
213

1:3
132

312

96
2:3

231

321

Each internal node is labeled i:j for i, j {1, 2,, n}.


The left subtree shows subsequent comparisons if ai aj.
BucketSor
if ai aj.
The right subtree shows subsequent comparisons
10/02/05

t
Slide 6

Decision-tree example
Sort a1, a2,
a3 9, 4, 6

1:2
2:3

123

1:3
213

1:3
132

312

4 6 2:3
231

321

Each internal node is labeled i:j for i, j {1, 2,, n}.


The left subtree shows subsequent comparisons if ai aj.
BucketSor
if ai aj.
The right subtree shows subsequent comparisons
10/02/05

t
Slide 7

Decision-tree example
Sort a1, a2,
a3 9, 4, 6

1:2
2:3

123

1:3
213

1:3
132

312

2:3
231

321

469
Each leaf contains a permutation , ,, (n) to
indicate that the ordering a(1) a(2) a(n) BucketSor
has been
established.
t
10/02/05

Slide 8

Decision-tree model
A decision tree can model the execution of
any comparison sort:
One tree for each input size n.
View the algorithm as splitting whenever
it compares two elements.
The tree contains the comparisons along
all possible instruction traces.
The running time of the algorithm = the
length of the path taken.
BucketSor
Worst-case running time = height ofttree.
10/02/05

Slide 9

Any comparison sort


Can be turned into a Decision tree
class InsertionSortAlgorithm {

1:2

for (int i = 1; i < a.length; i++) {


int j = i;
while ((j > 0) && (a[j-1] > a[i])) {

2:3

1:3

a[j] = a[j-1];
j--; }

123

213

1:3

2:3

a[j] = B; }}

132

10/02/05

312

231
321
BucketSor
t
Slide 10

Lower bound for decisiontree sorting


Theorem. Any decision tree that can sort n
elements must have height (n lg n) .
Proof. The tree must contain n! leaves, since
there are n! possible permutations. A height-h
binary tree has 2h leaves. Thus, n! 2h .
h lg(n!)
(lg is mono. increasing)
lg ((n/e)n)
(Stirlings formula)
= n lg n n lg e
BucketSor
= (n lg n) .
t
10/02/05

Slide 11

Bucket sort

Idea: suppose the values are in the range


0..m-1; start with m empty buckets
numbered 0 to m-1, scan the list and
place element s[i] in bucket s[i], and then
output the buckets in order
Will need an array of buckets, and the
values in the list to be sorted will be the
indexes to the buckets

10/02/05

No comparisons will be necessary

BucketSor
t
Slide 12

Example
4 2 1

10/02/05

2 0 3 2 1

0
0
0

1
1

0 0 0 1

2
2
2
2

4 0 2 3 0

3
3

4
4

2 2 2 2 3 3 BucketSor
4 4
t
Slide 13

Bucket sort algorithm


Algorithm BucketSort( S )
( values in S are between 0 and m-1 )
for j 0 to m-1 do
b[j] 0
for i 0 to n-1 do
b[S[i]] b[S[i]] + 1
i0
for j 0 to m-1 do
for r 1 to b[j] do
S[i] j
ii+1
10/02/05

// initialize m buckets
// place elements in their
// appropriate buckets
// place elements in buckets
// back in S
BucketSor
t
Slide 14

Values versus entries

If we were sorting values, each bucket is


just a counter that we increment
whenever a value matching the buckets
number is encountered
If we were sorting entries according to
keys, then each bucket is a queue

10/02/05

Entries are enqueued into a matching bucket


Entries will be dequeued back into the array
after the scan
BucketSor
t
Slide 15

Bucket sort algorithm


Algorithm BucketSort( S )

( S is an array of entries whose keys are between 0..m-1 )


for j 0 to m-1 do
//
initialize queue b[j]
for i 0 to n-1 do
//
b[S[i].getKey()].enqueue( S[i] );
i0
for j 0 to m-1 do
//
while not b[j].isEmpty() do
//
S[i] b[j].dequeue()
ii+1
10/02/05

initialize m buckets
place in buckets
place elements in
buckets back in S
BucketSor
t
Slide 16

Time complexity

Bucket initialization: O( m )
From array to buckets: O( n )
From buckets to array: O( n )

Since m will likely be small compared to n,


Bucket sort is O( n )

10/02/05

Even though this stage is a nested loop, notice that


all we do is dequeue from each bucket until they are
all empty > n dequeue operations in all

Strictly speaking, time complexity is O ( n + m )


BucketSor
t
Slide 17

Sorting integers

Can we perform bucket sort on any array of


(non-negative) integers?

If you are sorting 1000 integers and the


maximum value is 999999, you will need 1 million
buckets!

10/02/05

Yes, but note that the number of buckets will


depend on the maximum integer value

Time complexity is not really O( n ) because m is


much > than n. Actual time complexity is O( m )

Can we do better?

BucketSor
t
Slide 18

Radix sort

Idea: repeatedly sort by digitperform


multiple bucket sorts on S starting with the
rightmost digit
If maximum value is 999999, only ten buckets
(not 1 million) will be necessary
Use this strategy when the keys are integers,
and there is a reasonable limit on their values

10/02/05

Number of passes (bucket sort stages) will depend


on the number of digits in the maximum value
BucketSor
t
Slide 19

Example: first pass


12 58 37 64 52 36 99 63 18 9

20

10/02/05

12
52

63

64

37
36 47

20 88 47

58
18
88

9
99

20 12 52 63 64 36 37 47 58 18 88BucketSor
9 99
t
Slide 20

Example: second pass


20 12 52 63 64 36 37 47 58 18 88 9

9
10/02/05

12
18

20

36
37

52
47 58

63
64

88

99

99

12 18 20 36 37 47 52 58 63 64BucketSor
88 99
t
Slide 21

Example: 1st and 2nd passes


12 58 37 64 52 36 99 63 18 9

20 88 47

sort by rightmost digit

20 12 52 63 64 36 37 47 58 18 88 9

99

sort by leftmost digit

10/02/05

12 18 20 36 37 47 52 58 63 64 88 99
BucketSor
t
Slide 22

Radix sort and stability

Radix sort works as long as the bucket sort


stages are stable sorts
Stable sort: in case of ties, relative order of
elements are preserved in the resulting array

10/02/05

Suppose there are two elements whose first digit is


the same; for example, 52 & 58
If 52 occurs before 58 in the array prior to the
sorting stage, 52 should occur before 58 in the
resulting array

This way, the work carried out in the previous


bucket sort stages is preserved

BucketSor
t
Slide 23

Time complexity

If there is a fixed number p of bucket


sort stages (six stages in the case where
the maximum value is 999999), then
radix sort is O( n )

10/02/05

There are p bucket sort stages, each taking


O( n ) time

Strictly speaking, time complexity is


O( pn ), where p is the number of digits
(note that p = log10m, where m is the
maximum value in the list)
BucketSor
t
Slide 24

About Radix sort

Note that only 10 buckets are needed


regardless of number of stages since the
buckets are reused at each stage
Radix sort can apply to words

10/02/05

Set a limit to the number of letters in a word


Use 27 buckets (or more, depending on the
letters/characters allowed), one for each
letter plus a blank character
The word-length limit is exactly the number
of bucket sort stages needed
BucketSor
t
Slide 25

Summary

10/02/05

Bucket sort and Radix sort are O( n )


algorithms only because we have imposed
restrictions on the input list to be sorted
Sorting, in general, can be done in
O( n log n ) time

BucketSor
t
Slide 26