You are on page 1of 29

Algorithm Design Techniques: Greedy Algorithms

Introduction
Algorithm Design Techniques
Design of algorithms Algorithms commonly used to solve problems
Greedy, Divide and Conquer, Dynamic Programming, Randomized, Backtracking

General approach Examples Time and space complexity (where appropriate)

Greedy Algorithms
Choose the best option during each phase
Dijkstra, Prim, Kruskal

Making change
Choose largest bill at each round Does this always work?

Bad examples where greedy does not work?

Greedy Algorithms
Must have
Greedy-choice property: a globally optimal solution can be arrived at by making a locally optimal choice Optimal substructure: an optimal solution to a problem contains optimal solutions to its subproblems

Making Change
Greedy choice property
Highest denomination coin < n will reside in solution if not, it will be replaced by two or more smaller coins which will be more coins and not optimal This is also true for 1, 7, 10 denominations???

Optimal substructure
Solution for (n highest denomination coin) is optimal

Scheduling
Given jobs j1, j2, j3, ..., jn with known running times t1, t2, t3, ..., tn what is the best way to schedule the jobs to minimize average completion time? Job j1 j2 j3 j4 Time 15 8 3 10

Scheduling
j1 j2 j3 j4 36 15 23 26 Average completion time = (15+23+26+36)/4 = 25

j3

j2

j4

j1 36

3 11 21 Average completion time = (3+11+21+36)/4 = 17.75

Scheduling
Greedy-choice property: if shortest job does not go first, the y jobs before it will complete 3 time units faster, but j3 will be postponed by time to complete all jobs before it Optimal substructure: if shortest job is removed from optimal solution, remaining solution for n-1 jobs is optimal

Optimality Proof
Total cost of a schedule is
N
k=1

(N-k+1)tik

t1 + (t1+t2) + (t1+t2+t3) ... (t1+t2+...+tn)


N k=1 N k=1

(N+1)tik - k*tik First term independent of ordering, as second term increases, total cost becomes smaller

Scheduling
Suppose there is a job ordering such that x > y and tix < tiy Swapping jobs (smaller first) increases second term decreasing total cost Show: xtix + ytiy < ytix + xtiy xtix + ytiy = xtix + ytix + y(tiy - tix) = ytix + xtix+ y(tiy - tix) < ytix + xtix+ x(tiy - tix) = ytix + xtix+ xtiy - xtix = ytix + xtiy

More Scheduling
Multiple processor case
Algorithm?

More Scheduling
Multiple processor case
Algorithm:
order jobs shortest first schedule jobs round-robin

Minimizing final completion time


When is this useful? How is this different? Problem is NP-Complete!

Huffman Codes
100 ASCII characters Need ceil(log 100) bits to represent each character Large file = lots of bits! Would like to reduce number of bits

Huffman Codes
Idea encode frequently occurring characters using fewer bits Need to make sure all characters are distinguishable
01 = A 0101 = B 010101 =? AAA, AB, BA

No character code should be a prefix of another character code

Huffman Codes
Goal: find a full binary tree of minimum cost where characters are stored in the leaves Cost of tree: sum across all characters of the frequency of the character times its depth in the tree
frequently occurring characters should be highest in the tree

Huffman Codes
e t s
Character a e i s t space newline total Code 001 01 10 00000 0001 11 00001

sp

nl
Frequency 10 15 12 3 4 13 1 Total Bits 30 30 24 15 16 26 5 146

Huffmans Algorithm
How do we produce a code?
Maintain a forest of trees
weight of a tree is the sum of the frequencies of the leaves start with C trees to represent each character
weight of each is frequency of that character

Until there is only 1 tree


choose the 2 trees with the smallest weights and merge them by creating a new root and making each tree a right or left subtree

Running time O (ClogC)

Optimality Proof Idea


1. The tree must be full
if it is not, move leaf with no siblings to its parent

2. Least frequent characters are the deepest nodes


if not, a node can be swapped with an ancestor

3. Characters at the same depth can be swapped 4. As trees are merged, optimality holds

Optimality Proof Idea


Greedy choice property: given x and y -characters with lowest frequency in alphabet C, there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit
Take an arbitrary optimal prefix code and modify it to make it a tree representing another optimal prefix code such that x and y are sibling leaves of max depth

Optimality Proof Idea


Optimal substructure: C = C {x, y} U {z} where f[z] = f[x]+f[y] T is optimal tree for C Replace z in T with internal node having x and y as children Result is optimal prefix code for C

Approximate Bin Packing


N items of sizes s1, s2, ..., sN 0 < si <= 1 Goal: pack into fewest number of bins of size 1 NP-complete problem, but we can use greedy algorithms to produce solutions not too far from optimal Knapsack problem Examples?
Saving data to external media

Example Optimal Packing


Input: .2, .5, .4, .7, .1, .3, .8

.3 .5 .8 .7 .2

.4 .1

On-line vs Off-line
On-line
Process one item at a time Cannot move an item once it is placed

Off-line
Look at all items before you place first item

On-line Algorithms
On-line algorithms cannot guarantee optimal solution
Problem: cannot know when input will end M small items - M large items + Can fit into M bins with 1 large and 1 small in each bin If all small come first, place in M separate bins If input is only M small items, we have used twice as many bins as necessary There are inputs that force any on-line bin-packing algorithm to use at least 4/3 the optimal number of bins.

On-line Bin Packing Algorithms


Next fit First fit Best fit

On-line Bin Packing Algorithms


Next fit
Algorithm
if item first in bin with last item place there else place in new bin

(.2, .5) (.4) (.7, .1) (.3) (.8) Running time? Let M be the optimal number of bins required to pack a list I of items. Then next fit never uses more than 2M bins.
At most, half of the space is wasted (Bj + Bj+1 > 1)

On-line Bin Packing Algorithms


First fit
Algorithm
Scan all bins and place item in first bin large enough to hold it if no bin is large enough, create new bin

(.2, .5, .1) (.4, .3) (.7) (.8) Running time? Let M be the optimal number of bins required to pack a list I of items. Then first fit never uses more than ceil(17/10M) bins.

On-line Bin Packing Algorithms


Best fit
Algorithm
Scan all bins and place item in bin with tightest fit (will be fullest after item is placed there) if no bin is large enough, create new bin

(.2, .5, .1) (.4) (.7, .3) (.8) Running time? Same performance as first fit.

Off-line Bin Packing


Sort items (in decreasing order) first for easier placement of large items Apply first fit or best fit algorithm
First fit (.8, .2) (.7, .3) (.5, .4, .1)

Let M be the optimal number of bins required to pack a list I of items. Then first fit decreasing never uses more than (11/9M)+4 bins.

You might also like