You are on page 1of 40

MH3400 Algorithms for the Real World

Andrew Lim

Course Administration

Instructor Information

Andrew Lim
Email : andrewlim@ntu.edu.sg
Office Phone : 6513 8652
Skype Contact: limandrew@hotmail.com, please identify
yourself when contacting me for the first time

Lecture
Every TUE from 1330 1530 hrs SPMS LT2, please take
note of some changes

Tut/Lab
Every FRI from 08301030hrs COMPLAB1 from Wk2-Wk13
Some Tut/labs may be taught by my grad students.
2

Course Administration

Grading Criteria
2-hour Final Written Exam 50%
Continual Assessment
50%
Written and Lab Assignments, weights would be given for each
assignment.
Class participation coupon

Text Books and Materials


No required text book. You can use the internet, it is a
great source. Below are some texts on algorithms
Dasgupta Papadimitriou and Vazirani, Algorithms 2006
Mehlhorn and Sanders, Algorithms and Data Structures : The
Basic Toolbox, 2007
Lehman and Leighton, Mathematics for Computer Science

More public materials will be forthcoming when required

Schedule and Changes - Lectures


Jan 13:
Jan 20:
Jan 27:
Feb 3:
Feb 10:
Feb 17:
Feb 24:
Feb 28:

Lect 1
Postponed
Postponed
Lect 2
Lect 3
Lect 4
Lect 5
**Makeup

1000-1200hrs Lect 6
1330-1530hrs Lect 7

Mar10:
Mar17:
Mar24:
Mar31:
Apr7:
Apr14:

Lect 8
Lect 9
Lect 10
Lect 11
Lect 12
Lect 13

Note that
Mar 2-6 term-break
4

Schedule Lab/Tutorial
Jan 16:
Jan 23:
Jan 30:
Feb 6:
Feb 13:
Feb 20:
Feb 27:
Mar 13:

No Lab
Mar20: Lab 7
Lab 1
Mar27: Lab 8
Lab 2
Apr3:
** Good Friday
Lab 3
Apr10:
Lab 9
Lab 4
Apr17:
Lab 10
** Lunar NY
Note that
Lab 5
Mar 2-6 term-break
Lab 6

Properties of an algorithm?

Input
Zero or more quantities are externally supplied

Output
At least one quantity is produced

Definiteness
Each instruction is clear and unambiguous

Finiteness
Terminates after a finite number of steps

Effectiveness
Every instruction must be very basic so that it can be
carried out in principle by a person using only pencil
and paper. It not only need to be definite, it has to be
feasible

Four distinct areas of study


How to devise algorithms
How to validate algorithms
How to analyze algorithms
How to test algorithms

Sorting

Selection Sort
//third attempt

for (i=1; i<=n, i++){


int j=i;
for (int k=i+1; k<=n; k++)
if (a[k]<a[j]) j=k;
t=a[i]; a[i]=a[j]; a[j]=t;
}

Selection Sort
//third attempt
void selectionsort(Type a[], int n)
//sort the array a[1:n] into non-decreasing
order
{
for (i=1; i<=n, i++){
int j=i;
for (int k=i+1; k<=n; k++)
if (a[k]<a[j]) j=k;
Type t=a[i]; a[i]=a[j]; a[j]=t;
}
}
10

Four distinct areas of study

How to devise algorithms


Selection Sort method, what other methods?

How to validate algorithms


How to prove correctness?

How to analyze algorithms


What about performance? Theoretically and in
practice?

How to test algorithms


Correctness
Performance
11

Other Sorting Algorithms


Insertionsort
Bubblesort
Mergesort
Quicksort

12

Insertion Sort
input array

at each iteration, the array is divided in two sub-arrays:


left sub-array

sorted

right sub-array

unsorted

13

Insertion Sort

14

Bubble Sort
8

i=1

i=1

i=1

i=1

i=3

i=4

9
j

i=5

i=2

i=1

i=1

i=1

9
j

i=6

9
i=7
j
15

Mergesort - Divide
1

5 2

5 2

5 2

1 3

1 3

16

Mergesort Conquer and Merge


1

1 2

2 4

2 5

1 2

1 3

17

Quicksort

18

Type of Analysis

Worst case
Provides an upper bound on running time

Best case
Provides a lower bound on running time

Average case
Provides a prediction about the running time
Assumes that the input is random

** Benchmark case
Provide the prediction about the running on cases
that are relevant to the problem the algorithm is
solving
19

What to compare objectively?


Compare
Count

execution times?

the number of statements executed?

Express

running time as a function of the


input size n (i.e., f(n)).
Compare different functions corresponding to
running times.
Such an analysis is independent of machine time,
programming style, etc.

20

Selection Sort

21

Asymptotic Analysis
To

compare two algorithms with running


times f(n) and g(n), we need a rough
measure that characterizes how fast each
function grows.
We use rate of growth
Compare functions in the limit, that is,
asymptotically! i.e., for large values of n

22

Rate of Growth
The

low order terms in a function are


relatively insignificant for large n
n4 + 10n3 + 100n2 + 1000n + 10 ~ n4

we say that n4 + 10n3 + 100n2 + 1000n + 10


and n4 have the same rate of growth

23

Asymptotic notation

24

Asymptotic notation

25

Asymptotic notation

(g(n)) is the set of functions


with the same order of growth
as g(n)

26

O-notation
n4

+ 10n3 + 100n2 + 1000n + 10 is O(n4)


12345 is O(1)
n2 + 3n is O(n2) Selection Sort
Bubblesort?
Insertion sort?
Mergesort?
Quicksort?
Best Case? And Average case?
27

Class Exercise

For each of the following pairs of functions, either f(n) is


O(g(n)), f(n) is (g(n)), or f(n) = (g(n)). Determine
which relationship is correct.
f(n) = log n2; g(n) = log n + 5
f(n) = n; g(n) = log n2
f(n) = log log n; g(n) = log n
f(n) = n; g(n) = log2 n
f(n) = n log n + n; g(n) = log n
f(n) = 10; g(n) = log 10
f(n) = 2n; g(n) = 10n2
f(n) = 2n; g(n) = 3n
28

Identifying the Repeated Element


Consider an array of a[] of n numbers that has
n/2 distinct elements and n/2 copies of another
element. Propose an algorithm to find the
repeated element
Method 1?

Method 2?

29

An Algorithm to identify repeated element


int RepeatedElement(Type a[], int n)
{
while (1) {
int i = random()%n+1;
int j = random()%n+1;
//i and j are random numbers in [1,n]
if ((i!=j) && (a[i]==a[j])) return (i);
}
}

30

Identifying the Repeated Element

What is the probability that in an iteration


repeated elements are found and iteration will
quit is:
n/2 ( n/2 1)
.

for all n

10

This means that it wont quit <

Probability that it wont quit in 10 iterations is


< ( )10 < 0.1074

100 iterations is
< ( )100 < 2.04 x 10-10
31

Randomized Algorithm Asymptotic Complexities

The () like the O() notation is used to


characterize the run times of non randomized
algorithms, () is used for characterizing the run
times of Las Vegas algorithms. We say a Las
Vegas algorithm has a resource (time, space, etc)
bound of (g(n)) if there exists a constant c such
that the amount of resource used by the
algorithm (on any input of size n) is no more
than
with probability
. We
shall refer to these bounds as high probability
bounds.
32

Identifying the Repeated Element


For a few million elements, any deterministic
algorithm will certainly spend a few million
steps while our simplistic randomize algorithm
will almost certainly quit in 100 steps.
In general, the algorithms does not quit in the
first
iterations is

< ( )

log

if we pick
Note that

x
log log x log y
y

-c log

< n-

a logb x x logb a
33

Identifying the Repeated Element


This

means that the algorithm will


iterations or
terminates in

less with probability of


n Since each iteration of the while loop takes
O(1) time, the run time of this algorithm is
(log n).

34

And the Real World


Math + Computing A killer combination
At about week 7, Algorithms (Lecture) and Real
World (Lab/Tut) will be meet!
Real world dataset from a famous fashion
company. Dataset has undergone transformation
and masking to prevent leakage of company
sensitive information
Transformation maintained essential properties
of the data for analysis

35

Additional Course Materials Repository


URL:

http://data.computationallogistics.org:8081/NTU2015MH3400/

Username: NTU2015MH3400
Password: diFAcyugLeDu

36

Data Sample

37

Assignment 1 Individual (150 marks)


Deadline: at the beginning of Lab 1 on Jan 23
Part 1 50 marks
Visit the url http://dev.mysql.com/downloads/mysql/
Download and install mysql database in your
notebook/computer
Visit the url http://www.heidisql.com/
Download and install heidisql in your notebook/computer
Visit the url http://data.computationallogistics.org:8081/NTU2015MH3400/

Download the database dump sku_s_d_f.sql


Create a new database using heidisql
Load one of the above database dump into heidisql

38

Assignment 1 - Individual
Part 2 100 marks
Write the relevant SQL statements to generate the
following and cut and paste the screenshot of the
results into a word document to show the results.
a) Find the # of product groups
b) Find the # of products
c) Find the # of sizes
d) Find the # of different colors
e) Find the # of stores

39

Assignment 1 - Individual
Part 2 100 marks
f) Find for each store what is its total
sales/revenue for year 2012, 2013, 2014
g) Find the total sales/revenue for each day of the
week (i.e. Monday, Tuesday, Wednesday, etc)
h) Find the total sales/revenue for 2012, 2013, 2014
i) Find the total sales/revenue for each month
from Jan to Dec for the year 2013
j) Find the total sales of each store for the year
2013
The assignment is not simple, start early!
40

You might also like