Professional Documents
Culture Documents
Instructor:
Practice Associat Prof. Michelle CHEONG michcheong@smu.edu.sg Room: 80-04-036 Tel: 6828-0269
Week 10
Probability Functions
Random variables (r.v.) are either discrete or continuous
Discrete r.v. = Uniform, Binomial, Poisson Continuous r.v. = Uniform, Exponential, Normal, and many more
Probability Functions
Discrete Uniform Distribution Continuous Uniform Distribution
1.0
1.0
CDF
CDF
PMF
1 2 3 4 5 6 x 1 2 3 4 5
PDF
6 x
Cumulatively, both discrete and continuous functions will sum to 1.0 (which is the total probability) and known as cumulative distribution function (CDF)
Copyright Michelle Cheong Week 10
4
Bin(7,0.5)
Binomial Distribution
Week 10
5
=3
...
0 1 2 3 4 5 6 7
Week 10
6
fX(x) = e-x
fX(x)
10
20
x
Week 10
7
Week 10
8
Poisson
POISSON(x, mean, cumulative)
X = number of events Mean = expected value cumulative = TRUE returns CDF, FALSE returns PMF
Week 10
9
Normal
NORMSDIST(z) = standard normal
Z = value of interest Returns only CDF
Distribution
Probability = relative frequency Cumulative probability CDF = cumulative relative frequency (CRFdistb)
Copyright Michelle Cheong Week 10
11
Probability Function
Probability Cumulative probability (CDF or CRFdistb)
We match by minimizing the Maximum absolute deviation (MAD) = the largest gap between the cumulative relative frequency of a given data set (CRFdata) and that of its fitted statistical distribution (CRFdistb) The best fitted distribution is one with the smallest MAD
Week 10
12
We are trying to minimize the largest gap between the 2 curves by setting MAD1 the parameters that best describe the distribution, so that the distribution fits the data in the best way. For every iteration, as we change the parameters for CRFdistb, MAD changes position. We will stop when we get the smallest MAD.
Copyright Michelle Cheong Week 10
13
X
By minimizing the squared error between the points and the line, the parameters that describe the line changes. And we stop when the squared error is the least to get the parameters that best describe the line.
Copyright Michelle Cheong Week 10
14
Steps
1. Distribution functions are usually defined by the 4 parameters - Max, min, mean, standard deviation compute these parameters from the raw data Sort raw data in ascending order Compute cumulative relative frequency (CRFdata) of data Compute cumulative relative frequency (CRFdistb) of a known distribution using the raw data and an initial arbitrary input parameters for the distribution For discrete data, we can use uniform binomial, poisson For continuous data, we can use uniform, exponential, normal Compute the MAD = max(abs(CRFdata CRFdistb)) Use Solver to minimize the MAD where the minimization process will change the parameters for the distribution to get the best fit distribution Repeat 4 to 6 for another distribution to get its MAD. Select the distribution with the smallest MAD
Copyright Michelle Cheong Week 10
15
2. 3. 4.
5. 6.
7. 8.
Continuous data
Freq 1 1 1 1 1 1 1 1 1 1 Cum Freq 1 2 3 4 5 6 7 8 9 10 CRF 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10
3 2
4 6
4/10 6/10
10
10/10
Normal
NORMDIST(x, mean, standard_dev, cumulative) NORMDIST(x, mean, standard_dev, TRUE)
Exponential
EXPONDIST(x, 1/mean, cumulative) EXPONDIST(x, 1/mean, TRUE)
Copyright Michelle Cheong Week 10
17
Binomial
BINOMDIST(number_s, trials, probability_s, cumulative) BINOMDIST(x, trials, probability_s, TRUE)
Poisson
POISSON(x, mean, cumulative) POISSON(x, mean, TRUE)
Copyright Michelle Cheong Week 10
18
Hotel Apex
Fit Normal distribution to room sales data so as to estimate number of rooms to keep open during refurbishment period to satisfy 70% of demand Due to maximum capacity of 150 rooms, the data given does not represent real demand, since demand exceeding 150, will only result in 150 rooms being sold. Thus, data given is in fact sales data and NOT demand data. So, can we infer real demand data from sales data? This exercise tells us the importance of understanding data collected and the power of data inference
Week 10
19
Yankee Fruits
Help Paul decides how many melons he has to purchase weekly to satisfy different service level Use 4 methods Frequency Bins & Lookup() function CRF of raw data & Lookup() function Percentile() function NORMINV() function, assuming demand is Normal Which method is the best? - Depends on the situation.
Copyright Michelle Cheong Week 10
20
Yankee Fruits
Method 1: Uses lookup() function with frequency bins - This method is coarse as the bins consolidated data into intervals, so the answer provided will also be in terms of large intervals. However, if the order size is in multiples of 50, then this answer is appropriate. Method 2: Uses lookup() function with CRF of raw data. This method is finer than Method 1. But answers given are in terms of each unit. Method 3: Uses Percentile(). This method will interpolate and generate new data points which may not be integer. Is such fine data needed?
Method 4: Uses Norminv() function. This method assumes demand follows a normal distribution. For service level >= 0.95, the number of melons to order is larger than the largest raw data of 716. Seems to over-estimate.
Copyright Michelle Cheong Week 10
22