You are on page 1of 13

Probability distribution

functions

Normal distribution
Lognormal distribution
Mean, median and mode
Tails
Extreme value distributions

Normal (Gaussian)
distributionf (x) 1 exp 1 x

Probability density function (PDF)

What does figure tell about the cumulative


distribution function (CDF)? x
F ( x) P ( X x)

f (t )dt

More on the normal


distribution

Normal
distribution is denoted , with the square

giving the variance.


If X is normal, Y=aX+b is also normal. What
would be the mean and standard deviation of Y?
Similarly, if X and Y are normal variables, any
linear combination, aX+bY is also normal.
Can often use any function of a normal random
variables by using a linear Taylor expansion.
Example: X=N(10,0.52) and Y=X2 . Then Y
N(100,102)

Estimating mean and standard


deviation
Given a sample from a normally distributed variable,
the sample mean is the best linear unbiased
estimator (BLUE) of the true mean.
For the variance the equation gives the best
unbiased estimator, but the square root is not an
unbiased estimate of the standard deviation
2

1
1 n

xi x x xi

n 1 i 1
n i 1
For example, for a sample of 5 from
a standard
normal distribution, the standard deviation will be
estimated on average as 0.94 (with standard
deviation of 0.34)
2

Lognormal distribution
If
ln(X) has normal distribution X has
lognormal distribution. That is, if X is
normally distributed exp(X) is
lognormally distributed.
Notation: 1
ln x
f ( x)
exp

2
x 2

PDF
2

Mean and variance


exp / 2 ,
2

X2 Var X e 1 e 2
2

Mean, mode and median


Mode

(highest point) =
Median (50% of samples)
Figure for =0.

Light and heavy tails


Normal
distribution has light tail; 4.5 sigma

is equivalent to 3.4e-6 failure or defect


probability.
Lognormal can have heavy tail

Fitting distribution to data


Usually fit CDF to minimize maximum
distance (Kolmogorov-Smirnoff test)
Generated 20 points from N(3,12).
Normal fit N(3.48,0.932)
Lognormal lnN(1.24,0.26)
1

0.9
0.8

Almost same mean and


0.7

CDF

standard deviation.

0.6
0.5
0.4
0.3
0.2

experimental
lognormal
normal

0.1
0

Extreme value distributions


No matter what distribution you sample from, the
mean of the sample tends to be normally distributed
as sample size increases (what mean and standard
deviation?)
Similarly, distributions of the minimum (or
maximum) of samples belong to other distributions.
Even though there are infinite number of
distributions, there are only three extreme value
distribution.
Type I (Gumbel) derived from normal.
Type II (Frechet) e.g. maximum daily rainfall
Type III (Weibull) weakest link failure

Maximum of normal
samples
With normal distribution, maximum of sample is more
narrowly distributed than original distribution.
8000

9000

Max of 10
standard
normal
samples. 1.54
mean, 0.59
standard
deviation

7000
6000
5000
4000
3000

7000
6000
5000
4000
3000

2000

2000

1000
0
-1

Max of 100
standard normal
samples. 2.50
mean, 0.43
standard
deviation

8000

1000
0

1.5

2.5

3.5

4.5

5.5

Gumbel distribution
1
exp z e z ,

CDF exp(e )
.
Mean, median, mode and variance
PDF

Mean

median ln(ln(2))

2 2
Variance

Euler-Mascheroni constant 0.5772

1
0.9

mode=

fitted ev1
-max10 data

0.9

0.8

fitted ev1
-max100 data

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4
0.3

0.3

0.2

0.2

0.1

0.1
0
-5

0
-5.5

-4

-3

-2

-1

-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

Weibull distribution
Probability
Its log has Gumbel dist.

k x
distribution
f ( x; , k )

k 1

x/ k

x 0, k 0, 0

Used to describe distribution of strength or fatigue life in brittle


materials.
If it describes time to failure, then
k<1 indicates that failure rate decreases with time,
k=1 indicates constant rate,
k>1 indicates increasing rate.
Can
add 3rd parameter by replacing x by x-c.
1
0.9

log weibull
ev1 fit

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-8

-6

-4

-2

Exercises
Find how many samples of normally distributed numbers
you need in order to estimate the mean and standard
deviation with an error that will be less than 10% of the
true standard deviation most of the time.
Both the lognormal and Weibull distributions are used to
model strength. Find how closely you can approximate
data generated from a standard lognormal distribution by
fitting it with Weibull.
Take the introduction and preamble of the US Declaration
of Independence, and fit the distribution of word lengths
using the K-S criterion. What distribution fits best?
Compare the graphs of the CDFs. Compare to a more
contemporary text.

You might also like