You are on page 1of 32

Probability Theory and Statistics

Lecture 6
October 7, 2014
Robert Dahl Jacobsen
robert@math.aau.dk
Department of Mathematical Sciences
Aalborg University

Robert DJ | Probability Theory and Statistics

Agenda

Estimation
Two means
Likelihoods
Matlab

Robert DJ | Probability Theory and Statistics


2

Statistics in a nutshell

Model:
Xi N(, 2 )

Estimation:

b = x,

b2 = s2

= 0 ,

2 = 02

Hypothesis test:

Robert DJ | Probability Theory and Statistics


3

Estimation
population:

I
I

sample: b

Point estimate: Estimate of population parameter () from


b
sample ().
b
Estimator: Corresponding random variable ().
parameter

estimate

estimator

x
s2

X
S2

Robert DJ | Probability Theory and Statistics


4

Unbiased estimate
I

Unbiased estimator:
b =
E()

Example:
Xi N(, 2 ),
X =
S2 =

1
n1

1
n

n
X
i=1

i = 1, . . . , n
 2 
Xi N ,
n

n
X
(Xi X )2
i=1

2 2
(n 1)
n1

Then:
X and S 2 are independent
E(X ) =
E(S 2 ) = 2

Robert DJ | Probability Theory and Statistics

Confidence interval

Repeat 10 times:
I
I

100 samples from N(0, 1).


Compute average x:
0.015, 0.085, 0.036, 0.088, 0.0043,
0.015, 0.015, 0.0067, 0.081, 0.14

Questions:
I
I

Is it okay that all x 6= 0?


How far from 0 can x before it is not okay?

Confidence interval: An interval that is pretty certain to contain x.

Robert DJ | Probability Theory and Statistics

Confidence interval for mean


Known variance

Sample:
Xi N(, 2 ),

i = 1, . . . , n

Notation:
z = fractile of N(0, 1)
 2 
X N ,
n

(1 )100% confidence interval for :

x + z/2 x + z1/2
n
n

Shorthand:

x z/2
n

Robert DJ | Probability Theory and Statistics

Confidence interval: Interpretation

We are (1 )100% confident that is in the CI.

20 samples with 100 observations from N(0, 2):


1 : x1,1 , x1,2 , . . . , x1,100
x1
2 : x2,1 , x2,2 , . . . , x2,100
x2
..
.

20 : x20,1 , x20,2 , . . . , x20,100


95% confidence interval:
x 1.96

2
10

Expect one confidence interval without 0:


http://xkcd.com/882

x 20

Robert DJ | Probability Theory and Statistics


8

Confidence intervals

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8
1

10 11 12 13 14 15 16 17 18 19 20

Robert DJ | Probability Theory and Statistics


9

Chocolate bars
In a sample of 20 chocolate bars the amount of calories has been
measured. We have:
I

the corresponding random variable is approx. normally


distributed.

the population standard deviation is 10 calories.


the sample mean is 224 calories.

Calculate 90% and 95% confidence intervals for the mean. Which
one is larger?
Solutions:

10
[220.3; 227.7].
90% confidence interval: 224 1.64
20
10
[219.6; 228.4].
95% confidence interval: 224 1.96
20

Robert DJ | Probability Theory and Statistics

Confidence interval for mean


Unknown variance

Sample:
Xi N(, 2 ),

i = 1, . . . , n

Notation:
t = fractile of t(n 1)
n

s2 =

1X
(xi x)2
n
k =1

(1 )100% confidence interval for :


s
s
x + t/2 x t1/2
n
n
or
s
x t/2
n

Note: t < z

10

Robert DJ | Probability Theory and Statistics

Normal or t distribution?

11

General form of confidence interval for mean:


std
x fractile
n
Situation 1

Situation 2

Observations from N(, )


(unknown mean and variance)

Observations from N(, 2 )


(unknown mean)

Estimate:

Estimate:

mean = x,
I

variance = s2

Use
I
I

mean = x
I

fractile from t distribution


std = s2

Use
I

fractile from normal


distribution
std = 2

Robert DJ | Probability Theory and Statistics

More chocolate bars

12

In a sample of 20 chocolate bars the amount of calories has been


measured. We have:
I

the corresponding random variable is approx. normally


distributed.

the sample standard deviation is 10 calories.

the sample mean is 224 calories.

Calculate 90% and 95% confidence intervals for the mean.


Solutions:

10
[220.1; 227.9].
90% confidence interval: 224 1.73
20
10
95% confidence interval: 224 2.1
[219.3; 228.7].
20

Robert DJ | Probability Theory and Statistics

Confidence interval for variance


I

Sample:
Xi N(, 2 ),

i = 1, . . . , n

Notation:
n

s2 =

1 X
(xi x)2
n1
k =1

(n 1)S 2
2 (n 1)
2
2,n1 = fractile of 2 (n 1)
I

(1 )100% confidence interval for s2 :


(n 1)s2
(n 1)s2
2 2
2
1/2,n1
/2,n1

13

Robert DJ | Probability Theory and Statistics

Variating chocolate bars


14

In a sample of 20 chocolate bars the amount of calories has been


measured. We have:
I

the sample standard deviation is 10 calories.

Calculate 90% and 95% confidence intervals for the variance.

Solutions:
h
i
2
19102
[63.0; 187.8]
90% confidence interval: 1910
30.1 ; 10.1
h
i
19102 19102
95% confidence interval: 32.9 ; 8.9 [57.8; 213.3]

Robert DJ | Probability Theory and Statistics

Difference in means
Known variances
15

Two populations:
X1,i N(1 , 12 )
X2,i N(2 , 22 )

Two samples:
x1,1 , x1,2 , . . . , x1,n1
x2,1 , x2,2 , . . . , x2,n2

Estimate of 1 2 :
x1 x2 =

n1
n2
1 X
1 X
x1,i
x2,i
n1
n2
i=1

i=1

Confidence interval:
s
s
12
22
12
2
(x 1 x 2 )+z/2
+
1 2 (x 1 x 2 )+z1/2
+ 2
n1
n2
n1
n2

Robert DJ | Probability Theory and Statistics

Test of two means


Unknown & equal variances
16

Degrees of freedom: = n1 + n2 2

Pooled variance estimate:


sp2 =

(n1 1)s12 + (n2 1)s22


n1 + n2 2

Confidence interval:
s
(x 1 x 2 ) + t/2, sp

1
1
+
1 2
n1
n2
s
(x 1 x 2 ) + t1/2, sp

1
1
+
n1
n2

Robert DJ | Probability Theory and Statistics

Test of two means


Unknown & unequal variances
17

Degrees of freedom:
=

(s12 /n1 + s22 /n2 )2


s12 /n1
n1 1

s22 /n2
n2 1

Confidence interval:
s
s
s12
s22
s12
s2
(x 1 x 2 )+t/2,
+
1 2 (x 1 x 2 )+t1/2,
+ 2
n1
n2
n1
n2

Robert DJ | Probability Theory and Statistics

Likelihood approach: Motivation


I
I
I
I

18

Flip coin 10 times.


X = # heads B(10, p)
We observe 3 heads.
What is p / which p explains our data best?
p 7 b(3; 10, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood approach: Motivation


I
I
I
I

18

Flip coin 10 times.


X = # heads B(10, p)
We observe 8 heads.
What is p / which p explains our data best?
p
7 b(3; 10, p)
p
7 b(8; 10, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood, contd
19

p 7 b(3; 10, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood, contd
19

p 7 b(6; 20, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood, contd
19

p 7 b(9; 30, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood, contd
19

p 7 b(12; 40, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood, contd
19

p 7 b(15; 50, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood, contd
19

p 7 b(18; 60, p)

0.3

0.2

0.1

0.2

0.4

0.6

0.8

Robert DJ | Probability Theory and Statistics

Likelihood function
The general approach
20

Joint density function of X1 , X2 , . . . , Xn :


f (x1 , x2 , . . . , xn ; )

is the parameter (vector) of f = parameter of interest.

The likelihood function:


L(; x1 , x2 , . . . , xn ) = f (x1 , x2 , . . . , xn ; )

The log-likelihood function:


l(; x1 , x2 , . . . , xn ) = log L(; x1 , x2 , . . . , xn )

Notice:
Density: (x1 , x2 , . . . , xn ) 7 f (x1 , x2 , . . . , xn ; ) ( fixed)
Likelihood: 7 f (x1 , x2 , . . . , xn ; ) (data fixed)

Robert DJ | Probability Theory and Statistics

Likelihood function
21

Maximum likelihood estimate (MLE):


b = argmax f (x1 , x2 , . . . , xn ; )

MLE is not necessarily unique

Exact optimization can be difficult


Numerical optimization can be

I
I

time consuming to run


time consuming to program

Easier with independent observations:


f (x1 , x2 , . . . , xn ; ) =

n
Y
i=1

f (xi ; )

Robert DJ | Probability Theory and Statistics

Likelihood function: Example


22

I
I
I

Independent observations: x1 , x2 , . . . , xn , Xi N(, )


Parameter vector: = (, 2 ).
Likelihood function:
n
n
 (x )2 
Y
Y
1
i

L(; x1 , x2 , . . . , xn ) =
f (xi ; ) =
exp
2
2 2
2
i=1
i=1
n
 1 X

1
2
=
exp

(x

)
i
2 2
(2 2 )n/2
i=1

Log-likelihood function:
n
n
n
1 X
l(; x1 , x2 , . . . , xn ) = log(2) log( 2 )
(xi )2
2
2
2 2
i=1

Maximum likelihood estimate:


n

= x,

2 =

1X
(xi x)2 6= s2
n
i=1

Robert DJ | Probability Theory and Statistics

Good vs Best

23

MLE gives the best parameter within the chosen model class.

This does not guarantee that the model is good.

Fit with normal distribution: x 0, s2 9.8


0.15
0.12
0.09
0.06
0.03

Robert DJ | Probability Theory and Statistics

Good vs Best

23

MLE gives the best parameter within the chosen model class.

This does not guarantee that the model is good.

Fit with normal distribution: x 0, s2 9.8


0.15
0.12
0.09
0.06
0.03

Robert DJ | Probability Theory and Statistics

Matlab

(1 )100% Confidence interval for mean with known variance:


mean(x) + [-1 1] * norminv(1-alpha/2) * std(x) / sqrt(n)

(1 )100% Confidence interval for mean with unknown


variance:
mean(x) + [-1 1] * tinv(1-alpha/2, n-1) * std(x) / sqrt(n)

(1 )100% Confidence interval for variance:


(n-1)*std(x)^2 ./ chi2inv( [alpha/2 1-alpha/2], n-1 )

24

You might also like