You are on page 1of 15

Plan for lecture 12

1. The distribution of the sample variance of independent,


normally distributed random variables (Chap 8.3)
2. The t and F distributions (Chap 8.4 until Beta Distribution)

The distribution of the sample variance


Theorem
(Theorem 8.3.6 ) Let X1 , ..., Xn be independent, N(, 2 )
distributed. Then :
1. X and Xi X are independent.
2. X and S 2 are independent.
2
2 (n 1).
3. (n1)S
2
We now prove the Statement 3.
Recall
2

n 
n 
X
(n 1)S 2 X Xi X
Xi 2
=
and
2 (n)
2

i=1

i=1

Interpretation: If we replace by its estimator X , we have to


hand in a degree of freedom.

The distribution of the sample variance S 2


Proof
We prove the theorem for n = 2, in other words, we show that
S2
2 (1).
2
2

2
1 X
Xi X
S =
21
2

i=1

(X1 X2 )2
.
=
2
X1 , X2 are N(, 2 ) distributed.
Subsequently X1 X2 is normally distributed (as linear
combination of normally distributed variables), with expected value
E (X1 ) E (X2 ) = 0 and variance V (X1 ) + V (X2 ) = 2 2 (X1 and
X2 are independent, so cov (X1 , X2 ) = 0)

The distribution of the sample variance S 2

X
1 X2
is standard
2
I Z 2 is 2 (1) distributed
I

Z=

From Z 2 =

(21)S 2
2

normally distributed
.

follows that

(21)S 2
2

is 2 (1) distributed.

I
I
I

In the same way one can prove that


(n1)S 2
2 (n 1)
2
2
1 Pn
2
Idea: Write (n1)S
=
2
i=1 (Xi )

2
1 Pn
2
2
i=1 (Xi ) (n)
2
(X )2
2
n

(X )2
2
n

2 (1)

Show (using the method of the moment generating function)


that if U, V are two independent variables such that
U + V 2 (k + l) and V 2 (l), then U 2 (k).

S 2 and X are independent variables

Then

(n1)S 2
2

I
I

(n1)S 2
2

(X )2
2
n

(X )2

and

(X )2
2
n

2
n

1
2

are also independent

Pn

i=1 (Xi

2 (1)

This gives

(n1)S 2
2

2 (n 1)

)2 2 (n)

The expected value of S 2

E (S 2 ) = 2 .
Proof:
I

The expected value of a 2 (k) distributed variable is k, so


E (S 2 ) =

2
(n 1)S 2
2
E(
(n 1) = 2
)
=
2
n1

n1

Thus, S 2 on average equals 2 .

The variance of S 2
2 4
Var (S ) =
n1
Interpretation: The larger the value of n, the more the
distribution of S 2 concentrates around 2 (the variance of S 2
approaches 0).
Proof:
2

The variance of a 2 (k)-distributed variable is 2k, so


2


2
(n 1)S 2
Var (S ) =
Var
n1
2
 2 2

2 4
=
2(n 1) =
n1
n1
2

The t-distribution

I
I
I

Let X1 , , Xn be a sample from the N(, 2 ).

Z = (X )/(/ n) N(0, 1)
Which distribution can we use when is not known and is
estimated by S? In other words, what is the distribution of
the following random variable ?
X

S/ n

Observe that

X
(X )/(/ n)
=
S/
S/ n

We know that V = (n 1)S 2 / 2 is 2 (n 1) distributed;

Z=

Hence,

and V = (n 1)S 2 / 2 are independent.

S/ n

(X )/(/ n)
S/

qZ

V
n1

The t-distribution

Let Z be a N(0, 1) distributed variable

Let V be a 2 () distributed variable.

If Z and V are independent, then


Z
T =q

is t-distributed with degrees of freedom.

The t-distribution

In our case, if X1 , , Xn be a random sample from N(, 2 ).

Z = (X )/(/ n) is N(0, 1) distributed;

V = (n 1)S 2 / 2 is 2 (n 1) distributed;

Z and V are independent,

Thus,

X
Z
=p
S/ n
V /

is t-distributed with = n 1 degrees of freedom.

The t -distribution
What is he pdf and cdf of a t distribution?
p
I Let T = Z / V /.
I

We want to calculate the cdf and pdf of a t distributed


random variable. It is sufficient to calculate
FT (t) = P(T t).

p
P(T t) = P(Z / V / t)
Z +
p
=
P(Z / V / t|V = a)fV (a)da
0
Z +
p
=
P(Z t a/|V = v )fV (a)da
0

Because Z and
V are independent,
p
R +
P(T t) = 0 P(Z t a/)fV (a)da

The t-distribution
I

By differentiating
R + p over tpit follows (using the chain rule) that:
fT (t) = 0
a/(t a/)fV (a)da

fT (t) =
0

a 1 t 2 a/2
e

1
a/21 e a/2 da
/2
2 (/2)
Z
2
=C
a(+1)/21 e (1+t / )a/2 da

with C =

1
1
2(+1)/2 (/2)

We recognize under the integral is close to the density function


2
of a Gamma distribution with k = +1
2 and = 1+t 2 / .

Normalizing such an integral with k (k) will result in 1.

Hence the integral equals to k (k).


fT (t) = C k (k)
+1
+1
2
2 (
)
)
2
1 + t /
2
 +1
+1
+1
2
= C 2 2 (
) 1 + t 2 /
2

(+1)/2
1 (( + 1)/2)
t2
=
1+
(/2)

=C (

Properties of the t distribution

Let T be a t distributed variable with degrees of freedom.

The probability density function of a t-distributed variable is


given by
1 (( + 1)/2)
f (t) =
(/2)

For > 1, E (T ) = 0

For > 2, Var (T ) =

2 .


(+1)/2
t2
1+

Properties of the t distribution

The F -distribution
I

Let X11 , , X1n1 be a random sample from N(1 , 2 1 ) (1 is


unknown)

Let X21 , , X2n2 be a different random sample from


N(2 , 2 2 ). (2 is unknown.)

Compare 12 and 22 .

We can use
n

1
2
1 X
=
X1i X1 ,
n1 1

i=1
n

2
2
1 X
=
X2i X2
n2 1

i=1

to estimate 12 and 22 , respectively.

The F distribution
I

V1 is a 2 (1 )-distributed random variable ,

V2 is a 2 (2 )-distributed random variable ,

V1 and V2 are independent.

The distribution of

V1 /1
V2 /2

is called the F -distribution with 1 degrees of freedom in the


numerator and 2 degrees of freedom in the denominator and
is denoted by F (1 , 2 )
I

The probability density function is given by


((1 + 2 )/2)
x 1 /21
(1 /2 )1 /2
(1 /2)(2 /2)
(1 + (1 /2 )x)(1 +2 )/2
for x 0.

The F distribution

This gives:

V1 =

V2 =

V1 and V2 are independent

The random variable

(n1 1)S12
2 1
(n2 1)S22
2 2

has a 2 (n1 1) degrees of freedom


has a 2 (n2 1) degrees of freedom

V1 /(n1 1)
S 2 1 / 2 1
=
F = 2
2
S 2 / 2
V2 /(n2 1)
is F (n1 1, n2 1) .

Property of the F-distribution

If X F (1 , 2 ), then Y = X1 F (2 , 1 ).
Thus P(Y y ) = P( X1 y ) = P(X y1 ) = 1 P(X y1 ).

Looking up in the table on page 609

1. Assume that we want to find b for which P(X < b) = 0.9, if


X F (8, 4).
We search in the column 1 = 8 and the group for 2 = 4, row 0.9
and get b = 3.95.
2. Assume that we want to find b for which P(X < b) = 0.1, if
X F (8, 4).
0.1 is not tabulated, but we can find it as follows.
The probability P(X < b) = P( X1 > b1 ) = 1 P(Y < b1 ) with
Y F (4, 8). We therefore have to find b such that
P(Y < b1 ) = 0.9 for Y F (4, 8). In the 4-th column, group
1
2 = 8, we find the value 2.81. Therefore b = 2.81
.

Exercise 8.15

1. Let Xi N(, 2 ), i = 1, ..., n and Zi N(0, 1), i = 1, ..., k be


independent random variables. Give the probability distribution of
the following variables:
(h) ZZ21

(e)
(p)

n(X )
SZ P
(k1) ni=1 (Xi X )2
P
(n1) 2 ki=1 (Zi Z )2

(h)
I

Z1 N(0, 1) and Z22 (1) (see Theorem 8.3.5)

Z1 and Z2 are independent (VERY IMPORTANT), so from


the definitions of the tdistribution follows that
Z
q 1 t(1)
Z22

(e) The variable whose distribution we have to determine is defined


in terms of the variables X and SZ , therefore we think of
distributions related to these variables and try to combine them.
I

Xi N(, 2 ), so

N(0, 1). (see slides Lecture 10 or

Theorem 8.3.1 in the textbook (+corollary))


I

(k1)S 2

Z
Zi N(0, 1),i = 1, ..., k so
2 (k 1) (see Lecture
12
11 or Theorem 8.3.6 (c) in the textbook)
Because Xi and Z are mutually independent, X and SZ2 are
also mutually independent.
According to the definitions of the t distribution, it follows
that

(X )/(/ n)
q
t(k 1).
2
(k 1)SZ /(k 1)

After a few calculations it follows that:



n(X )
t(k 1).
SZ

(p)
(n1)SX2
(n 1)
2
(k1)SZ2
I
(k 1)
12
I Because Xi , i = 1, ..., n
I

and Zi , i = 1, ..., k are mutually


independent, SX and SZ are also mutually independent.

From the definitions of the F distribution it follows that:


((n 1)SX2 / 2 )/(n 1)
F (n 1, k 1).
(k 1)SZ2 /(k 1)

This is equivalent to:


P
(k 1) ni=1 (Xi X )2
F (n 1, k 1).
P
(n 1) 2 ki=1 (Zi Z )2

Exercise 2
2. Let V be a 2 (k) distributed random variable. Give an
approximation for the probability P(V /k u) for k large.
I

We can view V as the sum of Y1 , Y2 , . . . , Yk , with Yi being


independent, 2 (1) distributed random variables.

V /k can be seen as the sample mean of the random sample


Y1 , . . . , Yk from a 2 (1) distribution.

The 2 (1) distribution has expected value = 1 and variance


2 = 2.

From the CLT ( variant sample mean) it follows that V /k can


be approximated by a normal distribution with expected value
1 and variance 2/k (standard deviation 2k . )

Exercise 3

3. Let X1 , . . . , Xn be a random sample from a normal distribution


with variance 2 . Give an approximation for the distribution of S 2
for n large.
I

From Exercise 2 it follows that the distribution of the S 2 / 2


can be approximated by a normal distribution with expected
value 1 and variance 2/(n 1).

From S 2 = 2 S 2 / 2 and S 2 / 2 being approximately normally


distributed, it follows that S 2 is approximately normally
distributed with expected value 2 and variance 2 4 /(n 1).

Exercise 4
The Rayleigh distribution with parameter 1 has the probability
density function given by
(
2
xe x /2 for x > 0
f (x) =
0
otherwise
(a) Show that if X is Rayleigh distributed with parameter 1, the
moment generating function of Y = X 2 is given by
MY (t) = (1 2t)1 for t < 21 .
(b) Use the moment generating function to derive E (Y ) and
Var (Y ).
P
(c) Define S = ni=1 Xi2 . Show that for large n, the distribution of
the variable
S
U = n( 1)
2n
can be approximated by a standard normal distribution.

Solution. (a)
tX 2

MY (t) = E [e ] =
e tx xe x
0
Z
1
e tw e w /2 dw
=
2 0
= (1 2t)1

2 /2

dx

for t < 12 .
(b) Calculate E (Y ) = MY0 (0) = 2 and E (Y 2 ) = MY00 (0) = 8. The
variance of Y is thus equal to 4.
(c) From the Central limit theorem (variant with the sample mean),
the standard normal distribution is the limit distribution of
It follows that, for large n,

n(

S
1)
2n

could also be approximated by the N(0, 1) distribution .

S
2
n
2
n

You might also like