You are on page 1of 74

Lecture Notes on MS237

Mathematical statistics
Lecture notes by Janet Godolphin

2010

ii

Contents
1

Introductory revision material


1.1 Basic probability . . . . . . . . . . . . . . . .
1.1.1 Terminology . . . . . . . . . . . . . .
1.1.2 Probability axioms . . . . . . . . . . .
1.1.3 Conditional probability . . . . . . . . .
1.1.4 Self-study exercises . . . . . . . . . . .
1.2 Random variables and probability distributions
1.2.1 Random variables . . . . . . . . . . . .
1.2.2 Expectation . . . . . . . . . . . . . . .
1.2.3 Self-study exercises . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

3
3
3
4
5
7
9
9
10
12

Random variables and distributions


2.1 Transformations . . . . . . . . . . . . .
2.1.1 Self-study exercises . . . . . . .
2.2 Some standard discrete distributions . .
2.2.1 Binomial distribution . . . . . .
2.2.2 Geometric distribution . . . . .
2.2.3 Poisson distribution . . . . . . .
2.2.4 Self-study exercises . . . . . . .
2.3 Some standard continuous distributions
2.3.1 Uniform distribution . . . . . .
2.3.2 Exponential distribution . . . .
2.3.3 Pareto distribution . . . . . . .
2.3.4 Self-study exercises . . . . . . .
2.4 The normal (Gaussian) distribution . . .
2.4.1 Normal distribution . . . . . . .
2.4.2 Properties . . . . . . . . . . . .
2.4.3 Self-study exercises . . . . . . .
2.5 Bivariate distributions . . . . . . . . . .
2.5.1 Definitions and notation . . . .
2.5.2 Marginal distributions . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

13
13
14
15
15
16
17
18
19
19
19
20
21
22
22
23
24
25
25
26

iii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

CONTENTS

iv

.
.
.
.
.
.
.
.

26
27
30
31
31
31
32
34

3 Further distribution theory


3.1 Multivariate distributions . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Mean and covariance matrix . . . . . . . . . . . . . . . .
3.1.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.4 Self-study exercises . . . . . . . . . . . . . . . . . . . . .
3.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 The univariate case . . . . . . . . . . . . . . . . . . . . .
3.2.2 The multivariate case . . . . . . . . . . . . . . . . . . . .
3.2.3 Self-study exercises . . . . . . . . . . . . . . . . . . . . .
3.3 Moments, generating functions and inequalities . . . . . . . . . .
3.3.1 Moment generating function . . . . . . . . . . . . . . . .
3.3.2 Cumulant generating function . . . . . . . . . . . . . . .
3.3.3 Some useful inequalities . . . . . . . . . . . . . . . . . .
3.3.4 Self-study exercises . . . . . . . . . . . . . . . . . . . . .
3.4 Some limit theorems . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Modes of convergence of random variables . . . . . . . .
3.4.2 Limit theorems for sums of independent random variables
3.4.3 Self-study exercises . . . . . . . . . . . . . . . . . . . . .
3.5 Further discrete distributions . . . . . . . . . . . . . . . . . . . .
3.5.1 Negative binomial distribution . . . . . . . . . . . . . . .
3.5.2 Hypergeometric distribution . . . . . . . . . . . . . . . .
3.5.3 Multinomial distribution . . . . . . . . . . . . . . . . . .
3.5.4 Self-study exercises . . . . . . . . . . . . . . . . . . . . .
3.6 Further continuous distributions . . . . . . . . . . . . . . . . . .
3.6.1 Gamma and beta functions . . . . . . . . . . . . . . . . .
3.6.2 Gamma distribution . . . . . . . . . . . . . . . . . . . .
3.6.3 Beta distribution . . . . . . . . . . . . . . . . . . . . . .
3.6.4 Self-study exercises . . . . . . . . . . . . . . . . . . . . .

35
35
35
36
36
38
38
38
39
41
41
41
42
42
45
45
45
47
48
49
49
50
50
51
52
52
52
53
55

2.6

2.5.3 Conditional distributions . . . .


2.5.4 Covariance and correlation . . .
2.5.5 Self-study exercises . . . . . . .
Generating functions . . . . . . . . . .
2.6.1 General . . . . . . . . . . . . .
2.6.2 Probability generating function .
2.6.3 Moment generating function . .
2.6.4 Self-study exercises . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

CONTENTS
4

Normal and associated distributions


4.1 The multivariate normal distribution . . . . .
4.1.1 Multivariate normal . . . . . . . . . .
4.1.2 Properties . . . . . . . . . . . . . . .
4.1.3 Marginal and conditional distributions
4.1.4 Self-study exercises . . . . . . . . . .
4.2 The chi-square, t and F distributions . . . . .
4.2.1 Chi-square distribution . . . . . . . .
4.2.2 Students t distribution . . . . . . . .
4.2.3 Variance ratio (F) distribution . . . .
4.3 Normal theory tests and confidence intervals .
4.3.1 One-sample t-test . . . . . . . . . . .
4.3.2 Two-samples . . . . . . . . . . . . .
4.3.3 k samples (One-way Anova) . . . . .
4.3.4 Normal linear regression . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

57
57
57
58
60
61
62
62
64
65
65
65
66
67
68

CONTENTS

MS237 Mathematical Statistics


Level 2

Spring Semester

Credits 15

Course Lecturer in 2010


D. Terhesiu

email: d.terhesiu@surrey.ac.uk

Class Test
The Class Test will be held on Thursady 11th March (week 5), starting at 12.00.
Class tests will include questions of the following types:
examples and proofs previously worked in lectures,
questions from the self-study exercises,
previously unseen questions in a similar style.
The Class Test will comprise 15% of the overall assessment for the course.

Coursework
Distribution: Coursework will be distributed at 14.00 on Friday 26th March.
Collection: Coursework will be collected on Thursday 29th April in Room LTB.
The Coursework will comprise 10% of the overall assessment for the course.

Chapter 1
Chapter 1 contains and reviews prerequisite material from MS132. Due to time
constraints, students are expected to work through at least part of this material
independently at the start of the course.

Objectives and learning outcomes


This module provides theoretical background for many of the topics introduced
in MS132 and for some of the topics which will appear in subsequent statistics
modules.
At the end of the module, you should
(1) be familiar with the main results of statistical distribution theory;
(2) be able to apply this knowledge to suitable problems in statistics

Examples, exercises, and problems


Blank spaces have been left in the notes at various positions. These are for additional material and worked examples presented in the lectures. Most chapters end

CONTENTS

with a set of self-study exercises, which you should attempt in your own study
time in parallel with the lectures.
In addition, six exercise sheets will be distributed during the course. You will
be given a week to complete each sheet, which will then be marked by the lecturer
and returned with model solutions. It should be stressed that completion of these
exercise sheets is not compulsory but those students who complete the sheets do
give themselves a considerable advantage!

Selected texts
Freund, J. E. Mathematical Statistics with Applications, Pearson (2004)
Hogg, R. V. and Tanis, E. A. Probability and Statistical Inference, Prentice Hall
(1997)
Lindgren, B. W. Statistical Theory, Macmillan (1976)
Mood, A. M., Graybill, F. G. and Boes, D. C. Introduction to the Theory of Statistics, McGraw-Hill (1974)
Wackerly, D.D., Mendenhall, W., and Scheaffer, R.L. Mathematical Statistics with
Applications, Duxbury (2002)

Useful series
These series will be useful during the course:

(1 x)

xk = 1 + x + x2 + x3 +

k=0

(1 x)2 =
x

k=0

k=0

(k + 1)xk = 1 + 2x + 3x2 + 4x3 +


xk
x2 x 3
=1+x+
+
+
k!
2!
3!

Chapter 1
Introductory revision material
This chapter contains and reviews prerequisite material from MS132. If necessary
you should review your notes for that module for additional details. Several examples, together with numerical answers, are included in this chapter. It is strongly
recommended that you work independently through these examples in order to
consolidate your understanding of the material.

1.1

Basic probability

Probability or chance can be measured on a scale which runs from zero, which
represents impossibility, to one, which represents certainty.

1.1.1

Terminology

A sample space, , is the set of all possible outcomes of an experiment. An


event E is a subset of .
Example 1 Experiment: roll a die twice. Possible events are E1 = {1st face is a 6}, E2 =
{sum of faces = 3}, E3 = {sum of faces is odd}, E4 = {1st face - 2nd face =
3}. Identify the sample space and the above events. Obtain their probabilities
when the die is fair.

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

4
Answer:
1
2
first 3
roll 4
5
6

(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)

(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)

p(E1 ) = 16 ; p(E2 ) =

1
;
18

second roll
(1,3) (1,4)
(2,3) (2,4)
(3,3) (3,4)
(4,3) (4,4)
(5,3) (5,4)
(6,3) (6,4)

(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)

p(E3 ) = 21 ; p(E4 ) =

(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
1
.
12

Combinations of events
Given events A and B, further events can be identified as follows.
The complement of any event A, written A or Ac , means that A does not
occur.
The union of any two events A and B, written A B, means that A or B
or both occur.
The intersection of A and B, written as A B, means that both A and B
occur.
Venn diagrams are useful in this context.

1.1.2 Probability axioms


Let F be the class of all events in . A probability (measure) P on (, F) is a
real-valued function satisfying the following three axioms:
1. P (E) 0 for every E F
2. P () = 1
3. Suppose the events E1 and E2 are mutually exclusive (that is, E1 E2 = ).
Then
P (E1 E2 ) = P (E1 ) + P (E2 )
Some consequences:
= 1 P (E) (so in particular P () = 0)
(i) P (E)
(ii) For any two events E1 and E2 we have the addition rule
P (E1 E2 ) = P (E1 ) + P (E2 ) P (E1 E2 )

1.1. BASIC PROBABILITY

Example 1: (continued)
Obtain P (E1 E2 ), P (E1 E2 ), P (E1 E3 ) and P (E1 E3 )
Answer: P (E1 E2 ) = P () = 0
1
P (E1 E2 ) = P (E1 ) + P (E2 ) = 61 + 18
= 29
1
3
P (E1 E3 ) = P (6, 1), (6, 3), (6, 5) = 36 = 12
P (E1 E3 ) = P (E1 ) + P (E3 ) P (E1 E3 ) =

1
6

+ 12

1
12

7
12

[Notes on axioms:
(1) In order to cope with infinite sequences of events, it is necessary to strengthen axiom
3 to

3. P (
i=1 ) =
i=1 P (Ei ) for any sequence (E1 , E2 , ) of mutually exclusive events.
(2) When is noncountably infinite, in order to make the theory rigorous it is usually
necessary to restrict the class of events F to which probabilities are assigned.]

1.1.3

Conditional probability

Supose P (E2 ) 6= 0. The conditional probability of the event E1 given E2 is


defined as
P (E1 E2 )
.
P (E1 |E2 ) =
P (E2 )
The conditional probability is undefined if P (E2 ) = 0. The conditional probability formula above yields the multiplication rule:
P (E1 E2 ) = P (E1 )P (E2 |E1 )
= P (E2 )P (E1 |E2 )
Independence
Events E1 and E2 are said to be independent if
P (E1 E2 ) = P (E1 )P (E2 ) .
Note that this implies that P (E1 |E2 ) = P (E1 ) and P (E2 |E1 ) = P (E2 ). Thus
knowledge of the occurrence of one of the events does not affect the likelihood of
occurrence of the other.
Events E1 , . . . , Ek are pairwise independent if P (Ei Ej ) = P (Ei )P (Ej ) for all

i 6= j. They are mutually independent if for all subsets P (j Ej ) = j P (Ej ).


Clearly, mutual independence pairwise independence, but the converse is false
(see question 4 of the self study exercises).

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

Example 1 (continued): Find P (E1 |E2 ) and P (E1 |E3 ). Are E1 , E2 independent?
1 E2 )
1 E3 )
Answer: P (E1 |E2 ) = P (E
= 0, P (E1 |E3 ) = P (E
= 1/12
= 61
P (E2 )
P (E3 )
1/2
P (E1 )P (E2 ) 6= 0 so P (E1 E2 ) 6= P (E1 )P (E2 ) and thus E1 and E2 are not
independent.

Law of total probability (partition law)


Suppose that B1 , . . . , Bk are mutually exclusive and exhaustive events (i.e. Bi
Bj = for all i 6= j and i Bi = ).
Let A be any event. Then
P (A) =

P (A|Bj )P (Bj )

j=1

Bayes Rule
Suppose that events B1 , . . . , Bk are mutually exclusive and exhaustive and let A
be any event. Then
P (Bj |A) =

P (A|Bj )P (Bj )
P (A|Bj )P (Bj )
=
P (A)
i P (A|Bi )P (Bi )

Example 2: (Cancer diagnosis) A screening programme for a certain type of


= 0.05, where D is the event
cancer has reliabilities P (A|D) = 0.98 , P (A|D)
disease is present and A is the event test gives a positive result. It is known
that 1 in 10, 000 of the population has the disease. Suppose that an individuals
test result is positive. What is the probability that that person has the disease?
Answer: We require P (D|A). First find P (A).
(D)
= 0.98 0.0001 + 0.05 0.9999 =
P (A) = P (A|D)P (D) + P (A|D)P
0.050093.
By Bayes rule; P (D|A) =

P (A|D)P (D)
P (A)

0.00010.98
0.050093

= 0.002.

The person is still very unlikely to have the disease even though the test is positive.
Example 3: (Bertrands Box Paradox) Three indistinguishable boxes contain
black and white beads as shown: [ww], [wb], [bb]. A box is chosen at random

1.1. BASIC PROBABILITY

and a bead chosen at random from the selected box. What is the probability of
that the [wb] box was chosen given that selected bead was white?
Answer: E chose the [wb] box, W selected bead is white. By the
partition law: P (W ) = 1 13 + 12 13 + 0 13 = 12 . Now using Bayes rule
P (E|W ) =

P (E)P (W |E)
P (W )

1
12
3
1
2

1
3

(i.e. even though a bead from the selected

box has been seen, the probability that the box is [wb] is still 13 ).

1.1.4

Self-study exercises

1. Consider families of three children, a typical outcome being bbg (boy-boygirl in birth order) with probability 81 . Find the probabilities of
(i) 2 boys and 1 girl (any order),
(ii) at least one boy,
(iii) consecutive children of different sexes.
Answer: (i) 38 ; (ii) 87 ; (iii) 14 .
2. Use pA = P (A), pB = P (B) and pAB = P (A B) to obtain expressions
for:

(a) P (A B),
(b) P (A B),
(c) P (A B),

(d) P (A B),
(B A)).

(e) P ((A B)
Describe each event in words. (Use a Venn diagram.)
Answer: (a) 1pAB ; (b) pB pAB ; (c) 1pA +pAB ; (d) 1pA pB +pAB ;
(e) pA + pB 2pAB .
3. (i) Express P (E1 E2 E3 ) in terms of the probabilities of E1 , E2 , E3 and
their intersections only. Illustrate with a sketch.
(ii)Three types of fault can occur which lead to the rejection of a certain
manufactured item. The probabilities of each of these faults (A, B and C)

CHAPTER 1. INTRODUCTORY REVISION MATERIAL


occurring are 0.1, 0.05 and 0.04 respectively. The three faults are known to
be interrelated; the probability that A & B both occur is 0.04, A & C 0.02,
and B & C 0.02. The probability that all three faults occur together is 0.01.
What percentage of items are rejected?
Answer: (i) P (E1 )+P (E2 )+P (E3 )P (E1 E2 )P (E1 E3 )P (E2
E3 ) + P (E1 E2 E3 )
(ii) P (A B C) = .01 + .05 + .04 (.04 + .02 + .02) + .01 = 0.12
1
4. Two fair dice rolled: 36 possible outcomes each with probability 36
. Let
E1 = {odd face 1st}, E2 = {odd face 2nd}, E3 = {one odd, one even}, so
P (E1 ) = 21 , P (E2 ) = 12 , P (E3 ) = 12 . Show that E1 , E2 , E3 are pairwise
independent, but not mutually independent.

Answer: P (E2 |E1 ) = 21 = P (E2 ), P (E3 |E1 ) = 12 = P (E3 ), P (E3 |E2 ) =


1
= P (E3 ), so E1 , E2 , E3 are pairwise independent. But P (E1 E2 E3 ) =
2
0 6= P (E1 )P (E2 )P (E3 ), so E1 , E2 , E3 are not mutually independent.
5. An engineering company uses a selling aptitude test to aid it in the selection of its sales force. Past experience has shown that only 65% of all
persons applying for a sales position achieved a classification of satisfactory in actual selling and of these 80% had passed the aptitude test. Only
30% of the unsatisfactory persons had passed the test.
What is the probability that a candidate would be a satisfactory salesperson given that they had passed the aptitude test?
Answer: A = pass aptitude test, S = satisfactory. P (S) = 0.65, P (A|S) =
= 0.3. Therefore P (A) = (0.65 0.8) + (0.35 0.3) = 0.625
0.8, P (A|S)
so P (S|A) = P (S)P (A|S)/P (A) = (0.65 0.8)/0.625 = 0.832.

1.2. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

1.2

Random variables and probability distributions

1.2.1 Random variables


A random variable X is a real-valued function on the sample space ; that is,
X : R. If P is a probability measure on (, F) then the induced probability
measure on R is called the probability distribution of X.
A discrete random variable X takes values x1 , x2 , . . . with probabilities p(x1 ), p(x2 ), . . .,
where p(x) = pr(X = x) = P ({ : X() = x}) is the probability mass function (pmf) of X. (E.g. X = place of horse in race, grade of egg.)
Example 4: (i) Toss a coin twice: outcomes HH, HT, TH, TT. The random variable X = number of heads, takes values 0, 1, 2.
1
(ii) Roll two dice: X = total score. Probabilities for X are P(X = 2) = 36
, P(X =
2
3
3) = 36 , P(X = 4) = 36 etc.
Example 5: X takes values 1, 2, 3, 4, 5 with probabilities k, 2k, 3k, 4k, 5k. Calculate k and P(2 X 4).

Answer: 1 = 5x=1 P (x) = k(1 + 2 + 3 + 4 + 5) = 15k so k =


2
3
4
P (2 X 4) = P (2) + P (3) + P (4) = 15
+ 15
+ 15
= 35 .

1
.
15

A continuous random variable X takes values over an interval. E.g. X = time


over racecourse, weight of egg. Its probability density function (pdf) f (x) is
defined by
b
pr(a < X < b) =
f (x)dx .
Note that f (x) 0 for all x, and

f (x)dx = 1.

Example 6: Let f (x) = k(1 x2 ) on (1, 1). Calculate k and pr(|X| > 1/2).

k = 34
k(1 x2 )dx = k[x 31 x3 ]11 = 4k
3
1
5
P (|X| > 1/2) = 1 P ( 21 X 12 ) = 1 2 1 k(1 x2 )dx = 1 11k
= 16
12
Answer: 1 =

f (x)dx =

A mixed discrete/continuous random variable is such that the probability is shared

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

10

between discrete and continuous components with


p(x) + f (x)dx = 1, e.g.
rainfall on given day, waiting time in queue, flow in pipe, contents of reservoir.
The distribution function F of the random variable X is defined as
F (x) = pr(X x) = P ({ : X() x}).
Thus F () = 0, F () = 1, F is monotone increasing, and pr(a < X b) =
F (b) F (a).
Discrete case: F (x) =

ux

Continuous case: F (x) =

p(u)

f (u)du and F 0 (x) = f (x).

1.2.2 Expectation
The expectation (or expected value or mean) of the random variable X is defined
as

xp(x)
X discrete

= E(X) =

xf (x)dx X continuous
The Variance of X is 2 = Var(X) = E{(X )2 }. Equivalently 2 = E(X 2 )
{E(X)}2 (exercise: prove).
is called the standard deviation.
Functions of X:
(i) E{h(X)} =


h(x)p(x)

X discrete

h(x)f (x)dx X continuous

(ii) E(aX + b) = aE(X) + b, Var(aX + b) = a2 Var(X).


Proof (for discrete X)
(i) h(X) takes values h(x1 ), h(x2 ), . . . with probabilities p(x1 ), p(x2 ), . . ., so,

by definition, E{h(X)} = h(x1 )p(x1 ) + h(x2 )p(x2 ) + = h(x)p(x).

(ii) E[aX + b] = (aX + b)P (x) = a xP (x) + b P (x) = aE[X] + b

1.2. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

11

Var[aX+b] = E[(aX + b) E[aX + b]2 ] = E[aX + b aE[X] b2 ] = E[a2 (X


E[X])2 ] = a2 Var[X]
Example 7: X = 0, 1, 2 with probabilities 1/4, 1/2, 1/4.
Find E(X), E(X 1), E(X 2 ) and Var(X).
Answer: E[X] = 0 41 + 1 21 + 2

1
4

=1

E[X 1] = E[X] 1, E[X 2 ] = 02 14 + 12 12 + 22

1
4

3
2

Var[X] = E[X 2 ] E[X]2 = 12 .


Example 8: f (x) = k(1+x)4 on (0, ). Find k and hence obtain E(X), E{(1+
X)1 }, E(X 2 ) and Var(X).
Answer: 1 = k

(1 + x)4 dx = k[ 31 (1 + x)3 ]
0 =

k
3

k=3

E[X] = 3 0 x(1 + x)4 dx = 3 1 (u 1)u4 du = 3[ 12 u2 + 31 u3 ]


1 =
1
1
1
3( 2 3 ) = 2
E[(1 + X)1 ] = 3

(1 + x)5 dx = 3[ 41 (1 + x)4 ]
0 =

3
4

E[X 2 ] = 3 x2 (1+x)4 dx = 3 1 (u1)2 u4 du = 3[u1 +u2 13 u3 ]


1 = 1
Var[X] = E[X 2 ] E[X]2 = 43 .

12

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

1.2.3 Self-study exercises


3 1
1. X takes values 0, 1, 2, 3 with probabilities 14 , 15 , 10
, 4 . Compute (as fractions) E(X), E(2X + 3), Var(X) and Var(2X + 3).

31
61
73
Answer: E(X) = 20
, E(2X + 3) = 2E(X) + 3 = 10
, E(X 2 ) = 20
, so
2
499
499
2
Var(X) = E(X ) E(X) = 400 , Var(2X + 3) = 4Var(X) = 100 .

2. The random variable X has density function f (x) = kx(1 x) on (0,1),


f (x) = 0 elsewhere. Calculate k and sketch f (x). Compute the mean and
variance of X, and pr (0.3 X 0.6).
Answer: k = 6, E(X) = 12 , Var(X) =

1
,
20

pr (0.3 X 0.6) = 0.432.

Chapter 2
Random variables and distributions
2.1

Transformations

Suppose that X has distribution function FX (x) and that the distribution function
FY (y) of Y = h(X) is required, where h is a strictly increasing function. Then
FY (y) = pr(Y y) = pr(h(X) y) = pr(X x) = FX (x)
where x x(y) = h1 (y). If X is continuous and h is differentiable, then it
follows that Y has density

fY (y) =

dFY (y)
dFX (x)
dx
=
= fX (x) .
dy
dy
dy

On the other hand, if h is strictly decreasing then


FY (y) = pr(Y y) = pr(h(X) y) = pr(X x) = 1 FX (x)
which yields fY (y) = fX (x)(dx/dy). Both formulae are covered by

dx
fY (y) = fX (x) .
dy
13

14

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

Example 9: Suppose that X has pdf fX (x) = 2e2x on (0, ). Obtain the pdf of
Y = log X.

Probability integral transform. Let X be a continuous random variable with distribution function F (x). Then Y = F (X) is uniformly distributed on (0, 1).
Proof. First note that 0 Y 1. Let 0 y 1; then
pr(Y y) = pr(F (X) y) = pr(X F 1 (y)) = F (F 1 (y)) = y,
so Y has pdf f (y) = 1 on (0, 1) (by differentiation), which is the density of the
uniform distribution on (0, 1).
This result has an important application to the simulation of random variables:

2.1.1 Self-study exercises


1. X takes values 1, 2, 3, 4 with probabilities

1 1 3 2
, , ,
10 5 10 5

and Y = (X 2)2 .

(i) Find E(Y ) and Var(Y ) using the formula for E{h(X)}.
(ii) Calculate the pmf of Y and use it to calculate E(Y ) and Var(Y ) directly.

2.2. SOME STANDARD DISCRETE DISTRIBUTIONS

15

2. The random variable X has pdf f (x) = 31 , x = 1, 2, 3, zero elsewhere. Find


the pdf of Y = 2X + 1.
3. The random variable X has pdf f (x) = ex on (0, ). Obtain the pdf of
Y = eX .
4. Let X have the pdf f (x) =
pdf of Y = X 3 .

( 1 )x
2

, x = 1, 2, 3, . . . , zero elsewhere. Find the

2.2 Some standard discrete distributions


2.2.1

Binomial distribution

Consider a sequence of independent trials in each of which there are only two
possible results, success, with probability , or failure, with probability 1
(independent Bernoulli trials).
Outcomes can be represented as binary sequences, with 1 for success and 0 for
failure, e.g. 110001 has probability (1 )(1 )(1 ), since the trials are
independent.
Let the random variable X be the number of successes in n trials, with n fixed.
r
nr
The probability of a particular sequence
( ) of r 1s and n r 0s is (1 ) ,
n
and the event {X = r} contains
such sequences. Hence
r
(
p(r) = pr(X = r) =

n
r

)
r (1 )nr , r = 0, 1, . . . , n .

This is the pmf of the binomial (n, ) distribution. The name comes from the
binomial theorem
)
n (

n
{ + (1 )} =
r (1 )nr ,
r
n

r=0

from which

p(r) = 1 follows.

The mean is = n:

16

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

The variance is 2 = n(1 ) (see exercise 3).


Example 10: A biased coin with pr(head) = 2/3 is tossed five times. Calculate
p(r).

2.2.2 Geometric distribution


Suppose now that, instead of a fixed number of Bernoulli trials, one continues until
a success is achieved, so that the number of trials, N , is now a random variable.
Then N takes the value n if and only if the previous (n 1) trials result in failures
and the nth trial results in a success. Thus

p(n) = pr(N = n) = (1 )n1 , n = 1, 2, . . . .

This is the pmf of the geometric () distribution: the probabilities are in geometric progression. Note that the sum of the probabilities over n = 1, 2, . . . is
1.
The mean is = 1/:

2.2. SOME STANDARD DISCRETE DISTRIBUTIONS

17

The variance is 2 = (1 )/ 2 (see exercise 4).


Eg. Toss a biased coin with pr(head) = 2/3. Then, on average, it takes three
tosses to get a tail.

2.2.3

Poisson distribution

The pmf of the Poisson () distribution is defined as


e r
p(r) =
, r = 0, 1, 2, . . . ,
r!
where > 0. Note that the sum of the probabilities over r = 0, 1, 2, . . . is 1
(exponential series).
The mean is = :

The variance is 2 = (see exercise 6).


The Poisson distribution arises in various contexts, one being the limit of a binomial(n, )
as n and 0 with n = fixed.
Example 11: (Random events in time.) Cars are recorded as they pass a checkpoint. The probability that a car is level with the checkpoint at any given instant
is very small, but the number n of such instants in a given time period is large.
Hence Xt , the number of cars passing the checkpoint during a time interval t minutes, can be modelled as Poisson with mean proportional to t. For example, if

18

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

the average rate is two cars per minute, find the probability of exactly 3 cars in 5
minutes.

2.2.4 Self-study exercises


1. In a large consignment of widgets 5% are defective. What is the probability
of getting one or two defectives in a four-pack?
2. X is binomial with mean 2 and variance 1. Compute pr(X 1).
3. Derive the variance of the binomial (n, ) distribution.
[Hint: find E{X(X 1)}.]
4. Derive the variance of the geometric () distribution.
[Hint: find E{X(X 1)}.]
5. A leaflet contains one thousand words and the probability that any one word
contains a misprint is 0.005. Use the Poisson distribution to estimate the
probability of 2 or fewer misprints.
6. Derive the variance of the Poisson () distribution.
[Hint: find E{X(X 1)}.]

2.3. SOME STANDARD CONTINUOUS DISTRIBUTIONS

2.3

19

Some standard continuous distributions

2.3.1 Uniform distribution


The pdf of the uniform (, ) distribution is
f (x) = ( )1 , < x < .
The mean is = ( + )/2:

The variance is 2 = ( )2 /12 (see exercise 1).


Application. Simulation of continuous random variables via the probability integral transform: see Section 2.1.

2.3.2

Exponential distribution

The pdf of the exponential () distribution is


f (x) = ex , x > 0,

where > 0. The distribution function is F (x) =


(verify).
The mean is = 1/:

x
0

eu du = 1 ex

20

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

The variance is 2 = 1/2 (see exercise 4).


Lack of memory property.
pr(X > a + b|X > a) = pr(X > b)
Proof:

For example, if the lifetime of a component is exponentially distributed, then the


fact that it has lasted for 100 hours does not affect its chances of failing during the
next 100 hours. That is, the component is not subject to ageing.
Application to random events in time.
Example: cars passing a checkpoint. The distribution of the waiting time, T , for
the first event can be obtained as follows:
pr(T > t) = pr(Nt = 0) = et ,
since Nt , the number of events occurring during the time interval (0, t), has a
Poisson distribution with mean t. Hence T has distribution function F (t) =
1 et , that of the exponential () distribution.

2.3.3 Pareto distribution


The Pareto (, ) distribution has pdf
f (x) =

, x > 0,
(1 + x )+1

where > 0 and > 0. The distribution function is F (x) = 1 (1 + x )


(verify).

2.3. SOME STANDARD CONTINUOUS DISTRIBUTIONS

21

The mean is = /( 1) for > 1:

The variance is 2 = 2 /{( 1)2 ( 2)} for > 2.

2.3.4

Self-study exercises

1. Obtain the variance of the uniform (, ) distribution.


2. The lifetime of a valve has an exponential distribution with mean 350 hours.
What proportion of valves will last 400 hours or longer? For how many
hours should the valves be guaranteed so that only 1% are returned under
guarantee?
3. A machine suffers random breakdowns at a rate of three per day. Given that
it is functioning at 10am what is the probability that
(i) no breakdown occurs before noon?
(ii) the first breakdown occurs between 12pm and 1pm?
4. Obtain the variance of the exponential () distribution.
5. The random variable X has the Pareto distribution with = 3 , = 1. Find
the probability that X exceeds + 2, where , are respectively the mean
and standard deviation of X.

22

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

2.4 The normal (Gaussian) distribution


2.4.1 Normal distribution
The normal distribution is the most important distribution in Statistics, for both
theoretical and practical reasons. Its pdf is
(x)2
1
f (x) = e 22 , < x < .
2

The parameters and 2 are the mean and variance respectively. The distribution
is denoted by N (, 2 ).
Mean:

The importance of the normal distribution follows from its use as an approximation in various statistical methods (consequence of Central Limit Theorem: see
Section 3.4.2), its convenience for theoretical manipulation, and its application to
describe observed data.
Standard normal distribution
The standard normal distribution is N (0, 1), for which the distribution function
has the special notation (x). Thus
x
u2
1
e 2 du .
(x) =
2

The function is tabulated widely (e.g. New Cambridge Statistical Tables). Useful values are (1.64) = 0.95, (1.96) = 0.975.
Example 12: Suppose that X is N (0, 1) and Y is N (2, 4). Use tables to calculate
pr(X < 1), pr(X < 1), pr(1.5 < X < 0.5), pr(Y < 1) and pr(Y 2 >
5Y 6).

2.4. THE NORMAL (GAUSSIAN) DISTRIBUTION

23

2.4.2 Properties
(i) If X is N (, 2 ) then aX + b is N (a + b, a2 2 ).
In particular, the standardized variate (X )/ is N (0, 1).
(ii) if X1 is N (1 , 12 ), X2 is N (2 , 22 ) and X1 and X2 are independent, then
X1 + X2 is
N (1 + 2 , 12 + 22 ).
[Hence, from property (i), the distribution of X1 X2 is N (1 2 , 12 + 22 ).]

(iii) If Xi , i = 1, . . . , n, are independent N (i , i2 ), then i Xi is N ( i i , i i2 ).


(iv) The moment generating function (see Section 2.6.3) of N (, 2 ) is M (z) =
1 2 2
E(ezX ) = ez+ 2 z .
(Properties (i) - (iii) are easily proved via mgfs - see Section 2.6.3.)
(v) Central moments of N (, 2 ). Let r = E{(X )r }, the rth central moment of X. Then

r = 0 for r odd, r = (/ 2)r r!/(r/2)! for r even.


Note that 2 = 2 , the variance of X.

Sampling distribution of the sample mean


Let X1 , . . . , Xn be independently and identically distributed (iid) as N (, 2 ).
= n1 Xi is N (, n1 2 ). This is the sampling
Then the distribution of X
distribution of the sample mean, a result of fundamental importance in Statistics.

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

24
Proof:

2.4.3 Self-study exercises


1. The distribution of lengths of rivets is normal with mean 2.5cm and sd
0.02cm. In a batch of 500 rivets how many would you expect on average to
have length
(i)less than 2.46cm,
(ii) between 2.46cm and 2.53cm,
(iii) greater than 2.53cm?
(iv) What length is exceeded by only 1 in 1000 rivets?
2. Suppose that X is N (0, 1) and Y is N (2, 4). Use tables to calculate pr(Y
X < 1) and pr(X + 21 Y > 1.5).
3. Two resistors in series have resistances X1 and X2 ohms, where X1 is
N (200, 4) and X2 is N (150, 3). What is the distribution of the combined
resistance X = X1 + X2 ? Find the probability that X exceeds 355.5 ohms.
4. The fuel consumption of a fleet of 150 lorries is approximately normally
distributed with mean 15 mpg and sd 1.5 mpg.
(i) Compute the expected number of lorries that average between 13 and 14
mpg.
(ii) What is the probability that the average of a random sample of four
lorries exceeds 16 mpg?

2.5. BIVARIATE DISTRIBUTIONS

2.5

25

Bivariate distributions

2.5.1 Definitions and notation


Suppose that X1 , X2 are two random variables defined on the same probability
space (, F, P ). Then P induces a joint distribution for X1 , X2 . The joint
distribution function is defined as
F (x1 , x2 ) = P ({ : X1 () x1 , X2 () x2 })
= pr(X1 x1 , X2 x2 ) .
In the discrete case the joint pmf is p(x1 , x2 ) = pr(X1 = x1 , X2 = x2 ). In the
1 ,x2 )
continuous case, the joint pdf is f (x1 , x2 ) = Fx(x1 x
.
2
Example 13: (discrete) Two biased coins are tossed. Score heads = 1 (with probability ), tails = 0 (with probability 1 ). Let X1 = sum of scores, X2 =
difference of scores (1st - 2nd). The tables below show
(i) the possible values of X1 , X2 and their probabilities,
(ii) the joint probability table for X1 , X2 .

(i)
Outcome
X1

00

01

10

11

X2
Prob
(ii)
X2
-1
X1

0
1
2

Example 14: (continuous) Suppose X1 and X2 have joint pdf f (x1 , x2 ) = k(1
x1 x22 ) on (0, 1)2 . Obtain the value of k.

26

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

2.5.2 Marginal distributions


These follow from the law of total probability.
Discrete case. Marginal probability mass functions

p1 (x1 ) = pr(X1 = x1 ) =
p(x1 , x2 ) and p2 (x2 ) = pr(X2 = x2 ) =
p(x1 , x2 )
x2

x1

Continuous case. Marginal probability density functions

f1 (x1 ) = f (x1 , x2 )dx2 and f2 (x2 ) = f (x1 , x2 )dx1

Marginal means and variances. 1 = E(X1 ) = x1 p1 (x1 ) (discrete) or x1 f1 (x1 )dx1


(continuous)
12 = var(X1 ) = E{(X1 1 )2 } = E(X12 ) 21
Likewise 2 and 22 .

2.5.3 Conditional distributions


These follow from the definition of conditional probability.
Discrete case. Conditional probability mass function of X1 given X2 is
p1 (x1 |X2 = x2 ) =

pr(X1 = x1 |X2 = x2 )
pr(X1 = x1 , X2 = x2 )
p(x1 , x2 )
=
=
.
pr(X2 = x2 )
p2 (x2 )

Similarly
p2 (x2 |X1 = x1 ) =

p(x1 , x2 )
.
p1 (x1 )

Continuous case. Conditional probability density function of X1 given X2 is


f1 (x1 |X2 = x2 ) =

f (x1 , x2 )
.
f2 (x2 )

f2 (x2 |X1 = x1 ) =

f (x1 , x2 )
.
f1 (x1 )

Similarly

Independence. X1 and X2 are said to be independent if F (x1 , x2 ) = F1 (x1 )F2 (x2 ).


Equivalently, p(x1 , x2 ) = p1 (x1 )p2 (x2 ) (discrete), or f (x1 , x2 ) = f1 (x1 )f2 (x2 )
(continuous).

2.5. BIVARIATE DISTRIBUTIONS

27

Example 15: Suppose that R and N have a joint distribution in which R|N is
binomial (N, ) and N is Poisson (). Show that R is Poisson ().

2.5.4

Covariance and correlation

The covariance between X1 and X2 is defined as


12 = Cov(X1 , X2 ) = E{(X1 1 )(X2 2 )} = E(X1 X2 ) 1 2 ,

where E(X1 X2 ) =
x1 x2 p(x1 , x2 ) (discrete) or x1 x2 f (x1 , x2 )dx1 dx2 (continuous).
The correlation between X1 and X2 is
= Corr(X1 , X2 ) =
Example 13: (continued)
Marginal distributions:
x1 = 0, 1, 2 with p1 (x1 ) =
x2 = 1, 0, 1 with p2 (x2 ) =
Marginal means:

1 = x1 p1 (x1 ) =

2 = x2 p2 (x2 ) =
Variances:

12
.
1 2

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

28

12 =
22 =

x21 p1 (x1 ) 21 =
x22 p2 (x2 ) 22 =

Conditional distributions: e.g.


p(x1 |X2 = 0) =

x1 = 0

x1 = 2

Independence: e.g. p(1, 0) = 0 but p1 (0)p2 (1) 6= 0, so X1 , X2 are not independent.

Covariance: 12 = x1 x2 p(x1 , x2 ) 1 2 =
Example 14: (continued)
Marginal distributions:
1
f1 (x1 ) = 0 k(1 x1 x22 )dx2 =
f2 (x2 ) =
Marginal means:
1 =
2 =
Variances:
12 =
22 =

1
0

1
0

1
0

1
0

1
0

k(1 x1 x22 )dx1 =

x1 f1 (x1 )dx1 =
x2 f2 (x2 )dx2 =
x21 f1 (x1 )dx1 21 =
x22 f2 (x2 )dx2 22 =

Conditional distributions: e.g.


f (x2 |X1 = 13 ) =
Independence:
f (x1 , x2 ) = k(1 x1 x22 ) , which does not factorise into f1 (x1 )f2 (x2 )
so X1 , X2 are not independent.
Covariance:
12 =

x1 x2 f (x1 , x2 )dx1 dx2 1 2

2.5. BIVARIATE DISTRIBUTIONS

29

Properties
(i) E(aX1 + bX2 ) = a1 + b2 , Var(aX1 + bX2 ) = a2 12 + 2ab12 + b2 22
Cov(aX1 + b, cX2 + d) = ac12 , Corr(aX1 + b, cX2 + d) = Corr(X1 , X2 )
(note: invariance under linear transformation)
Proof:

(ii) X1 , X2 independent Cov(X1 , X2 ) = 0 . The converse is false.


Proof:

(iii) 1 Corr(X1 , X2 ) +1, with equality if and only if X1 , X2 are linearly


dependent.
Proof:

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

30

(iv) E(Y ) = E{E(Y |X)} and Var(Y ) = E{Var(Y |X)} + Var{E(Y |X)}
Proof:

2.5.5 Self-study exercises


1. Roll a fair die twice. Let X1 be the number of times that face 1 shows, and
let X2 = [sum of faces/4], where [x] denotes the integer part of x.
(a) Construct the joint probability table.
(b) Calculate the two marginal pmfs p1 (x1 ) and p2 (x2 ) and the conditional
pmfs p1 (x1 |x2 = 1) and p2 (x2 |x1 = 1). Are X1 and X2 independent?
(c) Compute the means, 1 and 2 , variances, 12 and 22 , and covariance
12 . Are X1 and X2 uncorrelated?
2. X1 and X2 have joint density f (x1 , x2 ) = 4x1 x2 for 0 x1 1, 0
x2 1. Calculate the marginal and conditional densities of X1 and X2 ,
their means and variances, and their correlation.
3. Calculate, in terms of the means, variances and covariances of X1 , X2 and
X3 , E(2X1 + 3X2 ), Cov(2X1 , 3X2 ), Var(2X1 + 3X2 ) and Cov(2X1 +
3X2 , 4X2 + 5X3 ).

2.6. GENERATING FUNCTIONS

2.6

31

Generating functions

2.6.1 General
The generating function for a sequence (an : n 0) is A(z) = a0 + a1 z + a2 z 2 +

n
=
n=0 an z . Here z is a dummy variable. The definition is useful only if the
series converges. The idea is to replace the sequence (an ) by the function A(z),
which may be easier to analyse than the original sequence.
Examples:
(i) If an = 1 for n = 0, 1, 2, . . . , then A(z) = (1 z)1 for |z| < 1 (geometric
series).
(
)
m
(ii) If an =
for n = 0, 1, . . . , m, and an = 0 for n > m, then A(z) =
n
(1 + z)m (binomial series).

2.6.2

Probability generating function

Let (pn ) be the pmf of some discrete random variable X, so pn = pr(X = n) 0

and n pn = 1. Define the probability generating function (pgf) of X by

P (z) = E(z X ) =
pn z n .
n

Properties
(i) |P (z)| 1 for |z| 1 .
Proof:

(ii) = E(X) = P 0 (1) .


Proof:

32

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

(iii) 2 = Var(X) = P 00 (1) + P 0 (1) {P 0 (1)}2 .


Proof:

(iv) Let X and Y be independent random variables with pgfs PX and PY respectively. Then the pgf of X + Y is given by PX+Y (z) = PX (z)PY (z) .
Proof:

Example 16: (i) Find the pgf of the Poisson () distribution.


(ii) Let X1 , X2 be independent Poisson random variables with parameters 1 , 2
respectively. Obtain the distribution of X1 + X2 .

2.6.3

Moment generating function

The moment generating function (mgf) is defined as


M (z) = E(ezX ) .
The pgf tends to be used more for discrete distributions, and the mgf for continuous ones, although note that the two are related by M (z) = P (ez ).

2.6. GENERATING FUNCTIONS

33

Properties
(i) = E(X) = M 0 (0), 2 = Var(X) = M 00 (0) 2 .
Proof:

(ii) Let X and Y be independent random variables with mgfs MX (z) , MY (z)
respectively. Then the mgf of X + Y is given by MX+Y (z) = MX (z)MY (z) .
Proof:

Normal distribution. We prove properties (i) - (iv) of Section 2.4.2.

34

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

2.6.4 Self-study exercises


1. Show that the pgf of the binomial (n, ) distribution is {z + (1 )}n .
2. (Zero-truncated Poisson distribution) Find the pgf of the discrete distribution with pmf p(r) = e r /{r!(1 e )} for r = 1, 2, . . .. Deduce the
mean and variance.
3. The random variable X has density f (x) = k(1 + x)ex on (0, ) with
> 0. Find the value of k. Show that the moment generating function
M (z) = k{(z )2 (z )1 }. Use it to calculate the mean and
standard deviation of X.

Chapter 3
Further distribution theory
3.1

Multivariate distributions

Let X1 , . . . , Xp be p real-valued random variables on (, F) and consider the joint


distribution of X1 , . . . , Xp . Equivalently, consider the distribution of the random
vector

X1
X2

X=


Xp

3.1.1

Definitions

The joint distribution function


F (x) = pr(X x) = pr(X1 x1 , . . . , Xp xp )
The joint probability mass function (pmf)
p(x) = pr(X = x) = pr(X1 = x1 , . . . , Xp = xp )
(discrete case)
The joint probability density function (pdf) f (x) is such that

pr(X A) =
f (x)dx
A

(continuous case)
35

36

CHAPTER 3. FURTHER DISTRIBUTION THEORY

The marginal distributions are those of the individual components:


Fj (xj ) = pr(Xj xj ) = F (, . . . , xj , . . . , )
The conditional distributions are those of one component given another:
F (xj |xk ) = pr(Xj xj |Xk = xk )

The Xj s are independent if F (x) = j Fj (xj ). Equivalently, p(x) = j pj (xj )

(discrete case), or f (x) = j fj (xj ) (continuous case).


Means: j = E(Xj )
Variances: j2 = Var(Xj ) = E{(Xj j )2 } = E(Xj2 ) 2j
Covariances: jk = Cov(Xj , Xk ) = E{(Xj j )(Xk k )} = E(Xj Xk )j k

Correlations: jk = Corr(Xj , Xk ) = jjk


k

3.1.2 Mean and covariance matrix

1
2

The mean vector of X is = E(X) =




p
The covariance matrix (variance-covariance matrix, dispersion matrix) of X
is

11 12 1p
21 22 2p


p1 p2 pp
Since the (i, j)th element of (X )(X )T is (Xi i )(Xj j ), we see that
= E{(X )(X )T } = E(XX T ) T .

3.1.3 Properties
Let X have mean and covariance matrix . Let a , b be p-vectors and A be a
q p matrix. Then
(i) E(aT X) = aT
(ii) Var(aT X) = aT a . It follows that is positive semi-definite.
(iii) Cov(aT X, bT X) = aT b

3.1. MULTIVARIATE DISTRIBUTIONS


(iv) Cov(AX) = AAT

(v) E(X T AX) = trace(A) + T A

Proof:

37

38

CHAPTER 3. FURTHER DISTRIBUTION THEORY

3.1.4 Self-study exercises


1. Let X1 = I1 Y, X2 = I2 Y , where I1 , I2 and Y are independent and I1 and
I2 take values 1 each with probability 21 .
Show that E(Xj ) = 0, Var(Xj ) = E(Y 2 ), Cov(X1 , X2 ) = 0.
2. Verify that E(X1 + + Xp ) = 1 + + p and Var(X1 + + Xp ) =

ij ij , where i = E(Xi ) and ij = Cov(Xi , Xj ).


has mean and variance
Suppose now that the Xi s are iid. Verify that X
2 /p, where = E(Xi ) and 2 = Var(Xi ).

3.2 Transformations
3.2.1 The univariate case
Problem: to find the distribution of Y = h(X) from the known distribution of X.
The case where h is a one-to-one function was treated in Section 1.2.3. When h
is many-to-one we use the following generalised formulae:

Discrete case: pY (y) =


pX (x)


Continuous case: fY (y) =
fX (x) dx
dy
where in both cases the summations are over the set {x : h(x) = y}. That is, we
add up the contributions to the mass or density at y from all x values which map
to y.
Example 17: (discrete) Suppose pX (x) = px for x = 0, 1, 2, 3, 4, 5 and let Y =
(X 2)2 . Obtain the pmf of Y .

3.2. TRANSFORMATIONS

39

Example 18: (continuous) Suppose fX (x) = 2x on (0, 1) and let Y = (X 12 )2 .


Obtain the pdf of Y .

3.2.2

The multivariate case

Problem: to find the distribution of Y = h(X), where Y is s 1 and X is r 1,


from the known distribution of X.

Discrete case: pY (y) =


pX (x) with the summation over the set {x : h(x) =
y}.
Continuous case:
Case (i): h is a one-to-one transformation (so that s = r). Then the rule is

dx
fY (y) = fX (x(y))
dy +

( )

dx
xi
is
the
Jacobian
of
transformation,
with
.
where dx
= y

dy
dy
j
ij

Case (ii): s < r. First transform the s-vector Y to the r-vector Y 0 , where Yi0 =
Yi , i = 1, . . . , s , and Yi0 , i = s + 1, . . . , r , are chosen for convenience. Now
0
find the density of Y 0 as above and then integrate out Ys+1
, . . . , Yr0 to obtain the
marginal density of Y , as required.
Case (iii): s = r but h() is not monotonic. Then there will generally be more
than one value of x corresponding to a given y and we need to add the probability
contributions from all relevant xs.

40

CHAPTER 3. FURTHER DISTRIBUTION THEORY

Example 19: (linear transformation) Suppose that Y = AX, where A is an r r


nonsingular matrix. Then fY (y) = fX (A1 y)|A|1
+ .

Example 20: Suppose fX (x) = ex1 x2 on (0, )2 . Obtain the density of Y1 =


1
(X1 + X2 ).
2

Sums and products If X1 and X2 are independent random variables with densities
f1 and f2 , then

(i) X1 + X2 has density g(u) = f1 (u v)f2 (v)dv (convolution integral)

(ii) X1 X2 has density g(u) = f1 (u/v)f2 (v)|v|1 dv .


Proof:

3.3. MOMENTS, GENERATING FUNCTIONS AND INEQUALITIES

41

3.2.3 Self-study exercises


1. If fX (x) = 29 (x + 1) on (1, 2) and Y = X 2 , find fY (y).
2. If X has density f (x) calculate the density g(y) of Y = X 2 when
(i) f (x) = 2xex on (0, );
2

(ii) f (x) = 12 (1 + x) on |x| 1;


(iii) f (x) =

1
2

on 12 x 32 .

3. Let X1 and X2 be independent exponential (), and let Y1 = X1 + X2 and


Y2 = X1 /X2 . Show that Y1 and Y2 are independent and find their densities.

3.3
3.3.1

Moments, generating functions and inequalities


Moment generating function

The moment generating function of the random vector X is defined as


M (z) = E(ez

TX

).

Here z T X = j zj Xj .
Properties
Suppose X has mgf M (z). Then
T
(i) X + a has mgf ea z M (z) and aX has mgf M (az).

(ii) The mgf of kj=1 Xj is M (z, . . . , z).


(iii) If X1 , . . . , Xk are independent random variables with mgfs Mj (zj ), j=1,. . . ,k,

then the mgf of X = (X1 , . . . , Xk )T is M (z) = kj=1 Mj (zj ), the product of the
individual mgfs.
Proof:

42

CHAPTER 3. FURTHER DISTRIBUTION THEORY

3.3.2 Cumulant generating function


The cumulant generating function (cgf) of X is defined as K(z) = log M (z).
The cumulants of X are defined as the coefficients j in the power series expan
j
sion K(z) =
j=1 j z /j!.
The first two cumulants are
1 = = E(X), 2 = 2 = Var(X)

Similarly, the third and fourth cumulants are found to be 3 = E(X )3 , 4 =


3/2
E(X )4 3 4 . These are used to define the skewness, 1 = 3 /2 , and the
kurtosis, 2 = 4 /22 .
Cumulants of the sample mean. Suppose that X1 , . . . , Xn is a random sample from
= n1 n Xj
a distribution with cgf K(z) and cumulants j . Then the mgf of X
j=1
is {M (n1 z)}n , so the cgf is
log{M (n1 z)}n = nK(n1 z) = n

j (n1 z)j /j! .

j=1

is j /nj1 and it follows that X


has mean 1 = ,
Hence the jth cumulant of X
variance 2 /n = 2 /n, skewness (3 /n2 )/(2 /n)3/2 = 1 /n1/2 and kurtosis
(4 /n3 )/(2 /n)2 = 2 /n.

3.3.3 Some useful inequalities


Markovs inequality
Let X be any random variable with finite mean. Then for all a > 0
pr(|X| a)

E|X|
.
a

3.3. MOMENTS, GENERATING FUNCTIONS AND INEQUALITIES

43

Proof:

Cauchy-Schwartz inequality
Let X, Y be any two random variables with finite variances. Then
{E(XY )}2 E(X 2 )E(Y 2 ) .
Proof:

Jensens inequality
If u(x) is a convex function then
E{u(X)} u(E(X)) .
Note that u() is convex if the curve y = u(x) has a supporting line underneath at
each point, e.g. bowl-shaped.
Proof:

44

CHAPTER 3. FURTHER DISTRIBUTION THEORY

Examples
1. Chebyshevs inequality.
Let Y be any random variable with finite variance. Then for all a > 0
pr(|Y | a)

2
.
a2

2. Correlation inequality.
2 2
{Cov(X, Y )}2 X
Y (which implies that |Corr(X, Y )| 1).

3. |E(X)| E(|X|).
[It follows that |E{h(Y )}| E{|h(Y )|} for any function h().]

3.4. SOME LIMIT THEOREMS

45

4. E{(|X|s )r/s } {E(|X|s )}r/s .


[Thus {E(|X|r )}1/r {E(|X|s )}1/s and it follows that {E(|X|r )}1/r is an increasing function of r.]

5. A cumulant generating function is a convex function; i.e. K 00 (z) 0.


Proof. K(z) = log M (z), so K 0 = M 0 /M and K 00 = {M M 00 (M 0 )2 }/M 2 .
Hence M (z)2 K 00 (z) = E(ezX )E(X 2 ezX ) {E(XezX )}2 0, by the CauchySchwartz inequality.
(on writing XezX = (ezX/2 )(XezX/2 ))

3.3.4

Self-study exercises

1. Find the joint mgf M (z) of (X, Y ) when the pdf is f (x, y) =
y)e(x+y) on (0, )2 . Deduce the mgf of U = X + Y .

1 3
(x
2

2. Find all the cumulants of the N (, 2 ) distribution.


1

[You may assume the mgf ez+ 2

2 z2

.]

3. Suppose that X is such that E(X) = 3 and E(X 2 ) = 13. Use Chebyshevs
inequality to determine a lower bound for pr(2 < X < 8).
4. Show that {E(|X|)}1 E(|X|1 ).

3.4
3.4.1

Some limit theorems


Modes of convergence of random variables

Let X1 , X2 , . . . be a sequence of random variables. There are a number of alternative modes of convergence of (Xn ) to a limit random variable X. Suppose first
that X1 , X2 , . . . and X are all defined on the same sample space .

CHAPTER 3. FURTHER DISTRIBUTION THEORY

46

Convergence in probability
p

Xn X if pr(|Xn X| > ) 0 as n for all  > 0. Equivalently,


pr(|Xn X| ) 1. Often X = c, a constant.
Almost sure convergence
a.s.

Xn X if pr(Xn X) = 1. Again, often X = c. Also referred to as


convergence with probability one.
Almost sure convergence is a stronger property than convergence in probability.
i.e. a.s. p, but p 6 a.s.
Example 21: Consider independent Bernoulli trials with constant probability of
success 21 .
A typical sequence would be 01001001110101100010 . . ..
Here the first 20 trials resulted in 9 successes, giving an observed proportion of
20 = 0.45 successes.
X
Intuitively, as we increase n we would expect this proportion to get closer to 1.
However, this will not be the case for all sequences: for example, the sequence
11111111111111111111 has exactly the same probability as the earlier sequence,
20 = 1.
but X
It can be shown that the total probability of all infinite sequences for which the
n 1 ) = 1 so
proportion of successes does not converge to 21 is zero; i.e. pr(X
2
p 1
a.s. 1

Xn 2 (and hence also Xn 2 ).


Convergence in rth mean
r

Xn X if E|Xn X|r 0 as n .
[rth mean p, but rth mean 6 a.s. ]
Suppose now that the distribution functions are F1 , F2 , . . . and F . The random
variables need not be defined on the same sample spaces for the following definition.
Convergence in distribution
d

Xn X if Fn (x) F (x) as n at each continuity point of F . We say that


the asymptotic distribution of Xn is F .
[p d, but d 6 p]
A useful result.
Let (Xn ), (Yn ) be two sequences of random variables such that

3.4. SOME LIMIT THEOREMS

47

Xn X and Yn c, a constant. Then


d

Xn + Yn X + c , Xn Yn cX , Xn /Yn X/c (c 6= 0).

3.4.2

Limit theorems for sums of independent random variables

Let X1 , X2 , . . . be a sequence of iid random variables with (common) mean . Let

n = n1 Sn .
Sn = ni=1 Xi , X
p
n
Weak Law of Large Numbers (WLLN). If E|Xi | < then X
.

n) =
Proof (case 2 = Var(Xi ) < ). Use Chebyshevs inequality: since E(X
we have, for every  > 0,
n | > )
pr(|X

n)
Var(X
2
=
0
2
n2

as n .
Example 21: (continued). Here 2 = Var(Xi ) =
n , the proportion of successes.
applies to X

1
4

(Bernoulli r.v.) so the WLLN

n a.s.
Strong Law of Large Numbers (SLLN). If E|Xi | < then X
.
[The proof is more tricky and is omitted.]
Central Limit Theorem (CLT). If 2 = Var(Xi ) < then
Sn n d

N (0, 1) .
n
Equivalently,
n d
X
N (0, 1) .
/ n
Proof. Suppose that Xi has mgf M (z). Write Zn =
given by
(

zZn

MZn (z) = E(e

Sn
n
.
n

The mgf of Zn is

)}n
){ (
z n
z

.
) = exp
M

CHAPTER 3. FURTHER DISTRIBUTION THEORY

48
Therefore the cgf of Zn is

(
)

z n
z

KZn (z) = log MZn (z) =


+ nK

n
{ (
)
(
)2 }
(
)

1
z n
z
z

+
+O
=
+n

2 n
n
n
)
(

z n z n z 2
z2
1
=

+
+
+O

2
2
n
as n , which is the cgf of the N (0, 1) distribution, as required.
[Note on the proof of the CLT. In cases where the mgf does not
exist, a similar proof
izX
j
can be given in terms of the function (z) = E(e
) where i = 1. () is called the
characteristic function and always exists.]

Example 21: (continued). Normal approximation to the binomial


Suppose now that the success probability is , so that pr(Xi = 1) = . Then


= and 2 = (1 ), so the CLT gives n(X
n )/ {(1 )} is
approximately N (0, 1).
p
n
Furthermore, X
by the WLLN, and it follows from the useful result that

n )} is also approximately N (0, 1).


n(Xn )/ {Xn (1 X
Poisson limit of binomial. Suppose that Xn is binomial (n, ) where is such that
d
n as n . Then Xn Poisson().

Proof. Xn is expressible as ni=1 Yi , where the Yi are independent Bernoulli


random variables with pr(Yi = 1) = . Thus Xn has pgf
(1 + z)n = {1 n1 (1 z) + o(n1 )}n exp{(1 z)}
as n , which is the pgf of the Poisson () distribution.

3.4.3 Self-study exercises


1. In a large consignment of manufactured items 25% are defective. A random
sample of 50 is drawn. Use the binomial distribution to compute the exact
probability that the number of defectives in the sample is five or fewer. Use
the CLT to approximate this answer.
2. The random variable Y has the Poisson (50) distribution. Use the CLT to
find pr(Y = 50), pr(Y 45) and pr(Y > 60).

3.5. FURTHER DISCRETE DISTRIBUTIONS

49

3. A machine in continuous use contains a certain critical component which


has an exponential lifetime distribution with mean 100 hours. When a component fails it is immediately replaced by one from the stock, originally of
90 such components. Use the CLT to find the probability that the machine
can be kept running for a year without the stock running out.

3.5

Further discrete distributions

3.5.1 Negative binomial distribution


Let X be the number of Bernoulli trials until the kth success. Then
pr(X = x) = pr(k 1 successes in first x 1 trials, followed by success on kth trial)
)
(
x1
k1 (1 )xk
=
k1
(where the first factor comes from the binomial distribution). Hence define the
pmf of the negative binomial (k, ) distribution as
(
)
x1
p(x) =
k (1 )xk , x = k, k + 1, . . .
k1
The mean is k/:
The variance is k(1 )/ 2 (see exercise 1).
The pgf is {/(z 1 1 + )}k :

The name negative binomial comes from the binomial expansion

1 = k {1 (1 )}k =
p(x)
x=k

CHAPTER 3. FURTHER DISTRIBUTION THEORY

50

where p(x) are the negative binomial probabilities. (Exercise: verify)

3.5.2 Hypergeometric distribution


An urn contains n1 red beads and n2 black beads. Suppose that m beads are drawn
without replacement and let X be the number of red beads in the sample. Note
that, since X n1 and X m, the possible values of X are 0, 1, ..., min(n1 , m).
Then

no. of selections of x reds and m x blacks


total no. of selections of m beads
(
)(
)
n1
n2
x
mx
(
)
=
, x = 0, 1, ..., min(n1 , m) .
n1 + n2
m

p(x) = pr(X = x) =

This is the pmf of the hypergeometric (n1 , n2 , m) distribution.


The mean is n1 m/(n1 + n2 ) and the variance is n1 n2 m(n1 + n2 m)/{(n1 +
n2 )2 (n1 + n2 1)}.

3.5.3 Multinomial distribution


An urn contains nj beads of colour j (j = 1, . . . k). Suppose that m beads are
drawn with replacement and let Xj be the number of beads of colour j in the

sample. Then, for xj = 0, 1, . . . , m and kj=1 xj = m,


(
p(x) = pr(X = x) =

m
x

)
1x1 2x2 kxk ,

where j = nj / ki=1 ni . This is the pmf of the multinomial (k, m, ) distribution. Here
(
)
m
= no. of different orderings of x1 + + xk beads
x
(
)
m!
=
x1 ! x k !

3.5. FURTHER DISCRETE DISTRIBUTIONS

51

and the probability of any given order is 1x1 2x2 kxk . The name multinomial
m
comes from the multinomial
(
) expansion of (1 + +k ) in which the coefficient
m
of 1x1 2x2 kxk is
.
x
The means are mj :

The covariances are jk = m(jk j j k ).


X

The joint pgf is E( j zj j ) = ( kj=1 j zj )m :

3.5.4

Self-study exercises

1. Derive the variance of the negative binomial (k, ) distribution.


[You may assume the formula for the pgf.]
2. Suppose that X1 , . . . , Xk are independent geometric () random variables.

Using pgfs, show that kj=1 Xj is negative binomial (k, ).


[Hence, the waiting times Xj between successes in Bernoulli trials are independent geometric, and the overall waiting time to the kth success is negative binomial.]
3. If X is multinomial (k, m, ) show that Xj is binomial (m, j ), Xj + Xk is
binomial (m, j + k ), etc.
[Either by direct calculation or using the pgf.]

52

CHAPTER 3. FURTHER DISTRIBUTION THEORY

3.6 Further continuous distributions


3.6.1 Gamma and beta functions

Gamma function: (a) = 0 xa1 ex dx for a > 0


Integration by parts gives (a) = (a 1)(a 1).

In particular, for integer a, (a) = (a 1)! (since (1) = 1). Also, (1/2) = .
1
Beta function: B(a, b) = 0 xa1 (1 x)b1 dx for a > 0, b > 0
Relationship with Gamma function: B(a, b) = (a)(b)
(a+b)

3.6.2 Gamma distribution


The pdf of the gamma (, ) distribution is defined as
f (x) =

x1 ex
,x>0
()

where > 0 and > 0. When = 1, this is the exponential () distribution.


The mean is /:

The variance is / 2 (see exercise 2).


The mgf is (1 z/) :

Note that the mode is ( 1)/ if 1, but f (0) = if < 1.

3.6. FURTHER CONTINUOUS DISTRIBUTIONS

53

Example 22: The journey time of a bus on a nominal 12 -hour route has the gamma
(3, 6) distribution. What is the probability that the bus is over half an hour late?

Sums of exponential random variables Suppose that X1 , . . . , Xn are iid exponen


tial () random variables. Then ni=1 Xi is gamma (n, ).
Proof:

3.6.3

Beta distribution

The pdf of the beta (, ) distribution is


f (x) =
where > 0 and > 0.

x1 (1 x)1
, 0 < x < 1,
B(, )

CHAPTER 3. FURTHER DISTRIBUTION THEORY

54
The mean is /( + ):

The variance is /{( + )2 ( + + 1)}.


The mode is ( 1)/( + 2) if 1 and + > 2.

Property If X1 and X2 are independent, respectively gamma (1 , ) and gamma


(2 , ), then U1 = X1 +X2 and U2 = X1 /(X1 +X2 ) are independent, respectively
gamma (1 + 2 , ) and beta (1 , 2 ).
Proof The inverse transformation is
(

X1
X2

(
=

U1 U2
U1 (1 U2 )

with Jacobian

dx u2
u1
=
du 1 u2 u1



= u1 .

3.6. FURTHER CONTINUOUS DISTRIBUTIONS

55

Therefore
[

] [ 2
]
1 (u1 u2 )1 1 eu1 u2
{u1 (1 u2 )}2 1 eu1 (1u2 )
fU (u) =
| u1 |
(1 )
(2 )
{ 1 +2 1 +2 1 u1 } {
}

u1
e
(1 + 2 ) 1 1
2 1
=
u
(1 u2 )
(1 + 2 )
(1 )(2 ) 2
on (0, ) (0, 1) and the result follows.

3.6.4

Self-study exercises

1. Suppose X has the gamma (2, 4) distribution. Find the probability that X
exceeds +2, where , are respectively the mean and standard deviation
of X.
2. Derive the variance of the gamma (, ) distribution. [Either by direct calculation or using the mgf.]
3. Find the distribution of log X when X is uniform (0,1). Hence show
that if X1 , . . . , Xk are iid uniform (0,1) then log(X1 X2 Xk ) is gamma
(k, 1).
4. If X is gamma (, ) show that log X has mgf z (z + )/().
5. Suppose X is uniform (0, 1) and > 0 Show that Y = X 1/ is beta (, 1).

56

CHAPTER 3. FURTHER DISTRIBUTION THEORY

Chapter 4
Normal and associated distributions
4.1

The multivariate normal distribution

4.1.1 Multivariate normal


The multivariate normal distribution, denoted Np (, ), has pdf
1
f (x) = |2|1/2 exp{ (x )T 1 (x )}
2
on (, )p .
The mean is (p 1) and the covariance matrix is (p p) (see property (v)).
Bivariate case, p = 2. Here
(
X=

X1
X2

)
,=

1
2

(
, =

11 12
21 22

(
=

12
1 2
1 2
22

|2| = (2)2 12 22 (1 2 )
(

2 1

= (1 )

/(1 2 )
1/12
/(1 2 )
1/22

)
, giving

{(
[
)2
(
)(
) (
)2 }]
x1 1
x1 1
x2 2
x2 2
1
2
+
exp 2(12 )
1
1
2
2

f (x1 , x2 ) =
21 2 1 2
57

58

CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS

4.1.2 Properties
i) Suppose X is Np (, ) and let Y = T 1 (X ), where = T T T . Then
Yi , i = 1, . . . , p, are independent N (0, 1).

T z+ 1 z T z
2

(ii) The joint mgf of Np (, ) is e

. (C.f. property (iv), Section 2.4.2.)

4.1. THE MULTIVARIATE NORMAL DISTRIBUTION

59

(iii) If X is Np (, ) then AX + b (where A is q p and b is q 1) is Nq (A +


b, AAT ).
(C.f. property (i), Section 2.4.2.)

(iv) If X i , i = 1, . . . , n, are independent Np (i , i ), then


(C.f. property (iii), Section 2.4.2.)

X i is Np ( i i , i i ).

60

CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS

(v) Moments of Np (, ). Obtain by differentiation of the mgf. In particular, differentiating w.r.t. zj and zk gives E(Xj ) = j , Var(Xj ) = jj and
Cov(Xj , Xk ) = jk .
Note that if X1 , . . . , Xp are all uncorrelated (i.e. jk = 0 for j 6= k) then
X1 , . . . , Xp are independent N (j , j2 ).

(vi) If X is Np (, ) then aT X and bT X are independent if and only if aT b = 0.


Similarly for AT X and B T X.

4.1.3 Marginal and conditional distributions


Suppose that X is Np (, ). Partition X T as (X T1 , X T2 ) where X 1 is p1 1, X 2 is
)
(
11 12
T
T
T
.
p2 1 and p1 + p2 = p. Correspondingly = (1 , 2 ) and =
21 22
Note that T21 = 12 and X 1 and X 2 are independent if and only if 12 = 0 (since

4.1. THE MULTIVARIATE NORMAL DISTRIBUTION

61

the joint density factorises if and only if 12 = 0).


The marginal distribution of X 1 is Np1 (1 , 11 ).
Proof:

The conditional distribution of X 2 |X 1 is Np2 (2.1 , 22.1 ), where


2.1 = 2 + 21 1
11 (X 1 1 )
22.1 = 22 21 1
11 12
(proof omitted). Note that 2.1 is linear in X 1 .

4.1.4

Self-study exercises
((

1. Write down the joint density of the N2

0
1

))
) (
1 1
,
distribution
1 4

in component form.
2. Suppose that X i , i = 1, . . . , n, are independent Np (, ). Show that the
= n1 X i is Np (, n1 ).
sample mean vector, X
i
3. For the distribution in exercise 1, obtain the marginal distributions of X1
and X2 and the conditional distributions of X2 given X1 = x1 and X1 given
X2 = x2 .

62

CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS

4.2 The chi-square, t and F distributions


4.2.1 Chi-square distribution
The pdf of the chi-square distribution with degrees of freedom ( > 0) is
u 2 1 e 2 u
1

f (u) =

2 2 ( 12 )

, u > 0.

Denoted by 2 . Note that the 2 distribution is identical to the gamma ( 2 , 12 )


distribution (c.f. Section 3.6). It follows that the mean is , the variance is 2 and
the mgf is (1 2z)/2 .
Properties
(i) Let be a positive integer and suppose that X1 , . . . , X are iid N (0, 1). Then

2
2
2
2
i=1 Xi is . In particular, if X is N (0, 1) then X is 1 .

(ii) If Ui , i = 1, . . . , n, are independent 2i then

n
i=1

Ui is 2 with =

n
i=1

i .

4.2. THE CHI-SQUARE, T AND F DISTRIBUTIONS

63

(iii) If X is Np (, ) then (X )T 1 (X ) is 2p .

Theorem (Joint distribution of the sample mean and variance)


= n1 Xi be the sample
Suppose that X1 , . . . , Xn are iid N (, 2 ). Let X
i

2
1
2

mean and S = (n 1)
i (Xi X) the sample variance.
is N (, 2 /n), (n 1)S 2 / 2 is 2n1 and X
and S 2 are independent.
Then X
Proof:

64

CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS

4.2.2 Students t distribution


The pdf of the Students t distribution with degrees of freedom ( > 0) is
1
f (t) =
, < t < .
1
2 1
1
B( 2 , 2 ) 2 (1 + t ) 2 (+1)
Denoted by t . The mean is 0 (provided > 1):

The variance is /( 2) (provided > 2).


Theorem If X is N (0, 1), U is 2 and X and U are independent, then
X
t .
T
U/
Proof:

4.3. NORMAL THEORY TESTS AND CONFIDENCE INTERVALS

65

4.2.3 Variance ratio (F) distribution


The pdf of the variance ratio, or F distribution with 1 , 2 degrees of freedom
(1 , 2 > 0) is
( ) 12 1
f (x) =

1
2

B( 21 , 22 )(1 +

x 2 1 1
1

1 x 12 (1 +2 )
)
2

, x > 0.

Denoted by F1 ,2 . The mean is 2 /(2 2) (provided 2 > 2) and the variance is


222 (1 + 2 2)/{1 (2 2)2 (2 4)} (provided 2 > 4).
Theorem. If U1 and U2 are independent, respectively 21 and 22 , then
F

U1 /1
F1 ,2 .
U2 /2

Proof:

It follows from the above result that (i) F1 ,2 1/F2 ,1 and (ii) F1, t2 .
(Exercise: check)

4.3
4.3.1

Normal theory tests and confidence intervals


One-sample t-test

Suppose that Y1 , . . . , Yn are iid N (, 2 ). Then, from Section 3.2, Y = n1 i Yi

(the sample mean) and S 2 = (n 1)1 i (Yi Y )2 (the sample variance) are
independent, respectively N (, 2 /n) and 2 2n1 /(n 1). Hence
Z=

(Y )

/ n

66

CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS

is N (0, 1),

(n 1)S 2
U=
2
is 2n1 and Z, U are independent.
It follows that
Z
Y
=
T =
S/ n
U/(n 1)
is tn1 .
Applications:
Inference about : one-sample z-test ( known) and t-test ( unknown).
Inference about 2 : 2 test.

4.3.2 Two-samples
Two independent samples. Suppose that Y11 , . . . , Y1n1 are iid N (1 , 12 ) and Y21 , . . . , Y2n2
are iid N (2 , 22 ).
Summary statistics: (n1 , Y1 , S12 ) and (n2 , Y2 , S22 )
(n 1)S12 +(n2 1)S22
Pooled sample variance: S 2 = 1 n1 +n
2 2
From Section 4.2, if 12 = 22 = 2 , say, then Y1 and (n1 1)S12 are indepen2
2 2
2

dent N (1 , n1
1 ) , n1 1 respectively, and Y2 and (n1 1)S2 are independent
2
2 2
N (2 , n1
2 ) , n2 1 respectively.
Furthermore, (Y1 , (n1 1)S 2 ) and (Y2 , (n2 1)S 2 ) are independent.
1

1
2
2
2 2
Therefore (Y1 Y2 ) is N (1 2 ), (n1
1 +n2 ) ), (n1 +n2 2)S is n1 +n2 2
and (Y1 Y2 ) and (n1 + n2 2)S 2 are independent.

Therefore
T

(Y1 Y2 ) (1 2 )

S ( n11 + n12 )

is tn1 +n2 2 .
Also, since S12 , S22 are independent,
F

S12
Fn1 1,n2 1 .
S22

Applications:
Inference about 1 2 : two-sample z-test ( known) and t-test ( unknown).
Inference about 12 /22 : F (variance ratio) test.

4.3. NORMAL THEORY TESTS AND CONFIDENCE INTERVALS

67

Matched pairs Observations (Yi1 , Yi2 : i = 1, . . . , n) where the differences Di =


Yi1 Yi2 are independent N (, 2 ). Then
T =

S/ n

is tn1 , where S 2 is the sample variance of the Di s.


Application:
Inference about from paired observations: paired-sample t-test.

4.3.3

k samples (One-way Anova)

Suppose we have k groups, with group means 1 , . . . , k .


Denote the independent observations by (Yi1 , . . . , Yini : i = 1, . . . , k) with Yij
N (i , 2 ), j = 1, . . . , ni , i = 1, . . . k.
Summary statistics: ((ni , Si2 ) : i = 1, . . . , k).

Total sum of squares: ssT = ij (Yij Y )2 , where Y = n1 ij Yij (the overall

mean) and n = i ni .
Then ssT = ssW + ssB where

ssW = ij (Yij Yi )2 = i (ni 1)Si2 (the within-samples ss)

ssB = i ni (Yi Y )2 (the between-samples ss)

From Sections 4.1 and 4.2, (ni 1)Si2 / 2 is 2ni 1 independent of Yi .


Hence ssW/ 2 is 2nk independent of ssB.
Also, by a similar argument to that of the Theorem in Section 3.2 (proof omitted),
ssB is 2 2k1 when i = , say, for all i.
Hence we obtain the F -test for equality of the group means i :
F =

ssB/(k 1)
ssW/(n k)

is Fk1,nk under the null hypothesis 1 = = k .

68

CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS

4.3.4 Normal linear regression


Observations Y1 , . . . , Yn are independently N ( + xi , 2 ), where x1 , . . . , xn are
given constants.
is found by minimizing the sum of squares
The least-squares estimator (
, )
n
Q(, ) = i=1 (Yi xi )2 .
By partial differentiation with respect to and , we obtain
= Txy /Txx ,

= Y x

where Txx = i (xi x)2 and Txy = i (xi x)(Yi Y )


Note that, since both
and are linear combinations of Y = (Y1 , . . . , Yn )T , they
are jointly normally distributed.
T is
Using properties of expectation and covariance matrices, we find that (
, )
bivariate normal with mean (, ) and covariance matrix
( 1 2
)
2
n
x

x
i
i
V =

x
1
Txx
Sums of squares

Total ss: Tyy = i (Yi Y )2 ;

Residual ss: Q(
, );

Regression ss: Tyy Q(


, )
Results:
2
(a) Residual ss = Tyy Txx 2 , Regression ss = Txx 2 = Txy
/Txx
2
2
(b) E(Total ss) = Txx + (n 1) , E(Regression ss) = Txx 2 + 2 ,
E(Residual ss) = (n 2) 2
(c) By a similar argument to that of the Theorem in Section 3.2 (proof omitted),
Residual ss is 2 2n2 and, if = 0, Regression ss is 2 21 , independently of
Residual ss.
Application:
The residual mean square, S 2 = Residual ss/(n 2), is an unbiased estimator of

2 , is an unbiased estimator of with estimated standard error S/ T xx , and


is

2 1/2
an unbiased estimator of with estimated standard error (S/ T xx )( i xi /n) .
If = 0 then
0
T =
S/ T xx

4.3. NORMAL THEORY TESTS AND CONFIDENCE INTERVALS

69

is tn2 , giving rise to tests and confidence intervals about .


If = 0 then
Regression ss
F =
S2
is F1,n2 , hence a test for = 0.

(Alternatively, and equivalently, use T = S/T as tn2 .)


xx

The coefficient of determination is


2
Txy
Regression ss
r =
=
Total ss
Txx Tyy
2

(square of the sample correlation coefficient). The coefficient of determination


gives the proportion of Y -variation attributable to regression on x.

You might also like