Lecture Note On Statistics

Lecture Notes on MS237
Mathematical statistics
Lecture notes by Janet Godolphin
2010
ii
Contents
1
Introductory revision material

1.1 Basic probability . . . . . . . . . . . . . . . .
1.1.1 Terminology . . . . . . . . . . . . . .
1.1.2 Probability axioms . . . . . . . . . . .
1.1.3 Conditional probability . . . . . . . . .
1.1.4 Self-study exercises . . . . . . . . . . .
1.2 Random variables and probability distributions
1.2.1 Random variables . . . . . . . . . . . .
1.2.2 Expectation . . . . . . . . . . . . . . .
1.2.3 Self-study exercises . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
4
5
7
9
9
10
12
Random variables and distributions

2.1 Transformations . . . . . . . . . . . . .
2.1.1 Self-study exercises . . . . . . .
2.2 Some standard discrete distributions . .
2.2.1 Binomial distribution . . . . . .
2.2.2 Geometric distribution . . . . .
2.2.3 Poisson distribution . . . . . . .
2.3 Some standard continuous distributions
2.3.1 Uniform distribution . . . . . .
2.3.2 Exponential distribution . . . .
2.3.3 Pareto distribution . . . . . . .
2.4 The normal (Gaussian) distribution . . .
2.4.1 Normal distribution . . . . . . .
2.4.2 Properties . . . . . . . . . . . .
2.5 Bivariate distributions . . . . . . . . . .
2.5.1 Definitions and notation . . . .
2.5.2 Marginal distributions . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
15
15
16
17
18
19
19
19
20
21
22
22
23
24
25
25
26
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
iv
.
.
.
.
.
.
.
.
26
27
30
31
31
31
32
34
3 Further distribution theory

3.1 Multivariate distributions . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Mean and covariance matrix . . . . . . . . . . . . . . . .
3.1.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.4 Self-study exercises . . . . . . . . . . . . . . . . . . . . .
3.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 The univariate case . . . . . . . . . . . . . . . . . . . . .
3.2.2 The multivariate case . . . . . . . . . . . . . . . . . . . .
3.3 Moments, generating functions and inequalities . . . . . . . . . .
3.3.1 Moment generating function . . . . . . . . . . . . . . . .
3.3.2 Cumulant generating function . . . . . . . . . . . . . . .
3.3.3 Some useful inequalities . . . . . . . . . . . . . . . . . .
3.4 Some limit theorems . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Modes of convergence of random variables . . . . . . . .
3.4.2 Limit theorems for sums of independent random variables
3.5 Further discrete distributions . . . . . . . . . . . . . . . . . . . .
3.5.1 Negative binomial distribution . . . . . . . . . . . . . . .
3.5.2 Hypergeometric distribution . . . . . . . . . . . . . . . .
3.5.3 Multinomial distribution . . . . . . . . . . . . . . . . . .
3.6 Further continuous distributions . . . . . . . . . . . . . . . . . .
3.6.1 Gamma and beta functions . . . . . . . . . . . . . . . . .
3.6.2 Gamma distribution . . . . . . . . . . . . . . . . . . . .
3.6.3 Beta distribution . . . . . . . . . . . . . . . . . . . . . .
35
35
35
36
36
38
38
38
39
41
41
41
42
42
45
45
45
47
48
49
49
50
50
51
52
52
52
53
55
2.6
2.5.3 Conditional distributions . . . .

2.5.4 Covariance and correlation . . .
Generating functions . . . . . . . . . .
2.6.1 General . . . . . . . . . . . . .
2.6.2 Probability generating function .
2.6.3 Moment generating function . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4
Normal and associated distributions

4.1 The multivariate normal distribution . . . . .
4.1.1 Multivariate normal . . . . . . . . . .
4.1.2 Properties . . . . . . . . . . . . . . .
4.1.3 Marginal and conditional distributions
4.1.4 Self-study exercises . . . . . . . . . .
4.2 The chi-square, t and F distributions . . . . .
4.2.1 Chi-square distribution . . . . . . . .
4.2.2 Students t distribution . . . . . . . .
4.2.3 Variance ratio (F) distribution . . . .
4.3 Normal theory tests and confidence intervals .
4.3.1 One-sample t-test . . . . . . . . . . .
4.3.2 Two-samples . . . . . . . . . . . . .
4.3.3 k samples (One-way Anova) . . . . .
4.3.4 Normal linear regression . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
57
58
60
61
62
62
64
65
65
65
66
67
68
CONTENTS
MS237 Mathematical Statistics

Level 2
Spring Semester
Credits 15
Course Lecturer in 2010

D. Terhesiu
email: d.terhesiu@surrey.ac.uk
Class Test
The Class Test will be held on Thursady 11th March (week 5), starting at 12.00.
Class tests will include questions of the following types:
examples and proofs previously worked in lectures,
questions from the self-study exercises,
previously unseen questions in a similar style.
The Class Test will comprise 15% of the overall assessment for the course.
Coursework
Distribution: Coursework will be distributed at 14.00 on Friday 26th March.
Collection: Coursework will be collected on Thursday 29th April in Room LTB.
The Coursework will comprise 10% of the overall assessment for the course.
Chapter 1
Chapter 1 contains and reviews prerequisite material from MS132. Due to time
constraints, students are expected to work through at least part of this material
independently at the start of the course.
Objectives and learning outcomes

This module provides theoretical background for many of the topics introduced
in MS132 and for some of the topics which will appear in subsequent statistics
modules.
At the end of the module, you should
(1) be familiar with the main results of statistical distribution theory;
(2) be able to apply this knowledge to suitable problems in statistics
Examples, exercises, and problems

Blank spaces have been left in the notes at various positions. These are for additional material and worked examples presented in the lectures. Most chapters end
CONTENTS
with a set of self-study exercises, which you should attempt in your own study
time in parallel with the lectures.
In addition, six exercise sheets will be distributed during the course. You will
be given a week to complete each sheet, which will then be marked by the lecturer
and returned with model solutions. It should be stressed that completion of these
exercise sheets is not compulsory but those students who complete the sheets do
give themselves a considerable advantage!
Selected texts
Freund, J. E. Mathematical Statistics with Applications, Pearson (2004)
Hogg, R. V. and Tanis, E. A. Probability and Statistical Inference, Prentice Hall
(1997)
Lindgren, B. W. Statistical Theory, Macmillan (1976)
Mood, A. M., Graybill, F. G. and Boes, D. C. Introduction to the Theory of Statistics, McGraw-Hill (1974)
Wackerly, D.D., Mendenhall, W., and Scheaffer, R.L. Mathematical Statistics with
Applications, Duxbury (2002)
Useful series
These series will be useful during the course:
(1 x)
xk = 1 + x + x2 + x3 +
k=0
(1 x)2 =
x
k=0
k=0
(k + 1)xk = 1 + 2x + 3x2 + 4x3 +

xk
x2 x 3
=1+x+
+
+
k!
2!
3!
Chapter 1
Introductory revision material
This chapter contains and reviews prerequisite material from MS132. If necessary
you should review your notes for that module for additional details. Several examples, together with numerical answers, are included in this chapter. It is strongly
recommended that you work independently through these examples in order to
consolidate your understanding of the material.
1.1
Basic probability
Probability or chance can be measured on a scale which runs from zero, which
represents impossibility, to one, which represents certainty.
1.1.1
Terminology
A sample space, , is the set of all possible outcomes of an experiment. An

event E is a subset of .
Example 1 Experiment: roll a die twice. Possible events are E1 = {1st face is a 6}, E2 =
{sum of faces = 3}, E3 = {sum of faces is odd}, E4 = {1st face - 2nd face =
3}. Identify the sample space and the above events. Obtain their probabilities
when the die is fair.
CHAPTER 1. INTRODUCTORY REVISION MATERIAL
4
Answer:
1
2
first 3
roll 4
5
6
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)
p(E1 ) = 16 ; p(E2 ) =
1
;
18
second roll
(1,3) (1,4)
(2,3) (2,4)
(3,3) (3,4)
(4,3) (4,4)
(5,3) (5,4)
(6,3) (6,4)
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
p(E3 ) = 21 ; p(E4 ) =
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
1
.
12
Combinations of events
Given events A and B, further events can be identified as follows.
The complement of any event A, written A or Ac , means that A does not
occur.
The union of any two events A and B, written A B, means that A or B
or both occur.
The intersection of A and B, written as A B, means that both A and B
occur.
Venn diagrams are useful in this context.
1.1.2 Probability axioms

Let F be the class of all events in . A probability (measure) P on (, F) is a
real-valued function satisfying the following three axioms:
1. P (E) 0 for every E F
2. P () = 1
3. Suppose the events E1 and E2 are mutually exclusive (that is, E1 E2 = ).
Then
P (E1 E2 ) = P (E1 ) + P (E2 )
Some consequences:
= 1 P (E) (so in particular P () = 0)
(i) P (E)
(ii) For any two events E1 and E2 we have the addition rule
P (E1 E2 ) = P (E1 ) + P (E2 ) P (E1 E2 )
1.1. BASIC PROBABILITY
Example 1: (continued)
Obtain P (E1 E2 ), P (E1 E2 ), P (E1 E3 ) and P (E1 E3 )
Answer: P (E1 E2 ) = P () = 0
1
P (E1 E2 ) = P (E1 ) + P (E2 ) = 61 + 18
= 29
1
3
P (E1 E3 ) = P (6, 1), (6, 3), (6, 5) = 36 = 12
P (E1 E3 ) = P (E1 ) + P (E3 ) P (E1 E3 ) =
1
6
+ 12
1
12
7
12
[Notes on axioms:
(1) In order to cope with infinite sequences of events, it is necessary to strengthen axiom
3 to
3. P (
i=1 ) =
i=1 P (Ei ) for any sequence (E1 , E2 , ) of mutually exclusive events.
(2) When is noncountably infinite, in order to make the theory rigorous it is usually
necessary to restrict the class of events F to which probabilities are assigned.]
1.1.3
Conditional probability
Supose P (E2 ) 6= 0. The conditional probability of the event E1 given E2 is

defined as
P (E1 E2 )
.
P (E1 |E2 ) =
P (E2 )
The conditional probability is undefined if P (E2 ) = 0. The conditional probability formula above yields the multiplication rule:
P (E1 E2 ) = P (E1 )P (E2 |E1 )
= P (E2 )P (E1 |E2 )
Independence
Events E1 and E2 are said to be independent if
P (E1 E2 ) = P (E1 )P (E2 ) .
Note that this implies that P (E1 |E2 ) = P (E1 ) and P (E2 |E1 ) = P (E2 ). Thus
knowledge of the occurrence of one of the events does not affect the likelihood of
occurrence of the other.
Events E1 , . . . , Ek are pairwise independent if P (Ei Ej ) = P (Ei )P (Ej ) for all
i 6= j. They are mutually independent if for all subsets P (j Ej ) = j P (Ej ).

Clearly, mutual independence pairwise independence, but the converse is false
(see question 4 of the self study exercises).
Example 1 (continued): Find P (E1 |E2 ) and P (E1 |E3 ). Are E1 , E2 independent?
1 E2 )
1 E3 )
Answer: P (E1 |E2 ) = P (E
= 0, P (E1 |E3 ) = P (E
= 1/12
= 61
P (E2 )
P (E3 )
1/2
P (E1 )P (E2 ) 6= 0 so P (E1 E2 ) 6= P (E1 )P (E2 ) and thus E1 and E2 are not
independent.
Law of total probability (partition law)

Suppose that B1 , . . . , Bk are mutually exclusive and exhaustive events (i.e. Bi
Bj = for all i 6= j and i Bi = ).
Let A be any event. Then
P (A) =
P (A|Bj )P (Bj )
j=1
Bayes Rule
Suppose that events B1 , . . . , Bk are mutually exclusive and exhaustive and let A
be any event. Then
P (Bj |A) =
P (A|Bj )P (Bj )
P (A|Bj )P (Bj )
=
P (A)
i P (A|Bi )P (Bi )
Example 2: (Cancer diagnosis) A screening programme for a certain type of

= 0.05, where D is the event
cancer has reliabilities P (A|D) = 0.98 , P (A|D)
disease is present and A is the event test gives a positive result. It is known
that 1 in 10, 000 of the population has the disease. Suppose that an individuals
test result is positive. What is the probability that that person has the disease?
Answer: We require P (D|A). First find P (A).
(D)
= 0.98 0.0001 + 0.05 0.9999 =
P (A) = P (A|D)P (D) + P (A|D)P
0.050093.
By Bayes rule; P (D|A) =
P (A|D)P (D)
P (A)
0.00010.98
0.050093
= 0.002.
The person is still very unlikely to have the disease even though the test is positive.
Example 3: (Bertrands Box Paradox) Three indistinguishable boxes contain
black and white beads as shown: [ww], [wb], [bb]. A box is chosen at random
1.1. BASIC PROBABILITY
and a bead chosen at random from the selected box. What is the probability of
that the [wb] box was chosen given that selected bead was white?
Answer: E chose the [wb] box, W selected bead is white. By the
partition law: P (W ) = 1 13 + 12 13 + 0 13 = 12 . Now using Bayes rule
P (E|W ) =
P (E)P (W |E)
P (W )
1
12
3
1
2
1
3
(i.e. even though a bead from the selected
box has been seen, the probability that the box is [wb] is still 13 ).
1.1.4
Self-study exercises
1. Consider families of three children, a typical outcome being bbg (boy-boygirl in birth order) with probability 81 . Find the probabilities of
(i) 2 boys and 1 girl (any order),
(ii) at least one boy,
(iii) consecutive children of different sexes.
Answer: (i) 38 ; (ii) 87 ; (iii) 14 .
2. Use pA = P (A), pB = P (B) and pAB = P (A B) to obtain expressions
for:
(a) P (A B),
(b) P (A B),
(c) P (A B),
(d) P (A B),
(B A)).
(e) P ((A B)
Describe each event in words. (Use a Venn diagram.)
Answer: (a) 1pAB ; (b) pB pAB ; (c) 1pA +pAB ; (d) 1pA pB +pAB ;
(e) pA + pB 2pAB .
3. (i) Express P (E1 E2 E3 ) in terms of the probabilities of E1 , E2 , E3 and
their intersections only. Illustrate with a sketch.
(ii)Three types of fault can occur which lead to the rejection of a certain
manufactured item. The probabilities of each of these faults (A, B and C)

occurring are 0.1, 0.05 and 0.04 respectively. The three faults are known to
be interrelated; the probability that A & B both occur is 0.04, A & C 0.02,
and B & C 0.02. The probability that all three faults occur together is 0.01.
What percentage of items are rejected?
Answer: (i) P (E1 )+P (E2 )+P (E3 )P (E1 E2 )P (E1 E3 )P (E2
E3 ) + P (E1 E2 E3 )
(ii) P (A B C) = .01 + .05 + .04 (.04 + .02 + .02) + .01 = 0.12
1
4. Two fair dice rolled: 36 possible outcomes each with probability 36
. Let
E1 = {odd face 1st}, E2 = {odd face 2nd}, E3 = {one odd, one even}, so
P (E1 ) = 21 , P (E2 ) = 12 , P (E3 ) = 12 . Show that E1 , E2 , E3 are pairwise
independent, but not mutually independent.
Answer: P (E2 |E1 ) = 21 = P (E2 ), P (E3 |E1 ) = 12 = P (E3 ), P (E3 |E2 ) =

1
= P (E3 ), so E1 , E2 , E3 are pairwise independent. But P (E1 E2 E3 ) =
2
0 6= P (E1 )P (E2 )P (E3 ), so E1 , E2 , E3 are not mutually independent.
5. An engineering company uses a selling aptitude test to aid it in the selection of its sales force. Past experience has shown that only 65% of all
persons applying for a sales position achieved a classification of satisfactory in actual selling and of these 80% had passed the aptitude test. Only
30% of the unsatisfactory persons had passed the test.
What is the probability that a candidate would be a satisfactory salesperson given that they had passed the aptitude test?
Answer: A = pass aptitude test, S = satisfactory. P (S) = 0.65, P (A|S) =
= 0.3. Therefore P (A) = (0.65 0.8) + (0.35 0.3) = 0.625
0.8, P (A|S)
so P (S|A) = P (S)P (A|S)/P (A) = (0.65 0.8)/0.625 = 0.832.
1.2. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
1.2
Random variables and probability distributions
1.2.1 Random variables

A random variable X is a real-valued function on the sample space ; that is,
X : R. If P is a probability measure on (, F) then the induced probability
measure on R is called the probability distribution of X.
A discrete random variable X takes values x1 , x2 , . . . with probabilities p(x1 ), p(x2 ), . . .,
where p(x) = pr(X = x) = P ({ : X() = x}) is the probability mass function (pmf) of X. (E.g. X = place of horse in race, grade of egg.)
Example 4: (i) Toss a coin twice: outcomes HH, HT, TH, TT. The random variable X = number of heads, takes values 0, 1, 2.
1
(ii) Roll two dice: X = total score. Probabilities for X are P(X = 2) = 36
, P(X =
2
3
3) = 36 , P(X = 4) = 36 etc.
Example 5: X takes values 1, 2, 3, 4, 5 with probabilities k, 2k, 3k, 4k, 5k. Calculate k and P(2 X 4).
Answer: 1 = 5x=1 P (x) = k(1 + 2 + 3 + 4 + 5) = 15k so k =

2
3
4
P (2 X 4) = P (2) + P (3) + P (4) = 15
+ 15
+ 15
= 35 .
1
.
15
A continuous random variable X takes values over an interval. E.g. X = time

over racecourse, weight of egg. Its probability density function (pdf) f (x) is
defined by
b
pr(a < X < b) =
f (x)dx .
Note that f (x) 0 for all x, and
f (x)dx = 1.
Example 6: Let f (x) = k(1 x2 ) on (1, 1). Calculate k and pr(|X| > 1/2).
k = 34
k(1 x2 )dx = k[x 31 x3 ]11 = 4k
3
1
5
P (|X| > 1/2) = 1 P ( 21 X 12 ) = 1 2 1 k(1 x2 )dx = 1 11k
= 16
12
Answer: 1 =
f (x)dx =
A mixed discrete/continuous random variable is such that the probability is shared
10
between discrete and continuous components with

p(x) + f (x)dx = 1, e.g.
rainfall on given day, waiting time in queue, flow in pipe, contents of reservoir.
The distribution function F of the random variable X is defined as
F (x) = pr(X x) = P ({ : X() x}).
Thus F () = 0, F () = 1, F is monotone increasing, and pr(a < X b) =
F (b) F (a).
Discrete case: F (x) =
ux
Continuous case: F (x) =
p(u)
f (u)du and F 0 (x) = f (x).
1.2.2 Expectation
The expectation (or expected value or mean) of the random variable X is defined
as

xp(x)
X discrete
= E(X) =

xf (x)dx X continuous
The Variance of X is 2 = Var(X) = E{(X )2 }. Equivalently 2 = E(X 2 )
{E(X)}2 (exercise: prove).
is called the standard deviation.
Functions of X:
(i) E{h(X)} =

h(x)p(x)
X discrete
h(x)f (x)dx X continuous
(ii) E(aX + b) = aE(X) + b, Var(aX + b) = a2 Var(X).

Proof (for discrete X)
(i) h(X) takes values h(x1 ), h(x2 ), . . . with probabilities p(x1 ), p(x2 ), . . ., so,
by definition, E{h(X)} = h(x1 )p(x1 ) + h(x2 )p(x2 ) + = h(x)p(x).
(ii) E[aX + b] = (aX + b)P (x) = a xP (x) + b P (x) = aE[X] + b
1.2. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
11
Var[aX+b] = E[(aX + b) E[aX + b]2 ] = E[aX + b aE[X] b2 ] = E[a2 (X

E[X])2 ] = a2 Var[X]
Example 7: X = 0, 1, 2 with probabilities 1/4, 1/2, 1/4.
Find E(X), E(X 1), E(X 2 ) and Var(X).
Answer: E[X] = 0 41 + 1 21 + 2
1
4
=1
E[X 1] = E[X] 1, E[X 2 ] = 02 14 + 12 12 + 22
1
4
3
2
Var[X] = E[X 2 ] E[X]2 = 12 .

Example 8: f (x) = k(1+x)4 on (0, ). Find k and hence obtain E(X), E{(1+
X)1 }, E(X 2 ) and Var(X).
Answer: 1 = k
(1 + x)4 dx = k[ 31 (1 + x)3 ]
0 =
k
3
k=3
E[X] = 3 0 x(1 + x)4 dx = 3 1 (u 1)u4 du = 3[ 12 u2 + 31 u3 ]

1 =
1
1
1
3( 2 3 ) = 2
E[(1 + X)1 ] = 3
(1 + x)5 dx = 3[ 41 (1 + x)4 ]
0 =
3
4
E[X 2 ] = 3 x2 (1+x)4 dx = 3 1 (u1)2 u4 du = 3[u1 +u2 13 u3 ]

1 = 1
Var[X] = E[X 2 ] E[X]2 = 43 .
12
1.2.3 Self-study exercises

3 1
1. X takes values 0, 1, 2, 3 with probabilities 14 , 15 , 10
, 4 . Compute (as fractions) E(X), E(2X + 3), Var(X) and Var(2X + 3).
31
61
73
Answer: E(X) = 20
, E(2X + 3) = 2E(X) + 3 = 10
, E(X 2 ) = 20
, so
2
499
499
2
Var(X) = E(X ) E(X) = 400 , Var(2X + 3) = 4Var(X) = 100 .
2. The random variable X has density function f (x) = kx(1 x) on (0,1),

f (x) = 0 elsewhere. Calculate k and sketch f (x). Compute the mean and
variance of X, and pr (0.3 X 0.6).
Answer: k = 6, E(X) = 12 , Var(X) =
1
,
20
pr (0.3 X 0.6) = 0.432.
Chapter 2
Random variables and distributions
2.1
Transformations
Suppose that X has distribution function FX (x) and that the distribution function
FY (y) of Y = h(X) is required, where h is a strictly increasing function. Then
FY (y) = pr(Y y) = pr(h(X) y) = pr(X x) = FX (x)
where x x(y) = h1 (y). If X is continuous and h is differentiable, then it
follows that Y has density
fY (y) =
dFY (y)
dFX (x)
dx
=
= fX (x) .
dy
dy
dy
On the other hand, if h is strictly decreasing then

FY (y) = pr(Y y) = pr(h(X) y) = pr(X x) = 1 FX (x)
which yields fY (y) = fX (x)(dx/dy). Both formulae are covered by

dx
fY (y) = fX (x) .
dy
13
14
CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS
Example 9: Suppose that X has pdf fX (x) = 2e2x on (0, ). Obtain the pdf of
Y = log X.
Probability integral transform. Let X be a continuous random variable with distribution function F (x). Then Y = F (X) is uniformly distributed on (0, 1).
Proof. First note that 0 Y 1. Let 0 y 1; then
pr(Y y) = pr(F (X) y) = pr(X F 1 (y)) = F (F 1 (y)) = y,
so Y has pdf f (y) = 1 on (0, 1) (by differentiation), which is the density of the
uniform distribution on (0, 1).
This result has an important application to the simulation of random variables:

1. X takes values 1, 2, 3, 4 with probabilities
1 1 3 2
, , ,
10 5 10 5
and Y = (X 2)2 .
(i) Find E(Y ) and Var(Y ) using the formula for E{h(X)}.
(ii) Calculate the pmf of Y and use it to calculate E(Y ) and Var(Y ) directly.
2.2. SOME STANDARD DISCRETE DISTRIBUTIONS
15
2. The random variable X has pdf f (x) = 31 , x = 1, 2, 3, zero elsewhere. Find

the pdf of Y = 2X + 1.
3. The random variable X has pdf f (x) = ex on (0, ). Obtain the pdf of
Y = eX .
4. Let X have the pdf f (x) =
pdf of Y = X 3 .
( 1 )x
2
, x = 1, 2, 3, . . . , zero elsewhere. Find the
2.2 Some standard discrete distributions

2.2.1
Binomial distribution
Consider a sequence of independent trials in each of which there are only two
possible results, success, with probability , or failure, with probability 1
(independent Bernoulli trials).
Outcomes can be represented as binary sequences, with 1 for success and 0 for
failure, e.g. 110001 has probability (1 )(1 )(1 ), since the trials are
independent.
Let the random variable X be the number of successes in n trials, with n fixed.
r
nr
The probability of a particular sequence
( ) of r 1s and n r 0s is (1 ) ,
n
and the event {X = r} contains
such sequences. Hence
r
(
p(r) = pr(X = r) =
n
r
)
r (1 )nr , r = 0, 1, . . . , n .
This is the pmf of the binomial (n, ) distribution. The name comes from the
binomial theorem
)
n (
n
{ + (1 )} =
r (1 )nr ,
r
n
r=0
from which
p(r) = 1 follows.
The mean is = n:
16
The variance is 2 = n(1 ) (see exercise 3).

Example 10: A biased coin with pr(head) = 2/3 is tossed five times. Calculate
p(r).
2.2.2 Geometric distribution

Suppose now that, instead of a fixed number of Bernoulli trials, one continues until
a success is achieved, so that the number of trials, N , is now a random variable.
Then N takes the value n if and only if the previous (n 1) trials result in failures
and the nth trial results in a success. Thus
p(n) = pr(N = n) = (1 )n1 , n = 1, 2, . . . .
This is the pmf of the geometric () distribution: the probabilities are in geometric progression. Note that the sum of the probabilities over n = 1, 2, . . . is
1.
The mean is = 1/:
2.2. SOME STANDARD DISCRETE DISTRIBUTIONS
17
The variance is 2 = (1 )/ 2 (see exercise 4).

Eg. Toss a biased coin with pr(head) = 2/3. Then, on average, it takes three
tosses to get a tail.
2.2.3
Poisson distribution
The pmf of the Poisson () distribution is defined as

e r
p(r) =
, r = 0, 1, 2, . . . ,
r!
where > 0. Note that the sum of the probabilities over r = 0, 1, 2, . . . is 1
(exponential series).
The mean is = :
The variance is 2 = (see exercise 6).

The Poisson distribution arises in various contexts, one being the limit of a binomial(n, )
as n and 0 with n = fixed.
Example 11: (Random events in time.) Cars are recorded as they pass a checkpoint. The probability that a car is level with the checkpoint at any given instant
is very small, but the number n of such instants in a given time period is large.
Hence Xt , the number of cars passing the checkpoint during a time interval t minutes, can be modelled as Poisson with mean proportional to t. For example, if
18
the average rate is two cars per minute, find the probability of exactly 3 cars in 5
minutes.

1. In a large consignment of widgets 5% are defective. What is the probability
of getting one or two defectives in a four-pack?
2. X is binomial with mean 2 and variance 1. Compute pr(X 1).
3. Derive the variance of the binomial (n, ) distribution.
[Hint: find E{X(X 1)}.]
4. Derive the variance of the geometric () distribution.
5. A leaflet contains one thousand words and the probability that any one word
contains a misprint is 0.005. Use the Poisson distribution to estimate the
probability of 2 or fewer misprints.
6. Derive the variance of the Poisson () distribution.
2.3. SOME STANDARD CONTINUOUS DISTRIBUTIONS
2.3
19
Some standard continuous distributions
2.3.1 Uniform distribution

The pdf of the uniform (, ) distribution is
f (x) = ( )1 , < x < .
The mean is = ( + )/2:
The variance is 2 = ( )2 /12 (see exercise 1).

Application. Simulation of continuous random variables via the probability integral transform: see Section 2.1.
2.3.2
Exponential distribution
The pdf of the exponential () distribution is

f (x) = ex , x > 0,
where > 0. The distribution function is F (x) =

(verify).
The mean is = 1/:
x
0
eu du = 1 ex
20
The variance is 2 = 1/2 (see exercise 4).

Lack of memory property.
pr(X > a + b|X > a) = pr(X > b)
Proof:
For example, if the lifetime of a component is exponentially distributed, then the

fact that it has lasted for 100 hours does not affect its chances of failing during the
next 100 hours. That is, the component is not subject to ageing.
Application to random events in time.
Example: cars passing a checkpoint. The distribution of the waiting time, T , for
the first event can be obtained as follows:
pr(T > t) = pr(Nt = 0) = et ,
since Nt , the number of events occurring during the time interval (0, t), has a
Poisson distribution with mean t. Hence T has distribution function F (t) =
1 et , that of the exponential () distribution.
2.3.3 Pareto distribution

The Pareto (, ) distribution has pdf
f (x) =
, x > 0,
(1 + x )+1
where > 0 and > 0. The distribution function is F (x) = 1 (1 + x )

(verify).
2.3. SOME STANDARD CONTINUOUS DISTRIBUTIONS
21
The mean is = /( 1) for > 1:
The variance is 2 = 2 /{( 1)2 ( 2)} for > 2.
2.3.4
1. Obtain the variance of the uniform (, ) distribution.

2. The lifetime of a valve has an exponential distribution with mean 350 hours.
What proportion of valves will last 400 hours or longer? For how many
hours should the valves be guaranteed so that only 1% are returned under
guarantee?
3. A machine suffers random breakdowns at a rate of three per day. Given that
it is functioning at 10am what is the probability that
(i) no breakdown occurs before noon?
(ii) the first breakdown occurs between 12pm and 1pm?
4. Obtain the variance of the exponential () distribution.
5. The random variable X has the Pareto distribution with = 3 , = 1. Find
the probability that X exceeds + 2, where , are respectively the mean
and standard deviation of X.
22
2.4 The normal (Gaussian) distribution

2.4.1 Normal distribution
The normal distribution is the most important distribution in Statistics, for both
theoretical and practical reasons. Its pdf is
(x)2
1
f (x) = e 22 , < x < .
2
The parameters and 2 are the mean and variance respectively. The distribution
is denoted by N (, 2 ).
Mean:
The importance of the normal distribution follows from its use as an approximation in various statistical methods (consequence of Central Limit Theorem: see
Section 3.4.2), its convenience for theoretical manipulation, and its application to
describe observed data.
Standard normal distribution
The standard normal distribution is N (0, 1), for which the distribution function
has the special notation (x). Thus
x
u2
1
e 2 du .
(x) =
2
The function is tabulated widely (e.g. New Cambridge Statistical Tables). Useful values are (1.64) = 0.95, (1.96) = 0.975.
Example 12: Suppose that X is N (0, 1) and Y is N (2, 4). Use tables to calculate
pr(X < 1), pr(X < 1), pr(1.5 < X < 0.5), pr(Y < 1) and pr(Y 2 >
5Y 6).
2.4. THE NORMAL (GAUSSIAN) DISTRIBUTION
23
2.4.2 Properties
(i) If X is N (, 2 ) then aX + b is N (a + b, a2 2 ).
In particular, the standardized variate (X )/ is N (0, 1).
(ii) if X1 is N (1 , 12 ), X2 is N (2 , 22 ) and X1 and X2 are independent, then
X1 + X2 is
N (1 + 2 , 12 + 22 ).
[Hence, from property (i), the distribution of X1 X2 is N (1 2 , 12 + 22 ).]
(iii) If Xi , i = 1, . . . , n, are independent N (i , i2 ), then i Xi is N ( i i , i i2 ).

(iv) The moment generating function (see Section 2.6.3) of N (, 2 ) is M (z) =
1 2 2
E(ezX ) = ez+ 2 z .
(Properties (i) - (iii) are easily proved via mgfs - see Section 2.6.3.)
(v) Central moments of N (, 2 ). Let r = E{(X )r }, the rth central moment of X. Then
r = 0 for r odd, r = (/ 2)r r!/(r/2)! for r even.

Note that 2 = 2 , the variance of X.
Sampling distribution of the sample mean

Let X1 , . . . , Xn be independently and identically distributed (iid) as N (, 2 ).
= n1 Xi is N (, n1 2 ). This is the sampling
Then the distribution of X
distribution of the sample mean, a result of fundamental importance in Statistics.
24
Proof:

1. The distribution of lengths of rivets is normal with mean 2.5cm and sd
0.02cm. In a batch of 500 rivets how many would you expect on average to
have length
(i)less than 2.46cm,
(ii) between 2.46cm and 2.53cm,
(iii) greater than 2.53cm?
(iv) What length is exceeded by only 1 in 1000 rivets?
2. Suppose that X is N (0, 1) and Y is N (2, 4). Use tables to calculate pr(Y
X < 1) and pr(X + 21 Y > 1.5).
3. Two resistors in series have resistances X1 and X2 ohms, where X1 is
N (200, 4) and X2 is N (150, 3). What is the distribution of the combined
resistance X = X1 + X2 ? Find the probability that X exceeds 355.5 ohms.
4. The fuel consumption of a fleet of 150 lorries is approximately normally
distributed with mean 15 mpg and sd 1.5 mpg.
(i) Compute the expected number of lorries that average between 13 and 14
mpg.
(ii) What is the probability that the average of a random sample of four
lorries exceeds 16 mpg?
2.5. BIVARIATE DISTRIBUTIONS
2.5
25
Bivariate distributions
2.5.1 Definitions and notation

Suppose that X1 , X2 are two random variables defined on the same probability
space (, F, P ). Then P induces a joint distribution for X1 , X2 . The joint
distribution function is defined as
F (x1 , x2 ) = P ({ : X1 () x1 , X2 () x2 })
= pr(X1 x1 , X2 x2 ) .
In the discrete case the joint pmf is p(x1 , x2 ) = pr(X1 = x1 , X2 = x2 ). In the
1 ,x2 )
continuous case, the joint pdf is f (x1 , x2 ) = Fx(x1 x
.
2
Example 13: (discrete) Two biased coins are tossed. Score heads = 1 (with probability ), tails = 0 (with probability 1 ). Let X1 = sum of scores, X2 =
difference of scores (1st - 2nd). The tables below show
(i) the possible values of X1 , X2 and their probabilities,
(ii) the joint probability table for X1 , X2 .
(i)
Outcome
X1
00
01
10
11
X2
Prob
(ii)
X2
-1
X1
0
1
2
Example 14: (continuous) Suppose X1 and X2 have joint pdf f (x1 , x2 ) = k(1
x1 x22 ) on (0, 1)2 . Obtain the value of k.
26
2.5.2 Marginal distributions

These follow from the law of total probability.
Discrete case. Marginal probability mass functions
p1 (x1 ) = pr(X1 = x1 ) =
p(x1 , x2 ) and p2 (x2 ) = pr(X2 = x2 ) =
p(x1 , x2 )
x2
x1
Continuous case. Marginal probability density functions
f1 (x1 ) = f (x1 , x2 )dx2 and f2 (x2 ) = f (x1 , x2 )dx1
Marginal means and variances. 1 = E(X1 ) = x1 p1 (x1 ) (discrete) or x1 f1 (x1 )dx1

(continuous)
12 = var(X1 ) = E{(X1 1 )2 } = E(X12 ) 21
Likewise 2 and 22 .
2.5.3 Conditional distributions

These follow from the definition of conditional probability.
Discrete case. Conditional probability mass function of X1 given X2 is
p1 (x1 |X2 = x2 ) =
pr(X1 = x1 |X2 = x2 )
pr(X1 = x1 , X2 = x2 )
p(x1 , x2 )
=
=
.
pr(X2 = x2 )
p2 (x2 )
Similarly
p2 (x2 |X1 = x1 ) =
p(x1 , x2 )
.
p1 (x1 )
Continuous case. Conditional probability density function of X1 given X2 is

f1 (x1 |X2 = x2 ) =
f (x1 , x2 )
.
f2 (x2 )
f2 (x2 |X1 = x1 ) =
f (x1 , x2 )
.
f1 (x1 )
Similarly
Independence. X1 and X2 are said to be independent if F (x1 , x2 ) = F1 (x1 )F2 (x2 ).

Equivalently, p(x1 , x2 ) = p1 (x1 )p2 (x2 ) (discrete), or f (x1 , x2 ) = f1 (x1 )f2 (x2 )
(continuous).
27
Example 15: Suppose that R and N have a joint distribution in which R|N is
binomial (N, ) and N is Poisson (). Show that R is Poisson ().
2.5.4
Covariance and correlation
The covariance between X1 and X2 is defined as

12 = Cov(X1 , X2 ) = E{(X1 1 )(X2 2 )} = E(X1 X2 ) 1 2 ,
where E(X1 X2 ) =
x1 x2 p(x1 , x2 ) (discrete) or x1 x2 f (x1 , x2 )dx1 dx2 (continuous).
The correlation between X1 and X2 is
= Corr(X1 , X2 ) =
Marginal distributions:
x1 = 0, 1, 2 with p1 (x1 ) =
x2 = 1, 0, 1 with p2 (x2 ) =
Marginal means:
1 = x1 p1 (x1 ) =
2 = x2 p2 (x2 ) =
Variances:
12
.
1 2
28
12 =
22 =
x21 p1 (x1 ) 21 =
x22 p2 (x2 ) 22 =
Conditional distributions: e.g.

p(x1 |X2 = 0) =
x1 = 0
x1 = 2
Independence: e.g. p(1, 0) = 0 but p1 (0)p2 (1) 6= 0, so X1 , X2 are not independent.
Covariance: 12 = x1 x2 p(x1 , x2 ) 1 2 =
Marginal distributions:
1
f1 (x1 ) = 0 k(1 x1 x22 )dx2 =
f2 (x2 ) =
Marginal means:
1 =
2 =
Variances:
12 =
22 =
1
0
1
0
1
0
1
0
1
0
k(1 x1 x22 )dx1 =
x1 f1 (x1 )dx1 =
x2 f2 (x2 )dx2 =
x21 f1 (x1 )dx1 21 =
x22 f2 (x2 )dx2 22 =
Conditional distributions: e.g.

f (x2 |X1 = 13 ) =
Independence:
f (x1 , x2 ) = k(1 x1 x22 ) , which does not factorise into f1 (x1 )f2 (x2 )
so X1 , X2 are not independent.
Covariance:
12 =
x1 x2 f (x1 , x2 )dx1 dx2 1 2
29
Properties
(i) E(aX1 + bX2 ) = a1 + b2 , Var(aX1 + bX2 ) = a2 12 + 2ab12 + b2 22
Cov(aX1 + b, cX2 + d) = ac12 , Corr(aX1 + b, cX2 + d) = Corr(X1 , X2 )
(note: invariance under linear transformation)
Proof:
(ii) X1 , X2 independent Cov(X1 , X2 ) = 0 . The converse is false.

Proof:
(iii) 1 Corr(X1 , X2 ) +1, with equality if and only if X1 , X2 are linearly

dependent.
Proof:
30
(iv) E(Y ) = E{E(Y |X)} and Var(Y ) = E{Var(Y |X)} + Var{E(Y |X)}
Proof:

1. Roll a fair die twice. Let X1 be the number of times that face 1 shows, and
let X2 = [sum of faces/4], where [x] denotes the integer part of x.
(a) Construct the joint probability table.
(b) Calculate the two marginal pmfs p1 (x1 ) and p2 (x2 ) and the conditional
pmfs p1 (x1 |x2 = 1) and p2 (x2 |x1 = 1). Are X1 and X2 independent?
(c) Compute the means, 1 and 2 , variances, 12 and 22 , and covariance
12 . Are X1 and X2 uncorrelated?
2. X1 and X2 have joint density f (x1 , x2 ) = 4x1 x2 for 0 x1 1, 0
x2 1. Calculate the marginal and conditional densities of X1 and X2 ,
their means and variances, and their correlation.
3. Calculate, in terms of the means, variances and covariances of X1 , X2 and
X3 , E(2X1 + 3X2 ), Cov(2X1 , 3X2 ), Var(2X1 + 3X2 ) and Cov(2X1 +
3X2 , 4X2 + 5X3 ).
2.6. GENERATING FUNCTIONS
2.6
31
Generating functions
2.6.1 General
The generating function for a sequence (an : n 0) is A(z) = a0 + a1 z + a2 z 2 +
n
=
n=0 an z . Here z is a dummy variable. The definition is useful only if the
series converges. The idea is to replace the sequence (an ) by the function A(z),
which may be easier to analyse than the original sequence.
Examples:
(i) If an = 1 for n = 0, 1, 2, . . . , then A(z) = (1 z)1 for |z| < 1 (geometric
series).
(
)
m
(ii) If an =
for n = 0, 1, . . . , m, and an = 0 for n > m, then A(z) =
n
(1 + z)m (binomial series).
2.6.2
Probability generating function
Let (pn ) be the pmf of some discrete random variable X, so pn = pr(X = n) 0
and n pn = 1. Define the probability generating function (pgf) of X by
P (z) = E(z X ) =
pn z n .
n
Properties
(i) |P (z)| 1 for |z| 1 .
Proof:
(ii) = E(X) = P 0 (1) .

Proof:
32
(iii) 2 = Var(X) = P 00 (1) + P 0 (1) {P 0 (1)}2 .

Proof:
(iv) Let X and Y be independent random variables with pgfs PX and PY respectively. Then the pgf of X + Y is given by PX+Y (z) = PX (z)PY (z) .
Proof:
Example 16: (i) Find the pgf of the Poisson () distribution.

(ii) Let X1 , X2 be independent Poisson random variables with parameters 1 , 2
respectively. Obtain the distribution of X1 + X2 .
2.6.3
Moment generating function
The moment generating function (mgf) is defined as

M (z) = E(ezX ) .
The pgf tends to be used more for discrete distributions, and the mgf for continuous ones, although note that the two are related by M (z) = P (ez ).
2.6. GENERATING FUNCTIONS
33
Properties
(i) = E(X) = M 0 (0), 2 = Var(X) = M 00 (0) 2 .
Proof:
(ii) Let X and Y be independent random variables with mgfs MX (z) , MY (z)
respectively. Then the mgf of X + Y is given by MX+Y (z) = MX (z)MY (z) .
Proof:
Normal distribution. We prove properties (i) - (iv) of Section 2.4.2.
34

1. Show that the pgf of the binomial (n, ) distribution is {z + (1 )}n .
2. (Zero-truncated Poisson distribution) Find the pgf of the discrete distribution with pmf p(r) = e r /{r!(1 e )} for r = 1, 2, . . .. Deduce the
mean and variance.
3. The random variable X has density f (x) = k(1 + x)ex on (0, ) with
> 0. Find the value of k. Show that the moment generating function
M (z) = k{(z )2 (z )1 }. Use it to calculate the mean and
standard deviation of X.
Chapter 3
Further distribution theory
3.1
Multivariate distributions
Let X1 , . . . , Xp be p real-valued random variables on (, F) and consider the joint

distribution of X1 , . . . , Xp . Equivalently, consider the distribution of the random
vector
X1
X2
X=

Xp
3.1.1
Definitions
The joint distribution function

F (x) = pr(X x) = pr(X1 x1 , . . . , Xp xp )
The joint probability mass function (pmf)
p(x) = pr(X = x) = pr(X1 = x1 , . . . , Xp = xp )
(discrete case)
The joint probability density function (pdf) f (x) is such that
pr(X A) =
f (x)dx
A
(continuous case)
35
36
CHAPTER 3. FURTHER DISTRIBUTION THEORY
The marginal distributions are those of the individual components:

Fj (xj ) = pr(Xj xj ) = F (, . . . , xj , . . . , )
The conditional distributions are those of one component given another:
F (xj |xk ) = pr(Xj xj |Xk = xk )
The Xj s are independent if F (x) = j Fj (xj ). Equivalently, p(x) = j pj (xj )
(discrete case), or f (x) = j fj (xj ) (continuous case).

Means: j = E(Xj )
Variances: j2 = Var(Xj ) = E{(Xj j )2 } = E(Xj2 ) 2j
Covariances: jk = Cov(Xj , Xk ) = E{(Xj j )(Xk k )} = E(Xj Xk )j k
Correlations: jk = Corr(Xj , Xk ) = jjk

k
3.1.2 Mean and covariance matrix
1
2
The mean vector of X is = E(X) =

p
The covariance matrix (variance-covariance matrix, dispersion matrix) of X
is
11 12 1p
21 22 2p

p1 p2 pp
Since the (i, j)th element of (X )(X )T is (Xi i )(Xj j ), we see that
= E{(X )(X )T } = E(XX T ) T .
3.1.3 Properties
Let X have mean and covariance matrix . Let a , b be p-vectors and A be a
q p matrix. Then
(i) E(aT X) = aT
(ii) Var(aT X) = aT a . It follows that is positive semi-definite.
(iii) Cov(aT X, bT X) = aT b
3.1. MULTIVARIATE DISTRIBUTIONS

(iv) Cov(AX) = AAT
(v) E(X T AX) = trace(A) + T A
Proof:
37
38

1. Let X1 = I1 Y, X2 = I2 Y , where I1 , I2 and Y are independent and I1 and
I2 take values 1 each with probability 21 .
Show that E(Xj ) = 0, Var(Xj ) = E(Y 2 ), Cov(X1 , X2 ) = 0.
2. Verify that E(X1 + + Xp ) = 1 + + p and Var(X1 + + Xp ) =
ij ij , where i = E(Xi ) and ij = Cov(Xi , Xj ).

has mean and variance
Suppose now that the Xi s are iid. Verify that X
2 /p, where = E(Xi ) and 2 = Var(Xi ).
3.2 Transformations
3.2.1 The univariate case
Problem: to find the distribution of Y = h(X) from the known distribution of X.
The case where h is a one-to-one function was treated in Section 1.2.3. When h
is many-to-one we use the following generalised formulae:
Discrete case: pY (y) =

pX (x)

Continuous case: fY (y) =
fX (x) dx
dy
where in both cases the summations are over the set {x : h(x) = y}. That is, we
add up the contributions to the mass or density at y from all x values which map
to y.
Example 17: (discrete) Suppose pX (x) = px for x = 0, 1, 2, 3, 4, 5 and let Y =
(X 2)2 . Obtain the pmf of Y .
3.2. TRANSFORMATIONS
39
Example 18: (continuous) Suppose fX (x) = 2x on (0, 1) and let Y = (X 12 )2 .

Obtain the pdf of Y .
3.2.2
The multivariate case
Problem: to find the distribution of Y = h(X), where Y is s 1 and X is r 1,

from the known distribution of X.
Discrete case: pY (y) =

pX (x) with the summation over the set {x : h(x) =
y}.
Continuous case:
Case (i): h is a one-to-one transformation (so that s = r). Then the rule is

dx
fY (y) = fX (x(y))
dy +

( )

dx
xi
is
the
Jacobian
of
transformation,
with
.
where dx
= y

dy
dy
j
ij
Case (ii): s < r. First transform the s-vector Y to the r-vector Y 0 , where Yi0 =
Yi , i = 1, . . . , s , and Yi0 , i = s + 1, . . . , r , are chosen for convenience. Now
0
find the density of Y 0 as above and then integrate out Ys+1
, . . . , Yr0 to obtain the
marginal density of Y , as required.
Case (iii): s = r but h() is not monotonic. Then there will generally be more
than one value of x corresponding to a given y and we need to add the probability
contributions from all relevant xs.
40
Example 19: (linear transformation) Suppose that Y = AX, where A is an r r

nonsingular matrix. Then fY (y) = fX (A1 y)|A|1
+ .
Example 20: Suppose fX (x) = ex1 x2 on (0, )2 . Obtain the density of Y1 =

1
(X1 + X2 ).
2
Sums and products If X1 and X2 are independent random variables with densities
f1 and f2 , then
(i) X1 + X2 has density g(u) = f1 (u v)f2 (v)dv (convolution integral)
(ii) X1 X2 has density g(u) = f1 (u/v)f2 (v)|v|1 dv .

Proof:
3.3. MOMENTS, GENERATING FUNCTIONS AND INEQUALITIES
41

1. If fX (x) = 29 (x + 1) on (1, 2) and Y = X 2 , find fY (y).
2. If X has density f (x) calculate the density g(y) of Y = X 2 when
(i) f (x) = 2xex on (0, );
2
(ii) f (x) = 12 (1 + x) on |x| 1;

(iii) f (x) =
1
2
on 12 x 32 .
3. Let X1 and X2 be independent exponential (), and let Y1 = X1 + X2 and

Y2 = X1 /X2 . Show that Y1 and Y2 are independent and find their densities.
3.3
3.3.1
Moments, generating functions and inequalities

Moment generating function
The moment generating function of the random vector X is defined as

M (z) = E(ez
TX
).
Here z T X = j zj Xj .
Properties
Suppose X has mgf M (z). Then
T
(i) X + a has mgf ea z M (z) and aX has mgf M (az).
(ii) The mgf of kj=1 Xj is M (z, . . . , z).

(iii) If X1 , . . . , Xk are independent random variables with mgfs Mj (zj ), j=1,. . . ,k,
then the mgf of X = (X1 , . . . , Xk )T is M (z) = kj=1 Mj (zj ), the product of the
individual mgfs.
Proof:
42
3.3.2 Cumulant generating function

The cumulant generating function (cgf) of X is defined as K(z) = log M (z).
The cumulants of X are defined as the coefficients j in the power series expan
j
sion K(z) =
j=1 j z /j!.
The first two cumulants are
1 = = E(X), 2 = 2 = Var(X)
Similarly, the third and fourth cumulants are found to be 3 = E(X )3 , 4 =

3/2
E(X )4 3 4 . These are used to define the skewness, 1 = 3 /2 , and the
kurtosis, 2 = 4 /22 .
Cumulants of the sample mean. Suppose that X1 , . . . , Xn is a random sample from
= n1 n Xj
a distribution with cgf K(z) and cumulants j . Then the mgf of X
j=1
is {M (n1 z)}n , so the cgf is
log{M (n1 z)}n = nK(n1 z) = n
j (n1 z)j /j! .
j=1
is j /nj1 and it follows that X

has mean 1 = ,
Hence the jth cumulant of X
variance 2 /n = 2 /n, skewness (3 /n2 )/(2 /n)3/2 = 1 /n1/2 and kurtosis
(4 /n3 )/(2 /n)2 = 2 /n.
3.3.3 Some useful inequalities

Markovs inequality
Let X be any random variable with finite mean. Then for all a > 0
pr(|X| a)
E|X|
.
a
3.3. MOMENTS, GENERATING FUNCTIONS AND INEQUALITIES
43
Proof:
Cauchy-Schwartz inequality
Let X, Y be any two random variables with finite variances. Then
{E(XY )}2 E(X 2 )E(Y 2 ) .
Proof:
Jensens inequality
If u(x) is a convex function then
E{u(X)} u(E(X)) .
Note that u() is convex if the curve y = u(x) has a supporting line underneath at
each point, e.g. bowl-shaped.
Proof:
44
Examples
1. Chebyshevs inequality.
Let Y be any random variable with finite variance. Then for all a > 0
pr(|Y | a)
2
.
a2
2. Correlation inequality.
2 2
{Cov(X, Y )}2 X
Y (which implies that |Corr(X, Y )| 1).
3. |E(X)| E(|X|).
[It follows that |E{h(Y )}| E{|h(Y )|} for any function h().]
3.4. SOME LIMIT THEOREMS
45
4. E{(|X|s )r/s } {E(|X|s )}r/s .

[Thus {E(|X|r )}1/r {E(|X|s )}1/s and it follows that {E(|X|r )}1/r is an increasing function of r.]
5. A cumulant generating function is a convex function; i.e. K 00 (z) 0.

Proof. K(z) = log M (z), so K 0 = M 0 /M and K 00 = {M M 00 (M 0 )2 }/M 2 .
Hence M (z)2 K 00 (z) = E(ezX )E(X 2 ezX ) {E(XezX )}2 0, by the CauchySchwartz inequality.
(on writing XezX = (ezX/2 )(XezX/2 ))
3.3.4
1. Find the joint mgf M (z) of (X, Y ) when the pdf is f (x, y) =
y)e(x+y) on (0, )2 . Deduce the mgf of U = X + Y .
1 3
(x
2
2. Find all the cumulants of the N (, 2 ) distribution.

1
[You may assume the mgf ez+ 2
2 z2
.]
3. Suppose that X is such that E(X) = 3 and E(X 2 ) = 13. Use Chebyshevs
inequality to determine a lower bound for pr(2 < X < 8).
4. Show that {E(|X|)}1 E(|X|1 ).
3.4
3.4.1
Some limit theorems

Modes of convergence of random variables
Let X1 , X2 , . . . be a sequence of random variables. There are a number of alternative modes of convergence of (Xn ) to a limit random variable X. Suppose first
that X1 , X2 , . . . and X are all defined on the same sample space .
46
Convergence in probability
p
Xn X if pr(|Xn X| > ) 0 as n for all > 0. Equivalently,

pr(|Xn X| ) 1. Often X = c, a constant.
Almost sure convergence
a.s.
Xn X if pr(Xn X) = 1. Again, often X = c. Also referred to as

convergence with probability one.
Almost sure convergence is a stronger property than convergence in probability.
i.e. a.s. p, but p 6 a.s.
Example 21: Consider independent Bernoulli trials with constant probability of
success 21 .
A typical sequence would be 01001001110101100010 . . ..
Here the first 20 trials resulted in 9 successes, giving an observed proportion of
20 = 0.45 successes.
X
Intuitively, as we increase n we would expect this proportion to get closer to 1.
However, this will not be the case for all sequences: for example, the sequence
11111111111111111111 has exactly the same probability as the earlier sequence,
20 = 1.
but X
It can be shown that the total probability of all infinite sequences for which the
n 1 ) = 1 so
proportion of successes does not converge to 21 is zero; i.e. pr(X
2
p 1
a.s. 1
Xn 2 (and hence also Xn 2 ).

Convergence in rth mean
r
Xn X if E|Xn X|r 0 as n .
[rth mean p, but rth mean 6 a.s. ]
Suppose now that the distribution functions are F1 , F2 , . . . and F . The random
variables need not be defined on the same sample spaces for the following definition.
Convergence in distribution
d
Xn X if Fn (x) F (x) as n at each continuity point of F . We say that

the asymptotic distribution of Xn is F .
[p d, but d 6 p]
A useful result.
Let (Xn ), (Yn ) be two sequences of random variables such that
3.4. SOME LIMIT THEOREMS
47
Xn X and Yn c, a constant. Then

d
Xn + Yn X + c , Xn Yn cX , Xn /Yn X/c (c 6= 0).
3.4.2
Limit theorems for sums of independent random variables
Let X1 , X2 , . . . be a sequence of iid random variables with (common) mean . Let
n = n1 Sn .
Sn = ni=1 Xi , X
p
n
Weak Law of Large Numbers (WLLN). If E|Xi | < then X
.
n) =
Proof (case 2 = Var(Xi ) < ). Use Chebyshevs inequality: since E(X
we have, for every > 0,
n | > )
pr(|X
n)
Var(X
2
=
0
2
n2
as n .
Example 21: (continued). Here 2 = Var(Xi ) =
n , the proportion of successes.
applies to X
1
4
(Bernoulli r.v.) so the WLLN
n a.s.
Strong Law of Large Numbers (SLLN). If E|Xi | < then X
.
[The proof is more tricky and is omitted.]
Central Limit Theorem (CLT). If 2 = Var(Xi ) < then
Sn n d
N (0, 1) .
n
Equivalently,
n d
X
N (0, 1) .
/ n
Proof. Suppose that Xi has mgf M (z). Write Zn =
given by
(
zZn
MZn (z) = E(e
Sn
n
.
n
The mgf of Zn is
)}n
){ (
z n
z
.
) = exp
M
48
Therefore the cgf of Zn is
(
)
z n
z
KZn (z) = log MZn (z) =

+ nK
n
{ (
)
(
)2 }
(
)
1
z n
z
z
+
+O
=
+n
2 n
n
n
)
(
z n z n z 2
z2
1
=
+
+
+O
2
2
n
as n , which is the cgf of the N (0, 1) distribution, as required.
[Note on the proof of the CLT. In cases where the mgf does not
exist, a similar proof
izX
j
can be given in terms of the function (z) = E(e
) where i = 1. () is called the
characteristic function and always exists.]
Example 21: (continued). Normal approximation to the binomial

Suppose now that the success probability is , so that pr(Xi = 1) = . Then

= and 2 = (1 ), so the CLT gives n(X
n )/ {(1 )} is
approximately N (0, 1).
p
n
Furthermore, X
by the WLLN, and it follows from the useful result that
n )} is also approximately N (0, 1).

n(Xn )/ {Xn (1 X
Poisson limit of binomial. Suppose that Xn is binomial (n, ) where is such that
d
n as n . Then Xn Poisson().
Proof. Xn is expressible as ni=1 Yi , where the Yi are independent Bernoulli

random variables with pr(Yi = 1) = . Thus Xn has pgf
(1 + z)n = {1 n1 (1 z) + o(n1 )}n exp{(1 z)}
as n , which is the pgf of the Poisson () distribution.

1. In a large consignment of manufactured items 25% are defective. A random
sample of 50 is drawn. Use the binomial distribution to compute the exact
probability that the number of defectives in the sample is five or fewer. Use
the CLT to approximate this answer.
2. The random variable Y has the Poisson (50) distribution. Use the CLT to
find pr(Y = 50), pr(Y 45) and pr(Y > 60).
3.5. FURTHER DISCRETE DISTRIBUTIONS
49
3. A machine in continuous use contains a certain critical component which

has an exponential lifetime distribution with mean 100 hours. When a component fails it is immediately replaced by one from the stock, originally of
90 such components. Use the CLT to find the probability that the machine
can be kept running for a year without the stock running out.
3.5
Further discrete distributions
3.5.1 Negative binomial distribution

Let X be the number of Bernoulli trials until the kth success. Then
pr(X = x) = pr(k 1 successes in first x 1 trials, followed by success on kth trial)
)
(
x1
k1 (1 )xk
=
k1
(where the first factor comes from the binomial distribution). Hence define the
pmf of the negative binomial (k, ) distribution as
(
)
x1
p(x) =
k (1 )xk , x = k, k + 1, . . .
k1
The mean is k/:
The variance is k(1 )/ 2 (see exercise 1).
The pgf is {/(z 1 1 + )}k :
The name negative binomial comes from the binomial expansion
1 = k {1 (1 )}k =
p(x)
x=k
50
where p(x) are the negative binomial probabilities. (Exercise: verify)
3.5.2 Hypergeometric distribution

An urn contains n1 red beads and n2 black beads. Suppose that m beads are drawn
without replacement and let X be the number of red beads in the sample. Note
that, since X n1 and X m, the possible values of X are 0, 1, ..., min(n1 , m).
Then
no. of selections of x reds and m x blacks

total no. of selections of m beads
(
)(
)
n1
n2
x
mx
(
)
=
, x = 0, 1, ..., min(n1 , m) .
n1 + n2
m
p(x) = pr(X = x) =
This is the pmf of the hypergeometric (n1 , n2 , m) distribution.

The mean is n1 m/(n1 + n2 ) and the variance is n1 n2 m(n1 + n2 m)/{(n1 +
n2 )2 (n1 + n2 1)}.
3.5.3 Multinomial distribution

An urn contains nj beads of colour j (j = 1, . . . k). Suppose that m beads are
drawn with replacement and let Xj be the number of beads of colour j in the
sample. Then, for xj = 0, 1, . . . , m and kj=1 xj = m,

(
p(x) = pr(X = x) =
m
x
)
1x1 2x2 kxk ,
where j = nj / ki=1 ni . This is the pmf of the multinomial (k, m, ) distribution. Here
(
)
m
= no. of different orderings of x1 + + xk beads
x
(
)
m!
=
x1 ! x k !
3.5. FURTHER DISCRETE DISTRIBUTIONS
51
and the probability of any given order is 1x1 2x2 kxk . The name multinomial
m
comes from the multinomial
(
) expansion of (1 + +k ) in which the coefficient
m
of 1x1 2x2 kxk is
.
x
The means are mj :
The covariances are jk = m(jk j j k ).

X
The joint pgf is E( j zj j ) = ( kj=1 j zj )m :
3.5.4
1. Derive the variance of the negative binomial (k, ) distribution.

[You may assume the formula for the pgf.]
2. Suppose that X1 , . . . , Xk are independent geometric () random variables.
Using pgfs, show that kj=1 Xj is negative binomial (k, ).

[Hence, the waiting times Xj between successes in Bernoulli trials are independent geometric, and the overall waiting time to the kth success is negative binomial.]
3. If X is multinomial (k, m, ) show that Xj is binomial (m, j ), Xj + Xk is
binomial (m, j + k ), etc.
[Either by direct calculation or using the pgf.]
52
3.6 Further continuous distributions

3.6.1 Gamma and beta functions
Gamma function: (a) = 0 xa1 ex dx for a > 0

Integration by parts gives (a) = (a 1)(a 1).
In particular, for integer a, (a) = (a 1)! (since (1) = 1). Also, (1/2) = .
1
Beta function: B(a, b) = 0 xa1 (1 x)b1 dx for a > 0, b > 0
Relationship with Gamma function: B(a, b) = (a)(b)
(a+b)
3.6.2 Gamma distribution

The pdf of the gamma (, ) distribution is defined as
f (x) =
x1 ex
,x>0
()
where > 0 and > 0. When = 1, this is the exponential () distribution.

The mean is /:
The variance is / 2 (see exercise 2).

The mgf is (1 z/) :
Note that the mode is ( 1)/ if 1, but f (0) = if < 1.
3.6. FURTHER CONTINUOUS DISTRIBUTIONS
53
Example 22: The journey time of a bus on a nominal 12 -hour route has the gamma
(3, 6) distribution. What is the probability that the bus is over half an hour late?
Sums of exponential random variables Suppose that X1 , . . . , Xn are iid exponen

tial () random variables. Then ni=1 Xi is gamma (n, ).
Proof:
3.6.3
Beta distribution
The pdf of the beta (, ) distribution is

f (x) =
where > 0 and > 0.
x1 (1 x)1
, 0 < x < 1,
B(, )
54
The mean is /( + ):
The variance is /{( + )2 ( + + 1)}.

The mode is ( 1)/( + 2) if 1 and + > 2.
Property If X1 and X2 are independent, respectively gamma (1 , ) and gamma

(2 , ), then U1 = X1 +X2 and U2 = X1 /(X1 +X2 ) are independent, respectively
gamma (1 + 2 , ) and beta (1 , 2 ).
Proof The inverse transformation is
(
X1
X2
(
=
U1 U2
U1 (1 U2 )
with Jacobian

dx u2
u1
=
du 1 u2 u1

= u1 .

3.6. FURTHER CONTINUOUS DISTRIBUTIONS
55
Therefore
[
] [ 2
]
1 (u1 u2 )1 1 eu1 u2
{u1 (1 u2 )}2 1 eu1 (1u2 )
fU (u) =
| u1 |
(1 )
(2 )
{ 1 +2 1 +2 1 u1 } {
}
u1
e
(1 + 2 ) 1 1
2 1
=
u
(1 u2 )
(1 + 2 )
(1 )(2 ) 2
on (0, ) (0, 1) and the result follows.
3.6.4
1. Suppose X has the gamma (2, 4) distribution. Find the probability that X
exceeds +2, where , are respectively the mean and standard deviation
of X.
2. Derive the variance of the gamma (, ) distribution. [Either by direct calculation or using the mgf.]
3. Find the distribution of log X when X is uniform (0,1). Hence show
that if X1 , . . . , Xk are iid uniform (0,1) then log(X1 X2 Xk ) is gamma
(k, 1).
4. If X is gamma (, ) show that log X has mgf z (z + )/().
5. Suppose X is uniform (0, 1) and > 0 Show that Y = X 1/ is beta (, 1).
56
Chapter 4
Normal and associated distributions
4.1
The multivariate normal distribution
4.1.1 Multivariate normal

The multivariate normal distribution, denoted Np (, ), has pdf
1
f (x) = |2|1/2 exp{ (x )T 1 (x )}
2
on (, )p .
The mean is (p 1) and the covariance matrix is (p p) (see property (v)).
Bivariate case, p = 2. Here
(
X=
X1
X2
)
,=
1
2
(
, =
11 12
21 22
(
=
12
1 2
1 2
22
|2| = (2)2 12 22 (1 2 )
(
2 1
= (1 )
/(1 2 )
1/12
/(1 2 )
1/22
)
, giving
{(
[
)2
(
)(
) (
)2 }]
x1 1
x1 1
x2 2
x2 2
1
2
+
exp 2(12 )
1
1
2
2
f (x1 , x2 ) =
21 2 1 2
57
58
CHAPTER 4. NORMAL AND ASSOCIATED DISTRIBUTIONS
4.1.2 Properties
i) Suppose X is Np (, ) and let Y = T 1 (X ), where = T T T . Then
Yi , i = 1, . . . , p, are independent N (0, 1).
T z+ 1 z T z
2
(ii) The joint mgf of Np (, ) is e
. (C.f. property (iv), Section 2.4.2.)
4.1. THE MULTIVARIATE NORMAL DISTRIBUTION
59
(iii) If X is Np (, ) then AX + b (where A is q p and b is q 1) is Nq (A +

b, AAT ).
(C.f. property (i), Section 2.4.2.)
(iv) If X i , i = 1, . . . , n, are independent Np (i , i ), then

(C.f. property (iii), Section 2.4.2.)
X i is Np ( i i , i i ).
60
(v) Moments of Np (, ). Obtain by differentiation of the mgf. In particular, differentiating w.r.t. zj and zk gives E(Xj ) = j , Var(Xj ) = jj and
Cov(Xj , Xk ) = jk .
Note that if X1 , . . . , Xp are all uncorrelated (i.e. jk = 0 for j 6= k) then
X1 , . . . , Xp are independent N (j , j2 ).
(vi) If X is Np (, ) then aT X and bT X are independent if and only if aT b = 0.

Similarly for AT X and B T X.
4.1.3 Marginal and conditional distributions

Suppose that X is Np (, ). Partition X T as (X T1 , X T2 ) where X 1 is p1 1, X 2 is
)
(
11 12
T
T
T
.
p2 1 and p1 + p2 = p. Correspondingly = (1 , 2 ) and =
21 22
Note that T21 = 12 and X 1 and X 2 are independent if and only if 12 = 0 (since
4.1. THE MULTIVARIATE NORMAL DISTRIBUTION
61
the joint density factorises if and only if 12 = 0).

The marginal distribution of X 1 is Np1 (1 , 11 ).
Proof:
The conditional distribution of X 2 |X 1 is Np2 (2.1 , 22.1 ), where

2.1 = 2 + 21 1
11 (X 1 1 )
22.1 = 22 21 1
11 12
(proof omitted). Note that 2.1 is linear in X 1 .
4.1.4
((
1. Write down the joint density of the N2
0
1
))
) (
1 1
,
distribution
1 4
in component form.
2. Suppose that X i , i = 1, . . . , n, are independent Np (, ). Show that the
= n1 X i is Np (, n1 ).
sample mean vector, X
i
3. For the distribution in exercise 1, obtain the marginal distributions of X1
and X2 and the conditional distributions of X2 given X1 = x1 and X1 given
X2 = x2 .
62
4.2 The chi-square, t and F distributions

4.2.1 Chi-square distribution
The pdf of the chi-square distribution with degrees of freedom ( > 0) is
u 2 1 e 2 u
1
f (u) =
2 2 ( 12 )
, u > 0.
Denoted by 2 . Note that the 2 distribution is identical to the gamma ( 2 , 12 )

distribution (c.f. Section 3.6). It follows that the mean is , the variance is 2 and
the mgf is (1 2z)/2 .
Properties
(i) Let be a positive integer and suppose that X1 , . . . , X are iid N (0, 1). Then
2
2
2
2
i=1 Xi is . In particular, if X is N (0, 1) then X is 1 .
(ii) If Ui , i = 1, . . . , n, are independent 2i then
n
i=1
Ui is 2 with =
n
i=1
i .
4.2. THE CHI-SQUARE, T AND F DISTRIBUTIONS
63
(iii) If X is Np (, ) then (X )T 1 (X ) is 2p .
Theorem (Joint distribution of the sample mean and variance)

= n1 Xi be the sample
Suppose that X1 , . . . , Xn are iid N (, 2 ). Let X
i
2
1
2
mean and S = (n 1)
i (Xi X) the sample variance.
is N (, 2 /n), (n 1)S 2 / 2 is 2n1 and X
and S 2 are independent.
Then X
Proof:
64
4.2.2 Students t distribution

The pdf of the Students t distribution with degrees of freedom ( > 0) is
1
f (t) =
, < t < .
1
2 1
1
B( 2 , 2 ) 2 (1 + t ) 2 (+1)
Denoted by t . The mean is 0 (provided > 1):
The variance is /( 2) (provided > 2).

Theorem If X is N (0, 1), U is 2 and X and U are independent, then
X
t .
T
U/
Proof:
4.3. NORMAL THEORY TESTS AND CONFIDENCE INTERVALS
65
4.2.3 Variance ratio (F) distribution

The pdf of the variance ratio, or F distribution with 1 , 2 degrees of freedom
(1 , 2 > 0) is
( ) 12 1
f (x) =
1
2
B( 21 , 22 )(1 +
x 2 1 1
1
1 x 12 (1 +2 )
)
2
, x > 0.
Denoted by F1 ,2 . The mean is 2 /(2 2) (provided 2 > 2) and the variance is

222 (1 + 2 2)/{1 (2 2)2 (2 4)} (provided 2 > 4).
Theorem. If U1 and U2 are independent, respectively 21 and 22 , then
F
U1 /1
F1 ,2 .
U2 /2
Proof:
It follows from the above result that (i) F1 ,2 1/F2 ,1 and (ii) F1, t2 .
(Exercise: check)
4.3
4.3.1
Normal theory tests and confidence intervals

One-sample t-test
Suppose that Y1 , . . . , Yn are iid N (, 2 ). Then, from Section 3.2, Y = n1 i Yi
(the sample mean) and S 2 = (n 1)1 i (Yi Y )2 (the sample variance) are
independent, respectively N (, 2 /n) and 2 2n1 /(n 1). Hence
Z=
(Y )
/ n
66
is N (0, 1),
(n 1)S 2
U=
2
is 2n1 and Z, U are independent.
It follows that
Z
Y
=
T =
S/ n
U/(n 1)
is tn1 .
Applications:
Inference about : one-sample z-test ( known) and t-test ( unknown).
Inference about 2 : 2 test.
4.3.2 Two-samples
Two independent samples. Suppose that Y11 , . . . , Y1n1 are iid N (1 , 12 ) and Y21 , . . . , Y2n2
are iid N (2 , 22 ).
Summary statistics: (n1 , Y1 , S12 ) and (n2 , Y2 , S22 )
(n 1)S12 +(n2 1)S22
Pooled sample variance: S 2 = 1 n1 +n
2 2
From Section 4.2, if 12 = 22 = 2 , say, then Y1 and (n1 1)S12 are indepen2
2 2
2
dent N (1 , n1
1 ) , n1 1 respectively, and Y2 and (n1 1)S2 are independent
2
2 2
N (2 , n1
2 ) , n2 1 respectively.
Furthermore, (Y1 , (n1 1)S 2 ) and (Y2 , (n2 1)S 2 ) are independent.
1
1
2
2
2 2
Therefore (Y1 Y2 ) is N (1 2 ), (n1
1 +n2 ) ), (n1 +n2 2)S is n1 +n2 2
and (Y1 Y2 ) and (n1 + n2 2)S 2 are independent.
Therefore
T
(Y1 Y2 ) (1 2 )
S ( n11 + n12 )
is tn1 +n2 2 .
Also, since S12 , S22 are independent,
F
S12
Fn1 1,n2 1 .
S22
Applications:
Inference about 1 2 : two-sample z-test ( known) and t-test ( unknown).
Inference about 12 /22 : F (variance ratio) test.
67
Matched pairs Observations (Yi1 , Yi2 : i = 1, . . . , n) where the differences Di =

Yi1 Yi2 are independent N (, 2 ). Then
T =
S/ n
is tn1 , where S 2 is the sample variance of the Di s.

Application:
Inference about from paired observations: paired-sample t-test.
4.3.3
k samples (One-way Anova)
Suppose we have k groups, with group means 1 , . . . , k .

Denote the independent observations by (Yi1 , . . . , Yini : i = 1, . . . , k) with Yij
N (i , 2 ), j = 1, . . . , ni , i = 1, . . . k.
Summary statistics: ((ni , Si2 ) : i = 1, . . . , k).
Total sum of squares: ssT = ij (Yij Y )2 , where Y = n1 ij Yij (the overall
mean) and n = i ni .
Then ssT = ssW + ssB where
ssW = ij (Yij Yi )2 = i (ni 1)Si2 (the within-samples ss)
ssB = i ni (Yi Y )2 (the between-samples ss)
From Sections 4.1 and 4.2, (ni 1)Si2 / 2 is 2ni 1 independent of Yi .

Hence ssW/ 2 is 2nk independent of ssB.
Also, by a similar argument to that of the Theorem in Section 3.2 (proof omitted),
ssB is 2 2k1 when i = , say, for all i.
Hence we obtain the F -test for equality of the group means i :
F =
ssB/(k 1)
ssW/(n k)
is Fk1,nk under the null hypothesis 1 = = k .
68
4.3.4 Normal linear regression

Observations Y1 , . . . , Yn are independently N ( + xi , 2 ), where x1 , . . . , xn are
given constants.
is found by minimizing the sum of squares
The least-squares estimator (
, )
n
Q(, ) = i=1 (Yi xi )2 .
By partial differentiation with respect to and , we obtain
= Txy /Txx ,
= Y x
where Txx = i (xi x)2 and Txy = i (xi x)(Yi Y )

Note that, since both
and are linear combinations of Y = (Y1 , . . . , Yn )T , they
are jointly normally distributed.
T is
Using properties of expectation and covariance matrices, we find that (
, )
bivariate normal with mean (, ) and covariance matrix
( 1 2
)
2
n
x
x
i
i
V =
x
1
Txx
Sums of squares
Total ss: Tyy = i (Yi Y )2 ;
Residual ss: Q(
, );
Regression ss: Tyy Q(

, )
Results:
2
(a) Residual ss = Tyy Txx 2 , Regression ss = Txx 2 = Txy
/Txx
2
2
(b) E(Total ss) = Txx + (n 1) , E(Regression ss) = Txx 2 + 2 ,
E(Residual ss) = (n 2) 2
(c) By a similar argument to that of the Theorem in Section 3.2 (proof omitted),
Residual ss is 2 2n2 and, if = 0, Regression ss is 2 21 , independently of
Residual ss.
Application:
The residual mean square, S 2 = Residual ss/(n 2), is an unbiased estimator of
2 , is an unbiased estimator of with estimated standard error S/ T xx , and

is
2 1/2
an unbiased estimator of with estimated standard error (S/ T xx )( i xi /n) .
If = 0 then
0
T =
S/ T xx
69
is tn2 , giving rise to tests and confidence intervals about .

If = 0 then
Regression ss
F =
S2
is F1,n2 , hence a test for = 0.
(Alternatively, and equivalently, use T = S/T as tn2 .)

xx
The coefficient of determination is

2
Txy
Regression ss
r =
=
Total ss
Txx Tyy
2
(square of the sample correlation coefficient). The coefficient of determination

gives the proportion of Y -variation attributable to regression on x.

Lecture Note On Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Note On Statistics

Uploaded by

Copyright:

Available Formats

Lecture Notes on MS237

Introductory revision material

Random variables and distributions

3 Further distribution theory

2.5.3 Conditional distributions . . . .

Normal and associated distributions

MS237 Mathematical Statistics

Course Lecturer in 2010

Objectives and learning outcomes

Examples, exercises, and problems

(k + 1)xk = 1 + 2x + 3x2 + 4x3 +

A sample space, , is the set of all possible outcomes of an experiment. An

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

1.1.2 Probability axioms

1.1. BASIC PROBABILITY

Supose P (E2 ) 6= 0. The conditional probability of the event E1 given E2 is

i 6= j. They are mutually independent if for all subsets P (j Ej ) = j P (Ej ).

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

Law of total probability (partition law)

Example 2: (Cancer diagnosis) A screening programme for a certain type of

1.1. BASIC PROBABILITY

(i.e. even though a bead from the selected

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

Answer: P (E2 |E1 ) = 21 = P (E2 ), P (E3 |E1 ) = 12 = P (E3 ), P (E3 |E2 ) =

1.2. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Random variables and probability distributions

1.2.1 Random variables

Answer: 1 = 5x=1 P (x) = k(1 + 2 + 3 + 4 + 5) = 15k so k =

A continuous random variable X takes values over an interval. E.g. X = time

A mixed discrete/continuous random variable is such that the probability is shared

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

between discrete and continuous components with

Continuous case: F (x) =

f (u)du and F 0 (x) = f (x).

h(x)f (x)dx X continuous

(ii) E(aX + b) = aE(X) + b, Var(aX + b) = a2 Var(X).

by definition, E{h(X)} = h(x1 )p(x1 ) + h(x2 )p(x2 ) + = h(x)p(x).

(ii) E[aX + b] = (aX + b)P (x) = a xP (x) + b P (x) = aE[X] + b

1.2. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Var[aX+b] = E[(aX + b) E[aX + b]2 ] = E[aX + b aE[X] b2 ] = E[a2 (X

E[X 1] = E[X] 1, E[X 2 ] = 02 14 + 12 12 + 22

Var[X] = E[X 2 ] E[X]2 = 12 .

E[X] = 3 0 x(1 + x)4 dx = 3 1 (u 1)u4 du = 3[ 12 u2 + 31 u3 ]

E[X 2 ] = 3 x2 (1+x)4 dx = 3 1 (u1)2 u4 du = 3[u1 +u2 13 u3 ]

CHAPTER 1. INTRODUCTORY REVISION MATERIAL

1.2.3 Self-study exercises

2. The random variable X has density function f (x) = kx(1 x) on (0,1),

pr (0.3 X 0.6) = 0.432.

On the other hand, if h is strictly decreasing then

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

2.1.1 Self-study exercises

2.2. SOME STANDARD DISCRETE DISTRIBUTIONS

2. The random variable X has pdf f (x) = 31 , x = 1, 2, 3, zero elsewhere. Find

, x = 1, 2, 3, . . . , zero elsewhere. Find the

2.2 Some standard discrete distributions

CHAPTER 2. RANDOM VARIABLES AND DISTRIBUTIONS

The variance is 2 = n(1 ) (see exercise 3).

2.2.2 Geometric distribution

p(n) = pr(N = n) = (1 )n1 , n = 1, 2, . . . .

2.2. SOME STANDARD DISCRETE DISTRIBUTIONS

The variance is 2 = (1 )/ 2 (see exercise 4).

The pmf of the Poisson () distribution is defined as

The variance is 2 = (see exercise 6).