Stat

MATH5315 Applied Statistics and Probability 2011-2012
Lecturer: Andrew J. Baczkowski room 9.21i email: sta6ajb@leeds.ac.uk

Regularly updated information about the module is available on the internet at:
http://www.maths.leeds.ac.uk/sta6ajb/math5315/math5315.html
Module Objective: The aim of the module is to provide a grounding in the aspects of statistics,
in particular statistical modelling, that are of relevance to actuarial and financial work. The
module introduces and develops the fundamental concepts of probability and statistics used in
applied financial analysis.
Provisional Detailed Syllabus:
Part I: Fundamentals of Probability (11 lectures)
Summarising data; Introduction to probability; Random variables; Probability distributions; Generating functions; Joint distributions; The central limit theorem; Conditional expectation.
Part II: Fundamentals of Statistics (9 lectures)
Sampling and statistical inference; Point estimation; Confidence intervals; Hypothesis testing.
Part III: Applied Statistics (10 lectures)
Correlation and regression (OLS); Analysis of variance (ANOVA); Univariate time series analysis
and forecasting (ARMA); Multivariate time series analysis (VAR); Cointegration; Volatility models
(ARCH/GARCH).
Booklist: Sections from the two books will form the course notes for this module.
1. Subject CT3 Probability and Mathematical Statistics Core Technical, Core Reading, published
by Institute of Actuaries, price 45. Referred to as CT3.
2. Introductory Econometrics for Finance (2nd edition) by C. Brookes, published by Cambridge
University Press, 2008, price 40. Referred to as IEF.
IT is ESSENTIAL that you have access to these books. You MUST prepare the material BEFORE
the lectures, which will consist of examples and further explanation to illustrate the book material.
Timetable:
Lectures (weeks 1-5): Tuesdays 10-11 in RSLT14 and Tuesdays 12-1 in RSLT08.
Lectures (weeks 1-4, 6-11): Fridays 1-3 in RSLT08.
Seminar (weeks 1-4, 6): Fridays 3-4 in E.C.Stoner Building, room 9.90.
Practical (weeks 7-11): Fridays 3-4 in Irene Manton North cluster.
(RSLT is the Roger Stevens Lecture Theatre Block).
Assessment:
70% of marks for two hour examination at end of semester.
30% of marks for continuous assessment practical work.
Examination Paper: Format currently planned as follows for the TWO hour paper. Eleven
section A questions each worth TWO marks. Eleven section B questions each worth THREE
marks. Eleven section C questions each worth FIVE marks. You attempt TEN section A questions,
TEN section B questions, and TEN section C questions.
Exercise Sheets for MATH5315: None. I will introduce examples as we need them. There are
some questions (and answers) available on the module web-page.
MATH5315 Applied Statistics and Probability

Lecture 1: Summarising Data
References: CT3 Unit 1.
(CT3 denotes Subject CT3 Probability and Mathematical Statistics Core Technical, Core Reading, published by Institute of Actuaries, price 45.)
2 Tabular and graphical methods.
2.1 Types of data. Discrete and continuous data.
2.2 Frequency distribution.
A line chart is better for discrete data than a bar chart!
2.3 Histograms.
2.5 Lineplots.
Dotplots. Cumulative frequency is frequency x.
3 Measures of Location.
3.1 The mean.
Sample mean x
.
3.2 The median.

4 Measures of spread.
4.1 The standard deviation.
4.2 Moments.
Sample standard deviation s, sample variance s2 .
Sample moments mk =
i=1
i=1
1X k
1X
xi , m k =
(xi x
)k .
n
n
4.3 The range.

4.4 The interquartile range.
1
2 (Q3 Q1 ).
More often people use the semi-interquartile range SIQ =
5 Symmetry and skewness.

5.1 Boxplots.

Lecture 2: Introduction to Probability
1 Introduction to sets.
Sample space S, event A.
1.1 Complementary sets.

1.2 Set operations.
A . More usually Ac might be used.
Union and intersection .
2 Probability axioms and the addition rule.

2.1 Basic probability axioms.
P{A B} = P{A} + P{B}.
2.2 The addition rule.
P{S} = 1, 0 P{A} 1. If A and B are mutually exclusive,
In general P{A B} = P{A} + P{B} P{A B}.
3 Conditional probability.
3.1 Independent events.
P{A|B} =
P{A B}
.
P{B}
A and B are independent if and only if P{A B} = P{A} P{B}.
3.2 Theorem of total probability.
P{A} =
n
X
j=1
3.3 Bayes Theorem.
P{Ei |A} =
P{A Ej }.
P{A|Ei } P{Ei }
.
n
X
P{A|Ej } P{Ej }
j=1

Lecture 3: Random Variables
1 Discrete random variables. Random variable X. Probability function fX (x) = P{X = x};
notation pX (x) would be better!
X
Cumulative distribution function (cdf) FX (x) = P{X x} =
fX (xi ).
xi x
2 Continuous random variables. Probability density function (pdf) fX (x).

Z x
Z b
dFX (x)
fX (t)dt, so fX (x) =
fX (x)dx. Cdf FX (x) =
P{a X b} =
.
dx
a
3 Expected values.
3.1 Mean.
E[X] or . More generally E[g(X)].
3.2 Variance and standard deviation.

notation!) Standard deviation .
3.3 Linear functions of X.
3.4 Moments.
using 3 / 3 .
Variance V[X] often denoted 2 . (I prefer Var[X] as
E[aX + b] = a + b. V[aX + b] = a2 2 .
k = E[(X )k ] is kth central moment of X about . Can measure skewness
4 Functions of a random variable.

4.1 Discrete random variables.
y1 = u(x1 ).
Y = u(X).
If have a 11 mapping, P{Y = y1 } = P{X = x1 } where
4.2 Continuous random variables.

dx
Y = u(X) so X = w(Y ). fY (y) = fX (x) .
dy

Lecture 4: Probability Distributions I
2 Discrete distributions.
2.1 Uniform distribution.
2.2 Bernoulli distribution.
P{X = x} =
1
k
for x = 1, 2, . . . , k.
Bernoulli trial.
2.3 Binomial distribution. If X is number of successes in n Bernoulli trials, X Bin(n, );

P{X = x} = nx x (1 )nx for x = 0, 1, 2, . . . , n.
2.4 Geometric distribution. If X is number of Bernoulli trials until first success, X

geometric(); P{X = x} = (1 )x1 for x = 1, 2, 3, . . ..
2.7 Poisson distribution.
If X Poisson(), then P{X = x} =
x e
for x = 0, 1, 2, . . . .
x!

Lecture 5: Probability Distributions II
3 Continuous distributions.
1
for < x < .
3.1 Uniform distribution.
If X uniform(, ), fX (x) =
3.2 Gamma distribution.

(n 1)! for n Z.
Gamma function (), (1) = 1, () = ( 1)( 1), (n) =
x1 ex
for x > 0. E[X] = , V[X] = 2 .
()
Exponential distribution: X exponential() gamma(1, ).

Chi-squared distribution: X 2 gamma( 21 , = 12 ).
If X gamma(, ), fX (x) =
3.3 Beta distribution. (NOT needed for the exam.)

x < 1 where B(, ) =
Mean is =
fX (x) =
x1 (1 x)1
for 0 <
B(, )
()()
.
( + )
. Variance is =
.
+
( + )2 ( + + 1)
1
N(, 2 ),
(x )2
2 2
exp
3.4 Normal distribution. If X
fX (x) =
2 2
.
X
If Z =
, then Z N(0, 1). Values P{Z < z} = (z) are tabulated.
3.6 t-distribution.
If X 2 and Z N(0, 1) independently, then T = p
3.7 F-distribution.
If X 2n1 and Y 2n2 and are independent, then F =
X/
for < x <
t .
X/n1
Fn1 ,n2 .
Y /n2

Lecture 6: Generating Functions
1 Probability generating functions. If X is discrete taking value k with probability pk
X
X
p k tk .
(k = 0, 1, 2, . . .), then GX (t) = E[t ] =
k=0
GX (1) = 1, GX (0) = p0 .
1.1 Important examples.

mial). Poisson.
Uniform. Binomial. Geometric (as special case of negative bino-
1.2 Evaluating moments.
GX (t)
GX (t) =
X
k=2
k(k 1)pk tk2 so GX (1) =
kpk t
k1
so
GX (1)
k=1
X
k=2
kpk = E[X].
k=1
k(k 1)pk = E[X(X 1)].
2 Moment generating function.

mX (t) = E[etX ].
Z
In continuous case mX (t) = etx fX (x)dx.
x
Z
Z
r tx
(r)
(r)
m (t) = x e fX (x)dx so m (0) = xr fX (x)dx = E[X r ].
x
2.1 Important examples.
gamma(, ). N(, 2 ).
4 Linear functions. If Y = a + bX, GY (t) = E[tY ] = E[ta+bX ] = ta E[(tb )X ] = ta GX (tb ).

Similarly, mY (t) = eat mX (bt).

Lecture 7: Joint Distributions I
1 Joint distributions.
1.1 Joint probability (density) functions.
Discrete case f (x, y) = P{X = x, Y = y}, though a better notation might
Z xp2X,Y (x, y) as in 1.3!
Z y2 be
f (x, y)dxdy.
Continuous case f (x, y) is joint pdf. P{x1 < X < x2 , y1 < Y < y2 } =
y1
x1
1.2 Marginal probability (density)

X functions.
Discrete case fX (x) = P{X = x} =
f (x, y) though a better notation might be pX (x) = P{X = x}
y
as in 1.3!
Continuous case, marginal pdf is fX (x) =
f (x, y)dy.
y
P{A B}
.
P{B}
fX,Y (x, y)
pX,Y (x, y)
. Continuous case fX|Y =y (x|y) =
.
Discrete case P{X = x|Y = y} =
pY (y)
fY (y)
1.3 Conditional probability (density) functions.
Recall P{A|B} =
1.4 Independence of random variables.

If X and Y are independent, then fX,Y (x, y) = fX (x)fY (y) for all x and y.
If X and Y are independent, then g(X) and h(Y ) will be independent.
2 Expectations of functions of two random variables.
XX
Discrete case E[g(X, Y )] =
g(x, y)P{X = x, Y = y}.
x
y
Z Z
g(x, y)fX,Y (x, y)dxdy.
Continuous case E[g(X, Y )] =
2.1 Expectations.
2.2 Expectations of sums and products. E[ag(X) + bh(Y )] = aE[g(X)] + bE[h(Y )].
If X and Y are independent, E[g(X)h(Y )] = E[g(X)]E[h(Y )].

Lecture 8: Joint Distributions II
2.3 Covariance and correlation coefficient.
cov(X, Y ) = E[(X X )(Y Y )] = E[XY ] X Y . cov(X, X) = V[X].
cov(X, Y )
, often denoted , 1 1.
corr(X, Y ) = p
V[X]V[Y ]
If = 0, X and Y are uncorrelated.
2.3.1 Useful results on handling covariances.
cov(aX + b, cY + d) = ac cov(X, Y ). cov(X, Y + Z) = cov(X, Y ) + cov(X, Z).
If X and Y are independent, cov(X, Y ) = 0. Converse not necessarily true.
2.4 Variance of a sum.
V[X + Y ] = V[X] + V[Y ] + 2cov(X, Y ).
3 Convolutions. Suppose
XZ =X +Y.
Discrete case P{Z = z} =
P{X = x, Y = z x}.
x Z
Continuous case fZ (Z = z) = fX,Y (x, y = z x)dx.
x
Simplifies in independence case.

Lecture 9: More on generating functions
References: CT3 Units 5 and 6.
3 (Unit 5) Cumulant generating function.
(0) = E[X]. C (0) = V[X].
CX (t) = log mX (t). CX
X
2
t
t3
CX (t) = 1 t + 2 + 3 + .
2!
3!
3.1
of linear combinations of random variables.
" n(Unit#6) Moments
n
X
X
ci E[Xi ].
ci Xi =
E
i=1
i=1
#
" n
n
X
XX
X
c2i V[Xi ] + 2
ci cj cov(Xi , Xj ). Special case is (mutual) independence case.
ci Xi =
V
i=1
i=1
1i<jn
3.2 (Unit 6) Distributions of linear combinations of independent random variables.

Discrete case via probability generating function (pgf). Let S = c1 X + c2 Y .
GS (t) = E[tc1 X+c2 Y ] = E[(tc1 )X ]E[(tc2 )Y ] = GX (tc1 )GY (tc2 ).
Binomial case: If X Bin(m, ) and Y Bin(n, ), X + Y Bin(m + n, ).
Poisson case: If X Poisson() and Y Poisson(), then X + Y Poisson( + ).
Continuous case via moment generating functions (mgf). Let S = c1 X + c2 Y .
mS (t) = E[e(c1 X+c2 Y )t ] = E[e(c1 t)X ]E[e(c2 t)Y ] = mX (c1 t)mY (c2 t).
k
X
ind
Xi gamma(k, ).
Exponential case: If Xi exponential(), i = 1, 2, . . . , k,
i=1
Gamma case: If X gamma(, ) and Y gamma(, ), then X + Y gamma( + , ).

Chi-square case: If X 2m and Y 2n , then X + Y 2m+n .
2 ) and Y N( , 2 ), then X + Y N( + , 2 + 2 ).
Normal case: If X N(X , X
Y
X
Y
Y
X
Y
10

Lecture 10: Central Limit Theorem
1 The central limit theorem. For X1 , X2 , . . . , Xn iid with common mean and common

X
N(0, 1) for large n.
variance 2 (< ),
/ n
2 Normal approximations.
2.1 Binomial distribution.
2.2 Poisson distribution.
If X Bin(n, ) and n is large, X N(n, n(1 ).

ind
If Xi Poisson(), from lecture 8,
large n, this is approximately N(n, n).

2.3 Gamma distribution.
ind
n
X
i=1
From lecture 8, if Xi exponential(),
For large n, this is approximately N(n/, n/2 ).

Since 2k gamma( 21 k, 12 ), 2k N(k, 2k) for large k.
Xi Poisson(n). For
n
X
i=1
Xi gamma(n, ).
2
3 The continuity
correction.
) If X Poisson() and is large, X N(, = ).
(
x + 12
, where Z N(0, 1).
P{X x} P Z
11

Lecture 11: Conditional Expectation
1 The conditional expectation E[Y |X = x].
2 The random variable E[Y |X].
value of a random variable g(X).
E[E[Y |X]] = E[Y ].
If g(x) = E[Y |X = x], then consider this as the observed
3 The random variable V[Y |X] and the E[V] + V[E] result.
V[Y |X = x] = E[{Y g(x)}2 |x] = E[Y 2 |X = x] g(x)2 .
V[Y |X] = E[{Y g(X)}2 |X] = E[Y 2 |X] g(X)2 .
E[V[Y |X]] = E[E[Y 2 |X]] E[g(X)2 ] = E[Y 2 ] E[g(X)2 ] so E[Y 2 ] = E[V[Y |X]] + E[g(X)2 ].
V[Y ] = E[Y 2 ] {E[Y ]}2 = E[V[Y |X]] + E[g(X)2 ] {E[g(X)]}2 = E[V[Y |X]] + V[g(X)] so that
V[Y ] = E[V[Y |X]] + V[E[Y |X]].
5 Compound distributions. S = X1 + X2 + + XN .
2 .
E[S|N = n] = nX , V[S|N = n] = nX
E[S] = E[E[S|N = n]] = E[N X ] = E[N ]X .
2 ] + V[N ] = 2 + 2 2 .
V[S] = E[V[S|N ]] + V[E[S|N ]] = E[N X
X
N X
N X
tS
tS
Mgf of S: mS (t) = E[e ] = E[E[e |N ]].
E[etS |N = n] = {mX (t)}n .
mS (t) = E[{mX (t)}N ] = GN (mX (t)) in terms of pgf of N .
12

Lecture 12: Poisson process and simulating random variables
4 The Poisson process. Point-like events ocur randomly and independently in time at an
average rate per unit time. Let N (t) be the number of events in interval [0, t] with N (0) = 0.
Let pn (t) = P{N (t) = n}.
pn (t + h) pn1 (t)[h] + pn (t)[1 h] so pn (t + h) pn (t) h[pn1 (t) pn (t)].
As h 0, pn (t) = [pn1 (t) pn (t)].
Similarly p0 (t) = p0 (t).
X
sn pn (t).
Define G(s, t) =
n=0
Thus G (s, t) = sG(s, t) G(s, t) so log G(s, t) = t(s 1) as G(s, 0) = 1.

Thus G(s, t) = exp t(s 1) so N (t) Poisson(t).
Time T1 to first event satisfies T1 exponential().
Time between events has the same exponential distribution.
5 Random number simulation.
5.1 Basic simulation method.
method.
Generate U uniform(0, 1). Then use inverse transformation
5.2 Continuous distributions.

example.
If F (x) = P{X x}, let x = F 1 (u). X exponential()
5.3 Discrete distributions.
13

Lecture 13: Sampling and Statistical Inference I
1 Basic definitions.
2 Moments of the sample mean and variance.
n
2.1 The sample mean.
2
X
= 1
= , V[X]
= .
X
Xi . E[X]
n
n
i=1
2.2 The sample variance.
1 X
2= 1
(Xi X)
S =
n1
n1
2
i=1
n
X
Xi2
i=1
nX
3 Sampling distributions for the normal.

3.1 The sample mean.
ind
If Xi N(, 2 ), then Z =
In general Z =

X
N(0, 1) for large n.
/ n

X
N(0, 1) for all n.
/ n
(n 1)S 2
ind
If Xi N(, 2 ), then
2n1 .
2

U = 2k tabulated values k () satisfy P U > 2k () = .
3.2 The sample variance.
3.3 Independence of the sample mean and variance.
14
Xi N(, 2 ) case.
. E[S 2 ] = 2 .

Lecture 14: Sampling and Statistical Inference II
N(0, 1)
for independent N(0, 1) and 2k distributions.
tk = q
2
k /k

X
ind
tn1 .
If Xi N(, 2 ), then t =
S/ n
Properties of tk distribution. Tabulated values tk () satisfy P{T > tk ()} = for T tk . Also
P{T < tk ()} =
4 The t distribution.
5 The F result for variance ratios.

U/1
F1 ,2 .
V /2
If S1 and S2 are based on samples of size n1 and n2 respectively from normal populations with
S 2 / 2
variances 12 and 22 respectively, then F = 12 12 Fn1 1,n2 1 .
S2 /2

1
= .
Tabulated values F1 ,2 () satisfy P{F > F1 ,2 ()} = for F F1 ,2 . Also P F <
F2 ,1 ()
If U 21 and V 22 are independent, then F =
15

Lecture 15: Point Estimation I
1 The method of moments.
1.1 The one-parameter case.
1.2 The two-parameter case.
3 Unbiasedness. Let g(X) be an estimator of a parameter .
Bias is Bias(g(X)) = E[g(X)] .
Unbiasedness.

4 Mean square error. Let g(X) be an estimator of a parameter . M SE(g(X) = E (g(X) )2 .
M SE(g(X) = V[g(X)] + Bias(g(X))2 .
16

Lecture 16: Point Estimation II
2 The method of maximum likelihood.
2.1 The one-parameter case.
Likelihood L() =
n
Y
f (xi ; ).
i=1
2.1.1 Example.
2.2 The two-parameter case.
5 Asymptotic distribution of MLE.
"
For large n, N(, v) where v =
nE
2 # .
log f (X; )
17

Lecture 17: Confidence Intervals I
1 Confidence intervals in general.
n
o
Want values 1 and 2 such that P 1 < < 2 = 0.95.
2 Distribution of confidence intervals.

2.1 The pivotal method. Look for a pivotal quantity g(X, ) such that, for example, if
g(X, ) increases as increases, then g(X, ) < g2 < 2 and g1 < g(X, ) 1 < .

X
ind
.
For example, if Xi N(, 2 ), then g(X, ) =
/ n
2.2 Confidence
limits.

1.96 .
The interval X 1.96 , X + 1.96
can be written as X
n
n
n
2.3 Sample size.
3 Confidence intervals for the normal distribution.
3.1 The mean.

X
tn1 (2.5%) S .
tn1 so 95% confidence interval for is X
Recall t =
S/ n
n
3.2 The variance.
(n 1)S 2
(n 1)S 2
(n 1)S 2
< 2 < 2
.
Recall
2n1 so 95% confidence interval for 2 is 2
2
n1 (2.5%)
n1 (97.5%)
18

Lecture 18: Confidence Intervals II
4 Confidence intervals for binomial and Poisson.
4.1 The binomial. Course text a bit muddled here I think! If P{h1 () < X < h2 ()} 0.95,
then P{X h2 ()} 0.025 and P{X h1 ()} 0.025. Now X < h2 () if > 2 (X) and similarly
X > h1 () if < 1 (X). Thus interval is 2 < < 1 , where, for example, P{X x| = 2 } =
0.025.
4.1.1 The normal approximation.
5 Confidence intervals for two sample problems.
5.1 Two normal means.

2
2
1 X
2 N 1 2 , 1 + 2 so
Confidence interval with known variances based on fact that X
n1 n2
X1 X2 (1 2 )
r
N(0, 1).

22
12
n1 + n2
If 12 = 22 = 2 is unknown, can be estimated using s2p =

1 X
2 (1 2 )
X
r
tn1 +n2 2 .

1
1
2
s p n1 + n2
(n1 1)s21 + (n2 1)s22

and then
n1 + n2 2
S12 /S22
S12 /12
F
so
Fn1 1,n2 1 .
n1 1,n2 1
S22 /22
12 /22
Also,
> F1 ,2 ()} = , then
if F F1 ,2 with P{F
2
2
S /S
P Fn1 1,n2 1 (0.975) < 12 22 < Fn1 1,n2 1 (0.025) = 0.95 re-arranges to give
1 /2
2
S1
1
S2
1
2
1
< 12 < 12
where
= Fn2 1,n1 1 (0.025).
2
Fn1 1,n2 1 (0.975)
S2 Fn1 1,n2 1 (0.025)
2
S2 Fn1 1,n2 1 (0.975)
5.2 Two population variances.
Recall
5.3 Two population proportions.

X1 Bin(n1 , 1 ) N(n1 1 , n1 1 (1
1 )) and X2 Bin(n2 , 2 ) N(n2 2 , n2 2 (1 2 )).

(1
)
1 (1 1 ) 2 (1 2 )
X
i
i
i
N i ,
+
for i = 1, 2 and so 1 2 N 1 2 ,
.
Thus i =
ni
ni
n1
n2
!
1 (1 1 ) 2 (1 2 )
+
so can obtain confidence inIn practice we assume 1 2 N 1 2 ,
n1
n2
terval for 1 2 by assuming the variance is known.
6 Paired data.
D
D
tn1 .
SD / n
(NOT needed for the exam.)
19
Form pairs Di = X1i X2i .
Then

Lecture 19: Hypothesis Testing I
1 Hypotheses, test statistics, decisions and errors.
Null hypothesis H0 . Alternative hypothesis H1 . Critical region.
= P{Type I error} = P{Reject H0 when H0 true}.
= P{Type II error} = P{Accept H0 when H0 false}.
Power = 1 = P{Reject H0 when parameter is }.
2 Classical testing, significance and P -values.
2.1 Best tests. Neyman-Pearson Lemma H0 : = 0 vs. H1 : = 1 best test based on
L0
likelihood ratio with critical region C satisfying
k.
L1
2.2 P -values.
P = P{A value occurs as or more extreme than the one observed|H0 true}.
3 Basic tests single parameter.

3.1 Testing the value of a population mean. Testing H0 : = 0 .
0
0
X
X
N(0, 1) or T =
tn1 if H0 true.
Test based on Z =
/ n
S/ n
3.2 Testing the value of population variance.
(n 1)S 2
2n1 if H0 true.
Test based on
02
Testing H0 : 2 = 02 .
3.3 Testing the value of a population proportion. Testing H0 : = 0 .

Test bsed on X Bin(n, 0 ) N(n0 , n0 (1 0 )) if H0 true.
4 Basic tests two independent samples.
4.1 Testing the value of the difference between two population means.
Testing H0 : 1 2 = .
1 X
2
1 X
2
X
X
q
Test based on Z = q 2
N(0,
1)
or
T
=
tn1 +n2 2 if H0 true.
1
22
1
1
S
+
p
n1
n2
n1 + n2
4.2 Testing the value of the ratio of two population variances.
S2
Test based on 12 Fn1 1,n2 1 if H0 true.
S2
Testing H0 : 12 = 22 .
4.3 Testing the value of the difference between two population proportions.
Testing H0 : 1 = 2 (= ).
1 2
Test based on q
if H0 true.
(1
)
(1
)
+
n1
n2
20

Lecture 20: Hypothesis Testing II
7 2 tests.
Test statistic is
X (fi ei )2
where ei are expected values under H0 and fi are observed values.

ei
i
X (Oi Ei )2
!
I prefer the notation
Ei
i
7.1 Goodness of fit.

7.1.1 Degrees of freedom.
Number of groups Number of constraints on ei Number of fitted parameters.
7.1.2 The accuracy of the 2 approximation.
Ensure all ei > 5 by combining groups (cells).
7.1.3 Example.
7.2 Contingency tables.
For r c table, degrees of freedom is (r 1)(c 1).
Row total Column total
Expected frequencies for any cell are
.
Grand Total
Tests of homogeneity and independence not clearly distinguished.
21

Lecture 21: Correlation and Regression I
References: CT3 Unit 12; IEF chapter 2, pages 27-38, 44-51.
0 (CT3) Introduction.
Scatter plot and summary statistics Sxx , Syy , Sxy .
1 (CT3) Correlation analysis.

1.1 (CT3) Data summary.
Sample correlation r = p
1.2 (CT3)
The normal model and inference.
r n2
tn2 .
If = 0,
1 r2

1+r
1+
1
1
1
If W = 2 log
, then W N 2 log
,
.
1r
1 n 3
Can re-write this as W = tanh
r, so that W N tanh
Sxy
.
Sxx Syy

1
.
,
n3
2 (CT3) Regression analysis the simple linear model.

Yi = + xi + ei , i = 1, 2, . . . , n. E[ei ] = 0, V[ei ] = 2 .
2.1 (CT3) Introduction.
2.2 (CT3) Fitting the model.
and minimise q =
n
X
i=1
e2i
n
X
i=1
Sxy
i where
Fitted line is y =
+ x
= y x
and =
.
Sxx
(yi xi )2 . Least squares derivation.
n
2
1 X
= , V[]
= .
2 =
(yi yi )2 .
E[]
Sxx
n2
i=1
2.3
the
X (CT3) Partitioning
X
Xvariability of the responses.
(yi y)2 =
(yi yi )2 +
(
yi y)2 . SST OT = SSRES + SSREG.
i
2
Sxy
SST OT = Syy . SSREG =
.
Sxx
2
E[SST OT ] = (n 1) , E[SSREG] = 2 + 2 Sxx , E[SSRES] = (n 2) 2 .
SSREG
.
Coefficient of determination R2 =
SST OT
Cases where line closely fits the data and where line a poor fit to the data.
22

Lecture 22: Correlation and Regression II
References: CT3 Unit 12; IEF chapter 2, pages 38-39, 51-66.
2.4 (CT3) The full
normal
model and inference. 2
2
(n 2)
Assumptions. N ,
2n2 .
, independently of
Sxx
2
2.5 (CT3)p
Inferences on the slope parameter .
( )/ 2 /Sxx

p
=s
tn2 .
((n 2)
2 / 2 )/(n 2)
2
Sxx
2.6 (CT3) Estimating a mean response and predicting an individual

response.

(x0 x
)2
1
0 . E[
+
2 .
0 = E[Y |x0 ] = + x0 estimated by
0 =
+ x
0 ] = 0 . V[
0 ] =
n
Sxx
0
p0
tn2 .
V[
0 ]
0.
Estimate individual response y0 = + x0 + e0 , estimated by y0 =
+ x
y
0
0
Since E[y0 y0 ] = 0, V[y0 y0 ] = 2 + V[
0 ], so s
tn2 .
2
1
(x
)
0
2 1 + +
n
Sxx
2.7 (CT3) Checking the model.
Residuals ei = yi yi . Residual plot ei vs. xi .
2.8 (CT3) Extending the scope of the linear model. (NOT needed for the exam.)
Transformations to give linearity.
3 (CT3) The multiple linear regression model. (NOT needed for the exam.)
+ 1 xi1 + k xik + ei , i = 1, 2, . . . , n.
23
Yi =

Lecture 23: Analysis of Variance
0 (CT3) Introduction.
1 (CT3) One-way analysis of variance.
1.1 (CT3) The model.
Yij = + i + eij for i = 1, 2, . . . , k, j = 1, 2, . . . , ni , n =
k
X
ni .
i=1
ind
eij N(0, 2 ) so Yij N( + i , 2 ).

k
X
ni i = 0, then i is the ith treatment effect, and is the overall mean.
If
i=1
ni
k X
X
(Yij i )2 to give
1.2 (CT3) Estimation of the parameters. Minimise q =
i=1 j=1
ni
1 X
(ni 1)Si2 ind 2
= Y, i = Yi Y. If Si2 =

(Yij Yi)2 , then
ni 1 .
ni 1
2
j=1
k
1 X
(n k)
2
2
Hence
=
(ni 1)Si2 satisfies
2nk .
nk
2
i=1
1.3 (CT3) Partitioning the variability.
ni
ni
k
k X
k X
X
X
X
ni (Yi Y)2 ,
(Yij Yi)2 +
(Yij Y)2 =
i=1 j=1
i=1 j=1
i=1
i.e., SST = SSR + SSB where SST is total sum of squares, SSR is residual sum of squares (withintreatments sum of squares), SSB is between-treatments sum of squares.
SSB
M SB
is mean squares
Fk1,nk , where M SB =
If H0 : 1 = 2 = = k = 0 is true, then
M SR
k1
SSR
between treatments and M SR =
is residual mean square.
nk
1.4 (CT3) Example.
1.5 (CT3) Checking the model.
Residuals are rij = eij = Yij

i = Yij Yi.
1.6 (CT3) Estimating the treatment means.
95% confidence interval for + i is Yi tnk (2.5%) .

ni
95% confidence interval for i j is Yi Yj tnk (2.5%)
1
1
+ .
ni nj
1.7 (CT3) Further comments. Linear regression model Yi = a + bxi + ei , i = 1, 2, . . . , n can

n
n
n
X
X
X
(Yi Y )2 , i.e., SST = SSR + SSREG .
(Yi Yi )2 +
(Yi Y )2 =
be analysed as
i=1
i=1
SSREG
F1,n2 .
If H0 : b = 0 is true,
SSR /(n 2)
i=1
24

Lecture 24: Univariate Time Series Analysis and Forecasting I
References: IEF chapter 5, pages 206-223.
5.1 Introduction.
5.2 Some notation and concepts.
5.2.2 A weakly stationary process. E[Yt ] = , V[Yt ] = 2 , cov(Yt1 , Yt2 ) = t1 t2 .
s
Autocovariance function is cov(Yt , Yts ) = s . Autocorrelation function is s = .
0
5.2.3 A white noise process.
ind
E[Yt ] = , V[Yt ] = 2 , s = 0 for s 6= 0.
If Yt N(, 2 ) for t = 1, 2, . . . , T , s N(0, 1/T ).
Box-Pierce test that 1 = = m = 0; if H0 true, Q = T
m
X
k=1
k2 2m .
ind
5.3 Moving average processes. MA(q) is Yt = +Ut +1 Ut1 + +q Utq with Ut (0, 2 ).
Backshift operator L (Most textbooks would use B!) satisfies LYt = Yt1 .
Thus Yt = + (L)Ut with (L) = 1 + 1 L + + q Lq .
V[Yt ] = 0 = (1 + 12 + + q2 ) 2 , s = (s + s+1 1 + + q qs ) 2 for s = 1, 2, . . . , q.
Example 5.2.
5.4 Autoregressive processes. AR(p) is Yt = + 1 Yt1 + 2 Yt2 + + p Ytp + Ut .
Can write as (L)Yt = + Ut with (L) = 1 1 L p Lp .
5.4.1 The stationarity condition. AR(p) process stationary if roots of (z) = 0 lie outside
unit circle. Can then write AR(p) process as MA() Yt = 1 (L)Ut .
Example 5.3.
5.4.2 Wolds decomposition theorem. (NOT needed for the exam.)
All we really need here is that (1 1 2 p )E[Yt ] = and the autocorrelation function
satisfies the Yule-Walker equations r = 1r 1 + 2r 2 + + pr p for r = 1, 2, . . . , p with
s = s .
Example 5.4.
5.5 The partial autocorrelation function. The pacf kk can be found from fitting the model
Yt = + k,1 Yt1 + k,k1 Ytk+1 + kk Ytk + Ut .
5.5.1 The invertibility condition. MA(q) process is invertible if roots of (z) = 0 lie outside
the unit circle. The process Yt = (L)Ut can then be written as an AR() process 1 (L)Yt = Ut .
25

Lecture 25: Univariate Time Series Analysis and Forecasting II
References: IEF chapter 5, pages 223-238, 247-251.
5.6 ARMA processes. ARMA(p, q) process is (L)Yt = + (L)Ut .
Mean satisfies (1 1 2 p )E[Yt ] = .
AR(p) process: acf (oscillatory) geometric decay, pacf zero after lag p.
MA(q) process: acf zero after lag q, pacf (oscillatory) geometric decay.
ARMA(p, q) process: acf like AR, pacf like MA.
5.6.1 Sample acf and pacf plots for standard processes. Correlogram (acf plot) has lines
drawn at 1.96/ n to indicate significant k ; pacf plot also has lines drawn at 1.96/ n.
5.7 Building ARMA models: the Box-Jenkins approach. Determine model order; estimate parameters; check model validity. Parsimonious models best!
5.7.1 Information criteria for ARIMA model selection.
5.7.3 ARIMA modelling.
AIC widely used (with SBIC).
Data differenced to give stationarity.
5.8 Constructing ARMA models in EViews. (NOT needed for the exam.) We use R.
5.11.4 Forecasting with time series versus structural models. (NOT needed for the
exam.) Conditional expectation is E[Yt+1 |t ] = E[Yt+1 |Y1 , Y2 , . . . , Yt ].
5.11.5 Forecasting with ARMA models. (NOT needed for the exam.) ARMA(p, q)
q
p
q
p
X
X
X
X
t+sj
bj U
ai Yt+si +
bj Utj . Forecast at time t + s is Yt+s =
ai Yti +
model Yt =
i=1
i=1
j=1
j=1
k = 0 for k t, U
k = 0 for k > t. IEF uses notation ft,s Yt+s .
where Yk = Yk for k t, U
5.11.6 Forecasting the future value of an MA(q) process. (NOT needed for the
exam.)
MA(3) process Yt = + Ut + 1 Ut1 + 2 Ut2 + 3 Ut3 .
eg, Yt+2 = + Ut+2 + 1 Ut+1 + 2 Ut + 3 Ut1 . Thus E[Yt+2 |t ] = + 2 Ut + 3 Ut1 .
5.11.7 Forecasting the future value of an AR(p) process. (NOT needed for the
exam.)
AR(2) process Yt = + 1 Yt1 + 2 Yt2 + Ut .
eg, Yt+2 = + 1 Yt+1 + 2 Yt + Ut+2 . Thus E[Yt+2 |t ] = + 1 Yt+1 + 2 Yt .
26

Lecture 26: Multivariate Models I
References: IEF chapter 3, pages 88-93; chapter 6, pages 265-271, 276-277.
3.1 Generalising the simple model to multiple linear regression.
3.2 The constant term. Writing the linear model in the form y = X + u.
3.3 How are the parameters (the elements of the vector) calculated?
= (X X)1 X y found by minimising (y X) (y X).
6.1 Motivations.
Structural equations. Reduced form equations.
6.2 Simultaneous equations bias.
6.3 So how can simultaneous equations models be validly estimated?
6.4 Can the original coefficients be retrieved from the s ?
6.4.1 What determines whether an equation is identified or not?
6.8 Estimation procedures for simultaneous equations systems.
6.8.1 Indirect least squares (ILS). (NOT needed for the exam.)
6.8.2 Estimation of just identified and overidentified systems using 2SLS. (NOT
needed for the exam.)
Using R systemfit command.
27

Lecture 27: Multivariate Models II
References: IEF chapter 6, pages 290-293, 294-296, 298, 308-315.
6.11 Vector autoregressive models.
6.11.1 Advantages of VAR modelling.
6.11.2 Problems with VARs.
6.11.3 Choosing the optimal lag length for a VAR.
6.11.5 Information criteria for VAR lag length selection.
6.12 Does the VAR include contemporaneous terms?
6.14 VARs with exogenous variables.
6.17 VAR estimation in EViews. (NOT needed for the exam.)
We use the R package vars.
28

Lecture 28: Cointegration
References: IEF chapter 7, pages 318-329, 335-341.
7.1 Stationarity and unit root testing.
7.1.1 Why are tests for non-stationarity necessary?
7.1.2 Two types of non-stationarity.
7.1.3 Some more definitions and terminology.
7.1.4 Testing for a unit root.
7.3 Cointegration.
7.3.1 Definition of cointegration.
7.4 Equilibrium correction or error correction models.
7.5 Testing for cointegration in regression: a residuals-based approach.
7.6 Methods of parameter estimation in cointegrated systems. (NOT needed for the
exam.)
7.6.1 The Engle-Granger 2-step method. (NOT needed for the exam.)
29

Lecture 29: ARCH Models
References: IEF chapter 8, pages 379-381, 383-384, 385-389.
8.1 Motivations: an excursion into non-linearity land.
8.1.1 Types of non-linear models.
8.2 Models for volatility.
8.3 Historical volatility.
8.6 Autoregressive volatility models.
8.7 Autoregressive conditionally heteroscedastic (ARCH) models.
8.7.1 Another way of expressing ARCH models.
8.7.2 Non-negativity constraints.
8.7.3 Testing for ARCH effects.
8.7.5 Limitations of ARCH(q) models.
30

Lecture 30: GARCH Models
References: IEF chapter 8, pages 392-399.
8.8 Generalised ARCH (GARCH) models.
8.8.1 The unconditional variance under a GARCH specification.
8.9 Estimation of ARCH/GARCH models.
8.9.1 Parameter estimation using maximum likelihood. (NOT needed for the exam.)
8.9.2 Non-normality and maximum likelihood. (NOT needed for the exam.)
31

Stat

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat

Uploaded by

Copyright:

Available Formats

MATH5315 Applied Statistics and Probability 2011-2012

Lecturer: Andrew J. Baczkowski room 9.21i email: sta6ajb@leeds.ac.uk

MATH5315 Applied Statistics and Probability

A line chart is better for discrete data than a bar chart!

Dotplots. Cumulative frequency is frequency x.

3.2 The median.

Sample standard deviation s, sample variance s2 .

4.3 The range.

More often people use the semi-interquartile range SIQ =

5 Symmetry and skewness.

MATH5315 Applied Statistics and Probability

Sample space S, event A.

1.1 Complementary sets.

A . More usually Ac might be used.

Union and intersection .

2 Probability axioms and the addition rule.

P{S} = 1, 0 P{A} 1. If A and B are mutually exclusive,

In general P{A B} = P{A} + P{B} P{A B}.

A and B are independent if and only if P{A B} = P{A} P{B}.

3.2 Theorem of total probability.

3.3 Bayes Theorem.

MATH5315 Applied Statistics and Probability

2 Continuous random variables. Probability density function (pdf) fX (x).

E[X] or . More generally E[g(X)].

3.2 Variance and standard deviation.

Variance V[X] often denoted 2 . (I prefer Var[X] as

k = E[(X )k ] is kth central moment of X about . Can measure skewness

4 Functions of a random variable.

If have a 11 mapping, P{Y = y1 } = P{X = x1 } where

4.2 Continuous random variables.

MATH5315 Applied Statistics and Probability

2.3 Binomial distribution. If X is number of successes in n Bernoulli trials, X Bin(n, );

2.4 Geometric distribution. If X is number of Bernoulli trials until first success, X

2.7 Poisson distribution.

If X Poisson(), then P{X = x} =

MATH5315 Applied Statistics and Probability

3.1 Uniform distribution.

3.2 Gamma distribution.

Gamma function (), (1) = 1, () = ( 1)( 1), (n) =

Exponential distribution: X exponential() gamma(1, ).

3.3 Beta distribution. (NOT needed for the exam.)

If X 2 and Z N(0, 1) independently, then T = p

If X 2n1 and Y 2n2 and are independent, then F =

for < x <

MATH5315 Applied Statistics and Probability

1.1 Important examples.

Uniform. Binomial. Geometric (as special case of negative bino-

1.2 Evaluating moments.

k(k 1)pk tk2 so GX (1) =

k(k 1)pk = E[X(X 1)].

2 Moment generating function.

2.1 Important examples.

4 Linear functions. If Y = a + bX, GY (t) = E[tY ] = E[ta+bX ] = ta E[(tb )X ] = ta GX (tb ).

MATH5315 Applied Statistics and Probability

1.2 Marginal probability (density)

Continuous case, marginal pdf is fX (x) =

1.4 Independence of random variables.

MATH5315 Applied Statistics and Probability

V[X + Y ] = V[X] + V[Y ] + 2cov(X, Y ).

Simplifies in independence case.

MATH5315 Applied Statistics and Probability

3.2 (Unit 6) Distributions of linear combinations of independent random variables.

Gamma case: If X gamma(, ) and Y gamma(, ), then X + Y gamma( + , ).

= Y, i = Yi Y. If Si2 =

95% confidence interval for + i is Yi tnk (2.5%) .