Professional Documents
Culture Documents
Lecture 1
Catching up: two-variable relationships
Irene Mammi
irene.mammi@unive.it
1 / 67
outline
I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapters 1 and 2.
2 / 67
examples of bivariate relationships
3 / 67
examples of bivariate relationships (cont.)
4 / 67
examples of bivariate relationships (cont.)
Figure 3: natural log of gasoline consumption vs natural log of price per gallon
5 / 67
examples of bivariate relationships (cont.)
Figure 4: natural log of gasoline consumption vs natural log of income per capita
6 / 67
examples of bivariate relationships (cont.)
7 / 67
examples of bivariate relationships (cont.)
I data come in the form of n pairs of observations of the form (Xi , Yi ),
i = 1, 2, . . . , n
I when n gets large, we can consider a bivariate frequency distribution
mean of height given chest (inches) 66.31 66.84 67.89 69.16 70.53
mean of chest given height (inches) 38.41 39.19 40.26 40.76 41.8
8 / 67
correlation coefficient
xi = Xi − X yi = Yi − Y
9 / 67
correlation coefficient (cont.)
10 / 67
correlation coefficient (cont.)
I the sign of ∑ni=1 xi yi indicates whether the scatter slopes upward or
downward
I better to express the sum in average terms, giving the sample
covariance:
n n
Cov(X , Y ) = ∑ (Xi − X )(Yi − Y )/n = ∑ xi yi /n
i =1 i =1
Cov(X , Y ) ∑ni=1 xi yi ∑n xi yi
r = p p = q q = i =1
Var(X ) Var(Y ) nsX sY
∑ni=1 xi2 ∑ni=1 yi2
11 / 67
probability models for two variables
discrete bivariate probability distribution
Marginal
X1 ··· Xi ··· Xm
probability
Y1 p11 ··· pi1 ··· pm1 p·1
.. .. .. .. ..
. . . . .
Yj p1j ··· pij ··· pmj p·j
.. .. .. .. ..
. . . . .
Yp p1p ··· pip ··· pmp p·p
Marginal
p1 · ··· pi · ··· pm · 1
probability
12 / 67
probability models for two variables (cont.)
13 / 67
probability models for two variables (cont.)
I the covariance is
14 / 67
probability models for two variables (cont.)
conditional probabilities
15 / 67
probability models for two variables (cont.)
the bivariate normal distribution
I most famous distribution for continuous variables is the bivariate
normal
I when X and Y follow a bivariate normal distribution, the probability
density function (pdf) is given by
1√
f (x , y ) = ×
2πσX σY 1−ρ2
2 2
1 x − µX x − µX y − µY y − µY
exp − 2(1− ρ2 ) σX − 2ρ σX σY + σY
where x and y stand for the values taken by X and Y and ρ is the
correlation coefficient between X and Y
I integrating over y gives the marginal distribution for X
" #
1 x − µX 2
1
f (x ) = p exp − , −∞ < x < ∞
2πσx2 2 σX
f (y |x ) = f (x , y )/f (x )
y − µ 2
Y |X
= √ 1 exp − 12
2πσY |X σ
Y |X
σY2 |X = σY2 (1 − ρ2 )
18 / 67
the two variables linear regression model (cont.)
a conditional model
E(Y |X ) = g (X )
E(Y |X ) = α + βX
19 / 67
the two variables linear regression model (cont.)
20 / 67
the two variables linear regression model (cont.)
21 / 67
the two variables linear regression model (cont.)
22 / 67
the two variables linear regression model (cont.)
least-squares estimators
ei = Yi − Ŷi = Yi − a − bXi i = 1, 2, . . . , n
23 / 67
the two variables linear regression model (cont.)
24 / 67
the two variables linear regression model (cont.)
I taking derivatives of RSS with respect to a and b and setting them to
zero gives
∂(∑i ei2 )
∂a = −2 ∑i (Yi − a − bXi ) = −2 ∑i ei = 0
∂(∑i ei2 )
∂b = −2 ∑i Xi (Yi − a − bXi ) = −2 ∑i Xi ei = 0
∑i Yi = na + b ∑i Xi
∑i Xi Yi = a ∑i Xi + b ∑i Xi2
a = Y − bX
25 / 67
the two variables linear regression model (cont.)
∑i ei2
s2 =
(n − 2)
26 / 67
the two variables linear regression model (cont.)
Yi = Ybi + ei
Yi − Y = (Ybi − Y ) + ei
27 / 67
the two variables linear regression model (cont.)
I ∑(Yi − Y )2 = TSS: total sum of squared deviations in Y
I ∑(Ybi − Y )2 =ESS: explained sum of squares from the regression of
Y on X
I ∑ ei 2 = RSS: residual, or unexplained, sum of squares from the
regression of Y on X
I the previous decomposition can be rewritten as
ESS RSS
R2 = = 1−
TSS TSS
I the closer R 2 to 1, the closer the sample values of Yi to the fitted line
I let r be the sample correlation coefficient then R 2 = r 2
28 / 67
inference in the two variables least-square model
properties of LS estimators
b= ∑ wi Yi
i
Xi −X
where wi = so that the LS slope estimator is linear in the
∑ i ( Xi − X ) 2
Y values
29 / 67
inference in the two variables least-square model (cont.)
I by substituting Yi = α + βXi + ui and using the stochastic properties
of u we have
b = α (∑i wi ) + β (∑i wi Xi ) + ∑i wi ui
= β + ∑ i w i ui
from which
E(b ) = β
that is, b is an unbiased estimator of β
I the variance is
!2
var(b ) = E[(b − β)2 ] = E ∑ wi ui
i
which reduces to
σ2
var(b ) =
∑i (Xi − X )2
30 / 67
inference in the two variables least-square model (cont.)
E(a ) = α
2
" #
2 1 X
var(a ) = σ +
n ∑i (Xi − X )2
I the covariance of the two estimators is
σ2 X
cov(a, b ) = −
∑i (Xi − X )2
31 / 67
inference in the two variables least-square model (cont.)
Gauss-Markov theorem
I the sampling variances of the LS estimators are the smallest that can
be achieved by any linear unbiased estimator
I looking at estimators of β, let
b∗ = ∑ ci Yi
i
32 / 67
inference in the two variables least-square model (cont.)
inference procedures
I up to now results only require the assumption that ui are i.i.d.(0, σ2 )
I inference also requires the assumption of normality
I since linear combination of normal variables are themselves normally
distributed, the sampling distribution of a, b is bivariate normal
I thus
b ∼ N ( β, σ2 / ∑ (Xi − X )2 )
I the standard deviation of the sampling distribution is referred to as
the standard error of b and denoted by se(b )
I the sampling distribution of the intercept term is
2
" !#
2 1 X
a ∼ N α, σ +
n ∑i (Xi − X )2
33 / 67
inference in the two variables least-square model (cont.)
I we would also have
b−β
z= q ∼ N (0, 1)
σ/ ∑i (Xi − X )2
b − β0 b − β0
q =
se(b )
σ/ ∑i (Xi − X )2
∑i ei2
∼ χ2 (n − 2)
σ2
34 / 67
inference in the two variables least-square model (cont.)
I we have
b−β
q ∼ t (n − 2)
s/ ∑i (Xi − X )2
I H0 : β = β 0 would be rejected if
q b − β 0
> t0.025 (n − 2)
s/ ∑i (Xi − X )2
35 / 67
prediction in the two variables regression model
I point prediction is given by the regression value corresponding to X0
where x0 = X0 − X
I the true value of Y for the prediction period or observation is
Y0 = α + βX0 + u0
Y = α + βX + u
I subtracting gives
Y0 = Y + βx0 + u0 − u
I the prediction error is defined as
∑ x02
1
var(e0 ) = σ2 1 + +
n ∑i (Xi − X )2
I replacing σ2 by s 2 gives
Y0 − Y
b0
q ∼ t (n − 2)
s 1 + 1/n + (X0 − X )2 / ∑(Xi − X )2
37 / 67
prediction in the two variables regression model (cont.)
I everything known except Y0 so a 95% confidence interval for Y0 is
s
1 (X0 − X )2
(a + bX0 ) ± t(0.025) s 1 + +
n ∑(Xi − X )2
E(Y0 ) = α + βX0
E(Y |X ) = α + βX
E(ui ) = 0 for all i
E(ui2 ) = σ2 for all i
E(ui uj ) = 0 for all i 6= j
39 / 67
time as a regressor
I many economic variables increase or decrease with time
I a linear trend relationship would be modeled as
Y = α + βT + u
∆Yt = β + (ut − ut −1 )
Yt = Y0 (1 + g )t
Yt = Y0 e βt or lnYt = α + βt
I taking fist differences gives
∆lnYt = β = ln(1 + g ) ≈ g
log-log transformation
Yt = AX β or lnYt = α + βlnX
dY X
I β represents the elasticity of Y with respect to X , ε = dX Y
semilog transformation
lnY = α + βX + u
I β = Y1 dY
dX represents the proportionate change in Y per unit
change in X
42 / 67
lagged dependent variable as regressor
∑ Yt = na + b ∑ Yt −1
∑ Yt Yt −1 = a ∑ Yt −1 + b ∑ Yt2−1
43 / 67
lagged dependent variable as regressor (cont.)
I by repeated substitution obtain
Y1 = α + βY0 + u1
Y2 = α + β(α + βY0 + u1 ) + u2
= α(1 + β) + β2 Y0 + (u2 + βu1 )
Yt = α(1 + β + β2 + . . . + βt −1 )
+ βt Y0 + (ut + βut −1 + β2 ut −2 + . . . + βt −1 u1 )
E(Yt ut ) = σ2
E(Yt ut −1 ) = βσ2
E(Yt ut −2 ) = β2 σ2
44 / 67
lagged dependent variable as regressor (cont.)
45 / 67
intro to asymptotics
46 / 67
intro to asymptotics (cont.)
convergence in probability
σ2
E(x n ) = µ and var(x n ) =
n
so that x n is an unbiased estimator and the variance tends to zero as
n increases
I the distribution of x n becomes more and more concentrated in the
neighborhood of µ as n increases
I define µ ± e to be a neighborhood around µ and
47 / 67
intro to asymptotics (cont.)
I shorthand expression
plim x n = µ
I the sample mean is a consistent estimator of µ
48 / 67
intro to asymptotics (cont.)
convergence in distribution
49 / 67
intro to asymptotics (cont.)
σ2
a
xn ∼ N µ,
n
50 / 67
intro to asymptotics (cont.)
autoregressive equation
Yt = α + βYt −1 + ut
∑ Yt = na + b ∑ Yt −1
∑ Yt Yt −1 = a ∑ Yt −1 + b ∑ Yt2−1
√ √
I it can be proved that n(a − α) and n(b − β) have a bivariate
normal limiting distribution with zero means and finite variances and
covariances
I thus LS estimators are consistent for α and β
I the application of LS formulae to the AR model has an asymptotic, or
large-sample, justification
51 / 67
intro to asymptotics (cont.)
(1) the ut are i.i.d. with zero mean and finite variance
(2) the Yi series is stationary
52 / 67
stationary and nonstationary series
I consider again the AR(1) model
Yt = α + βYt −1 + ut
Yt = α(1 + β + β2 + . . . + βt −1 )
+ βt Y0 + (ut + βut −1 + β2 ut −2 + . . . + βt −1 u1 )
I assume that the process started a very long time ago so that can
rewrite
E(Yt ) = α(1 + β + β2 + . . .)
which only exists if the infinite geometric series on the RHS has a limit
I the necessary and sufficient condition is
| β| < 1
54 / 67
stationary and nonstationary series (cont.)
I the expectation is then
α
E(Yt ) = µ =
1−β
(Yt − µ) = ut + βut −1 + β2 ut −2 + . . .
σ2
var(Y ) = σY2 =
1 − β2
55 / 67
stationary and nonstationary series (cont.)
I the Y series has a constant unconditional variance, independent of
time
I define autocovariance: covariance of Y with a lagged value of itself
I the first-order (first-lag) autocovariance is defined as
γs = βs σY2 s = 0, 1, 2, . . .
56 / 67
stationary and nonstationary series (cont.)
I nb: the autocovariances depend only on the lag length and are
independent of t
I γ0 is the variance: dividing the covariances by the variance gives the
set of autocorrelation coefficients (or serial correlation
coefficients) defined by
ρs = γs /γ0 s = 0, 1, 2, . . .
57 / 67
stationary and nonstationary series (cont.)
58 / 67
stationary and nonstationary series (cont.)
unit root
E(Yt |Y0 ) = αt + Y0
I in the unit root case, the conditional mean and variance of Y do not
exist ⇒ the series is said to be nonstationary, and the asymptotic
results do not hold
I when | β| > 1, the series exhibits explosive behavior
60 / 67
stationary and nonstationary series (cont.)
61 / 67
stationary and nonstationary series (cont.)
62 / 67
maximum likelihood estimation of the AR model
I if some assumptions are made about the specific form of the pdf for
u, it is possible to derive maximum likelihood estimators (MLEs)
of the parameters of the AR model
I MLEs are consistent, asymptotically normal and asymptotically
efficient
I assume that the disturbances ui are i.i.d. N (0, σ2 ) so that the pdf is
1 2 2
f (ui ) = √ e −ui /2σ i = 1, 2, . . . , n
σ 2π
I arbitrary initial value Y0 ; any observed set of sample values
Y1 , Y2 , . . . , Yn is generated by some set of u values
63 / 67
maximum likelihood estimation of the AR model (cont.)
I the probability of a set of u values is
I this density may be interpreted in two ways: (1) for given α, β, and
σ2 it indicates the probability of a set of sample outcomes; (2) it is a
function of α, β, and σ2 , conditional on a set of sample outcomes
64 / 67
maximum likelihood estimation of the AR model (cont.)
I for interpretation (2), we refer to the density as likelihood function:
` = lnL
65 / 67
maximum likelihood estimation of the AR model (cont.)
I since ` is a monotonic transformation of L, the MLEs may equally be
obtained by solving
∂` ∂` ∂`
= = 2 =0
∂α ∂β ∂σ
I for the AR model, the log-likelihood (conditional on Y0 ) is
n
n n 1
` = − ln(2π ) − lnσ2 − 2
2 2 2σ ∑ (Yt − α − βYt −1 )2
t =1
66 / 67
maximum likelihood estimation of the AR model (cont.)
properties of MLEs
67 / 67