Allchap6 PDF

Copyright 1996 Lawrence C.
Marsh
1.0
PowerPoint Slides for

Undergraduate Econometrics
by
Lawrence C. Marsh
To accompany: Undergraduate Econometrics

by Carter Hill, William Griffiths and George Judge
Publisher: John Wiley & Sons, 1997
Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh
1.1 1.2
Chapter 1
The Role of Econometrics
The Role of
Using Information:
Econometrics
1. Information from economic theory.
in Economic Analysis
2. Information from economic data.
Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department,
John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these
programs or from the use of the information contained herein.

1.3 1.4
Understanding Economic Relationships: Economic Decisions
Dow-Jones money supply
federal Stock Index
budget To use information effectively:
short term
treasury bills
inflation trade
deficit
unemployment
Federal Reserve
Discount Rate
economic theory
economic data } economic
decisions
power of
labor unions capital gains tax *Econometrics* helps us combine
rent economic theory and economic data .
control
crime rate laws

1.5 1.6
The Consumption Function demand, qd, for an individual commodity:
qd = f( p, pc, ps, i ) demand

Consumption, c, is some function of income, i : p = own price; pc = price of complements;
ps = price of substitutes; i = income
c = f(i)
supply, qs, of an individual commodity:
For applied econometric analysis qs = f( p, pc, pf ) supply

this consumption function must be
specified more precisely. p = own price; pc = price of competitive products;
ps = price of substitutes; pf = price of factor inputs
1
1.7 1.8
How much ?
Answering the How Much? question
Listing the variables in an economic relationship is not enough.
For effective policy we must know the amount of change

needed for a policy instrument to bring about the desired Need to estimate parameters
effect: that are both:
• By how much should the Federal Reserve
raise interest rates to prevent inflation?
1. unknown
and
• By how much can the price of football tickets 2. unobservable
be increased and still fill the stadium?

1.9 1.10
The Statistical Model
The Statistical Model
Actual vs. Predicted Consumption:

Average or systematic behavior
over many individuals or many firms. Actual = systematic part + random error
Not a single individual or single firm. Consumption, c, is function, f, of income, i, with error, e:
Economists are concerned with the c = f(i) + e

unemployment rate and not whether
a particular individual gets a job. Systematic part provides prediction, f(i),
but actual will miss by random error, e.

The Consumption Function 1.11 1.12
The Econometric Model
c = f(i) + e
Need to define f(i) in some way.
y = β1 + β2 X2 + β3 X3 + e
To make consumption, c,
a linear function of income, i : • Dependent variable , y, is focus of study
(predict or explain changes in dependent variable).
f(i) = β1 + β2 i
• Explanatory variables, X2 and X3, help us explain
The statistical model then becomes:
observed changes in the dependent variable.
c = β1 + β2 i + e
2
1.13 1.14
Statistical Models Econometric model
Controlled (experimental)
vs. • economic model
Uncontrolled (observational) economic variables and parameters.
Controlled experiment (“pure” science) explaining mass, y :

• statistical model
pressure, X2, held constant when varying temperature, X3, sampling process with its parameters.
and vice versa.
• data
Uncontrolled experiment (econometrics) explaining consump- observed values of the variables.
tion, y : price, X2, and income, X3, vary at the same time.

1.15 1.16
The Practice of Econometrics
Note: the textbook uses the following symbol
• Uncertainty regarding an outcome. to mark sections with advanced material :
• Relationships suggested by economic theory.
• Assumptions and hypotheses to be specified.
• Sampling process including functional form.
• Obtaining data for the analysis.
• Estimation rule with good statistical properties.
• Fit and test model using software package.
• Analyze and evaluate implications of the results.
• Problems suggest approaches for further research. “Skippy”

2.1 2.2
Chapter 2 Random Variable
Some Basic random variable:

A variable whose value is unknown until it is observed.
Probability The value of a random variable results from an experiment.
Concepts The term random variable implies the existence of some

known or unknown probability distribution defined over
the set of all possible values of that variable.
that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the In contrast, an arbitrary variable does not have a
John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution probability distribution associated with its values.
3
2.3 2.4
Discrete Random Variable
Controlled experiment values
of explanatory variables are chosen discrete random variable:
with great care in accordance with A discrete random variable can take only a finite
an appropriate experimental design. number of values, that can be counted by using
the positive integers.
Example: Prize money from the following

Uncontrolled experiment values lottery is a discrete random variable:
first prize: $1,000
of explanatory variables consist of second prize: $50
nonexperimental observations over third prize: $5.75
which the analyst has no control. since it has only four (a finite number)
(count: 1,2,3,4) of possible outcomes:
$0.00; $5.75; $50.00; $1,000.00

2.5 2.6
Continuous Random Variable Dummy Variable
continuous random variable:
A continuous random variable can take A discrete random variable that is restricted
any real value (not just whole numbers) to two possible values (usually 0 and 1) is
in at least one interval on the real line. called a dummy variable (also, binary or
Examples:
indicator variable).
Gross national product (GNP)
money supply Dummy variables account for qualitative differences:
interest rates gender (0=male, 1=female),
price of eggs race (0=white, 1=nonwhite),
household income citizenship (0=U.S., 1=not U.S.),
expenditure on clothing income class (0=poor, 1=rich).

2.7 2.8
A list of all of the possible values taken A discrete random variable X
by a discrete random variable along with has pdf, f(x), which is the probability
their chances of occurring is called a probability that X takes on the value x.
function or probability density function (pdf).
f(x) = P(X=x)
die x f(x)
one dot 1 1/6
two dots 2 1/6 Therefore, 0 < f(x) < 1
three dots 3 1/6
four dots 4 1/6 If X takes on the n values: x1, x2, . . . , xn,
five dots 5 1/6 then f(x1) + f(x2)+. . .+f(xn) = 1.
six dots 6 1/6
4
2.9 2.10
Probability, f(x), for a discrete random A continuous random variable uses
variable, X, can be represented by height: area under a curve rather than the
height, f(x), to represent probability:
0.4
f(x) 0.3 f(x)
0.2
0.1 red area
green area 0.1324
0.8676
0 1 2 3 X
number, X, on Dean’s List of three roommates . .
$34,000 $55,000 X
per capita income, X, in the United States

2.11 2.12
Since a continuous random variable has an The area under a curve is the integral of
uncountably infinite number of values, the equation that generates the curve:
the probability of one occurring is zero.
b
P[X=a] = P[a<X<a]=0 P[a<X<b]= ∫ f(x) dx
a
Probability is represented by area.

For continuous random variables it is the
Height alone has no area.
integral of f(x), and not f(x) itself, which
An interval for X is needed to get defines the area and, therefore, the probability.
an area under the curve.

2.13 2.14
Rules of Summation Rules of Summation (continued)
n n n n
Rule 1: Σ xi = x1 + x2 + . . . + xn
i=1
Rule 4: Σ (axi + byi) = a Σ xi + b Σ yi
i=1 i=1 i=1
n n
Rule 2: Σ axi = ai Σ xi n x1 + x2 + . . . + xn
n iΣ
Rule 5: x = 1 xi =
i=1 =1
=1 n
n n n
Rule 3: Σ (xi + yi) = Σ xi + Σ yi
i=1
The definition of x as given in Rule 5 implies
i=1 i=1
the following important fact:
Note that summation is a linear operator n

which means it operates term by term. Σ (xi − x) = 0
i=1
5
2.15 2.16
Rules of Summation (continued) The Mean of a Random Variable
n
Rule 6: Σ f(xi) = f(x1) + f(x2) + . . . + f(xn)
i=1
n The mean or arithmetic average of a

Notation: Σx f(xi) = Σi f(xi) = i =Σ1 f(xi) random variable is its mathematical
expectation or expected value, EX.
n m n
Rule 7: Σ Σ f(xi,yj) = Σ [ f(xi,y1) + f(xi,y2)+. . .+ f(xi,ym)]
i=1 j=1 i=1
The order of summation does not matter :

n m m n
Σ Σ f(xi,yj) = Σ Σ f(xi,yj)
i=1 j=1 j=1 i=1

2.17 2.18
Expected Value Expected Value
There are two entirely different, but mathematically
equivalent, ways of determining the expected value: 2. Analytically:
The expected value of a discrete random
variable, X, is determined by weighting all
1. Empirically:
the possible values of X by the corresponding
The expected value of a random variable, X,
probability density function values, f(x), and
is the average value of the random variable in an
summing them up.
infinite number of repetitions of the experiment.
In other words:
In other words, draw an infinite number of samples,
and average the values of X that you get. E[X] = x1f(x1) + x2f(x2) + . . . + x nf(xn)

2.19 2.20
Empirical vs. Analytical
Empirical (sample) mean:
As sample size goes to infinity, the n
x = Σ xi
empirical and analytical methods i=1
will produce the same value. where n is the number of sample observations.
Analytical mean:
In the empirical case when the n
sample goes to infinity the values E[X] = Σ x i f(xi)
i=1
of X occur with a frequency where n is the number of possible values of xi.
equal to the corresponding f(x)
in the analytical expression. Notice how the meaning of n changes.
6
2.21 2.22
The expected value of X:
n EX = 0 (.1) + 1 (.3) + 2 (.3) + 3 (.2) + 4 (.1)
EX = Σ xi f(xi) = 1.9
i=1
2 2 2 2 2 2
The expected value of X-squared: EX = 0 (.1) + 1 (.3) + 2 (.3) + 3 (.2) + 4 (.1)
n
Σ
2 = 0 + .3 + 1.2 + 1.8 + 1.6
EX = xi2 f(xi)
i=1 = 4.9
It is important to notice that f(xi) does not change!
3 3 3 3 3 3
The expected value of X-cubed: EX = 0 (.1) + 1 (.3) + 2 (.3) + 3 (.2) +4 (.1)
n
= 0 + .3 + 2.4 + 5.4 + 6.4
Σ
3
3
EX = xi f(xi)
i=1 = 14.5

2.23 2.24
n Adding and Subtracting
E [g(X)] = Σ g(xi) f(xi)
i=1 Random Variables
g(X) = g1(X) + g2(X)
E [g(X)] =
n
Σ [g1(xi) + g2(xi)] f(xi)
i=1
E(X+Y) = E(X) + E(Y)
n n
E [g(X)] = Σ g1(xi) f(xi) +i =Σ1 g2(xi) f(xi)
i=1
E [g(X)] = E [g1(X)] + E [g2(X)]

E(X-Y) = E(X) - E(Y)

2.25 2.26
Variance
Adding a constant to a variable will
add a constant to its expected value: var(X) = average squared deviations
around the mean of X.
E(X+a) = E(X) + a
var(X) = expected value of the squared deviations
Multiplying by constant will multiply around the expected value of X.
its expected value by that constant:
2
E(bX) = b E(X) var(X) = E [(X - EX) ]
7
2
2.27 2.28
var(X) = E [(X - EX) ]
2
variance of a discrete
var(X) = E [(X - EX) ] random variable, X:
2 2
= E [X - 2XEX + (EX) ]
2 2
= E(X ) - 2 EX EX + E (EX) n
∑(x i - EX ) f(x i)
2 2
2 2
= E(X ) - 2 (EX) + (EX) var ( X) =
2 2
= E(X ) - (EX) i=1
2 2
var(X) = E(X ) - (EX) standard deviation is square root of variance

2.29 2.30
calculate the variance for a
discrete random variable, X:
2
Z = a + cX
xi f(xi) (xi - EX) (xi - EX) f(xi)
var(Z) = var(a + cX)
2 .1 2 - 4.3 = -2.3 5.29 (.1) = .529 2
3 .3 3 - 4.3 = -1.3 1.69 (.3) = .507 = E [(a+cX) - E(a+cX)]
4 .1 4 - 4.3 = - .3 .09 (.1) = .009
5 .2 5 - 4.3 = .7 .49 (.2) = .098 = c2 var(X)
6 .3 6 - 4.3 = 1.7 2.89 (.3) = .867
n
Σ x f(xi) = .2 + .9 + .4 + 1.0 + 1.8 = 4.3
i=1 i
n 2
2
Σ (xi - EX) f(xi) = .529 + .507 + .009 + .098 + .867 var(a + cX) = c var(X)
i=1
= 2.01

2.31 2.32
Joint pdf Survey of College City, NY
college grads
joint pdf in household
A joint probability density function, Y=1 Y=2
f(x,y), provides the probabilities
f(x,y)
f(0,1) f(0,2)
associated with the joint occurrence
vacation X = 0 .45 .15
of all of the possible pairs of X and Y. homes
owned
X=1
.05 .35
f(1,1) f(1,2)
8
2.33 2.34
Marginal pdf
Calculating the expected value of
functions of two random variables.
The marginal probability density functions,
f(x) and f(y), for discrete random variables,
E[g(X,Y)] = Σ Σ g(xi,yj) f(xi,yj) can be obtained by summing over the f(x,y)
i j with respect to the values of Y to obtain f(x)
E(XY) = Σ Σ xi yj f(xi,yj) with respect to the values of X to obtain f(y).
i j
E(XY) = (0)(1)(.45)+(0)(2)(.15)+(1)(1)(.05)+(1)(2)(.35)=.75 f(xi) = Σ f(xi,yj) f(yj) = Σ f(xi,yj)

j i

2.35 2.36
marginal Conditional pdf
Y=1 Y=2 marginal
pdf for X: The conditional probability density
X=0 .15 .60 f(X = 0) functions of X given Y=y , f(x|y),
.45
and of Y given X=x , f(y|x),
.05 .35 .40 f(X = 1) are obtained by dividing f(x,y) by f(y)
X=1
to get f(x|y) and by f(x) to get f(y|x).
marginal .50 .50 f(x,y) f(x,y)

pdf for Y:
f(Y = 1) f(Y = 2)
f(x|y) = f(y|x) =
f(y) f(x)

2.37 2.38
conditonal Independence
Y=1 Y=2
f(Y=1|X = 0)=.75 f(Y=2|X= 0)=.25
.75 .25 X and Y are independent random
X=0 .45 .15 .60 variables if their joint pdf, f(x,y),
f(X=0|Y=1)=.90 .90 .30 f(X=0|Y=2)=.30 is the product of their respective
f(X=1|Y=1)=.10 .10 .70 f(X=1|Y=2)=.70 marginal pdfs, f(x) and f(y) .
X=1 .05 .35 .40
.125 .875 f(xi,yj) = f(xi) f(yj)
f(Y=1|X = 1)=.125 f(Y=2|X = 1)=.875
.50 .50 for independence this must hold for all pairs of i and j
9
2.39 2.40
not independent Covariance
Y=1 Y=2 marginal
.50x.60=.30
pdf for X:
.50x.60=.30 The covariance between two random
X=0 .45 .15 .60 f(X = 0) variables, X and Y, measures the
linear association between them.
X=1 .05 .35 .40 f(X = 1)
.50x.40=.20 .50x.40=.20
The calculations
cov(X,Y) = E[(X - EX)(Y-EY)]
in the boxes show
marginal .50 .50 the numbers Note that variance is a special case of covariance.
pdf for Y: required to have
f(Y = 1) f(Y = 2) independence. cov(X,X) = var(X) = E[(X - EX)2 ]

2.41 Y=1 Y=2 2.42
cov(X,Y) = E [(X - EX)(Y-EY)]
cov(X,Y) = E [(X - EX)(Y-EY)] X=0 .45 .15 .60

EX=0(.60)+1(.40)=.40
= E [XY - X EY - Y EX + EX EY]
X=1 .05 .35 .40
= E(XY) - EX EY - EY EX + EX EY
= E(XY) - 2 EX EY + EX EY
covariance
= E(XY) - EX EY .50 .50 cov(X,Y) = E(XY) - EX EY
EY=1(.50)+2(.50)=1.50 = .75 - (.40)(1.50)
EX EY = (.40)(1.50) = .60 = .75 - .60
cov(X,Y) = E(XY) - EX EY = .15
E(XY) = (0)(1)(.45)+(0)(2)(.15)+(1)(1)(.05)+(1)(2)(.35)=.75

2.43 Y=1 Y=2 2.44
Correlation EX=.40
2 2 2
EX=0(.60)+1(.40)=.40
X=0 .45 .15 .60
The correlation between two random 2 2
var(X) = E(X ) - (EX)
variables X and Y is their covariance = .40 - (.40)
2
divided by the square roots of their X=1 .05 .35 .40 = .24
respective variances. cov(X,Y) = .15
.50 .50 correlation

cov(X,Y) EY=1.50
ρ(X,Y) = 2 2 2
ρ(X,Y) =
cov(X,Y)
EY=1(.50)+2(.50)
var(X) var(Y) = .50 + 2.0
2
var(Y) = E(Y ) - (EY)
2
var(X) var(Y)
= 2.50 = 2.50 - (1.50)2
Correlation is a pure number falling between -1 and 1. = .25 ρ(X,Y) = .61
10
2.45 Since expectation is a linear operator, 2.46
it can be applied term by term.
Zero Covariance & Correlation
The expected value of the weighted sum
of random variables is the sum of the
Independent random variables expectations of the individual terms.
have zero covariance and,
therefore, zero correlation. E[c1X + c2Y] = c1EX + c2EY
In general, for random variables X1, . . . , Xn :

The converse is not true.
E[c1X1+...+ cnXn] = c1EX1+...+ cnEXn

2.47 2.48
The variance of a weighted sum of random The Normal Distribution
variables is the sum of the variances, each times
the square of the weight, plus twice the covariances
of all the random variables times the products of Y ~ N(β,σ2)
their weights.
(y - β)2
exp - 2
1
f(y) =
Weighted sum of random variables:
2 π σ2 2σ
var(c1X + c2Y)=c12var(X)+c2 2var(Y) + 2c1c2cov(X,Y) f(y)
Weighted difference of random variables:
var(c1X − c2Y) = c21 var(X)+c22var(Y) − 2c1c2cov(X,Y)

β y

2.49 2.50
The Standardized Normal Y ~ N(β,σ2)
f(y)
Z = (y - β)/σ
β
a
y
Z ~ N(0,1)
Y-β a-β a-β
P[Y>a] = P > = P Z >
σ σ σ
1 - z2
f(z) = exp 2
2π
11
2.51 2.52
Y ~ N(β,σ2) Linear combinations of jointly
f(y)
normally distributed random variables
are themselves normally distributed.
β
a b
y Y1 ~ N(β1,σ 12), Y 2 ~ N(β2,σ 22), . . . , Yn ~ N(βn,σ n2)
a-β Y-β b-β

P[a<Y<b] = P
σ <
σ < σ
W = c 1Y1 + c 2Y2 + . . . + c nYn
a-β b-β
= P
σ
<Z<
σ W ~ N[ E(W), var(W) ]

2.53 2.54
Chi-Square Student - t
If Z1, Z2, . . . , Zm denote m independent If Z ~ N(0,1) and V ~ χ(m) and if Z and V

2
N(0,1) random variables, and are independent then, Z

V = Z1 + Z2 + . . . + Zm, then V ~ χ(m)
2 2 2 2
t= ~ t(m)
V m
V is chi-square with m degrees of freedom.
t is student-t with m degrees of freedom.
E[V] = E[ χ(m) ] = m
2
mean:
mean: E[t] = E[t(m) ] = 0 symmetric about zero
variance: var[V] = var[ χ(m)
2
] = 2m
variance: var[t] = var[t(m) ] = m / (m−2)

2.55 3.1
F Statistic Chapter 3
If V1 ~ χ(m ) and V 2 ~ χ(m2) and if V 1 and V2

The Simple Linear
2 2
1
are independent, then
V1 m
F=
V2
1
~ F(m1,m2) Regression
m2
F is an F statistic with m 1 numerator

Model
degrees of freedom and m2 denominator Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
degrees of freedom. copyright owner is unlawful. Request for further information should be addressed to the Permissions Department,
12
3.2 3.3
Weekly Food Expenditures
Purpose of Regression Analysis
y = dollars spent each week on food items.
1. Estimate a relationship among economic x = consumer’s weekly income.
variables, such as y = f(x).
The relationship between x and the expected
2. Forecast or predict the value of one value of y , given x, might be linear:
variable, y, based on the value of
another variable, x. E(y|x) = β1 + β2 x

3.4 f(y|x) f(y|x=480) f(y|x=800)
3.5
f(y|x=480)
f(y|x=480)
µy|x=480 y µy|x=480 µy|x=800 y
Figure 3.1a Probability Distribution f(y|x=480) Figure 3.1b Probability Distribution of Food
of Food Expenditures if given income x=$480. Expenditures if given income x=$480 and x=$800.

3.6 Homoskedastic Case 3.7
Average
Expenditure f(yt) yt
re
E(y|x)
tu
E(y|x)=β1+β2x
di
en
p
ex
∆E(y|x) β2=
∆E(y|x) .
∆x
∆x .
β1{
x (income)
x1=480 x2=800 income xt
Figure 3.2 The Economic Model: a linear relationship
between avearage expenditure on food and income. Figure 3.3. The probability density function
for yt at two levels of household income, x t
13
Heteroskedastic Case 3.8 3.9
f(yt) Assumptions of the Simple Linear
y
t
re Regression Model - I
tu
ndi
pe 1. The average value of y, given x, is given by
ex
. the linear regression:
E(y) = β1 + β2x
.
. 2. For each value of x, the values of y are
distributed around their mean with variance:
var(y) = σ2
3. The values of y are uncorrelated, having zero
x1 x2 x3 income xt covariance and thus no linear relationship:
cov(yi ,yj) = 0
Figure 3.3+. The variance of yt increases 4. The variable x must take at least two different
as household income, x t , increases. values, so that x ≠ c, where c is a constant.

3.10 3.11
One more assumption that is often used in
The Error Term
practice but is not required for least squares: y is a random variable composed of two parts:
5. (optional) The values of y are normally I. Systematic component: E(y) = β1 + β2x

distributed about their mean for each This is the mean of y.
value of x:
II. Random component: e = y - E(y)
y ~ N [(β1+β2x), σ2 ] = y - β1 - β2x
This is called the random error.
Together E(y) and e form the model:

y = β1 + β2x + e

y 3.12 y 3.13
y4 . ê4 {
.y4
e4 { E(y) = β1 + β2x ^y = b + b x
.^ 1 2
y4
y^ 3
y3 .} e3 ..} ê3
y2 e2 { . ê2 { .y2 y3
.^
y2
^y1.
}. e1 } ê1
y1 y1.
x1 x2 x3 x4 x1 x2 x3 x4
x x
Figure 3.5 The relationship among y, e and Figure 3.7a The relationship among y, ê and
the true regression line. the fitted regression line.
14
y 3.14 f(. ) 3.15
f(e) f(y)
.y4 ^y = b + b x
^* ^y3*
{.
ê*4
^* * *
1 2
^y1* .y2 ê*3{. ^y4* y = b1 + b2x

. ê *{ y .
{
2 . 2 y3
ê*1
y1. 0 β1+β2x
x1 x2 x3 x4
x
Figure 3.7b The sum of squared residuals Figure 3.4 Probability density function for e and y
from any other line will be larger.

The Error Term Assumptions 3.16 Unobservable Nature 3.17
of the Error Term
1. The value of y, for each value of x, is
y = β1 + β2x + e 1. Unspecified factors / explanatory variables,
2. The average value of the random error e is: not in the model, may be in the error term.
E(e) = 0
3. The variance of the random error e is: 2. Approximation error is in the error term if
var(e) = σ2 = var(y) relationship between y and x is not exactly
4. The covariance between any pair of e’s is: a perfectly linear relationship.
cov(ei ,ej) = cov(yi ,yj) = 0
5. x must take at least two different values so that
x ≠ c, where c is a constant. 3. Strictly unpredictable random behavior that
may be unique to that observation is in error.
6. e is normally distributed with mean 0, var(e)=σ2
(optional) e ~ N(0,σ2)

3.18 3.19
Population regression values:
y t = β1 + β2x t + e t y t = β1 + β2x t + e t
Population regression line: e t = y t - β1 - β2x t

E(y t|x t) = β1 + β2x t
Minimize error sum of squared deviations:
Sample regression values: T
y t = b1 + b2x t + ê t S(β1,β2) = Σ(y t - β1 - β2x t )2 (3.3.4)
t=1
Sample regression line:
y^ t = b1 + b2x t
15
Minimize w. r. t. β1 and β2: 3.20 Minimize w. r. t. β1 and β2: 3.21
T T
S(β1,β2) = Σ(y t - β1 - β2x t )2 (3.3.4) S(. ) = Σ (y t - β1 - β2x t )2
t =1 S(.) t =1
S(.)
∂S(. )
= - 2 Σ (y t - β 1 - β 2x t )
∂β1
∂S(. ) ∂S(.) .
= - 2Σ x t (y t - β1 - β2x t ) <0
∂βi .∂S(.)
∂β2 ∂S(.) =0
∂βi ∂βi
>0
Set each of these two derivatives equal to zero and .
solve these two equations for the two unknowns: β1 β2 bi βi

To minimize S(.), you set the two 3.22 3.23
derivatives equal to zero to get: - 2 Σ (y t - b1 - b2x t ) = 0
∂S(. ) - 2Σ x t (y t - b1 - b2x t ) = 0
= - 2 Σ (y t - b1 - b2x t ) = 0
∂β1
∂S(. )
= - 2Σ x t (y t - b1 - b2x t ) = 0
Σ y t - Tb1 - b2 Σ x t = 0
∂β2
Σ x t y t - b1 Σ x t - b2 Σ xt =
2
0
When these two terms are set to zero,
β1 and β2 become b1 and b2 because they no longer Tb1 + b2 Σ x t = Σ yt
represent just any value of β1 and β2 but the special
Σ xt + b2 Σ xt Σ xtyt
2
values that correspond to the minimum of S(. ) . b1 =

3.24 3.25
Tb1 + b2 Σ x t = Σ yt elasticities
Σ xt + b2 Σ xt Σ xtyt
2
b1 =
percentage change in y ∆y/y ∆y x
η = = =
percentage change in x ∆x/x ∆x y
Solve for b1 and b2 using definitions of x and y
T Σ x t yt -Σ xt Σ yt Using calculus, we can get the elasticity at a point:

b2 =
T Σ x t - (Σ x t)2
2
η = lim ∆y x = ∂y x
∆x→ 0 ∆x y ∂x y
b1 = y - b2 x
16
3.26 3.27
applying elasticities estimating elasticities
∂y x x
E(y) = β1 + β2 x η
^
= = b2
∂x y y
∂E(y) y^t = b1 + b2 x t = 4 + 1.5 x t
= β2
∂x x = 8 = average number of years of experience
y = $10 = average wage rate
∂E(y) x x
η = = β2 x 8
∂x E(y) E(y) η
^
= b2 = 1.5 = 1.2
y 10

3.28 3.29
Prediction log-log models
Estimated regression equation:
y^t = 4 + 1.5 x t
ln(y) = β1 + β2 ln(x)
xt = years of experience
∂ln(y) ∂ln(x)
y^ t = β2
= predicted wage rate
∂x ∂x
^
If x t = 2 years, then yt = $7.00 per hour.
1 ∂y 1 ∂x
= β2
^
If x t = 3 years, then yt = $8.50 per hour. y ∂x x ∂x

3.30 4.1
1 ∂y 1 ∂x Chapter 4
= β2
y ∂x x ∂x
x ∂y
Properties of
= β2 Least Squares
y ∂x
elasticity of y with respect to x: Estimators
x ∂y
η = β2
= that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
y ∂x
17
4.2 Assumptions of the Simple 4.3
Simple Linear Regression Model
Linear Regression Model
yt = β1 + β2 x t + ε t
1. yt = β1 + β2x t + ε t
yt = household weekly food expenditures 2. E(ε t) = 0 <=> E(yt) = β1 + β2x t
xt = household weekly income 3. var(εt) = σ 2 = var(yt)
4. cov( ε i,ε j) = cov(yi,yj) = 0
For a given level of x t, the expected
level of food expenditures will be: 5. xt ≠ c for every observation
E(yt|x t) = β1 + β2 x t 6. ε t~N(0,σ 2) <=> yt~N(β1+ β2x t,σ 2)

4.4 4.5
The population parameters β1 and β2
are unknown population constants . Estimators are Random Variables
( estimates are not )
The formulas that produce the
sample estimates b1 and b2 are • If the least squares estimators b 0 and b1
called the estimators of β1 and β2. are random variables, then what are their
their means, variances, covariances and
probability distributions?
When b0 and b1 are used to represent
the formulas rather than specific values, • Compare the properties of alternative
they are called estimators of β1 and β2 estimators to the properties of the
which are random variables because least squares estimators.
they are different from sample to sample.
yt = β1 + β2x t + ε t
4.6 Substitute in 4.7
The Expected Values of b1 and b2 to get:
TΣxtεt - Σxt Σεt
The least squares formulas (estimators)
b2 = β2 +
TΣxt2 -(Σxt) 2
in the simple regression case:
TΣxtyt - Σxt Σyt The mean of b2 is:

b2 =
TΣxt2 -(Σxt) 2
(3.3.8a) TΣxtEεt - Σxt ΣEεt
Eb2 = β2 +
TΣxt2 -(Σxt)2
b1 = y - b2x (3.3.8b)
Since Eεt = 0 , then Eb2 = β2 .

where y = Σyt / T and x = Σx t / T
18
4.8 4.9
An Unbiased Estimator Wrong Model Specification
The unbiasedness result on the

The result Eb2 = β2 means that previous slide assumes that we
the distribution of b2 is centered at β2. are using the correct model.
If the model is of the wrong form

Since the distribution of b2 or is missing important variables,
is centered at β2 ,we say that then Eεt ≠ 0, then Eb2 ≠ β2 .
b2 is an unbiased estimator of β2.

4.10 4.11
Unbiased Estimator of the Intercept Equivalent expressions for b2:
Σ(xt − x )(yt − y )
In a similar manner, the estimator b 1 b2 = (4.2.6)
of the intercept or constant term can be Σ(xt − x ) 2
shown to be an unbiased estimator of β1
Expand and multiply top and bottom by T:
when the model is correctly specified.
TΣxtyt − Σxt Σyt

Eb1 = β1 b2 = (3.3.8a)
TΣxt2 −(Σxt) 2

4.12 4.13
Variance of b2 Variance of b1
Given that both y t and ε t have variance σ 2,

the variance of the estimator b 2 is:
Given b1 = y − b2x
the variance of the estimator b 1 is:
var(b2) = σ2
Σ(x t − x)2
Σx t2
var(b1) = σ2 Τ Σ(x t − x) 2
b2 is a function of the y t values but
var(b2) does not involve yt directly.
19
4.14 4.15
Covariance of b1 and b2 What factors determine
variance and covariance ?
1. σ 2: uncertainty about yt values uncertainty about
−x
b1, b2 and their relationship.
2. The more spread out the xt values are then the more
cov(b 1,b2) = σ2 confidence we have in b1, b2, etc.
Σ(x t − x)2 3. The larger the sample size, T, the smaller the
variances and covariances.
4. The variance b1 is large when the (squared) xt values
are far from zero (in either direction).
If x = 0, slope can change without affecting 5. Changing the slope, b2, has no effect on the intercept,
the variance. b1, when the sample mean is zero. But if sample
mean is positive, the covariance between b1 and
b2 will be negative, and vice versa.

4.16 4.17
Gauss-Markov Theorm
implications of Gauss-Markov
Under the first five assumptions of the 1. b 1 and b2 are “best” within the class
simple, linear regression model, the of linear and unbiased estimators.
ordinary least squares estimators b1 2. “Best” means smallest variance
and b2 have the smallest variance of within the class of linear/unbiased.
all linear and unbiased estimators of 3. All of the first five assumptions must
β1 and β2. This means that b1and b2 hold to satisfy Gauss-Markov.
are the Best Linear Unbiased Estimators 4. Gauss-Markov does not require
(BLUE) of β1 and β2. assumption six: normality.
5. G-Markov is not based on the least
squares principle but on b 1 and b2.

4.18 4.19
Probability Distribution
G-Markov implications (continued)
of Least Squares Estimators
6. If we are not satisfied with restricting
our estimation to the class of linear and
unbiased estimators, we should ignore σ2 Σx2t
the Gauss-Markov Theorem and use b1 ~ N β 1 , Τ Σ(x t − x) 2
some nonlinear and/or biased estimator
instead. (Note: a biased or nonlinear
estimator could have smaller variance
than those satisfying Gauss-Markov.) σ2
b2 ~ N β 2 ,
7. Gauss-Markov applies to the b1 and b2
estimators and not to particular sample
Σ(x t − x) 2
values (estimates) of b1 and b2.
20
4.21
yt and ε t normally distributed 4.20
normally distributed under
The least squares estimator of β2 can be
The Central Limit Theorem
expressed as a linear combination of y t’s:
b2 = Σ wt yt If the first five Gauss-Markov assumptions

(x t − x ) hold, and sample size, T, is sufficiently large,
where wt = then the least squares estimators, b1 and b2,
Σ(x t − x) 2 have a distribution that approximates the
b1 = y − b2x normal distribution with greater accuracy
the larger the value of sample size, T.
This means that b1and b2 are normal since
linear combinations of normals are normal.

4.22 Estimating the variance 4.23
Consistency
of the error term, σ 2
We would like our estimators, b1 and b2, to collapse
onto the true population values, β1 and β2, as
sample size, T, goes to infinity. e^t = yt − b1 − b2 x t
One way to achieve this consistency property is
Σe^t
T
2
for the variances of b1 and b2 to go to zero as T
goes to infinity.
σ
^2 = t =1
T− 2
Since the formulas for the variances of the least
squares estimators b1 and b2 show that their
variances do, in fact, go to zero, then b1 and b2, σ^ 2 is an unbiased estimator of σ 2
are consistent estimators of β1 and β2.

4.24 5.1
The Least Squares Chapter 5
Predictor, ^yo
Given a value of the explanatory

Inference
variable, Xo , we would like to predict
a value of the dependent variable, yo .
in the Simple
The least squares predictor is:
Regression Model
yô = b1 + b2 x o
(4.7.2) copyright owner is unlawful. Request for further information should be addressed to the Permissions Department,
21
5.2 5.3
Assumptions of the Simple Probability Distribution
Linear Regression Model of Least Squares Estimators
1. yt = β1 + β2x t + ε t
σ2 Σx 2t
2. E( ε t) = 0 <=> E(yt) = β1 + β2x t b1 ~ N β 1 , Τ Σ(x t − x) 2
3. var( εt) = σ = var(yt)
2
4. cov(ε i, ε j) = cov(yi, yj) = 0

σ2
5. x t ≠ c for every observation b2 ~ N β 2 ,
Σ(x t − x) 2
6. ε t~N(0, σ 2) <=> yt~ N(β1+ β2x t, σ 2)

5.4 5.5
Error Variance Estimation We make a correct decision if:
• The null hypothesis is false and we decide to reject it.

Unbiased estimator of the error variance: • The null hypothesis is true and we decide not to reject it.
Σe^ t
2
σ
^2 =
Τ−2 Our decision is incorrect if:
• The null hypothesis is true and we decide to reject it.

Transform to a chi-square distribution: This is a type I error.
(Τ − 2) σ
σ2
^2
∼ χ Τ−2
• The null hypothesis is false and we decide not to reject it.
This is a type II error.

5.6 5.7
Simple Linear Regression
σ2
b2 ~ N β 2 ,
Σ(x t − x) 2 yt = β1 + β2x t + ε t where E ε t = 0
Create a standardized normal random variable, Z,

by subtracting the mean of b2 and dividing by its yt ~ N(β1+ β2xt , σ 2)
standard deviation: since Eyt = β1 + β2x t
b2 − β2 εt = yt − β1 − β2x t
Ζ = ∼ Ν(0,1)
var(b 2)
Therefore, ε t ~ N(0,σ 2) .
22
5.8 5.9
Create a Chi-Square Sum of Chi-Squares
ε t ~ N(0,σ 2) but want N(0,1) . Σt =1(ε t /σ)2 =

(ε1 /σ)2 + (ε 2 /σ)2 +. . .+ (ε T /σ)2
(ε t /σ) ~ N(0,1) Standard Normal .
χ2(1) + χ2(1) +. . .+χ2(1) = χ2(Τ)
(ε t /σ)2 ~ χ2(1) Chi-Square .
Therefore, Σt =1(εt /σ)2 ∼ χ2(Τ)

5.10 5.11
Chi-Square degrees of freedom
Student-t Distribution
Since the errors ε t = yt − β1 − β2x t
are not observable, we estimate them with Z
the sample residuals e t = yt − b1 − b2x t. t= ~ t(m)
V/m
Unlike the errors, the sample residuals are
not independent since they use up two degrees
of freedom by using b1 and b2 to estimate β1 and β2. where Z ~ N(0,1)
We get only T −2 degrees of freedom instead of T.

and V ~ χ2(m)

5.12 5.13
Z
t = ~ t(m) Z
V / ( T− 2) t = (T−2) σ
^2
V =
V / (T-2) σ2
(b2 − β2) (b2 − β2)
where Z =
var(b 2) var(b 2)
t =
σ2 (T−2) σ
^2
( T− 2)
and var(b2) =
Σ( xi − x )2 σ2
23
σ
2 5.14 5.15
var(b 2) =
Σ( xi − x ) 2
(b2 − β2) (b2 − β2)
t = =
(b2 − β2) ^2 ^
notice the σ var(b 2)
cancellations Σ( xi − x )2
σ2
Σ( x i − x ) 2 (b2 − β2)
t = =
^2
(T−2) σ σ
^2 (b2 − β2)
( T− 2) t =
σ2 Σ( xi − x ) 2 se(b2)

5.16 5.17
Student’s t - statistic Figure 5.1 Student-t Distribution
f(t)
(b2 − β2)
t = ~ t (T−2)
se(b2)
(1−α)
α/2 α/2
t has a Student-t Distribution -tc 0 tc t
with T− 2 degrees of freedom.
red area = rejection region for 2-sided test

5.18 5.19
probability statements Confidence Intervals

P( t < -tc ) = P( t > tc ) = α/2
Two-sided (1−α)x100% C.I. for β1:
P(-t c ≤ t ≤ tc) = 1 − α b1 − t α/2[se(b1)], b1 + tα/2[se(b1)]
Two-sided (1−α)x100% C.I. for β2:

(b2 − β2)
P(-t c ≤ ≤ tc) = 1 − α b2 − t α/2[se(b2)], b2 + tα/2[se(b2)]
se(b2)
24
5.20 5.21
Student-t vs. Normal Distribution

Hypothesis Tests
1. Both are symmetric bell-shaped distributions.
2. Student-t distribution has fatter tails than the normal. 1. A null hypothesis, H0.
3. Student-t converges to the normal for infinite sample. 2. An alternative hypothesis, H1.
4. Student-t conditional on degrees of freedom (df).
3. A test statistic.
5. Normal is a good approximation of Student-t for the first few
decimal places when df > 30 or so. 4. A rejection region.

5.22 5.23
Rejection Rules
1. Two-Sided Test: Format for Hypothesis Testing
If the value of the test statistic falls in the critical region in either
tail of the t-distribution, then we reject the null hypothesis in favor
of the alternative. 1. Determine null and alternative hypotheses.
2. Left-Tail Test: 2. Specify the test statistic and its distribution
If the value of the test statistic falls in the critical region which lies
in the left tail of the t-distribution, then we reject the null as if the null hypothesis were true.
hypothesis in favor of the alternative.
3. Select α and determine the rejection region.
2. Right-Tail Test:
If the value of the test statistic falls in the critical region which lies 4. Calculate the sample value of test statistic.
in the right tail of the t-distribution, then we reject the null
hypothesis in favor of the alternative. 5. State your conclusion.

5.24 5.25
practical vs. statistical Type I and Type II errors
significance in economics
Type I error:
Practically but not statistically significant: We make the mistake of rejecting the null
When sample size is very small, a large average gap between hypothesis when it is true.
α = P(rejecting H0 when it is true).
the salaries of men and women might not be statistically
significant.
Statistically but not practically significant: Type II error:

When sample size is very large, a small correlation (say, ρ =
0.00000001) between the winning numbers in the PowerBall
We make the mistake of failing to reject the null
Lottery and the Dow-Jones Stock Market Index might be hypothesis when it is false.
statistically significant. β = P(failing to reject H0 when it is false).
25
5.26 6.1
Prediction Intervals Chapter 6
A (1−α)x100% prediction interval for yo is:
^y ± t se( f )
o c
The Simple Linear
Regression Model
f = yô − yo ^ f)
se( f ) = var(
^ 2 1 + 1 + (x o − x )
2
^ f)=σ
var(
Τ Σ(x t − x)2

6.2 6.3
Explaining Variation in yt Explaining Variation in yt
Predicting yt without any explanatory variables: yt = b1 + b2xt + êt

yt = β1 + et
Σ(yt − b1) = 0
T
t=1
Explained variation:
^y = b + b x
t 1 2 t
Σyt − Tb1 = 0
T T T
Σe2t = Σ(yt − β1)2

t=1 t=1
t=1
Unexplained variation:
∂Σ
T
e2t T
b1 = y
t=1
= −2 Σ(yt − b1) = 0 ê = y − ^y = y − b − b x
∂β1 t=1 Why not y? t t t t 1 2 t

6.4 6.5
Explaining Variation in yt Total Variation in yt
yt = y^ t + êt using y as baseline SST = total sum of squares
yt − y = y^t − y + êt Why not y? SST measures variation of yt around y

T T cross
Σ(yt−y)2 = tΣ= 1(y^t−y)2 +tΣ= 1e^t2
T
= Σ(yt − y)2
product T
t=1 term SST
drops t=1
SST = SSR + SSE out
26
6.6 6.7
Explained Variation in yt Unexplained Variation in yt
SSR = regression sum of squares SSE = error sum of squares
^y = b + b x
Fitted ^yt values: t 1 2 t e^t = yt−y
^ =y −b −bx
t t 1 2 t
SSR measures variation of ^yt around y ^
SSE measures variation of yt around yt
T
SSR = Σ(yt − y)
T
= Σ(yt − y^ t)2 = Σêt2
T
^ 2
t=1
SSE
t=1 t=1

6.8 6.9
Analysis of Variance Table Coefficient of Determination
What proportion of the variation
Table 6.1 Analysis of Variance Table in yt is explained?
Source of Sum of Mean
Variation
Explained
DF
1
Squares
SSR
Square
SSR/1
0 ≤ R2 ≤ 1
Unexplained T-2 SSE SSE/(T-2)
Total T-1 SST

[= ^σ2]
R2 = SSR
SST

6.10 6.11
Coefficient of Determination Coefficient of Determination
SST = SSR + SSE R2 is only a descriptive measure.
SST SSR SSE
Dividing = + R2 does not measure the quality
by SST SST SST SST
of the regression model.
1 = SSR + SSE
SST SST
Focusing solely on maximizing
R = 2 SSR
SST = 1− SSE
SST
R2 is not a good idea.
27
6.12 6.13
Correlation Analysis Correlation Analysis
Population:
^ = Σ(xt − x) /(T−1)
T
cov(X,Y) 2
ρ= var(X)
t=1
var(X) var(Y)
^ = Σ(yt − y)2/(T−1)
T
var(Y)
t=1
Sample:
^
cov(X,Y)
=Σ(xt − x)(yt − y)/(T−1)
T
r= ^
cov(X,Y)
^
var(X) ^
var(Y) t=1

6.14 6.15
Correlation Analysis Correlation Analysis and R2
For simple linear regression analysis:

Sample Correlation Coefficient
r 2 = R2
Σ(xt − x)(yt − y)
T
t=1
r= R2 is also the correlation
Σ(xt − x) Σ(yt − y)
T T
2 2
between yt and ^ yt
t=1 t=1
measuring “goodness of fit”.

6.16 6.17
Regression Computer Output Regression Computer Output
b 1 = 40.7676 b 2 = 0.1283
Typical computer output of regression estimates:
se(b 1) = ^ 1) = 490.12
var(b = 22.1287
Table 6.2 Computer Generated Least Squares Results ^ 2) = 0.0009326 = 0.0305
(1) (2) (3) (4) (5) se(b 2) = var(b
Parameter Standard T for H0:
b1 40.7676
Variable Estimate Error Parameter=0 Prob>|T| t = = = 1.84
INTERCEPT 40.7676 22.1387 1.841 0.0734 se(b 1) 22.1287
X 0.1283 0.0305 4.201 0.0002
b2 0.1283
t = se(b 2)
= = 4.20
0.0305
28
6.18 6.19
Regression Computer Output Regression Computer Output
SST = Σ(yt−y) = 79532

2
Sources of variation in the dependent variable:
Table 6.3 Analysis of Variance Table SSR = Σ(y

^ −y)2 = 25221
t
SSE = Σe^t = 54311

Sum of Mean 2
Source DF Squares Square
Explained 1 25221.2229 25221.2229 SSE /(T-2) = ^σ2 = 1429.2455
Unexplained 38 54311.3314 1429.2455
Total 39 79532.5544 SSR
R-square: 0.3171 R2 = SST = 1 − SSE = 0.317
SST

6.20 6.21
Reporting Regression Results Reporting Regression Results
yt = 40.7676 + 0.1283xt R2 = 0.317

This R2 value may seem low but it is
(s.e.) (22.1387) (0.0305) typical in studies involving cross-sectional
data analyzed at the individual or micro level.
yt = 40.7676 + 0.1283xt A considerably higher R2 value would be

expected in studies involving time-series data
(t) (1.84) (4.20) analyzed at an aggregate or macro level.

6.22 Effects of Scaling the Data 6.23
Effects of Scaling the Data
Changing the scale of x Changing the scale of y
yt = β1 + β2xt + et yt = β1 + β2xt + et
The estimated
coefficient and yt = β1 + (cβ2)(x t/c) + et yt/c = (β1/c) + (β2/c)xt + et/c
standard error yt = β1 + β*2x*t + et All statistics
y*t = β*1 + β*2xt + e*t
change but the are changed
other statistics where except for where y*t = yt/c e*t = e t/c
are unchanged. the t-statistics
β*2 = cβ2 and x*t = xt/c and R2 value. β* = β /c and
1 1 β*2 = β2/c
29
6.24 6.25
Effects of Scaling the Data Functional Forms
Changing the scale of x and y
The term linear in a simple
yt = β1 + β2xt + et regression model does not mean
No change in yt/c = (β1/c) + (cβ2/c)xt/c + et/c
the R2 or the a linear relationship between
t-statistics or y*t = β*1 + β2x*t + e*t variables, but a model in which
in regression
results for β2 where y*t = yt/c e*t = e t/c the parameters enter the model
but all other
β* = β /c and
in a linear way.
stats change. 1 1 x*t = xt/c

Linear vs. Nonlinear 6.27
Linear vs. Nonlinear 6.27
Linear Statistical Models:

yt = β1 + β2xt + et yt = β1 + β2 ln(xt) + et
y nonlinear
ln(yt) = β1 + β2xt + et yt = β1 + β2x2t + et relationship
food
expenditure between food
Nonlinear Statistical Models:
expenditure and
yt = β1 + β2xβt 3 + et β
yt 3 = β1 + β2xt + et income
yt = β1 + β2xt + exp(β3xt) + et 0 income x

6.28 6.29
Useful Functional Forms Useful Functional Forms
1. Linear Linear
Look at 2. Reciprocal
each form 3. Log-Log
and its
slope and
4. Log-Linear yt = β1 + β2xt + et
elasticity 5. Linear-Log x
slope: β2 elasticity: β2 yt
6. Log-Inverse t
30
6.30 6.31
Reciprocal Log-Log
yt = β1 + β2 1x + et ln(yt)= β1 + β2ln(xt) + et
t
slope: elasticity: yt
1
− β2 2
1 slope: β2 x elasticity: β2
− β2 x y t
xt t t

6.32 6.33
Log-Linear Linear-Log
ln(yt)= β1 + β2xt + et yt= β1 + β2ln(xt) + et
slope: β2 yt elasticity: β2xt slope: β2 _

1 _1
elasticity: β2y
xt t

6.34 6.35
Useful Functional Forms Error Term Properties
Log-Inverse 1. E (et) = 0
2. var (et) = σ2
ln(yt) = β1 - β2 x1 + et
t 3. cov(ei, ej) = 0
slope: β2
yt 1
elasticity: β2 x 4. et ~ N(0, σ2)
x2t t
31
6.36 6.37
Economic Models Economic Models
1. Demand Models 1. Demand Models

2. Supply Models * quality demanded (yd) and price (x)
3. Production Functions * constant elasticity
4. Cost Functions
5. Phillips Curve ln(ytd)= β1 + β2ln(x)t + et

6.38 6.39
2. Supply Models 3. Production Functions

* quality supplied (ys) and price (x) * output (y) and input (x)
* constant elasticity * constant elasticity
Cobb-Douglas Production Function:
ln(yt )= β1 + β2ln(xt) + et
s
ln(yt)= β1 + β2ln(xt) + et

6.40 6.41
4a. Cost Functions 4b. Cost Functions

* total cost (y) and output (x) * average cost (x/y) and output (x)
yt = β1 + β2x2t + et (yt/xt) = β1/xt + β2xt + et/xt
32
6.42 7.1
Economic Models Chapter 7
5. Phillips Curve
The Multiple
nonlinear in both variables and parameters
* wage rate (wt) and time (t)
Regression Model
wt − wt-1
% ∆wt = wt-1 = γα + γη ut
1 Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
unemployment rate, ut or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these

Two Explanatory Variables 7.2 Correlated Variables 7.3
yt = β1 + β2xt2 + β3xt3 + et yt = β1 + β2xt2 + β3xt3 + et

yt = output xt2 = capital xt3 = labor
xt‘s affect yt ∂yt ∂yt
= β2 = β3 Always 5 workers per machine.
separately ∂xt2 ∂xt3
If number of workers per machine
But least squares estimation of β2 is never varied, it becomes impossible
now depends upon both xt2 and xt3 . to tell if the machines or the workers
are responsible for changes in output.

7.4 7.5
The General Model Statistical Properties of et
yt = β1 + β2xt2 + β3xt3 +. . .+ βKxtK + et
1. E(et) = 0
The parameter β1 is the intercept (constant) term.
The “variable” attached to β1 is xt1= 1. 2. var(et) = σ2
3. cov(et , es) = 0 for t ≠ s
Usually, the number of explanatory variables
is said to be K−1 (ignoring xt1= 1), while the 4. et ~ N(0, σ2)
number of parameters is K. (Namely: β1 . . . βK).
33
7.6 7.7
Statistical Properties of yt Assumptions
1. yt = β1 + β2xt2 +. . .+ βKxtK + et
1. E (yt) = β1 + β2xt2 +. . .+ βKxtK
2. E (yt) = β1 + β2xt2 +. . .+ βKxtK
2. var(y t) = var(et) = σ2
3. var(yt) = var(et) = σ2
3. cov(yt ,ys) = cov(et , es) = 0 t≠s 4. cov(yt ,ys) = cov(et ,es) = 0 t ≠ s
4. yt ~ N(β1+β2xt2 +. . .+βKxtK, σ2) 5. The values of xtk are not random
6. yt ~ N(β1+β2xt2 +. . .+βKxtK, σ2)

7.8 7.9
Least Squares Estimation Least Squares Estimators
yt = β1 + β2xt2 + β3xt3 + et b1 = y − b1 − b2x2 − b3x3
T
2
S ≡ S(β1, β2, β3) = Σ(yt − β1 − β2xt2 − β3xt3) 2
t=1
b2 =
(Σy*t x*t2)(Σx*t3 ) − (Σy*t x*t3)(Σx*t2x*t3)
(Σxt2* )(Σx*t3 ) − (Σx*t2x*t3) 2
2 2
Define: yt* = yt − y
(Σyt*x*t3)(Σx*t2 ) − (Σy*t x*t2)(Σx*t3x*t2)

2
x*t2 = xt2 − x2 b3 =
(Σx*t2 )(Σx*t3 ) − (Σx*t2x*t3) 2
2 2
* =x −x
xt3 t3 3

7.10 Error Variance Estimation 7.11
Dangers of Extrapolation
Unbiased estimator of the error variance:
Statistical models generally are good only
Σe^ t
“within the relevant range”. This means 2
that extending them to extreme data values σ

^2 =
outside the range of the original data often Τ−Κ
leads to poor and sometimes ridiculous results.
Transform to a chi-square distribution:
χ
If height is normally distributed and the
normal ranges from minus infinity to plus (Τ − Κ) σ
^2
∼
infinity, pity the man minus three feet tall. σ 2
Τ−Κ
34
Gauss-Markov Theorem 7.12 7.13
Variances
Under the assumptions of the yt = β1 + β2xt2 + β3xt3 + et
multiple regression model, the
var(b2) =
σ2 When r23 = 0
Σ(xt2 − x2)
(1− r232 ) 2
ordinary least squares estimators these reduce
to the simple
have the smallest variance of
var(b3) = σ2 regression
all linear and unbiased estimators. (1− r23
2
)Σ(xt3 − x3)2 formulas.
This means that the least squares
estimators are the B est Linear Σ(xt2 − x2)(xt3 − x3)
where r23 =
Unbiased Estimators (BLUE).
Σ(xt2 − x2) Σ(xt3 − x3)
2 2

7.14 7.15
Variance Decomposition Covariances
The variance of an estimator is smaller when: yt = β1 + β2xt2 + β3xt3 + et
1. The error variance, σ 2, is smaller: σ 2 0.

− r23 σ2
cov(b2,b3) =
Σ(xt2 − x2) Σ(xt3 − x3)
2
2. The sample size, T, is larger:
T
(1− r23) 2 2
Σ(xt2 − x2)
2
.
t=1
3. The variable’s values are more spread out:
(xt2 − x2)2 . Σ(xt2 − x2)(xt3 − x3)
where r23 =
4. The correlation is close to zero: r232 0. Σ(xt2 − x2) Σ(xt3 − x3)
2 2

7.16 7.17
Covariance Decomposition Var-Cov Matrix
yt = β1 + β2xt2 + β3xt3 + et
The covariance between any two estimators
is larger in absolute value when: The least squares estimators b1, b2, and b3
have covariance matrix:
1. The error variance, σ 2, is larger.
2. The sample size, T, is smaller. var(b 1) cov(b1,b2) cov(b1,b3)
cov(b1,b2,b3) = cov(b1,b2) var(b2) cov(b2,b3)
3. The values of the variables are less spread out. cov(b1,b3) cov(b2,b3) var(b3)
4. The correlation, r23, is high.
35
Normal 7.18 Student-t 7.19
yt = β1 + β2x2t + β3x3t +. . .+ βKxKt + et

Since generally the population variance
yt ~N (β1 + β2x2t + β3x3t +. . .+ βKxKt), σ 2 of b k , var(bk) , is unknown, we estimate
^ k) which uses ^σ 2 instead of σ 2.
it with var(b
This implies and is implied by: e t ~ N(0, σ )
2
Since bk is a linear bk − βk bk − βk
function of the yt’s: bk ~ N βk, var(bk) t = =
^ k)
var(b se(bk)
b − βk
z = k ~ N(0,1) for k = 1,2,...,K
t has a Student-t distribution with df=(T−K).
var(bk)

7.20 8.1
Interval Estimation Chapter 8
bk − βk
P −t c ≤
se(bk) c
≤t = 1−α Hypothesis Testing
tc is critical value for (T-K) degrees of freedom and
such that P(t ≥ tc) = α /2. Nonsample Information
P bk − t c se(bk) ≤ βk ≤ bk + tc se(bk) = 1 − α
bk − t c se(bk) , bk + tc se(bk)
Interval endpoints: or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these

8.2 8.3
Chapter 8: Overview Student - t Test
1. Student-t Tests yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
2. Goodness-of-Fit Student-t tests can be used to test any linear
combination of the regression coefficients:
3. F-Tests
4. ANOVA Table H0: β1 = 0 H0: β2 + β3 + β4 = 1
5. Nonsample Information H0: 3β2 − 7β3 = 21 H0: β2 − β3 ≤ 5
6. Collinearity
Every such t-test has exactly T−K degrees of freedom
7. Prediction where K=#coefficients estimated(including the intercept).
36
One Tail Test 8.4 Two Tail Test 8.5
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
H0: β3 ≤ 0 b3 H0: β2 = 0 b2
t= ~ t (T−K) t= ~ t (T−K)
H1: β3 > 0 se(b3) H1: β2 ≠ 0 se(b2)
df = T− K df = T− K
= T− 4 = T− 4
(1 − α) α α/2 (1 − α) α/2
0 tc -tc 0 tc

Goodness - of - Fit 8.6 Adjusted R-Squared 8.7
Coefficient of Determination Adjusted Coefficient of Determination

Original:
Σ(y^ t − y)2
T
R2 = SSR
SST
= 1 − SSE
SST
R2 = SSR
SST =
t=1
Σ(yt − y)2
T
Adjusted:
0≤ R ≤1 2 t=1
R2 = 1 − SSE/(T−K)
SST/(T−1)

8.8 8.9
Computer Output Reporting Your Results
Table 8.2 Summary of Least Squares Results Reporting standard errors:

Variable Coefficient Std Error t-value p-value
constant 104.79 6.48 16.17 0.000
^y = 104.79 − 6.642 X + 2.984 X
t t2 t3
price −6.642 3.191 −2.081 0.042 (6.48) (3.191) (0.167) (s.e.)
advertising 2.984 0.167 17.868 0.000
Reporting t-statistics:
b2 −6.642
t= = = −2.081 ^yt = 104.79 − 6.642 Xt2 + 2.984 Xt3
se(b2) 3.191
(16.17) (-2.081) (17.868) (t)
37
8.10 8.11
Single Restriction F-Test Multiple Restriction F-Test
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
(SSER − SSEU)/J H0: β2 = 0 H0: β2 = 0, β4 = 0 (SSER − SSEU)/J

F = F =
SSEU/(T−K)
H1: β2 ≠ 0 H1: H0 not true SSEU/(T−K)
(1964.758 − 1805.168)/1
=
1805.168/(52 − 3) dfn = J = 1 First run the restricted dfn = J = 2
= 4.33 dfd = T− K = 49 regression by dropping dfd = T− K = 49
By definition this is the t-statistic squared: Xt2 and Xt4 to get SSER.
t = − 2.081 F = t2 = 4.33 Next run unrestricted regression to get SSEU .

8.12 8.13
F-Tests F-Test of Entire Equation
F-Tests of this type are always right-tailed , yt = β1 + β2Xt2 + β3Xt3 + e t
even for left-sided or two-sided hypotheses,
f(F) because any deviation from the null will We ignore β1. Why? H0: β2 = β3 = 0
make the F value bigger (move rightward). H1: H0 not true
(SSER − SSEU)/J
F =
SSEU/(T−K)
(SSER − SSEU)/J dfn = J = 2
F = (13581.35 − 1805.168)/2
SSEU/(T−K) = dfd = T− K = 49
(1 − α) α 1805.168/(52 − 3) α = 0.05
= 159.828 Reject H0! Fc = 3.187
0 Fc F

ANOVA Table 8.14 Nonsample Information 8.15
Table 8.3 Analysis of Variance Table A certain production process is known to be

Sum of Mean Cobb-Douglas with constant returns to scale.
Source DF Squares Square F-Value ln(yt) = β1 + β2 ln(Xt2) + β3 ln(Xt3) + β4 ln(Xt4) + et
Explained 2 11776.18 5888.09 158.828
Unexplained 49 1805.168 36.84 where β2 + β3 + β4 = 1 β4 = (1 − β2 − β3)
Total 51 13581.35 p-value : 0.0001
ln(yt /Xt4) = β1 + β2 ln(Xt2/Xt4) + β3 ln(Xt3 /Xt4) + et
2 SSR 11776.18 y*t = β1 + β2 X*t2 + β3 X*t3 + β4 X*t4 + et

R = SST
= 13581.35
= 0.867
Run least squares on the transformed model.
Interpret coefficients same as in original model.
38
8.16 8.17
Collinear Variables Effects of Collinearity
The term “independent variable” means A high degree of collinearity will produce:
an explanatory variable is independent of 1. no least squares output when collinearity is exact.
of the error term, but not necessarily 2. large standard errors and wide confidence intervals.
independent of other explanatory variables. 3. insignificant t-values even with high R2 and a
significant F-value.
Since economists typically have no control
over the implicit “experimental design”, 4. estimates sensitive to deletion or addition of a few
observations or “insignificant” variables.
explanatory variables tend to move
5. good “within-sample”(same proportions) but poor
together which often makes sorting out
“out-of-sample”(different proportions) prediction.
their separate influences rather problematic.

8.18 8.19
Identifying Collinearity Mitigating Collinearity
Evidence of high collinearity include: Since high collinearity is not a violation of
any least squares assumption, but rather a
1. a high pairwise correlation between two
explanatory variables.
lack of adequate information in the sample:
2. a high R-squared when regressing one 1. collect more data with better information.
explanatory variable at a time on each of the 2. impose economic restrictions as appropriate.
remaining explanatory variables.
3. impose statistical restrictions when justified.
3. a statistically significant F-value when the
4. if all else fails at least point out that the poor
t-values are statistically insignificant.
model performance might be due to the
4. an R-squared that doesn’t fall by much when collinearity problem (or it might not).
dropping any of the explanatory variables.

8.20 9.1
Prediction Chapter 9
yt = β1 + β2Xt2 + β3Xt3 + e t
Given a set of values for the explanatory Extensions
variables, (1 X02 X03), the best linear
unbiased predictor of y is given by: of the Multiple
y^0 = b1 + b2X02 + b3X03 Regression Model
This predictor is unbiased in the sense Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
that the average value of the forecast copyright owner is unlawful. Request for further information should be addressed to the Permissions Department,
error is zero. programs or from the use of the information contained herein.
39
9.2 9.3
Topics for This Chapter Intercept Dummy Variables
1. Intercept Dummy Variables Dummy variables are binary (0,1)
2. Slope Dummy Variables yt = β1 + β2Xt + β3Dt + e t

3. Different Intercepts & Slopes yt = speed of car in miles per hour
4. Testing Qualitative Effects Xt = age of car in years
5. Are Two Regressions Equal? D t = 1 if red car, Dt = 0 otherwise.
6. Interaction Effects
H 0: β3 = 0
7. Dummy Dependent Variables Police: red cars travel faster.
H 1: β3 > 0

9.4
yt = β1 + β2Xt + β3Dt + e t Slope Dummy Variables 9.5
red cars: yt = (β1 + β3) + β2xt + e t yt = β1 + β2Xt + β3DtXt + e t

other cars: yt = β1 + β2Xt + e t Stock portfolio: Dt = 1 Bond portfolio: Dt = 0
yt
β1 + β3 y yt = β1 + (β2 + β3)Xt + et
value t
β1 β2
of stocks
β2 + β 3
red c
miles ars
porfolio
β2
bonds
per othe β2
r car β1 yt = β1 + β2Xt + et
hour s
0 Xt β1 = initial
age in years investment 0 years Xt

Different Intercepts & Slopes
9.6 yt = β1 + β2 Xt + β3 Dt + et 9.7
For men: Dt = 1.
yt = β1 + β2Xt + β3Dt + β4DtXt + e t For women: Dt = 0.
“miracle” seed: Dt = 1 regular seed: D t = 0 yt
yt = (β1+ β3) + β2 Xt + et Men
yt yt = (β1 + β3) + (β2 + β4)Xt + et wage
harvest Women
rate
“miracle”
β2 + β 4
weight β2 yt = β1 + β2 Xt + et
of corn yt = β1 + β2Xt + et
β1+ β3 . β2 Testing for H0: β3 = 0
β1 + β3 regular discrimination
β2 β1 . H1: β3 > 0
β1 in starting wage
rainfall Xt 0 years of experience Xt
40
yt = β1 + Copyright
β5 Xt +1996
β6 DLawrence C. Marsh
t Xt + e t
9.8 Copyright 1996
An Ineffective Affirmative Action Plan
Lawrence C. Marsh
9.9
For men Dt = 1. yt = β1 + β2 Xt + β3 Dt + β4 Dt Xt + et
For women Dt = 0.
yt yt women are started yt = (β1 + β3) + (β2 + β4) Xt + et
yt = β1 + (β5 + β6 )Xt + et Men at a higher wage. Men
wage
β5 + β6
wage +β4 Women
Women β2
rate rate
β2 yt = β1 + β2 Xt + et
yt = β1 + β5 Xt + et
β5
Women are given a higher starting wage, β1 ,
Men and women have the same
starting wage, β1 , but their wage rates
β1 while men get the lower starting wage, β1 + β3 ,
β1 increase at different rates (diff.= β6 ). β1 + β3 (β3 < 0 ). But, men get a faster rate of increase
in their wages, β2 + β4 , which is higher than the
β6 > 0 means that men’s wage rates are Note: rate of increase for women, β2 , (since β4 > 0 ).
increasing faster than women's wage rates. (β3 < 0)
0 years of experience 0 Xt
Xt years of experience

9.10 men: Dt = 1 ; women: Dt = 0 9.11
Testing Qualitative Effects
Yt = β1 +β 2 Xt + β 3 Dt + β 4 Dt Xt + et
1. Test for differences in intercept.
H 0: β3 ≤ 0 vs. Η1: β3 > 0 intercept
2. Test for differences in slope. Testing for b3 − 0
discrimination in tn −4
starting wage. Est.Var b 3 ˜
3. Test for differences in both H 0: β4 ≤ 0 vs. Η1: β4 > 0 slope
intercept and slope. Testing for b4 − 0
discrimination in
wage increases. Est.Var b 4 ˜ tn −4

Testing: Ho: β3 = β4 = 0 9.12
Are Two Regressions Equal? 9.13
H 1 : otherwise
variations of “The Chow Test”
( SSER − SSEU ) / 2 2
∼ F T −4 I. Assuming equal variances (pooling):
SSEU / ( T − 4 )
men: Dt = 1 ; women: Dt = 0
T
SSE U =∑(yt −b1−b2Xt −b3 Dt −b4 Dt Xt )
2 y t = β1 + β2 Xt + β3 Dt + β4 Dt Xt + et
t =1 H o: β3 = β4 = 0 vs. H 1: otherwise
and intercept and slope
T y t = wage rate Xt = years of experience
SSE R = ∑ (y t
− b1 − b2 Xt )
2
This model assumes equal wage rate variance.
t =1
41
II. Allowing forCopyright
unequal 1996 Lawrence C. Marsh
variances: 9.14 Copyright 1996 Lawrence C. Marsh
9.15
(running three regressions) Interaction Variables
Forcing men and women to have same β1, β2.
Everyone: y t = β1 + β2 Xt + et SSER
1. Interaction Dummies
Allowing men and women to be different.
Men only: y tm = δ 1 + δ 2 Xtm + etm SSEm
2. Polynomial Terms
Women only: y tw = γ1 + γ2 Xtw + etw SSEw (special case of continuous interaction)
(SSE R − SSEU)/J J = # restrictions
F=
SSEU /(T−K) K=unrestricted coefs. 3. Interaction Among Continuous Variables
J=2 K=4 where SSEU = SSEm + SSEw

9.16 9.17
1. Interaction Dummies 2. Polynomial Terms
Wage Gap between Men and Women
Polynomial Regression yt = income; Xt = age
yt = wage rate; Xt = experience
For men: M t = 1. For women: Mt = 0. Linear in parameters but nonlinear in variables:
For black: Bt = 1. For nonblack: Bt = 0. yt = β1 + β2 X t + β3 X2t + β4 X3t + et
No Interaction: wage gap assumed the same: yt
yt = β1 + β2 Xt + β3 M t + β4 Bt + et
Interaction: wage gap depends on race:
20 30 40 50 60 70 80 90 Xt
yt = β1 + β2 Xt + β3 M t + β4 Bt + β5 M t Bt + et People retire at different ages or not at all.

9.18 9.19
Polynomial Regression 3. Continuous Interaction
yt = income; Xt = age
Exam grade = f(sleep:Zt , study time:Bt)
yt = β1 + β2 X t + β3 X t + β4 X t + et
2 3
yt = β1 + β2 Zt + β3 Bt + β4 Zt Bt + et
Rate income is changing as we age: Sleep and study time do not act independently.
∂yt
= β2 + 2 β3 X t + 3 β4 X2t
∂Xt More study time will be more effective
when combined with more sleep and less
Slope changes as X t changes. effective when combined with less sleep.
42
9.20 Exam grade = f(sleep:Zt , study time:Bt) 9.21
continuous interaction
If Zt + Bt = 24 hours, then Bt = (24 − Zt)
Exam grade = f(sleep:Zt , study time:Bt)
yt = β1+ β2 Zt +β3(24 − Zt) +β4 Zt (24 − Zt) + et
Your studying is ∂yt
= β2 + β4 Zt yt = (β1+24 β3) + (β2−β 3+24 β4)Zt − β4Z2t + et
more effective
∂Bt yt = δ1 + δ2 Zt + δ3 Z2t + et
with more sleep.
∂yt Sleep needed to maximize your exam grade:
Your mind sorts = β2 + β4 Bt ∂yt −δ
things out while ∂Zt = δ2 + 2δ3 Zt = 0 Zt = 2
∂Zt 2δ3
you sleep (when you have things to sort out.) where δ 2 > 0 and δ 3 < 0

9.22 9.23
Dummy Dependent Variables Linear Probability Model
1 quits job
yi =
1. Linear Probability Model 0 does not quit
yi = β1 + β2 Xi2 + β3 Xi3 + β4 Xi4 + ei

2. Probit Model
Xi2 = total hours of work each week
Xi3 = weekly paycheck
3. Logit Model Xi4 = hourly pay (Xi3 divided by Xi2)

Linear Probability Model 9.24 Linear Probability Model 9.25
yi = β1 + β2 Xi2 + β3 Xi3 + β4 Xi4 + ei

Problems with Linear Probability Model:
Read predicted values of yi off the regression line:
^y = b + b X + b X + b X 1. Probability estimates are sometimes
i 1 2 i2 3 i3 4 i4 ^y
i less than zero or greater than one.
yt = 1
2. Heteroskedasticity is present in that
the model generates a nonconstant
yt = 0 error variance.
total hours of work each week Xi2
43
9.26 9.27
Probit Model Probit Model
latent variable, zi : zi = β1 + β2 Xi2 + . . . Since zi = β1 + β2 Xi2 + . . . , we can
substitute in to get:
Normal probability density function:
pi = P[ Z ≤ β1 + β2Xi2 ] = F(β1 + β2Xi2)
1
e −0.5zi
2
f(z i) =
2π yt = 1
Normal cumulative probability function:
zi yt = 0
F(zi) = P[ Z ≤ zi ] = ∫−∞ 1
2π
e−0.5u du
2

9.28 9.29
Logit Model Logit Model
pi is the probability of quitting the job. pi is the probability of quitting the job.
1
1 pi = − (β + β X + . . .)
Define pi : pi = 1 +e 1 2 i2
− (β + β X + . . .)
1 +e 1 2 i2
yt = 1
For β2 > 0, pi will approach 1 as Xi2 +∞
For β2 > 0, pi will approach 0 as Xi2 −∞ yt = 0


9.30 10.1
Maximum Likelihood Chapter 10
Maximum likelihood estimation (MLE)

is used to estimate Probit and Logit functions.
Heteroskedasticity
The small sample properties of MLE
are not known, but in large samples
MLE is normally distributed, and it is Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
consistent and asymptotically efficient . that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
44
10.2 10.3
Regression Model
The Nature of Heteroskedasticity
yt = β1 + β2xt + e t
Heteroskedasticity is a systematic pattern in
the errors where the variances of the errors
are not constant. zero mean: E(e t) = 0
Ordinary least squares assumes that all homoskedasticity: var(e t) = σ 2
observations are equally reliable.
nonautocorrelation: cov(et, es) = 0 t≠s
For efficiency (accurate estimation/prediction)
reweight observations to ensure equal error
variance. heteroskedasticity: var(e t) = σt 2

10.4 10.5
Homoskedastic pattern of errors The Homoskedastic Case
consumption
yt yt
f(yt)
. n
.. p tio
.. . . m
. .. . . . nsu
.
. . .. .. . . . co
.
. . . . . .
.
. . . .
. . .. . . . .
.
.
x1 x2 x3 x4
income
xt income xt

10.6 10.7
The Heteroskedastic Case
Heteroskedastic pattern of errors
consumption
.
yt f(yt)
y
t
. . n
. io
. pt
.
. . su
m
. . n
. .. . . . . . .
co
.
. . . .. . . . . . rich people
. . .. . . . . .
. . . . . . . .
. . poor people
x1 x2 x3 income xt
income xt
45
10.8 10.9
Properties of Least Squares yt = β1 + β2xt + e t
heteroskedasticity: var(e t) = σt 2
1. Least squares still linear and unbiased. incorrect formula for least squares variance:
2. Least squares not efficient. var(b 2) = σ2
3. Usual formulas give incorrect standard Σ (xt − x )2
errors for least squares. correct formula for least squares variance:
4. Confidence intervals and hypothesis tests Σ σt 2(xt − x )2
based on usual standard errors are wrong. var(b 2) =
[Σ ( xt − x )2]2

10.10 10.11
Hal White’s Standard Errors
Two Types of Heteroskedasticity
White’s estimator of the least squares variance:
1. Proportional Heteroskedasticity.
Σ êt 2(xt − x )2 (continuous function(of xt, for example))
est.var(b2) =
[Σ ( xt − x )2]2
2. Partitioned Heteroskedasticity.
In large samples White’s standard error (discrete categories/groups)
(square root of estimated variance) is a
correct / accurate / consistent measure.

10.12 10.13
std.dev. proportional to xt
Proportional Heteroskedasticity
variance: var(e t) = σt 2 σt 2 = σ 2 xt
E(e t) = 0 var(e t) = σt 2 cov(et, es) = 0 t ≠ s
standard deviation: σt = σ xt
The variance is To correct for heteroskedasticity divide the model by xt

assumed to be
where σt 2 = σ 2 xt yt 1 x e
proportional to
the value of xt
= β1 + β2 t + t
xt xt xt xt
46
yt 1 x e 10.14 10.15
= β1 + β2 t + t Generalized Least Squares
xt xt xt xt
These steps describe weighted least squares:
1. Decide which variable is proportional to the
y*t = β1x*t1 + β2x*t2 + e*t heteroskedasticity (xt in previous example).
et 1 1 2. Divide all terms in the original model by the

var(e*t ) = var( )= var(e t) = x σ 2 xt square root of that variable (divide by xt ).
xt xt t
3. Run least squares on the transformed model
var(e*t ) = σ 2
which has new y*t, x*t1 and x*t2 variables
et is heteroskedastic, but et* is homoskedastic. but no intercept.

10.16 10.17
Partitioned Heteroskedasticity Reweighting Each Group’s Observations
yt = β1 + β2xt + e t “field” corn: yt = β1 + β2xt + et var(e t) = σ1 2
yt = bushels per acre of corn t = 1, ,100 ...

yt 1 x e
σ1 = β1 σ + β2 σ t + σ t t = 1, . . . ,80
xt = gallons of water per acre (rain or other) 1 1 1
error variance of “field” corn: var(e t) = σ1 2 “sweet” corn: yt = β1 + β2xt + et var(e t) = σ2 2

t = 1, . . . ,80
yt 1 x e
error variance of “sweet” corn: var(e t) = σ2 2 = β1 σ + β2 σt + σt t = 81, . . . ,100
t = 81, . . . ,100
σ2 2 2 2

10.18 10.19
Apply Generalized Least Squares Detecting Heteroskedasticity

Run least squares separately on data for each group. Determine existence and nature of heteroskedasticity:
^σ 2 provides estimator of σ 2 using 1. Residual Plots provide information on the

1 1
the 80 observations on “field” corn. exact nature of heteroskedasticity (partitioned
or proportional) to aid in correcting for it.
σ
^ 2 provides estimator of σ 2 using
2 2 2. Goldfeld-Quandt Test checks for presence
the 20 observations on “sweet” corn. of heteroskedasticity.
47
10.20 10.21
Residual Plots Goldfeld-Quandt Test
Plot residuals against one variable at a time
after sorting the data by that variable to try The Goldfeld-Quandt test can be used to detect
to find a heteroskedastic pattern in the data. heteroskedasticity in either the proportional case
. or for comparing two groups in the discrete case.
et .
. . .
. . . . .
. . . . . . . .. . . . .. For proportional heteroskedasticity, it is first necessary
.
0 . . . .. . . . . . ... to determine which variable, such as xt, is proportional
. xt to the error variance. Then sort the data from the
.. .
. largest to smallest values of that variable.
.

In the proportional case, drop the middle 10.22 10.23
r observations where r ≈ T/6, then run
More General Model
separate least squares regressions on the first
T 1 observations and the last T2 observations.
Structure of heteroskedasticity could be more complicated:
Ho: σ1 2= σ2 2
σt 2 = σ 2 exp{α1 zt1 + α2 zt2}
H1: σ1 2> σ2 2 Use F
Table zt1 and zt2 are any observable variables upon
Goldfeld-Quandt σ^1 2 which we believe the variance could depend.
Test Statistic
GQ = ~ F [T1-K1, T2-K2]
^
σ 2
2
Note: The function exp{.} ensures that σt2 is positive.
Small values of GQ support Ho while large values support H1.

10.24 11.1
More General Model Chapter 11
σt2 = σ 2 exp{α1 zt1 + α2 zt2}
ln(σt2) = ln(σ 2) + α1 zt1 + α2 zt2
ln(σt2) = α0 + α1 zt1 + α2 zt2 Autocorrelation
where α0 = ln(σ 2)
Least squares residuals, ê t

Ho: α1 = 0, α2 = 0
ln(e^t2) =α 0 +α 1z t1+α 2z t2 + νt
H1: α1 ≠ 0, α2 ≠ 0
and/or or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these
the usual F test programs or from the use of the information contained herein.
48
11.2 11.3
et crosses line not enough (attracting)
The Nature of Autocorrelation Postive . . .. . . . ..
. . . .. .
Auto. 0
.. .. . .
. t
For efficiency (accurate estimation/prediction)
all systematic information needs to be incor-
et crosses line randomly
porated into the regression model.
No . . .. . . . . . . . .
Auto. 0
. . .. . . . . .. .. . . . t
Autocorrelation is a systematic pattern in the .
errors that can be either attracting (positive)
et . . crosses line too much (repelling)
. . . .
or repelling (negative) autocorrelation. Negative . . .
0
Auto.
. . . . . . t
. . .

11.4 Order of Autocorrelation 11.5
Regression Model
yt = β1 + β2xt + e t yt = β1 + β2xt + e t
zero mean: E(e t) = 0 1st Order: e t = ρ e t−1 + νt
homoskedasticity: var(e t) = σ 2 2nd Order: e t = ρ1 e t−1 + ρ2 e t−2 + νt

nonautocorrelation: cov(et, es) = 0 t≠s 3rd Order: e t = ρ1 e t−1 + ρ2 e t−2 + ρ3 e t−3 + νt
We will assume First Order Autocorrelation:
autocorrelation: cov(et, es) ≠ 0 t≠s
AR(1) : e t = ρ e t−1 + νt

First Order Autocorrelation 11.6 Autocorrelation creates some 11.7
Problems for Least Squares :
yt = β1 + β2xt + et
1. The least squares estimator is still linear
e t = ρ e t−1 + νt where −1 < ρ < 1
and unbiased but it is not efficient.
E(νt) = 0 var(νt) = σν2 cov(νt, νs) = 0 t ≠ s

2. The formulas normally used to compute
These assumptions about νt imply the following about et : the least squares standard errors are no
longer correct and confidence intervals and
E(et) = 0 cov(et, et−k) = σ e2 ρk for k > 0 hypothesis tests using them will be wrong.
var(et) = σ e2 = σ ν 2
2
corr(et, et−k) = ρk for k > 0
1− ρ
49
11.8 11.9
Generalized Least Squares yt = β1 + β2xt + ρ e t−1 + νt
AR(1) : e t = ρ e t−1 + νt
substitute yt = β1 + β2xt + et
in for e t
et = yt − β1 − β2xt lag the
yt = β1 + β2xt + et errors
et−1 = yt−1 − β1 − β2xt−1 once
yt = β1 + β2xt + ρ e t−1 + νt
yt = β1 + β2xt + ρ(yt−1 − β1 − β2xt−1) + νt

Now we need to get rid of et−1
(continued) (continued)

11.10 yt*= yt − ρyt−1 β*1 = β1(1−ρ) 11.11
yt = β1 + β2xt + ρ(yt−1 − β1 − β2xt−1) + νt
x*t2 = xt − ρxt−1 y*t = β1* + β2x*t2 + νt
yt = β1 + β2xt + ρyt−1 − ρ β1 − ρ β2xt−1 + νt
yt − ρyt−1 = β1(1−ρ) + β2(xt−ρxt−1) + νt Problems estimating this model with least squares:
1. One observation is used up in creating the
transformed (lagged) variables leaving only
yt* = β1* + β2x*t2 + νt (T−1) observations for estimating the model.
y*t = yt − ρyt−1 x*t2 = (xt−ρxt−1) 2. The value of ρ is not known . We must find
some way to estimate it.
β*1 = β1(1−ρ)

11.12 11.13
Recovering the 1st Observation Recovering the 1st Observation
Dropping the 1st observation and applying least squares The 1st observation should fit the original model as:
is not the best linear unbiased estimation method.
y1 = β1 + β2x1 + e1
Efficiency is lost because the variance
with error variance: var(e1) = σe2 = σν2 /(1-ρ2).
of the error associated with the 1st observation
is not equal to that of the other errors.
We could include this as the 1st observation for our
estimation procedure but we must first transform it so
This is a special case of the heteroskedasticity that it has the same error variance as the other observations.
problem except that here all errors are assumed
to have equal variance except the 1st error. Note: The other observations all have error variance σ ν2.
50
11.14 11.15
y1 = β1 + β2x1 + e1 y1 = β1 + β2x1 + e1
with error variance: var(e1) = σe2 = σν2 /(1-ρ2).
Multiply through by 1-ρ2 to get:
The other observations all have error variance σ ν2.
1-ρ2 y1 = 1-ρ2 β1 + 1-ρ2 β2x1 + 1-ρ2 e1
Given any constant c : var(ce1) = c2 var(e1).
If c = 1-ρ2 , then var( 1-ρ2 e1) = (1-ρ2) var(e1). The transformed error ν1 = 1-ρ2 e1 has variance σν2 .
= (1-ρ2) σe2
= (1-ρ2) σν2 /(1-ρ2)
This transformed first observation may now be
= σν2 added to the other (T-1) observations to obtain
the fully restored set of T observations.
The transformation ν1 = 1-ρ2 e1 has variance σν2 .

11.16 11.17
Estimating Unknown ρ Value ê = y - b - b x
t t 1 2 t
If we had values for the e t’s, we could estimate: Next, estimate the following by least squares:
e t = ρ e t−1 + νt ê = ρ ê + ^ν
t t−1 t
First, use least squares to estimate the model: The least squares solution is:
yt = β1 + β2xt + et
Σ e^t êt-1
T
t=2
The residuals from this estimation are: ^ρ =
Σ êt-1
T
2
ê = y - b - b x t=2
t t 1 2 t

11.18 Testing for Autocorrelation 11.19
Durbin-Watson Test
The test statistic, d, is approximately related to ^
ρ as:
H o: ρ = 0 vs. H1: ρ ≠ 0 , ρ > 0, or ρ < 0
d ≈ 2(1−ρ)
^
The Durbin-Watson Test statistic, d, is : ρ = 0 , the Durbin-Watson statistic is d ≈ 2.
When ^
Σ (e^t − êt-1)2
T
t=2 ρ = 1 , the Durbin-Watson statistic is d ≈ 0.

When ^
d =
Σ êt2
T
Tables for critical values for d are not always
t=1
readily available so it is easier to use the p-value
that most computer programs provide for d.
Reject Ho if p-value < α, the significance level.
51
11.20 11.21
Prediction with AR(1) Errors
For h periods ahead, the best predictor is:
When errors are autocorrelated, the previous period’s
error may help us predict next period’s error.
^ ^ ^ ~
The best predictor, yT+1 , for next period is: yT+h = β1 + β2xT+h + ^ρh eT
^y ^ ^ ^~
T+1 = β 1 + β 2xT+1 + ρ eT
~
^ ^ Assuming | ^ρ | < 1, the influence of ^ρh eT
where β1 and β2 are generalized least squares diminishes the further we go into the future
~
estimates and e T is given by: (the larger h becomes).
e = y − ^β − ^β x
~
T T 1 2 T

12.1 12.2
Chapter 12
Pooling Time and Cross Sections
Pooling
yit = β1it + β2itx2it + β3itx3it + eit
Time-Series and
for the ith firm in the tth time period
Cross-Sectional Data
If left unrestricted,
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, this model requires different equations
programs or from the use of the information contained herein. for each firm in each time period.

12.3 12.4
Two-Equation SUR Model
Seemingly Unrelated Regressions
The investment expenditures (INV) of General Electric (G)
and Westinghouse(W) may be related to their stock market
SUR models impose the restrictions: value (V) and actual capital stock (K) as follows:
β1it = β1i β2it = β2i β3it = β3i
INV Gt = β1G + β2GV Gt + β3GK Gt + eGt
yit = β1i + β2ix2it + β3ix3it + eit
INV Wt = β1W + β2WV Wt + β3WK Wt + eWt
Each firm gets its own coefficients: β1i , β2i and β3i
but those coefficients are constant over time . i = G, W t = 1, . . . , 20
52
12.5 homoskedasticity assumption: 12.6
Estimating Separate Equations
σG = σW
2 2
We make the usual error term assumptions:
E(e Gt) = 0 E(e Wt) = 0 Dummy variable model assumes that σ G = σ W :
2 2
var(e Gt) = σ G2 var(e Wt) = σ2W

INVt = β1G + δ1Dt + β2GVt + δ2DtVt + β3GKt + δ3DtKt + et
cov( e Gt, eGs) = 0 cov( e Wt, eWs) = 0
For Westinghouse observations Dt = 1; otherwise Dt = 0.
For now make the assumption of no correlation
β1W = β1G + δ1 β3W = β3G + δ3
between the error terms across equations:
β2W = β2G + δ2
cov( e Gt, eWt) = 0 cov( e Gt, eWs) = 0

12.7 12.8
Problem with OLS on Each Equation Correlated Error Terms
The first assumption of the Gauss-Markov Any correlation between the

Theorem concerns the model specification.
dependent variables of two or
If the model is not fully and correctly specified more equations that is not due
the Gauss-Markov properties might not hold. to their explanatory variables
is by default due to correlated
Any correlation of error terms across equations
must be part of model specification. error terms.

Which of the following models would 12.9 12.10
be likely to produce positively correlated
errors and which would produce Joint Estimation of the Equations
negatively correlations errors?
INV Gt = β1G + β2GV Gt + β3GK Gt + eGt
1. Sales of Pepsi vs. sales of Coke.
(uncontrolled factor: outdoor temperature)
INV Wt = β1W + β2WV Wt + β3WK Wt + eWt
2. Investments in bonds vs. investments in stocks .
(uncontrolled factor: computer/appliance sales)
3. Movie admissions vs. Golf Course admissions.
(uncontrolled factor: weather conditions) cov(eGt , eWt ) = σGW
4. Sales of butter vs. sales of bread.
(uncontrolled factor: bagels and cream cheese)
53
12.11 12.12
Separate vs. Joint Estimation
Seemingly Unrelated Regressions
SUR will give exactly the same results as estimating
each equation separately with OLS if either or both
When the error terms of two or more equations of the following two conditions are true:
are correlated, efficient estimation requires the use
of a Seemingly Unrelated Regressions (SUR) 1. Every equation has exactly the same set of
type estimator to take the correlation into account. explanatory variables with exactly the same
values.
Be sure to use the Seemingly Unrelated Regressions (SUR)
procedure in your regression software program to estimate 2. There is no correlation between the error
any equations that you believe might have correlated errors. terms of any of the equations.

12.13 12.14
Test for Correlation Start with
the residuals
êGt and e^Wt
Test the null hypothesis of zero correlation from each
σ
^
GW =
1
T
Σ êGte^Wt
equation
Ηο: σGW = 0 estimated

separately. σ
^2 = 1
Σ êGt
2
G T
σ^ GW
2
σ
^2 = 1
Σ e^Wt
2
λ = T rGW =
2
σGW
^2 2 rGW W T
rGW = σ^G2 σ^W2
2
σG2 σ^W
^ 2
λ asy.
∼ χ(1)2 λ = T rGW
2
λ asy.
∼ χ2(1)

12.15 The Fixed Effects Model is conveniently 12.16
Fixed Effects Model
represented using dummy variables:
yit = β1it + β2itx2it + β3itx3it + eit D1i=1 if North D2i=1 if East D3i=1 if South D4i=1 if West
D1i=0 if not N D2i=0 if not E D3i=0 if not S D4i=0 if not W
Fixed effects models impose the restrictions:
yit = β11D1i + β12D2i + β13D3i + β14D4 i+ β2x2it + β3x3it + eit
β1it = β1i β2it = β2 β3it = β3
yit = millions of bushels of corn produced
For each ith cross section in the t th time period: x2it = price of corn in dollars per bushel
x3it = price of soybeans in dollars per bushel
yit = β1i + β2x2it + β3x3it + eit
Each cross-sectional unit gets its own intercept,
Each i th cross-section has its own constant β1i intercept. but each cross-sectional intercept is constant over time.
54
12.17 12.18
Test for Equality of Fixed Effects Random Effects Model
Ho : β11 = β12 = β13 = β14
H1 : Ho not true
yit = β1i + β2x2it + β3x3it + eit
The Ho joint null hypothesis may be tested with F-statistic:
(SSER − SSEU) / J J
β1i = β1 + µi
F= ~ F(NT − K)
SSEU / (NT − K)
SSER is the restricted error sum of squares (one intercept) β1 is the population mean intercept.
SSEU is the unrestricted error sum of squares (four intercepts)
N is the number of cross-sectional units (N = 4) µi is an unobservable random error that
K is the number of parameters in the model (K = 6)
J is the number of restrictions being tested (J = N−1 = 3) accounts for the cross-sectional differences.
T is the number of time periods

12.19 12.20
Random Intercept Term Random Effects Model
β1i = β1 + µi yit = β1i + β2x2it + β3x3it + eit

where i = 1, ... ,N
µi are independent of one another and of eit yit = (β1+µi) + β2x2it + β3x3it + eit
E(µi) = 0 var(µi) = σµ2 yit = β1 + β2x2it + β3x3it + (µi +e it)
Consequently, E(β1i) = β1 var(β1i) = σµ2

yit = β1 + β2x2it + β3x3it + νit

12.21 13.1
yit = β1 + β2x2it + β3x3it + νit Chapter 13
νit = (µi +e it)
νit has zero mean: E(νit) = 0
Simultaneous
νit is homoskedastic: var(νit) = σµ2 + σe2 Equations
The errors from the same firm in different time periods
are correlated:
cov(νit,νis) = σµ
2 t≠s
Models
The errors from different firms are always uncorrelated: that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
cov(νit,νjs) = 0
i≠j or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these
55
13.2 13.3
Keynesian Macro Model The Structural Equations
consumption is a function of income:
Assumptions of Simple Keynesian Model
c = β1 + β2 y
1. Consumption, c, is function of income, y.
2. Total expenditures = consumption + investment. income is either consumed or invested:
3. Investment assumed independent of income.
y=c+i

The Statistical Model 13.4 The Simultaneous Nature 13.5
of Simultaneous Equations
The consumption equation:
2. 1.
ct = β1 + β2 yt + et ct = β1 + β2 yt + et
5.
Since yt
The income identity: contains et
3.
yt = ct + it they are
4. yt = ct + it correlated

13.6 13.7
The Failure of Least Squares Single vs. Simultaneous Equations
The least squares estimators of Single Equation: Simultaneous Equations:

parameters in a structural simul-
taneous equation is biased and ct et
inconsistent because of the cor-
yt
relation between the random error ct
and the endogenous variables on
the right-hand side of the equation.
et yt it
56
13.8 13.9
Deriving the Reduced Form Deriving the Reduced Form
ct = β1 + β2 yt + et (1 − β2)ct = β1 + β2 it + et
yt = ct + it β1 β2
ct = + it + 1 et
(1−β2) (1−β2) (1−β2)
ct = β1 + β2(ct + it) + et
ct = π11 + π21 it + νt
(1 − β2)ct = β1 + β2 it + et
The Reduced Form Equation

13.10 13.11
Reduced Form Equation yt = ct + it
where ct = π11 + π21 it + νt
ct = π11 + π21 it + νt yt = π11 + (1+π21) it + νt
β1 β
π11 = π21 = (1−β
2
)
It is sometimes useful to give this equation
its own reduced form parameters as follows:
(1−β2) 2
1 yt = π12 + π22 it + νt
and νt = (1−β2)
+ et

ct = π11 + π21 it + νt 13.12
Identification 13.13
yt = π12 + π22 it + νt The structural parameters are β1 and β2.
Since ct and yt are related through the identity: The reduced form parameters are π11 and π21.
yt = ct + it , the error term, νt, of these two
Once the reduced form parameters are estimated,
equations is the same, and it is easy to
the identification problem is to determine if the
show that:
β1 orginal structural parameters can be expressed
π =π =11 12 (1−β2)
uniquely in terms of the reduced form parameters.
π^ 11 ^ π^ 21
β1 = β2 =
^
π22 = (1−π21) = 1
(1−β2) (1+π 21)
^
(1+π^ 21)
57
13.14
Identification The Identification Problem 13.15
An equation is under-identified if its structural A system of M equations
(behavorial) parameters cannot be expressed
in terms of the reduced form parameters. containing M endogenous
variables must exclude at least
An equation is exactly identified if its structural
(behavorial) parameters can be uniquely expres- M−1 variables from a given
sed in terms of the reduced form parameters. equation in order for the
An equation is over-identified if there is more parameters of that equation to
than one solution for expressing its structural be identified and to be able to
(behavorial) parameters in terms of the reduced
form parameters. be consistently estimated.
Copyright 1996 Lawrence C. Marsh Copyright 1996variables

Problem: right-hand endogenous Lawrence C. Marsh
13.16 13.17
Two Stage Least Squares yt2 and yt1 are correlated with the error terms.
Solution: First, derive the reduced form equations.

yt1 = β1 + β2 yt2 + β3 xt1 + et1
yt1 = β1 + β2 yt2 + β3 xt1 + et1
yt2 = α1 + α2 yt1 + α3 xt2 + et2

yt2 = α1 + α2 yt1 + α3 xt2 + et2
Solve two equations for two unknowns, yt1, yt2 :
yt1 = π11 + π21 xt1 + π31 xt2 + νt1

Problem: right-hand endogenous variables
yt2 and yt1 are correlated with the error terms. yt2 = π12 + π22 xt1 + π32 xt2 + νt2

13.18 2SLS: Stage II 13.19
2SLS: Stage I
yt1 = π11 + π21 xt1 + π31 xt2 + νt1 yt1 = ^yt1 + ^νt1 and yt2 = ^yt2 + ν^t2
yt2 = π12 + π22 xt1 + π32 xt2 + νt2 Substitue in yt1 = β1 + β2 yt2 + β3 xt1 + et1
for yt1 , yt2 yt2 = α 1 + α 2 yt1 + α 3 xt2 + et2
Use least squares to get fitted values:
^y = ^π + ^π x + ^π x ^ + ^ν ) + β x + e
t1 11 21 t1 31 t2 yt1 = ^yt1 + ^νt1 yt1 = β1 + β2 (y t2 t2 3 t1 t1
^y = π^ + ^π x + ^π x
t2 12 22 t1 32 t2
yt2 = ^yt2 + ^νt2 ^ + ^ν ) + α x + e
yt2 = α 1 + α 2 (y t1 t1 3 t2 t2
58
13.20 14.1
2SLS: Stage II (continued) Chapter 14
yt1 = β1 + β2 ^yt2 + β3 xt1 + ut1
Nonlinear
yt2 = α 1 + α 2 ^yt1 + α 3 xt2 + ut2
Least
where ut1 = β2^νt2 + et1 and ut2 = α 2^νt1 + et2
Run least squares on each of the above equations

Squares
to get 2SLS estimates: Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
~ ~ ~ ~
β1 , β2 , β3 , α 1 , ~α 2 and ~α 3 John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution

14.2 14.3
Review of Least Squares Principle Review of Least Squares
(minimize the sum of squared errors) (B.) Regression model without an intercept term:
(A.) “Regression” model with only an intercept term: y t = βx t + et ∂ SSE = − 2 Σ x (y − ^βx )= 0

∂α t t t
yt = α + et ∂ SSE = − 2 Σ (y − ^α) = 0 ^
et = yt − βx t Σ x ty t − Σ βx t = 0
2
∂α t
et = yt − α Σ y t − Σ ^α = 0 Σ et2 = Σ (yt − βx t)2 Σx t y t −^β Σx2t = 0

2 ^ = 0
Σ yt − Τ α ^β Σx2 = Σ x y
Σ et2 = Σ (yt − α) 2 t t t
Yields an exact analytical solution:
SSE = Σ (yt − βx t)
2 ^β = Σ x ty t
SSE = Σ (yt − α) α
^ = 1 Σ yt = y This yields an exact
T analytical solution: Σx2t

Review of Least Squares 14.4 Nonlinear Least Squares 14.5
(C.) Regression model with both an intercept and a slope: (D.) Nonlinear Regression model:
2
y t = α + βx t + et SSE = Σ (yt − α − βx t) y t = x tβ + et
PROBLEM: An exact
∂ SSE ^
= − 2 Σ x t(yt − ^α − βx t) = 0 SSE = Σ (yt − xtβ)
2 analytical solution to
∂β this does not exist.
∂ SSE ^ ^
∂ α = − 2 Σ (yt − α − βx t) = 0 ∂ SSE = − 2 Σ x ^β ln(x )(y − x ^β) = 0
This yields an exact ∂β t t t t
analytical solution:
α^ = y − ^β x
Must use numerical
y − ^α − ^β x = 0 search algorithm to
try to find value of
^β = Σ (xt−x)(y t−y) ^ ^ β to satisfy this.
Σx ty t − ^αΣx t − ^βΣx2t = 0 Σ [x tβ ln(xt)yt] − Σ [x t2β ln(xt)] = 0
Σ(xt−x)2
59
14.6 14.7
Find Minimum of Nonlinear SSE Conclusion
2
SSE SSE = Σ (yt − x β)t
The least squares principle

is still appropriate when the
model is nonlinear, but it is
harder to find the solution.
^ β
β

14.8 14.9
Optional Appendix
The Gauss-Newton Algorithm
Nonlinear least squares 1. Apply the Taylor Series Expansion to the
optimization methods: nonlinear model around some initial b(o).
2. Run Ordinary Least Squares (OLS) on the
The Gauss-Newton Method linear part of the Taylor Series to get b(m).
3. Perform a Taylor Series around the new b(m)
to get b(m+1) .
4. Relabel b(m+1) as b(m) and rerun steps 2.-4.
5. Stop when (b(m+1) − b(m) ) becomes very small.

The Gauss-Newton Method 14.10 yt = f(Xt,b(ο)) + f’(Xt,b(ο))(b - b(ο)) + εt∗ 14.11
yt = f(Xt,b) + εt for t = 1, . . . , n. yt - f(Xt,b(ο)) = f’(Xt,b(ο))b - f’(Xt,b(ο)) b(ο) + εt∗
yt - f(Xt,b(ο)) + f’(Xt,b(ο)) b(ο) = f’(Xt,b(ο))b + εt∗

Do a Taylor Series Expansion around the vector b = b(o) as follows:
f(Xt,b) = f(Xt,b(ο)) + f’(Xt,b(ο))(b - b(ο)) yt∗(ο) = f’(Xt,b(ο))b + εt∗ This is linear in b .

+ (b - b(ο))Tf’’(Xt,b(ο))(b - b(ο)) + Rt
where yt∗(ο) ≡ yt - f(Xt,b(ο)) + f’(Xt,b(ο)) b(ο)
yt = f(Xt,b(ο)) + f’(Xt,b(ο))(b - b(ο)) + εt∗

Gauss-Newton just runs OLS on this
where εt∗ ≡ (b - b(o))Tf’’(Xt,b(ο))(b - b(ο)) + Rt + εt transformed truncated Taylor series.
60
Gauss-Newton just runs OLS on this 14.12 14.13
transformed truncated Taylor series.
Recall that: y*(o) ≡ y − f(X,b(o)) + f’(X,b(ο)) b(ο)
yt∗(ο) = f’(Xt,b(ο))b + εt∗ or y∗(ο) = f’(X,b(ο))b + ∈∗

Now define: y∗∗(ο) ≡ y − f(X,b(o))
for t = 1, . . . , n in matrix terms
Therefore: y∗(ο) = y∗∗(ο) + f’(X,b(ο)) b(ο)
^b = [ f’(X,b )T f’(X,b )]-1 f’(X,b )T y∗
(ο) (ο) (ο) (ο) Now substitute in for y∗ in Gauss-Newton solution:
This is analogous to linear OLS where ^b = [ f’(X,b )T f’(X,b )]-1 f’(X,b )T y∗

(ο) (ο) (ο) (ο)
y = Xb + ∈ led to the solution: ^b = (XTX)−1XTy to get:
except that X is replaced with the matrix of first
partial derivatives: f’(Xt,b(ο)) and y is replaced by y∗(ο)
^b = b(o) + [ f’(X,b(ο))T f’(X,b(ο))]-1 f’(X,b(ο))T y∗∗(ο)
(i.e. “y” = y*(ο) and “X” = f’(X,b(ο)) )

14.14 14.15
Thus, the Gauss-Newton (nonlinear OLS) solution
^b = b(o) + [ f’(X,b(ο))T f’(X,b(ο))]-1 f’(X,b(ο))T y∗∗(ο) can be expressed in two alternative, but equivalent,
forms:
^
Now call this b value b(1) as follows: 1. replacement form :
b(1) = b(ο) + [ f’(X,b(ο))T f’(X,b(ο) )]-1 f’(X,b

(ο) )T y∗∗ (ο)
b(m+1) = [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y*(m)
More generally, in going from interation m to

iteration (m+1) we obtain the general expression: 2. updating form:
b(m+1) = b(m) + [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y∗∗(m) b(m+1) = b(m) + [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y∗∗(m)

14.17
14.16
For example, consider Durbin’s Method of estimating Durbin’s Method is to set aside a copy of the equation,
the autocorrelation coefficient under a first-order lag it once, multiply by ρ and subtract the new equation
autoregression regime: from the original equation, then move the ρyt-1 term to
y t = b1 + b2 Xt 2 + . . . + bK Xt K + εt for t = 1, . . . , n. the right side and estimate ρ along with the b’s by OLS.
y t = b1 + b2 X t 2 + b3 X t 3 + εt for t = 1, . . . , n.
εt = ρ ε t - 1 + ut where u t satisfies the conditions
Lag once and multiply by ρ: where εt = ρ εt - 1 + ut
E u t = 0 , E u 2t = su2, E u t u s = 0 for s ≠ t.
ρ y t-1 = ρ b1 + ρ b2 Xt -1, 2 + ρ b3 Xt -1, 3 + ρ εt -1
Therefore, u t is nonautocorrelated and homoskedastic.
Subtract from the original and move ρ y t-1 to right side:
Durbin’s Method is to set aside a copy of the equation,
lag it once, multiply by ρ and subtract the new equation
yt = b1(1-ρ) + b2(Xt 2 - ρXt-1, 2) + b3(Xt 3 − ρXt-1, 3)+ ρy t-1+ ut
from the original equation, then move the ρyt-1 term to
the right side and estimate ρ along with the bs by OLS.
61
14.18 14.19
The structural (restricted,behavorial) equation is:
α1 = b1(1-ρ) α2 = b2 α3= - b2ρ α4 = b3 α5= - b3ρ α6= ρ
yt = b1(1-ρ) + b2(Xt 2 - ρXt-1, 2) + b3(Xt 3 - ρXt-1, 3) + ρy t-1+ ut
Given OLS estimates: ^
α1 α^2 ^α3 ^α4 ^
α5 ^α6
Now Durbin separates out the terms as follows:
we can get three separate and distinct estimates for ρ :
yt = b1(1-ρ) + b2Xt 2 - b2ρXt-1 2 + b3Xt 3 - b3ρXt-1 3+ ρy t-1+ ut
^ρ = − ^α 3 − ^α 5 ^ρ = ^α
The corresponding reduced form (unrestricted) equation is: ρ^ =
^α ^α 6
2 4
yt = α1 + α2Xt, 2 + α3Xt-1, 2 + α4Xt, 3 + α5Xt-1, 3+ α6yt-1+ u t These three separate estimates of ρ are in conflict !!!
It is difficult to know which one to use as “the”
legitimate estimate of ρ. Durbin used the last one.
α1 = b1(1-ρ) α2 = b2 α3= - b2ρ α4 = b3 α5= - b3ρ α6= ρ

14.21
14.20
The problem with Durbin’s Method is that it ignores yt = b1(1-ρ) + b2Xt 2 - b2ρXt-1, 2 + b3Xt 3 - b3ρXt-1, 3+ ρyt-1+ ut
the inherent nonlinear restrictions implied by this
structural model. To get a single (i.e. unique) estimate ∂ yt ∂ yt ∂ yt ∂ y t
for ρ the implied nonlinear restrictions must be f’(Xt,b) = [ ∂b 1 ∂b2 ∂b 3 ∂ρ
]
incorporated directly into the estimation process.
∂ yt ∂ yt
Consequently, the above structural equation should be
= (1 − ρ) = (X t, 2 − ρ X t-1,2)
estimated using a nonlinear method such as the
Gauss-Newton algorithm for nonlinear least squares.
∂b 1 ∂b2
∂ yt
yt = b1(1-ρ) + b2Xt 2 - b2ρXt -1, 2 + b3Xt 3 - b3ρXt -1, 3+ ρyt-1+ ut ∂b 3
= (X t, 3 − ρ X t-1,3)
∂ yt
= ( - b1 - b2Xt-1,2 - b3Xt-1,3+ y t-1 )
∂ρ

15.1
14.22
^β
(m+1) = [ f’(X,b(m) )T f’(X,b
(m) )]-1 f’(X,b (m) )T y ∗(m)
Chapter 15
where yt∗(m) ≡ yt - f(Xt,b(m)) + f’(Xt,b(m)) b(m)
b1(m) Distributed
b2(m)
Iterate until convergence. b(m) =
b3(m)
ρ(m)
Lag Models
∂ yt ∂ yt ∂ yt ∂ yt
f’(Xt,b(m)) = [ ]
∂b1(m) ∂b2(m) ∂b3(m) ∂ρ(m) Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
f(Xt,b) = b1(1-ρ) + b2Xt 2 - b2ρXt-1 2 + b3Xt 3 - b3ρXt-1 3+ ρy t-1 John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution
62
Copyright 1996 Lawrence C. Marsh15.2 Copyright 1996 Lawrence C. Marsh15.3
The Distributed Lag Effect Unstructured Lags

Effect Effect Effect y t = α + β0 xt + β1 xt-1 + β2 xt-2 + . . . + βn xt-n + et
at time t at time t+1 at time t+2
“n” unstructured lags
no systematic structure imposed on the β’s

Economic action
at time t the β’s are unrestricted

15.4 15.5
Problems with Unstructured Lags The Arithmetic Lag Structure
1. n observations are lost with n-lag setup. proposed by Irving Fisher (1937) β0 = (n+1) γ
β1 = nγ
the lag weights decline linearly
β2 = (n-1) γ
2. high degree of multicollinearity among xt-j’s. β3 = (n-2) γ
.
Imposing the relationship: .
3. many degrees of freedom used for large n. β# = (n - # + 1) γ βn-2 = 3γ
βn-1 = 2γ
βn = γ
4. could get greater precision using structure. only need to estimate one coefficient, γ ,
instead of n+1 coefficients, β0 , ... , βn .
Copyright 1996 Lawrence C. Marsh15.6 Copyright 1996 Lawrence C. Marsh

15.7
Arithmetic Lag Structure Arithmetic Lag Structure
y t = α + γ [(n+1)x t + nx t-1 + (n-1)x t-2 + . . . + xt-n] + et
y t = α + β0 x t + β1 x t-1 + β2 x t-2 + . . . + βn x t-n + et
Step 3: Define zt .
Step 1: impose the restriction: β# = (n - # + 1) γ
zt = [(n+1)x t + nx t-1 + (n-1)x t-2 + . . . + xt-n]
y t = α + (n+1) γx t + n γx t-1 + (n-1) γx t-2 + . . . + γx t-n + et
Step 4: Decide number of lags, n.
Step 2: factor out the unknown coefficient, γ .

For n = 4: zt = [ 5x t + 4x t-1 + 3x t-2 + 2x t-3 + x t-4]
y t = α + γ [(n+1)x t + nx t-1 + (n-1)x t-2 + . . . + xt-n] + et Step 5: Run least squares regression on:
y t = α + γ zt + et
63
Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh15.9
15.8
Arithmetic Lag Structure Polynomial Lag Structure
proposed by Shirley Almon (1965)
βi
n = the length of the lag
the lag weights fit a polynomial
β0 = (n+1)γ . p = degree of polynomial
β1 = nγ . βi = γ0 + γ1i + γ2i2 +...+ γpip where i = 1, . . . , n

β2 = (n-1)γ . linear
. lag
. structure For example, a quadratic polynomial:
β0 = γ0
. β1 = γ0 + γ1 + γ2
βn = γ . βi = γ0 + γ1i + γ2i 2
β2 = γ0 + 2γ1 + 4γ2
where i = 1, . . . , n β3 = γ0 + 3γ1 + 9γ2
0 1 2 . . . . . n n+1 β4 = γ0 + 4γ1 + 16γ2
p = 2 and n = 4
i

15.10 15.11
Polynomial Lag Structure Polynomial Lag Structure
y t = α + β0 x t + β1 x t-1 + β2 x t-2 + β3 x t-3 + β4 x t-4 + et y t = α + γ0 [xt + x t-1 + x t-2 + x t-3 + xt-4]
+ γ1 [xt + x t-1 + 2x t-2 + 3x t-3 + 4x t-4]
Step 1: impose the restriction: βi = γ0 + γ1i + γ2i2 + γ2 [xt + x t-1 + 4x t-2 + 9x t-3 + 16x t-4] + et
y t = α + γ0 x t + (γ0 + γ1 + γ2)x t-1 + (γ0 + 2γ1 + 4γ2)x t-2
+ (γ0 + 3γ1 + 9γ2)x t-3+ (γ0 + 4γ1 + 16γ2)x t-4 + et Step 3: Define zt0 , zt1 and zt2 for γ0 , γ1 , and γ2.
Step 2: factor out the unknown coefficients: γ0, γ1, γ2.

z t0 = [xt + x t-1 + x t-2 + x t-3 + xt-4]
y t = α + γ0 [xt + x t-1 + x t-2 + x t-3 + xt-4] z t1 = [xt + x t-1 + 2x t-2 + 3x t-3 + 4x t- 4 ]

+ γ1 [xt + x t-1 + 2x t-2 + 3x t-3 + 4x t-4] z t2 = [xt + x t-1 + 4x t-2 + 9x t-3 + 16x t- 4]
+ γ2 [xt + x t-1 + 4x t-2 + 9x t-3 + 16x t-4] + et

15.12 15.13
Polynomial Lag Structure
Polynomial Lag Structure
Step 4: Regress yt on zt0 , zt1 and zt2 .
y t = α + γ0 z t0 + γ1 z t1 + γ2 z t2 + et βi β
. . β.
β1
2
3
β0
^ .
Step 5: Express βi‘s in terms of ^γ0 , ^γ1 , and ^γ2. .β 4
^ = ^γ
β 0 0
^
β1 = ^γ0 + ^γ + ^γ
1 2
^ = ^γ +
β 2γ^ + 4γ ^ 0 1 2 3 4 i
2 0 1 2
^
β3 = ^γ0 + ^ + 9γ
3γ ^
^
1 2 Figure 15.3
β4 = γ^0 + ^ + 16γ^
4γ 1 2
64
15.15
15.14
Geometric Lag Structure Geometric Lag Structure
infinite distributed lag model: infinite unstructured lag:
y t = α + β0 xt + β1 xt-1 + β2 xt-2 + . . . + e t yt = α + β0 xt + β1 xt-1 + β2 xt-2 + β3 xt-3 + . . . + et
∞ β0 β
y t = α + Σ βi xt-i + et =
i =0 (15.3.1) β1 = βφ
Substitute βi = β φi β2 = β φ2
geometric lag structure: β3 = β φ3
...
βi = β φi where |φ| < 1 and βφi > 0 . infinite geometric lag:
yt = α + β(xt + φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + et

15.16 15.17
Geometric Lag Structure Geometric Lag Structure
yt = α + β(xt + φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + et
βi β0 = β .
impact multiplier :
β
β1 = β φ . geometrically
declining
interim multiplier (3-period) : β2 = β φ 2 . weights
β3 = β φ3 . .
β+βφ+β φ2 β4 = β φ4
long-run multiplier : 0 1 2 3 4 i
β
β(1 + φ + φ2 + φ3 + . . . ) = 1− φ
Figure 15.5

15.18 15.19
Geometric Lag Structure The Koyck Transformation
y t = α + β(x t + φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + et Lag everything once, multiply by φ and subtract from original:
Problem: yt = α + β(xt + φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + et

How to estimate the infinite number
φ yt-1 = φ α + β(φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + φ et-1
of geometric lag coefficients ???
yt − φ yt-1 = α(1− φ) + βxt + (et − φ et-1)

Answer:
Use the Koyck transformation.
65
Copyright 1996 Lawrence C. Marsh
15.20 Copyright 1996 Lawrence C. Marsh
15.21
The Koyck Transformation The Koyck Transformation
yt = α(1− φ) + φ yt-1 + βxt + (et − φ et-1)
yt − φ yt-1 = α(1− φ) + βxt + (et − φ et-1)
Defining δ1 = α(1− φ) , δ2 = φ , and δ3 = β ,
Solve for yt by adding φ yt-1 to both sides: use ordinary least squares:
yt = α(1− φ) + φ yt-1 + βxt + (et − φ et-1) yt = δ1 + δ2 yt-1 + δ3xt + νt

^β = ^δ
The original structural 3
parameters can now be ^φ = ^δ
estimated in terms of 2
these reduced form
yt = δ1 + δ2 yt-1 + δ3xt + νt ^α = ^δ / (1− δ^ )
1 2
parameter estimates.

15.23
Geometric Lag Structure Durbin’s h-test
yt = ^α + ^β(xt + φ^ xt-1 + ^φ2 xt-2 + ^φ3 xt-3 + . . .) + êt
for autocorrelation
^ ^
β0 = β Estimates inconsistent if geometric lag model is autocorrelated,
^ but Durbin-Watson test is biased in favor of no autocorrelation.
β1 = ^β φ^
^ ^ ^2
β2 = βφ T−1
h= 1− d
^β 2 1 − ( T − 1)[se(b2)]2
3 = ^β ^φ3
.
. h = Durbin’s h-test statistic
. T = sample size
d = Durbin-Watson test statistic
^ + β^0 xt + ^β1 xt-1 + ^β2 xt-2 + ^β3 xt-3 + . . . + êt
yt = α se(b2) = standard error of the estimate b2

15.25
Adaptive Expectations
Adaptive Expectations
yt = α + β x*t + e t adjust expectations

based on past realization:
x*t - x*t-1 = λ (x t-1 - x* t-1)

yt = credit card debt
x*t = expected (anticipated) income

(x*t is not observable)
66
Adaptive Expectations Adaptive Expectations 15.27
yt = α + β x*t + e t
x*t - x*t-1 = λ (x t-1 - x* t-1)
Lag this model once and multiply by (1− λ):
rearrange to get:
(1− λ)yt-1 = (1− λ)α + (1− λ)β x*t-1 + (1− λ)et-1
x*t = λ xt-1 + (1- λ) x*t-1
or subtract this from the original to get:
yt = αλ - (1− λ)yt-1+ β [x*t - (1− λ)x*t-1]

λ xt-1 = [x*t - (1- λ) x*t-1]
+ et - (1− λ)et-1

Adaptive Expectations Adaptive Expectations 15.29
yt = αλ - (1− λ)yt-1+ β [x*t - (1− λ)x*t-1]

yt = αλ - (1− λ)yt-1+ βλxt-1 + ut
+ et - (1− λ)et-1
Use ordinary least squares regression on:

Since λ xt-1 = [x*t - (1- λ) x*t-1]
we get: yt = β1 + β2yt-1+ β3xt-1 + ut
yt = αλ - (1− λ)yt-1+ βλxt-1 + ut ^

β3
and we get: ^
β1 ^
β=
α
^ =
where ut = et - (1− λ)et-1 ^
λ = (1− β2)
^ ^
(1− β2)
^
(1− β2)

Partial Adjustment 15.31
Partial Adjustment
y*t = α + β xt + e t yt - y t-1 = γ (y*t - y t-1)

= γ (α + βxt + et - y t-1)
inventories partially adjust , 0 < γ < 1, = γα + γβxt - γyt-1+ γet
towards optimal or desired level, y*t :
Solving for yt :
yt - y t-1 = γ (y*t - y t-1)
yt = γα + (1 - γ)yt-1 + γβxt + γet
67
16.1
Partial Adjustment 15.32
Chapter 16
yt = γα + (1 - γ)yt-1 + γβxt + γet

Time
yt = β1 + β2yt-1+ β3xt + νt Series
Use ordinary least squares regression to get: Analysis
^
^γ = (1− β^ )
^
β1 ^ β3
β=
2 ^α = that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
(1− β2)
^
(1− β2)
^ John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution

16.3
Previous Chapters used Economic Models Time Series Analysis does not generally
incorporate all of the economic relationships
1. economic model for dependent variable of interest. found in economic models.
2. statistical model consistent with the data.

Times Series Analysis uses
more statistics and less economics.
3. estimation procedure for parameters using the data.
4. forecast variable of interest using estimated model. Time Series Analysis is useful for short term forecasting only.
Long term forecasting requires incorporating more involved

Times Series Analysis does not use this approach. behavioral economic relationships into the analysis.

16.5
Univariate Time Series Analysis can be used Three types of Univariate Time Series Analysis
to relate the current values of a single economic processes will be discussed in this chapter:
variable to:
1. its past values 1. autoregressive (AR)
2. the values of current and past random errors
2. moving average (MA)
Other variables are not used 3. autoregressive moving average (ARMA)

in univariate time series analysis.
68
Copyright 1996 Lawrence C. Marsh16.6 Copyright 1996 Lawrence C. Marsh16.7
Multivariate Time Series Analysis can be First-Order Autoregressive Processes, AR(1):
used to relate the current value of each of
several economic variables to:
y t = δ + θ1y t-1+ et, t = 1, 2,...,T. (16.1.1)
1. its past values.

δ is the intercept.
2. the past values of the other forecasted variables .
θ1 is parameter generally between -1 and +1.
3. the values of current and past random errors. et is an uncorrelated random error with
mean zero and variance σ e 2 .
Vector autoregressive models discussed later in
this chapter are multivariate time series models.

16.9
Autoregressive Process of order p, AR(p) : Properties of least squares estimator:
y t = δ + θ1y t-1 + θ2y t-2 +...+ θpy t-p + et (16.1.2)
AR models always have one or more lagged
δ is the intercept. dependent variables on the right hand side.
θi’s are parameters generally between -1 and +1. Consequently, least squares is no longer a
et is an uncorrelated random error with best linear unbiased estimator (BLUE),
mean zero and variance σ e 2 . but it does have some good asymptotic
properties including consistency.

16.10 16.11
AR(2) model of U.S. unemployment rates Choosing the lag length, p, for AR(p):
y t = 0.5051 + 1.5537 yt-1 - 0.6515 yt-2 The Partial Autocorrelation Function (PAF)
(0.1267) (0.0707) (0.0708)
The PAF is the sequence of correlations between
(yt and yt-1), (yt and yt-2), (yt and yt-3), and so on,
positive given that the effects of earlier lags on yt are
held constant.
negative
Note: Q1-1948 through Q1-1978 from J.D.Cryer (1986) see unempl.dat
69
16.13
Partial Autocorrelation Function 16.12
Using AR Model for Forecasting:
Data simulated unemployment rate: yT-1 = 6.63 and yT = 6.20
yt = 0.5 yt-1 + 0.3 yt-2 + et ^ ^ ^
from this model: ^y
T+1 = δ + θ1 yT + θ2 yT-1
^θ θkk is the last (k th) coefficient = 0.5051 + (1.5537)(6.2) - (0.6515)(6.63)

kk
1
in a k th order AR process. = 5.8186
^y ^ ^ ^
T+2 = δ + θ1 yT+1 + θ2 yT
2/ T = 0.5051 + (1.5537)(5.8186) - (0.6515)(6.2)

0 k = 5.5062
−2/ T ^y ^ ^ ^
T+1 = δ + θ1 yT + θ2 yT-1
This sample PAF suggests a second = 0.5051 + (1.5537)(5.5062) - (0.6515)(5.8186)

= 5.2693
−1 order process AR(2) which is correct.

16.14 16.15
Moving Average Process of order q, MA(q): An MA(1) process:
y t = µ + et + α1et-1 + α2et-2 +...+ αqet-q + et (16.2.1) y t = µ + et + α1et-1 (16.2.2)
µ is the intercept.
αi‘s are unknown parameters. Minimize sum of least squares deviations:
et is an uncorrelated random error with T T

S(µ,α1) = Σ e2t = Σ(y t - µ - α1et-1)
2
mean zero and variance σ e 2 . (16.2.3)
t=1 t=1

16.17
Stationary vs. Nonstationary
First Differencing is often used to transform
a nonstationary series into a stationary series:
stationary:
A stationary time series is one whose mean, variance,
and autocorrelation function do not change over time.
yt = z t - z t-1
nonstationary:
A nonstationary time series is one whose mean, where z t is the original nonstationary series
variance or autocorrelation function change over time.
and yt is the new stationary series.
70
16.19
16.18
Autocorrelation Function
Choosing the lag length, q, for MA(q): Data simulated yt = et − 0.9 et-1
from this model:
The Autocorrelation Function (AF)
rkk 1 This sample AF suggests a first order
process MA(1) which is correct.
The AF is the sequence of correlations between
(yt and yt-1), (yt and yt-2), (yt and yt-3), and so on,
without holding the effects of earlier lags 2/ T
0 k
on yt constant.
−2/ T
The PAF controlled for the effects of previous lags rkk is the last (k th) coefficient
but the AF does not control for such effects.
−1 in a k th order MA process.

16.21
Autoregressive Moving Average Integrated Processes
ARMA(p,q)
A time series with an upward or downward
An ARMA(1,2) has one autoregressive lag trend over time is nonstationary.
and two moving average lags:
Many nonstationary time series can be made
stationary by differencing them one or more times.
yt = δ + θ1yt-1 + et + α1et-1 + α2 et-2
Such time series are called integrated processes.

16.22 16.23
The number of times a series must be Unit Root
differenced to make it stationary is the
order of the integrated process, d.
zt = θ1zt-1 + µ + et + α1et -1 (16.3.2)
An autocorrelation function, AF,
with large, significant autocorrelations
for many lags may require more than
-1 < θ1 < 1 stationary ARMA(1,1)
one differencing to become stationary.
θ1 = 1 nonstationary process
Check the new AF after each differencing
to determine if further differencing is needed.
θ1 = 1 is called a unit root
71
16.25
Unit Root Tests Unit Root Tests
zt - zt-1 = (θ1- 1)zt-1 + µ + et + α1et -1 H 0: θ*1 = 0 vs. H 1: θ*1 < 0 (16.3.4)
∆zt = θ1*zt -1 + µ + et + α1et -1 (16.3.3) Computer programs typically use one of

the following tests for unit roots:
where ∆zt = zt - zt-1 and θ1 = θ1- 1
*
Dickey-Fuller Test
Testing θ*1 = 0 is equivalent to testing θ1 = 1 Phillips-Perron Test

16.27
Autoregressive Integrated Moving Average The Box-Jenkins approach:

ARIMA(p,d,q)
1. Identification
determining the values of p, d, and q.
An ARIMA(p,d,q) model represents an 2. Estimation
AR(p) - MA(q) process that has been linear or nonlinear least squares.
differenced (integrated, I(d)) d times. 3. Diagnostic Checking
model fits well with no autocorrelation?
y t = δ + θ1y t-1 +...+ θpy t-p + et + α1et-1 +... + αq et-q 4. Forecasting
short-term forecasts of future yt values.

16.29
Vector Autoregressive (VAR) Models Vector Autoregressive (VAR) Models
Use VAR for two or more interrelated time series: 1. extension of AR model.
2. all variables endogenous.
y t = θ0 + θ1y t-1 +...+ θpy t-p + φ1x t-1 +... + φp x t-p + et
3. no structural (behavioral) economic model.
x t = δ 0 + δ 1y t-1 +...+ δ py t-p + α1x t-1 +... + αp x t-p + ut 4. all variables jointly determined (over time).
5. no simultaneous equations (same time).
72
16.31
16.30
The random error terms in a VAR model
may be correlated if they are affected by
Least Squares is Consistent
relevant factors that are not in the model
such as government actions or
national/international events, etc. Consequently, regardless of whether
the VAR random error terms are
correlated or not, least squares estimation
Since VAR equations all have exactly the
of each equation separately will provide
same set of explanatory variables, the usual consistent regression coefficient estimates.
seemingly unrelation regression estimation
produces exactly the same estimates as
least squares on each equation separately.

16.33
VAR Model Specification Spurious Regressions
yt = β1 + β2 xt + εt
To determine length of the lag, p, use:
where εt = θ1 εt-1 + νt
1. Akaike’s AIC criterion
-1 < θ1 < 1 I(0) (i.e. d=0)
2. Schwarz’s SIC criterion θ1 = 1 I(1) (i.e. d=1)
If θ1 =1 least squares estimates of β2 may

These methods were discussed in Chapter 15. appear highly significant even when true β2 = 0 .

16.34 16.35
Cointegration Cointegrated VAR(1) Model
yt = β1 + β2 xt + εt VAR(1) model:
yt = θ0 + θ1yt-1 + φ1xt-1 + et
If xt and yt are nonstationary I(1)
we might expect that εt is also I(1). xt = δ0 + δ1yt-1 + α1xt-1 + ut
However, if xt and yt are nonstationary I(1)

but εt is stationary I(0), then xt and yt are If xt and yt are both I(1) and are cointegrated,
said to be cointegrated. use an Error Correction Model, instead of VAR(1).
73
16.37
Error Correction Model Error Correction Model
∆yt = yt - y t-1 and ∆xt = xt - x t-1 ∆yt = θ*0 + γ1(yt-1 - β1 - β2 xt-1) + et

∆xt = δ0 + γ2(yt-1 - β1 - β2 xt-1) + ut
*
∆yt = θ0 + (θ1-1)yt-1 + φ1xt-1 + et

θ0 = θ0 + γ1β1
* φ1 δ1
γ1 =
∆xt = δ0 + δ1yt-1 + (α1-1)xt-1 + ut 1 - α1 α1 - 1
β2 =
δ1 γ2 = δ1
δ0 = δ0 + γ2β1
*
(continued)

16.38 16.39
Estimating an Error Correction Model Estimating an Error Correction Model
Step 1: Step 2:
Estimate by least squares: Estimate by least squares:

yt-1 = β1 + β2 xt-1 + εt-1
∆yt = θ0 + γ1 ^εt-1 + et
*
to get the residuals:

∆xt = δ0 + γ2 ε^ t-1 + ut
^ *
^ε ^
t-1 = yt-1 - β1 - β2 xt-1

17.1
Chapter 17
Using cointegrated I(1) variables in a
VAR model expressed solely in terms
of first differences and lags of first Guidelines for
differences is a misspecification.
Research Project
The correct specification is to use an
Error Correction Model Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
74
17.3
What Book Has Covered Topics for This Chapter
ð Formulation 1. Types of Data by Source
economic ====> econometric.
2. Nonexperimental Data
ð Estimation 3. Text Data vs. Electronic Data
selecting appropriate method.
ð Interpretation 4. Selecting a Topic
how the x ’s impact on the y .
t t 5. Writing an Abstract
ð Inference 6. Research Report Format
testing, intervals, prediction.

17.4 17.5
Types of Data by Source Time vs. Cross-Section
i) Experimental Data Time Series Data

data collected at distinct points in time
from controlled experiments.
(e.g. weekly sales, daily stock price, annual
ii) Observational Data budget deficit, monthly unemployment.)
passively generated by society. Cross Section Data
data collected over samples of units, individuals,
iii) Survey Data households, firms at a particular point in time.
data collected through interviews. (e.g. salary, race, gender, unemployment by state.)

17.6 17.7
Micro vs. Macro Flow vs. Stock
Micro Data: Flow Data:

data collected on individual economic outcome measured over a period of time,
decision making units such as individuals, such as the consumption of gasoline during
households or firms. the last quarter of 1997.
Macro Data: Stock Data:
data resulting from a pooling or aggregating outcome measured at a particular point in
over individuals, households or firms at the time, such as crude oil held by Chevron in
local, state or national levels. US storage tanks on April 1, 1997.
75
Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh17.9
17.8
Quantitative vs. Qualitative International Data
Quantitative Data: International Financial Statistics (IMF monthly).

outcomes such as prices or income that may Basic Statistics of the Community (OECD annual).
be expressed as numbers or some transfor- Consumer Price Indices in the European
mation of them (e.g. wages, trade deficit). Community (OECD annual).
World Statistics (UN annual).
Qualitative Data:
Yearbook of National Accounts Statistics (UN).
outcomes that are of an “either-or” nature
(e.g. male, home owner, Methodist, bought FAO Trade Yearbook (annual).
car last year, voted in last election).

17.11
United States Data State and Local Data
Survey of Current Business (BEA monthly). State and Metropolitan Area Data Book
Handbook of Basic Economic Statistics (BES). (Commerce and BC, annual).
Monthly Labor Review (BLS monthly).
CPI Detailed Report (BLS, annual).
Federal Researve Bulletin (FRB monthly).
Statistical Abstract of the US (BC annual). Census of Population and Housing
Economic Report of the President (CEA annual). (Commerce, BC, annual).
Economic Indicators (CEA monthly). County and City Data Book
Agricultural Statistics (USDA annual). (Commerce, BC, annual).
Agricultural Situation Reports (USDA monthly).

17.13
Citibase on CD-ROM Citibase on CD-ROM
(continued)
• Financial series: interest rates, stock market, etc.
• Business formation, investment and consumers. • Labor statistics: unemployment, households.
• Construction of housing. • National income and product accounts in detail.
• Manufacturing, business cycles, foreign trade. • Forecasts and projections.
• Prices: producer and consumer price indexes. • Business cycle indicators.
• Industrial production. • Energy consumption, petroleum production, etc.
• Capacity and productivity. • International data series including trade
statistics.
• Population.
76
17.15
17.14
Resources for Economists Internet Data Sources
A few of the items on Bill Goffe’s Table of Contents:
Resources for Economists by Bill Goffe
• Shortcut to All Resources.
http://econwpa.wustl.edu/EconFAQ/EconFAQ.html • Macro and Regional Data.
• Other U.S. Data.
• World and Non-U.S. Data.
Bill Goffe provides a vast database of information
• Finance and Financial Markets.
about the economics profession including economic
organizations, working papers and reports, • Data Archives.
and economic data series. • Journal Data and Program Archives.

17.17
Useful Internet Addresses Data from Surveys
http://seamonkey.ed.asu.edu/~behrens/teach/ WWW _data.html The survey process has four distinct aspects:
http://www.sims.berkeley.edu/~hal/pages/interesting.html
i) identify the population of interest.
http://www.stls.frb.org FED RESERVE BK - ST. LOUIS
http://www.bls.gov BUREAU OF LABOR STATISTICS ii) designing and selecting the sample.
http://nber.harvard.edu NAT’L BUR. ECON. RESEARCH
iii) collecting the information.
http://www.inform.umd.edu:8080/EdRes/Topic/EconData/
.www/econdata.html UNIVERSITY OF MARYLAND iv) data reduction, estimation and inference.
http://www.bog.frb.fed.us FEB BOARD OF GOVERNORS
http://www.webcom.com/~yardeni/economic.html

17.18 17.19
Controlled Experiments Economic Data Problems
Controlled experiments were done on these topics: I. poor implicit experimental design
1. Labor force participation: negative income tax : (i) collinear explanatory variables.
guaranteed minimum income experiment.
(ii) measurement errors.
2. National cash housing allowance experiment:
impact on demand and supply of housing. II. inconsistent with theory specification
3. Health insurance: medical cost reduction: (i) wrong level of aggregation.
sensitivity of income groups to price change.
(ii) missing observations or variables.
4. Peak-load pricing and electricity use:
daily use pattern of residential customers. (iii) unobserved heterogeneity.
77
17.21
Selecting a Topic Writing an Abstract
General tips for selecting a research topic: Abstract of less than 500 words should include:
ð • “What am I interested in?” (i) concise statement of the problem.
ð • Well-defined, relatively simple topic. (ii) key references to available information.
ð • Ask prof for ideas and references. (iii) description of research design including:
ð • Journal of Economic Literature (ECONLIT) (a) economic model
(b) statistical model
ð • Make sure appropriate data are available.
(c) data sources
ð • Avoid extremely difficult econometrics. (d) estimation, testing and prediction
ð • Plan your work and work your plan. (iv) contribution of the work

17.22
Research Report Format
1. Statement of the Problem.
2. Review of the Literature.
3. The Economic Model.
4. The Statistical Model.
5. The Data.
6. Estimation and Inferences Procedures.
7. Empirical Results and Conclusions.
8. Possible Extensions and Limitations.
9. Acknowledgments.
10. References.
78

Allchap6 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Allchap6 PDF

Uploaded by

Copyright:

Available Formats

Copyright 1996 Lawrence C.

PowerPoint Slides for

To accompany: Undergraduate Econometrics

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

qd = f( p, pc, ps, i ) demand

For applied econometric analysis qs = f( p, pc, pf ) supply

For effective policy we must know the amount of change

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Actual vs. Predicted Consumption:

Economists are concerned with the c = f(i) + e

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Controlled experiment (“pure” science) explaining mass, y :

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Some Basic random variable:

Probability The value of a random variable results from an experiment.

Concepts The term random variable implies the existence of some

Example: Prize money from the following

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

per capita income, X, in the United States

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Probability is represented by area.

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Note that summation is a linear operator n

n The mean or arithmetic average of a

The order of summation does not matter :

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

E [g(X)] = E [g1(X)] + E [g2(X)]

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

E(bX) = b E(X) var(X) = E [(X - EX) ]

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

E(XY) = (0)(1)(.45)+(0)(2)(.15)+(1)(1)(.05)+(1)(2)(.35)=.75 f(xi) = Σ f(xi,yj) f(yj) = Σ f(xi,yj)

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

marginal .50 .50 f(x,y) f(x,y)

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

cov(X,Y) = E [(X - EX)(Y-EY)] X=0 .45 .15 .60

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

respective variances. cov(X,Y) = .15

.50 .50 correlation

In general, for random variables X1, . . . , Xn :

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

var(c1X − c2Y) = c21 var(X)+c22var(Y) − 2c1c2cov(X,Y)

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

a-β Y-β b-β

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

If Z1, Z2, . . . , Zm denote m independent If Z ~ N(0,1) and V ~ χ(m) and if Z and V

N(0,1) random variables, and are independent then, Z

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

If V1 ~ χ(m ) and V 2 ~ χ(m2) and if V 1 and V2

F is an F statistic with m 1 numerator

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

µy|x=480 y µy|x=480 µy|x=800 y

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

5. (optional) The values of y are normally I. Systematic component: E(y) = β1 + β2x

Together E(y) and e form the model:

Copyright 1996 Lawrence C. Marsh Copyright 1996 Lawrence C. Marsh

^y1* .y2 ^e3{. ^y4 y = b1 + b2x