Introduction To Econometrics

1/32
EC114 Introduction to Quantitative Economics

11. Introduction to Econometrics
Marcus Chambers
Department of Economics
University of Essex
17/19 January 2012
EC114 Introduction to Quantitative Economics 11. Introduction to Econometrics
2/32
Outline
1
The Purpose of Econometrics
2
Sample Correlations
3
Two-variable Regression
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, chapter 8 and sections 9.19.2.
The Purpose of Econometrics 3/32
What is Econometrics? Econometrics:
may be dened as the social science in which the
tools of economic theory, mathematics, and statistical
inference are applied to the analysis of economic
phenomena
(A.S. Goldberger, Econometric Theory, 1964, p.1)
is concerned with the empirical determination of
economic laws
(H. Theil, Principles of Econometrics, 1971, p.1)
aims to put empirical esh and blood on
theoretical structures
(J. Johnston, Econometric Methods (Third Edition),
1984, p.5)
Furthermore, Econometrics:
is the science and art of using economic theory
and statistical techniques to analyze economic data
(J.H. Stock and M.W. Watson, Introduction to
Econometrics, 2003, p.3)
The latter authors (Stock and Watson) also state:
Ask a half dozen econometricians what
econometrics is and you could get a half dozen
different answers (p.3); and
Econometrics can be a fun course for both
teacher and student (p.xxvii)
Please remember this last point as the term progresses. . .
Econometrics, then, can be seen as an application of
Statistics to economic theory and data.
In this sense, during this term we will apply many of the
tools covered last term to economic problems.
For example, we will use the concepts of estimation and
inference, including t- and F-tests, to estimate economic
relationships and to test hypotheses of interest.
But why is Econometrics so important?
Economics is based on the study of relationships between
variables.
Example include:
consumption and income (the consumption function);
quantity demanded and price (demand curve);
employment and wages (demand for labour).
Econometrics studies how to:
quantify these relationships and nd values for their
parameters (i.e. estimates);
test the theories implied by the relationships;
use the relationships as a basis for predictions and
forecasts.
Example: consumer theory suggests that aggregate
consumers expenditure (C) is a function of income (Y) and
the cost of borrowing (I):
C = C(Y, I).
It also suggests that a rise in Y leads to a rise in C, while a
rise in I leads to a fall in C, other things being equal.
However, some problems remain:
How do we actually dene and measure C, Y and I?
What is the appropriate functional form? It could be linear:
C = + Y + I, > 0, < 0,
or it could be nonlinear e.g. the constant elasticity form
C = AY
, > 0, < 0,
or it could be something completely different!
Even when we decide on the functional form, additional
problems remain:
What are the values of and ?
The theory concerns equilibrium do the data correspond
to equilibrium points?
The theory suggests no role for prices, P. Suppose we
include prices:
C = + Y + I + P.
Can we test whether = 0?
Furthermore, economic relationships are never exact or
deterministic.
There will always be unknown factors that determine the
variable of interest.
In view of this we would rewrite the linear consumption
function as
C = + Y + I + ,
where is a random disturbance that may be positive or
negative.
The presence of reects the fact that there may be other
(unknown) factors affecting C which we treat as being
random.
We therefore consider stochastic (or random) relationships
rather than deterministic relationships.
All of these aspects (and more besides) have to be
considered in an econometric analysis of the data.
Unlike the physical sciences, it is not typically possible to
conduct experiments to quantify the effects of interest in
economics.
For example, we cant hold I constant so as to isolate the
effects of Y in order to determine in reality all variables
change!
Fortunately, however, we can use a statistical technique
called multiple regression analysis to estimate the
parameters of interest, such as and .
Much of econometrics uses multiple regression analysis in
one form or another it can be regarded as the
econometricians substitute for a controlled experiment.
Sample Correlations 11/32
In the rst part of the module it was shown that the
direction of the (linear) association between two variables
can be measured by the covariance.
The strength of the association is measured via the
correlation coefcient.
The formulation of these statistics differs depending on
whether they are computed for the population or the
sample.
The denitions were given in Lectures 9 and 10.
As a reminder we have:
Population Covariance
Cov(X, Y) = E[X E(X)][Y E(Y)]
Population Correlation
=
Cov(X, Y)
V(X)
V(Y)
=
E(XY) E(X)E(Y)
V(X)
V(Y)
Sample Correlation
R =
(X X)(Y Y)
(X X)
2
(Y Y)
2
Remember: 1 1 and 1 R 1.
As an example, consider the well-known macroeconomic
relation
MV = PG,
where M denotes money stock, V is the velocity of
circulation, P is the price level and G denotes GDP.
Dening k = P/V we can rewrite the equation as
M = kG.
Assuming k to be a positive constant this implies that M is
proportional to G, as in the following diagram:

If we could observe M and G we could calculate the
correlation between them and test to see whether it was
positive or not.
Table 9.1 in Thomas provides data for a cross-section of 30
countries in 1985.
A scatter diagram of the data is as follows:

There is a broadly increasing relationship between M and
G but the fact that the dots do not lie on a straight line
implies that the value of k is different across countries.
We can use the data to calculate the sample correlation.
We nd that, taking X = G and Y = M:
(X X)(Y Y) = 116.60
(X X)
2
= 666.86
(Y Y)
2
= 26.403
and hence
R =
(X X)(Y Y)
(X X)
2
(Y Y)
2
=
116.60
666.86
26.403
= 0.8787,
which suggests a strong positive linear relation between M
and G.
However, R is a sample statistic, and we are really
interested in the population correlation, .
In particular, is R sufciently different from 0 that we can
say that is also different from 0?
Put another way, can we test
H
0
: = 0 against H
A
: > 0
at, say, the 5% level of signicance?
The answer is: yes!
Our test will be based on the statistic
TS =
R
n 2
1 R
2
t
n2
under H
0
.
From the t-table we nd that t
0.05
28
= 1.701 and so the test
criterion is:
reject H
0
: = 0 if TS > 1.701
and reserve judgment otherwise.
Substituting the values:
TS =
0.8787
28
1 0.8787
2
= 9.74.
Hence TS = 9.74 > 1.701 and so we reject H
0
at the 5%
level of signicance in favour of H
A
: > 0.
Our result implies that M and G are positively related.
However, R does not imply anything about causality, so we
cant say that M grows because G grows.
It can be the other way around, or M and G can inuence
each other, or the relation exists by chance i.e. the relation
between M and G is spurious.
For example, the sample correlation between UK beer
prices and Japanese petrol consumption from the 1950s
to the 1990s is as high as 0.93, but there is no causal
mechanism the high correlation exists by chance, or is
spurious.
Two-variable Regression 20/32
Regression analysis differs from correlation because:
1
An a priori assumption is made about the direction of
causality between two variables; and
2
An attempt is made to quantify the linear relationship
between the variables.
So, by writing M = f (G), we are assuming that M depends
on G and not vice versa.
Therefore, M is the dependent variable (or regressand)
and G is the explanatory variable (or regressor ).
For consistency of notation we shall set:
Y : dependent variable;
X : explanatory variable.
In our example Y = M and X = G.
We assume that Y and X are linked by the population
regression equation which is a linear relationship:
E(Y) = + X.
In this set-up:
E(Y) : the expected demand for money of a country
with GDP of X;
, : unknown population parameters;
: intercept
: slope, or gradient.
The actual demand for money, Y, of a country is not always
the same as the expected demand, E(Y).
The difference between the two is referred to as a
deviation, error or disturbance, which we represent with the
symbol .
We then have
Y = E(Y) + .
Recalling that E(Y) = + X this implies that
Y = + X +
i.e. Y is linearly related to X but the relationship is subject
to a random disturbance .
What does the disturbance actually represent?
All variables other than GDP that inuence the demand for
money (which we are assuming to be quantitatively small,
otherwise we would need to allow for them explicitly);
Random variation in Y resulting from the basic
unpredictability of economic agents.
Even if GDP was the only variable inuencing the demand
for money and even if GDP was identical in all 30
countries, we would still expect some variation in the
demand for money across countries.
The random disturbance, , represents all such random
factors.
Disturbances can be either positive or negative.
If > 0 then Y > E(Y) so that Y is above its expected value.
Alternatively, if < 0 then Y < E(Y) and Y is below its
expected value.
Extending the notation:
n : sample size (n = 30 in the current example);
i : index for observations: i = 1, . . . , n;
Y
i
: demand for money per head in country i;
X
i
: GDP per head of country i;
i
: disturbance associated with country i.
The symbols X, Y and without subscripts are a general
shorthand for the variables they represent: GDP per head,
the demand for money per head and the disturbance.
When X, Y and appear with subscripts (e.g. Y
8
or X
10
or
12
) they must be interpreted as numbers referring to, in
this case, either GDP, the demand for money or the
disturbance values for particular countries.
Subscripted variables therefore satisfy:
E(Y
i
) = + X
i
, i = 1, 2, 3, . . . , n,
Y
i
= + X
i
+
i
, i = 1, 2, 3, . . . , n.
Problem: the population parameters, and , and hence
the population regression line, are unknown.
We therefore estimate the population parameters using the
data.
The most common way to do this is to t a straight line to
the scatter of points in Figure 9.1.
The result is the sample regression line, written
Y = a + bX,
where a and b are the estimates of and , respectively,
and

Y is the predicted value (or tted value) of Y.

The dependent variable Y is represented on the vertical
axis and the independent variable X on the horizontal axis.
We can compare the population and sample regression
lines:

The population and sample regression lines are different
because a and b are only estimates of and .
We can calculate the predicted value of Y for any country
in our sample using
Y
i
= a + bX
i
, i = 1, . . . , n.
For example, country 15 (Japan) has GDP per head of
X
15
= 10.9748, and so
Y
15
= a + (b 10.9748).
The difference between the actual value of Y and the
predicted value is known as the residual:
Y
i
=

Y
i
+ e
i
Actual = Predicted + Residual
Important: residuals and disturbances are different
quantities:
Disturbance:
i
= Y
i
E(Y
i
) = Y
i
X
i
,
Residual: e
i
= Y
i
Y
i
= Y
i
a bX
i
.
Disturbances are the parts of the Y
i
that are not explained
by the population regression; they are unobservable.
Residuals are the parts of the Y
i
that are not explained by
the sample regression; they can be calculated using the
formula above.
Disturbances and residuals can be depicted as follows:

In the diagram
i
> 0 and e
i
> 0 because Y
i
lies above both
the population and sample regression lines for the
corresponding value of X
i
.
Summary 32/32
Summary
The purpose of Econometrics.
Sample correlations.
Two-variable regression.
Next week:
Ordinary least squares (OLS) estimation; goodness-of-t.

Introduction To Econometrics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Econometrics

Uploaded by

Copyright:

Available Formats

1/32

EC114 Introduction to Quantitative Economics

You might also like