You are on page 1of 10

Page 1

Econ107 Applied Econometrics


Topic 1: An Overview of Regression Analysis
(Studenmund, Chapter 1)
I. The Nature and Scope of Econometrics.
Lots of definitions of econometrics.
Nobel Prize Committee
Paul Samuelson, et al. Econometrics may be defined as quantitative
analysis of actual economic phenomena.
Goldberger ... application of economic theory, mathematics and statistical
inference to the analysis of economic phenomena.
(Joke) E.E. Leamer There are two things you dont want to see in the
making sausage and econometric research.

II. Major Uses of Econometrics.


1. Describing economic reality
2. Testing hypothesis about economic theory
3. Forecasting future economic activity

III. Econometric Methodology Regression Analysis


An important methodology in econometrics is regression analysis which typically
follows these steps:
Use a famous example to illustrate.
1.

State the hypotheses.

Keynes in the General Theory said a $1 increase in income will lead to less than
a $1 increase in overall consumption.

Page 2

We want to test this hypothesis that the MPC<1.


2.

Specify the mathematical model of the theory.

Although Keynes didnt specify the exact nature of the relationship. Might
suggest a simple linear relationship.
C = 0 + 1 DI

0 < 1< 1

where C=aggregate consumption and DI=aggregate disposable income


3.

Specify the econometric model.

This purely mathematical model is uninteresting to the econometrician. It


assumes an exact or deterministic relationship between C and DI.
C = 0 + 1 DI +

We re-write the equation with a disturbance or error term.


This is now an econometric model, or more precisely a linear regression model.

Page 3

4.

Obtain the Data.

Only way to estimate the parameters of interest in this model, is to obtain the
necessary data. Data source could involve time series, cross-sectional or panel
data.
Time series data are collected over time for the same country or other single
aggregate economic unit (e.g., aggregate C and DI could be obtained for
Singapore from 1950 -2000). In this case, wed normally re-write the equation
with a t subscript on the variables and disturbance term to denote time.
C t = 0 + 1 DI t + t

Cross-sectional data are collected for a sample over individuals, households,


firms or other disaggregate economic entity at a point in time (e.g., C and DI could
be obtained for sample of 1,000 Singapore families during 2000). In this case,
wed normally re-write the equation with a i subscript on the variables and
disturbance term to denote individual.
C i = 0 + 1 DI i + i

Finally, panel data contains elements of both time series and cross-sectional data
(e.g., C and DI could be obtained for all countries in the OECD during the period
1950-2000). Note that we have variation across countries at any single point in
time, as well as variation across time. In this case, wed normally re-write the
equation with both an i and t subscript on the variables and disturbance term to
denote country and time.
C it = 0 + 1 DI it + it

Time series or cross sectional data could be plotted as a scatter diagram below:

Page 4

5.

Estimate the parameters in the econometric model.

Now its time to estimate the coefficients in the model. The basic idea is to come
up with a line that best fits the data points. Imagine that this regression
analysis yields the following consumption function.
C = 336.9 + 0.820DI

These are the estimates of the 2 coefficients. The hat on C indicates that this is
an estimated consumption function or regression model.
6.

Test the hypothesis.

Recall that we wanted to test Keynes hypothesis that the MPC was between zero
and 1. Looks reasonable, but unsure whether there is any statistical evidence
that its below 1.

Page 5

7.

Forecast or predict economic behaviour.

One of the other uses of this model if for forecasting or predicting future economic
behaviour. To predict C, however, need to know future values of DI. Suppose
you know that DI is going to be $65,000 (millions).
C = 336.9 + 0.820(65,000) = 53,636.9

This also allows you to predict savings of $11,363.1. This is just the difference
between DI and C.

8.

Use the model for policy purposes.

Can also be used for control purposes. Suppose that C of 53.6 billion is
insufficient to maintain full-employment. Not enough spending by households.
Government could consider increasing DI through tax cuts to achieve a higher
target. Suppose 62 billion is needed.
62,000 = 336.9 + 0.820DI
DI = 75,198.9

Thus, need to cut taxes by just over $10 billion from forecasted levels.

IV. Types of Econometrics and Names of Variables in Regression


Split into theoretical and applied fields. We end up straddling these 2
approaches. Theoretical econometrics concerns the development of basic
estimation approaches, properties of estimators, etc. More closely related to
mathematical statistics (e.g., proofs, axioms, ...).
Applied econometrics is built on this theoretical foundation. Applies estimation
techniques to various areas of economic enquiry. Examples: Where to open a new
restaurant? How much ad? Should we fix the target interest rate? How many hours
studying on Econ107? Academics, private and government sectors have
increasingly used econometrics.

Page 6

Regression analysis is the study of the relationship between a Dependent


Variable and one or more Independent or Explanatory Variables.
In the linear regression model (or true regression line or population regression
function)
Yi = 0 + 1 X 1i + + K X Ki + i
Yi is called dependent or left-hand-side variable or regressant and is random;
X ki (k = 1, , K ) is called independent or explanatory or right-hand-side variable or

regressor, it can be fixed or random; i is called error or disturbance term and is


random; s are called regression coefficients, they are unknown and fixed; 0 is
the intercept coefficient; k (k = 1,, K ) is the slope coefficients. The meaning of
1 is the impact of a one unit increase in X 1 on Y , holding constant the other
independent variables.
The estimated regression line (or sample regression function) is written as
Yi = 0 + 1 X 1i + + K X Ki

Yi is called estimated or fitted value of Yi ; k (k = 0, , K ) is called estimated


regression coefficient; Define ei = Yi Yi and call ei the residual.

When K=1, the regression model is Simple Linear Regression (SLR) model.
When K>1, the regression model is Multiple Linear Regression (MLR) model.

V. Statistical vs. Deterministic Relationships


Regression analysis is concerned with a Statistical, not a Functional or
Deterministic dependence among variables. In statistical relationships, the
variables are Random or Stochastic.

VI. Regression vs. Causation


Although regression analysis deals with the relationship of one variable on other
variables, it doesnt necessarily imply causation. A causal relationship must come
from outside of statistics. Economic theory is supposed to provide the compelling
evidence of causation.

Page 7

VII. The True (or Population) Regression Function (PRF)


Suppose we have a small community of 12 families. Were interested in studying
the relationship between their weekly disposable income (X) and expenditure on
food (Y). We want to predict the population mean of food expenditures, given
some level of family income.
The 12 families can be grouped into four income groups. Each family within a
group has the same disposable income. This is the entire population, not a sample.
Disposable
Income (X)

Individual Food
Expenditures (Y)

Average Food
Expenditures

250

78.00, 88.50, 96.00

87.50

300

77.50, 89.00, 96.50, 109.00

93.00

350

90.50, 106.50

98.50

400

99.00, 103.00, 110.00

104.00

Plot these data points on the following diagram. This is often known as a Scatter
Diagram. The solid dots are the actual observations. Now the Conditional
Mean or Conditional Expectation is
E(Y | X = X i )

The circles are the conditional means. Clearly, food expenditures on average
increase with disposable income.
This can be seen even more clearly by connecting these conditional means with
a straight line. This is the True (or Population) Regression Line. Note that it
could also be a True (or Population) Regression Curve.

Page 8

Geometrically, a population regression line or curve is simply the locus of the


conditional means or expectations of the dependent variable for fixed values of the
explanatory variable(s).
In general, we could write the Population Regression Function (PRF) as:
E(Y | X i ) = f( X i )

where this is some function of the explanatory variable.


We might anticipate that food consumption will be linearly related to disposable
income. This is an initial assumption of our estimation. We could narrow this
functional form to:
E(Y | X i ) = 0 + 1 X i

This is known as the linear PRF (or PR Line).

Page 9

VIII. Linearity in Regression Analysis


What do we mean when we say that our regression model is linear? One
possibility is that the model is nonlinear in terms of the variables.
E(Y | X i ) = 0 + 1 X i2

The second possibility is that the PRF is nonlinear in terms of the coefficients.
E(Y | X i ) = 0 +

1 X i

Such regressions functions will not be considered in this paper, but the one given
above will be. From now on, linear regression models should be read as linear
(in terms of the parameters).

IX. Adding the Disturbance Term to Our PRF


The PRF tells us the 'average' food expenditures for a given level of household
income. But we know that any 'particular' household is unlikely to be on this
function. For this reason we rewrite PRF as
Y i = 0 + 1 X i + i

where i is a random variable with mean 0. Lot's of reasons why i might exist.

Minor influences of Y are omitted.

The underlying theoretical equation might have a different functional form


than the one chosen for the regression.

Some purely random variations are always there.

Measurement Error on Y or X.

Page 10

X. The Sample (Estimated) Regression Function


Thus far, we've dealt with the entire population and the PRF. Avoided any
consideration of sampling. In most cases, we will never observe the entire
population. We have to infer from a sample or samples what the PRF might look
like. Note that we're unlikely to know just how close we get to the truth.
Each sample we draw can be used to produce a Sample (Estimated) Regression
Function (SRF), that is, the estimated regression function:

Y i = 0 + 1 X i

Of course, we can replace the actual value of the dependent variable ( Y i ) with its
fitted value ( Y i ).
The LHS is no longer an estimator, its the actual value. The RHS now includes
the Residual term e i .
Y i = 0 + 1 X i + ei

This means that the actual dependent variable can be decomposed into its fitted
value and the residual.
Y i = Y i + ei

This residual, like the disturbance can be either positive or negative. We can
either overestimate:
Y i - Y i = ei < 0

if Y i < Y i

or underestimate the true value of Y i :


Y i - Y i = ei > 0

if Y i > Y i

X. Questions for discussion: Q1.10


XI. Run the height regression (Section 1.4) using the data file provided.
Do further exploration according to Q1.4 and Q1.5

You might also like