BBA3 Econometrics

ECONOMETRICS: an introduction
Edward Omey
HUB - Stormstraat 2 - 1000 Brussel

e-mail: edward.omey@hubrussel.be
web: www.edwardomey.com
Academic year 2010 - 2011
1. What is econometrics?
2. The method of least squares
3. Selection of variables
4. The basic assumtions
5. Checking the basic assumptions
6. Making Predictions
7. Some references
Together with this text is an example that is called
’EconometricsExampleText.xls’.
There is also a …le called
’BBA3EconometricsAndExcel’
and there is a table with critical values of the Kolmogorov-Smirnov test.
Both are available in "Teaching material BBA3" on www.edwardomey.com.
1
1 What is econometrics?
1.1 Introduction
Econometrics can be seen as scienti…c research in which we complete the
results of economic assumptions and theories with quantitative information
that is based on real data.
In economic theory, usually there is a qualitative analysis based on some
reasonable assumptions. As an example, in economic theory one argues that
an increase in the price of a product will result in a decrease of the demand
of that product. In econometrics we want to quantify this theoretical result
to know what will be the resulting demand if we increase the price by, for
example, 1 euro.
In mathematical economics one tries to present economic theory in a
mathematical, formal way. In econometrics we want to verify these mathe-
matical models by using real data. As a result we can eventually decide that
the theory is a good or a bad representation of the real world.
Economic statisticiens usually collect data and present data in various
ways. Usually these data form the basis of further statistical analysis.
1.2 Econometric approach

In general, the econometric approach can be represented in the following
table. The table consists of 5 levels and 3 main types of input.
I (economics) II (data) III (methodology)

1 economic theory facts mathematics & probability
2 economic model data statistics
3 econometric model worked data econometric methods
4 operational econometric model
5 veri…cation prediction evaluation
We brie‡y discuss the entries of this table.
1.3 Input from economics

In economics, people develop "rules" about how people, companies and states
behave. Usually this analysis is qualitative and presents the in‡uence of -
2
resp. causal relations between - one group of variables on other groups of
variables.
Sometimes one uses economic models which give a "simple" representa-
tion of the real life.
Example. Keynes examined the relations between consumption and sav-
ings and he found that "people tend to spend more when income goes up".
A simple model is the following model: C = a + bY , 0 < b < 1, where
C = consumption and Y = income.
This formula is an example of a mathematical model. It consists of 2
variables and 1 relation. On a graph the model corresponds to a straight line
with slope b. The graph intersects the vertical axis in a.
In econometric models it is important to answer the following questions:
- how many variables are we going to include in a model?
- which variables are we going to include in a model?
- how many relations and which relations are we going to include in the
model?
Example
In studying the income Y of individual workers, economic theory leads to
a formal relation of the form
Y = f (age, sexe, level of education, years of experience, knowledge
of languages,...)
To formulate an econometric model it is necessary to know how many
variables and which variables to include ànd to specify the relation Y =
f (::::).
1.4 Input from data

In all econometric analysis, we need data. In some cases we have to collect
raw data and in other cases we can use data that have been collected by
others - statistical institutions.
1.4.1 There are di¤erent types of data.

Usually one consider the following 4 types of data
a) Qualitative - the data can not be represented by meaningful numbers
a1) Nominal data: we use groups or categories depending on a qualitative
property
Examples: the colour of hair; sexe of people; the type of car,...
3
a2) Ordinal data: these are nominal data which can be sorted in a mean-
ingful way
Examples: the stars of hotels; the size of shoes; the highest diploma;...
b) Quantitative - the data can be represented by meaningful numbers
b1) Interval data
b2) Ratio data
Ratio data are based on real variables that have a natural zero. The
number zero means that the characteristic is absent.
Example: income (my income is zero); points (I obtained zero points);
revenue (I sold zero products);...
When we deal with ratio data, then the words "double" or "halve" make
sense.
In dealing with interval data we have a variable without a natural zero
and it doesn’t make sense to use the words "double" or "halve".
Typical examples are:degrees in C or in F ; all IQ-scales.
1.4.2 Dummy
In many econometric models we have to deal with qualitative variables. In
order to include such variables in a mathematical model it will be useful to
quantify these qualitative variables. To this end we are going to use 1 or
more dummy variables. A dummy variable is a variable that can take on
only 2 values: 1 or 0.
Example. When we have a qualitative variable with 2 categories, we use
1 dummy variable. Suppose that we have to deal with the sexe of people, we
can de…ne D = 1 if the person is female, and D = 0 if the person is
male.
Example. When we have a qualitative variable with 3 categories, we use
2 dummies. Suppose that we want to model the colours at a tra…c light. We
can de…ne
D(1) = 1 if light is red D(1) = 0 otherwize
D(2) = 1 if light is green D(2) = 0 otherwize
In this case we …nd the following possibilities
D(1) D(2) result
1 0 red
0 1 green
0 0 orange
1 1 impossible
4
1.4.3 Quality of data
Many companies and government organisations collect data and construct
(huge) databases. Some of these are freely available on internet.
Using these data one has to be careful about the quality of the data. In
many cases the published data suggest a higher precision than in reality. One
often publishes rounded numbers or (weekly, monthly,...) averages. Some
always round down and others round in another way.
In collecting data one often has problems with validity: do the data
measure what we really want to measure? For one variable it is often the
case that people use several proxies. To measure - for example - poverty in
a country, one can use several variables:
- Gini - coe¢ cient
- the proportion of people living below the poverty line
- the ratio Q(3)=Q(1)
etc.
1.4.4 Reworking data

Sometimes we have to modify the data before using them in a model. Ex-
amples of modi…cations are
- replace nominal values by real values
- to calculate weekly mean values from daily values
- calculate relative changes and/or proportions
- transform variables to increase correlation coe¢ cients
etc.
1.4.5 Time-series versus cross-sections

In general we can distinguish two types of analysis in econometrics
In a time series we measure variables on di¤erent moments in time (daily,
weekly, yearly,...).
In a cross section we measure variables in the same moment.
Example
- We study the monthly unemployment rate in Belgium and use (for ex-
ample) the monthly in‡ation, the monthly interest rate, the monthly relative
change in the working population,...
5
- We study the 2010 unemployment rate in the di¤erent countries of
the EU and we use (for example) the mean income, the interest rate, the
population density, the ratio new companies/failed companies, etc.
1.5 Input from methodology

In our analysis we are going to need some techniques from mathematics,
probability theory and statistics.
1.5.1 Stochastic models

In econometrics we often deal with relations of the form
Y = f (X; Z; U; V; :::)
where Y is the dependent variable and X; Z; U; V ,... are independent or

explanatory variables. Because it is very plausable that we "forgot" 1 or
more variables, usually one adds an error term in the formula:
Y = f (X; Z; U; V; ").
This error term summarizes all errors that can occur.

- measurement errors
- observation errors
- rounding of data
- variables that we "forgot" in our analysis (because we didn’t take them
into account or because we didn’t want to take them into account)
- irrelevant variables that we included in our analysis
- stochastic behaviour: in all social, biological, ... environments there is
some indeterminism present. Under the same conditions, the sales can be
di¤erent from day to day because of random decisions of people
- etc.
1.5.2 Linear models

In this course we consider only linear models. These are mathematical
relations that are linear in the parameters.
Examples
* Y = a + bX
6
* Y = a + bX + cX 2
* Y = a + bX + cZ + d exp(X)
* Y = a + b log(X) + c log(Z)
etc.
1.5.3 "Good" models

We will us the KISS principle in econometrics! We consider a model a "good"
model if
* it is simple: a model is stronger if it explains a lot by only a few variables
* it is theoretical consistent: if from the theoretical point of view we
expect that a variable has a positive in‡uence on Y , we expect that the
model re‡ects also this positive e¤ect!
* it has predictive power: we wish that the predictions made by the model
are "close" to reality!
* explanatory power: we wish that the results of the model are close to
the data that we used to obtain the model!
1.5.4 Econometric methods

Model: In general we are going to use models of the form
Y = a + bX + cZ + dU + eV + ::: + "
= f (X; Z; U; V; :::; a; b; c; :::; ").
Here
* Y is the variable that we want to explain;
* X; Z; U; V; ::: are the explanatory variables, the variables that we use to
explain Y ;
* a; b; c; d; ::: are the parameters in the model;
* " is the (stochastic) error term.
Data. To estimate the parameters, we need data about Y; X; Z; U; V; :::We

suppose that we have n datapoints:
* Y1 ; X1 ; Z1 ; U1 ; V2
* Y2 ; X2 ; Z2 ; U2 ; V2
...
* Yn ; Xn ; Zn ; Un ; Vn .
7
By de…nition, we assume that each of these data points satis…es the "for-
mula":
Yi = f (Xi ; Zi ; Ui ; Vi ; :::; a; b; c; :::; "i ), for i = 1; 2; :::; n.

We assume that the parameters are independent of the index i but that the
error term may be di¤erent for di¤erent values of the index.
Method
In the formula
Yi = f (Xi ; Zi ; Ui ; Vi ; :::; a; b; c; :::; "i ), for i = 1; 2; :::; n,
we know the values of Yi ; Xi ; Zi ; ::: but not the values of a; b; c; ::: and "i .
Hoping that "i is "small", we delete "i and we approximate Yi by Ybi :
Ybi = f (Xi ; Zi ; Ui ; Vi ; :::; a; b; c; :::), for i = 1; 2; :::; n.

The approximation error that we make will be denoted by ei :
ei = Yi Ybi , for i = 1; 2; :::; n.

We want the errors to be "as small as possible". In order to do this, we can
de…ne the "global" error in di¤erent ways. We can take for example
1X
n
mean error = e = ei
n i=1
or
X
n
AD = absolute deviation = jei j ,
i=1
or
X
n
SSE = sum or squared errors = e2i .
i=1
8
1.5.5 Least squares method
A popular and attractive method in econometrics is the least squares
method (LSE). In this method we estimate the papameters by solving the
following mathematical problem:
min SSE.
a;b;c;:::
We consider a model to be the best model, if the sum of the squared

errors is smaller than any other model.
Example. Let us consider the following data:
X Y
3 10
6 15
9 30
12 35
15 25
18 30
21 45
24 45
We consider the following 2 models:
Model 1: Yb = 8 + 1; 6 X
Model 2: Yb = 8; 5 + 1; 5 X
Using these formulas, we …nd the following results:
X Y Model 1 errors model 1 Model 2 errors model 2

3 10 12,8 -2,8 13 -3
6 15 17,6 -2,6 17,5 -2,5
9 30 22,4 7,6 22 8
12 35 27,2 7,8 26,5 8,5
15 25 32 -7 31 -6
18 30 36,8 -6,8 35,5 -5,5
21 45 41,6 3,4 40 5
24 45 46,4 -1,4 44,5 0,5
Simple calculations show the following results:
9
Model 1 Model 2
e -0,225 0,625
AD 39,4 39
SSE 241,96 243
If we use AD, then model 2 performs better than model 1. If we use SSE,
then model 1 performs better than model 2.
10
2 The method of least squares
2.1 Introduction
Recall from the previous chapter that we are going to use the method of least
squares. To this end we need a model and data!
Model
In general we are going to use models of the form
Y = a + bX + cZ + dU + eV + ::: + ".
Data
To estimate the parameters, we need data about Y; X; Z; U; V; :::We sup-
pose that we have n datapoints:
* Y1 ; X1 ; Z1 ; U1 ; V2
* Y2 ; X2 ; Z2 ; U2 ; V2
...
* Yn ; Xn ; Zn ; Un ; Vn .
By de…nition, we have
Yi = a + bXi + cZi + dUi + ::: + "i , for i = 1; 2; :::; n.
We assume that the parameters are independent of the index i but that the
error term may be di¤erent for di¤erent values of the index.
Method
We delete "i and …nd the following approximation:
Ybi = a + bXi + cZi + dUi + ::: + , for i = 1; 2; :::; n.
The approximation error that we make is ei :
ei = Yi Ybi , for i = 1; 2; :::; n.

The "global" error is given by SSE:
X
n
SSE = sum or squared errors = e2i .
i=1
11
The least squares method (LSE): we estimate the papameters by solv-
ing the following mathematical problem:
min SSE.
a;b;c;:::
We consider a model to be the best model, if the sum of the squared

errors is smaller than any other model.
2.2 Example 1: the simple model Y = a + bX + "

2.2.1 The least squares method
In the simplest model we have only 1 explanatory variable (X) and 2 para-
meters (a; b). Now we have
Model: Y = a + bX + ";
Data: Y1 ; X1 ; Y2 ; X2 ; :::::; Yn ; Xn and Yi = a + bXi + "i ;
Approximation: Ybi = a + bXi ;
Approximation error:Pei = YiP Ybi = Yi a bXi ;
Global error: SSE = e2i = (Yi a bXi )2 ;
LSE: we have to solve the problem
X
min SSE = min (Yi a bXi )2 .
a;b a;b
Solution (details are here for completeness)

To solve the problem, we rewrite SSE as follows: we insert X and Y in
the expression for SSE and we …nd
X 2
SSE = ( Yi Y ) b(Xi X) + (Y a bX)
X X X
= (Yi Y )2 + b2 (Xi X)2 + (Y a bX)2
X
2b (Yi Y )(Xi X)
X
+2 (Yi Y )(Y a bX)
X
2b (Xi X)(Y a bX).
For the last two terms, we …nd

X X
(Yi Y )(Y a bX) = (Y a bX) (Yi Y ) = 0,
X X
(Xi X)(Y a bX) = (Y a bX) (Xi X) = 0.
12
Simplifying, we …nd that
X X
SSE = (Yi Y )2 + b2 (Xi X)2 + n(Y a bX)2
X
2b (Yi Y )(Xi X)
For the other terms, we use the following notations:
X V (Y )
V (Y ) = (Yi Y )2 and s2 (Y ) =
n 1
X V (X)
V (X) = (Xi X)2 and s2 (X) =
n 1
X V (X; Y )
V (X; Y ) = (Yi Y )(Xi X) and Cov(X; Y ) = .
n 1
We call V (Y ) the variation of Y and V (X; Y ) the covariation of X and
Y . It is clear that this is highly related to covariance and correlation:
1 X
Cov(X; Y ) = (Yi Y )(Xi X),
n 1
Cov(X; Y ) V (X; Y )
r(X; Y ) = =p .
s(X)s(Y ) V (X)V (Y )
Using these new notations, we have calculated that
SSE = V (Y ) + b2 V (X) 2bV (X; Y ) + n(Y a bX)2 .
Note that the last term in this expression is always zero or positive for
all values of a; b.
To omit any contribution of this term, we impose the condition that this
term is zero:
Y a bX = 0
or equivalently
Y = a + bX.
We can simplify SSE and …nd
SSE = V (Y ) + b2 V (X) 2bV (X; Y ).
13
This is a quadratic relation in b and from mathematics we know that
SSE is a convex parabola.
(See e.g. at the website: http://en.wikipedia.org/wiki/Parabola)
This parabola reaches a minimum in the value
V (X; Y )
b= ,
V (X)
and the value at this mimimum is given by
V 2 (X; Y )
SSE = V (Y ) .
V (X)
As a conclusion, we have solved the mathematical problem. The optimal
a and bb. We …nd
values of a and b will be denoted by b
bb = V (X; Y ) = Cov(X; Y ) ,
V (X) s2 (X)
a = Y bbX,
b
Yb = ba + bbX.
a and bb are called the least squares estimates for a

The expressions for b
and b.
a + bbX is called the regression (line) of Y on X.
The …nal result Yb = b
2.2.2 Some important remarks

In order to make the calculations, we have to assume that s2 (X) 6= 0!
If s2 (X) = 0, it means that we use a variable X that is a constant.
The regression line is given by Yb = b a + bbX. Since Y = b

a + bbX, it
follows that (X; Y ) is a point on the regression line.
For the errors we …nd
ei = Yi Ybi
= Yi a bbXi .
b
It follows that the mean error is zero:
e=Y b
a bbX = Y b
a bbX = 0.
14
For the variation of the errors V (e), we …nd
X X
V (e) = (ei e)2 = e2i = SSE.
Recall that
V 2 (X; Y )
V (e) = SSE = V (Y ) .
V (X)
For the variation of Yb , we …nd that
V (Yb ) = V (b a + bbX)
= bb2 V (X)
V 2 (X; Y )
= .
V (X)
Combining these two formulas, we …nd that
V (e) = V (Y ) V (Yb ),
or
V (Y ) = V (Yb ) + V (e).
For this simple model the variation of Y can divided into two parts:
the total variation is the sum of the variation explained by the model
(V (Yb )) and the variation that can not be explained by the model
(V (e)).
Variations: we use the following notations:
SST = V (Y ) = the variation of Y

SSR = V (Yb ) = the variation of Yb
SSE = V (e) = the variation of e.
The previous remark shows that SST = SSR + SSE.
2.2.3 Numerical example

We take the example of the previous section. We have the following data:
15
X Y
3 10
6 15
9 30
12 35
15 25
18 30
21 45
24 45
and we consider the following model: Y = a + bX + ".
In this example we …nd:
X = 13; 5 V (X) = 378 s2 (X) = 378=8
Y = 29; 375 V (Y ) t 1121; 9 s2 (Y ) t 1121; 9=8
V (X; Y ) = 577; 5 Cov(; Y ) = 577; 5=8
The least squares estimators are given by
bb = Cov(X; Y ) = 1; 5278
s2 (X)
a = Y bbX = 8; 75
b
and the regression line of Y on X is given by
Yb = 8; 75 + 1; 5278X.
For the model we …nd the following errors and estimates (we rounded the
numbers)
X Y Yb e e2
3 10 13,3334 -3,3334 11,11
6 15 17,9168 -2,9168 8,51
9 30 22,5002 7,4998 56,25
12 35 27,0836 7,9164 62,67
15 25 31,667 -6,667 44,45
18 30 36,2504 -6,2504 39,07
21 45 40,8338 4,1662 17,36
24 45 45,4172 -0,4172 0,17
One can verify that e = 0 and that V (Yb ) t 882; 3, V (e) t 239; 6.
Note that V (Y ) t 1121; 9 = 882; 3 + 239; 6.
16
2.2.4 The quality of the model
From the mathematical point of view, if s2 (X) 6= 0, we can always calculate
the regression of Y on X. In this section we brie‡y discuss how to measure
the quality of the model. We want to measure how "good" the model …ts
the data.
Correlation coe¢ cient As a …rst indicator we can calculate the corre-

lation coe¢ cient r(Y; Yb ). For a perfect model we …nd r(Y; Yb ) = 1. In
econometrics it is common to use r2 (Y; Yb ). We call r2 (Y; Yb ) the R2 value of
the model. Alternatively R2 is also called the determination coe¢ cient:
to what extent is Y determined by Y c. We always have 0 R2 1 and the
2 2
ideal model has R = 1 or R = 100%.
In our example we …nd R2 = 78; 6%.
Scatter plot As a second indicator, we can make a scatter plot of Y Yb .

In the ideal situation, all points are on the …rst diagonal.
ANOVA As a third indicator we can analyse the variations or variances

in the model. Recall that for our model we have
V (Y ) = V (Yb ) + V (e).
The third indicator is the ratio of what we can explain divived by what
we have to explain:
V (Yb )
R2 = .
V (Y )
It is not a mistake to use the same notation R2 : one can provethe following
property
Property
For linear models with a constant term, we have
V (Yb )
R2 = = r2 (Y; Yb ).
V (Y )
17
2.3 Example 2: The model Y = a + bX + cZ + "
2.3.1 The least squares method
In this model we have only 2 explanatory variables (X and Z) and 3 para-
meters (a; b; c). Now we have
Model: Y = a + bX + cZ + ";
Data: Y1 ; X1 ; Z1 ; Y2 ; X2 ; Z2 ; ::::; Yn ; Xn ; Zn and Yi = a + bXi + cZi + "i ;
Approximation: Ybi = a + bXi + cZi ;
Approximation error:Pei = YiP Ybi = Yi a bXi cZi ;
Global error: SSE = e2i = (Yi a bXi cZi )2 ;
LSE: we have to solve the problem
X
min SSE = min (Yi a bXi cZi )2 .
a;b;c a;b;c
Solution
From Example 1 recall that the optimal values of a and b were given by
Cov(X; Y ) = bs2 (X),

Y = a + bX
In the new 3 parameter model one can show that the optimal values of
a; b; c are the solutions (if they exist) of the following system of equations:
Cov(X; Y ) = bs2 (X) + cCov(X; Z),

Cov(Z; Y ) = bCov(Z; X) + cs2 (Z),
Y = a + bX + cZ.
We are going to use EXCEL to solve this mathematical problem.
2.3.2 Multicolinearity and quasi-multicolinearity

Examples From the mathematical point of view, the system of equations
doesn’t always has a unique solution!
Example 1
Let us consider the system
5 = b6 + c10
7 = b12 + c20
18
It is clear that the second equation can be simpli…ed and we …nd
5 = b6 + c10
3; 5 = b6 + c10
and there is no solution!

Note that if we look at the coe¢ cients of b and c in the system, we have
6 10
12 20
and we see that the second row is a multiple of the …rst row!
Example 2
5 = b6 + c10
10 = b12 + c20
In this case, we can simplify again and we …nd
5 = b6 + c10
5 = b6 + c10
or just only one equation: b6 + 10c = 5. It is clear that we …nd many

solutions!
6 10
12 20
Example 3
5 = b6 + c10
10 = b + c
In this case the second equation gives b = 10 c and as a result, for equation
1 we …nd that
5 = (10 c)6 + c10
19
or 55 = 4c, or c = 55=4 and then b = 10 c.
6 10
1 1
De…nition of MC and QMC Looking at the system of equations that

we have to solve, we have the following coe¢ cients of b and c:
s2 (X) Cov(X; Z)
Cov(X; Z) s2 (Z)
We have a problem of multicolinearity (MC) if the second row is a multiple
of the …rst row! This is
u(s2 (X); Cov(X; Z)) = (Cov(X; Z); s2 (Z)),
where u 6= 0 is a constant. In this case we …nd that
u s2 (X) = Cov(X; Z)
u Cov(X; Z) = s2 (Z)
From these equations, we can …nd u. We have

Cov(X; Z)
u =
s2 (X)
s2 (Z)
u =
Cov(X; Z)
and we …nd that
Cov(X; Z) s2 (Z)
=
s2 (X) Cov(X; Z)
or
Cov 2 (X; Z)
=1
s2 (X)s2 (Z)
or
r2 (X; Z) = 1.
Recall that when r2 (X; Z) = 1, then r(X; Z) = 1 and then X and Z
are perfectly linearly related: there is a linear relationship between X and
Z.
20
De…nition
In the model Y = a + bX + cZ + ", we have MC (multicolinearity) if
X and Z are linearly related, i.e. if r2 (X; Z) = 1.
When the variables X and Z are not linearly related, but almost linearly
related, we say that we have a problem of quasi-multicolinariry (QMC).
This happens when r2 (X; Z) is "close" to 1.
In econometrics, we want to avoid QMC!
We make the following agreement:

If r2 (X; Z) 0; 36 or if 0; 60 < r(X; Z) < 0; 60, we don’t have
problems of QMC;
If r2 (X; Z) > 0; 36 or if r(X; Z) > 0; 60 or r(X; Z) < 0; 60, we have
a problem of QMC.
For models with 3 or more explanatory variables, we are also going to

discuss QMC. We have a problem of QMC if one of the variables is highly
linearly related with one or more other explanatory variables.
Example We consider the model Y = a + bX + cZ + ", where

Y = the half year salesvolume
X = the time
Z = a dummy (Z = 1 in the …rst half year and Z = 0 in the second
half year)
Y X Z
4 1 1
1 2 0
6 3 1
2 4 0
11 5 1
5 6 0
11 7 1
7 8 0
15 9 1
9 10 0
We …nd (cf.Computer seminar): s2 (X) = 8; 25, s2 (Z) = 0; 25 and r(X; Z) =

0; 17. For this model we don’t have problems with QMC and we can solve
21
the system of equations. We …nd
b
a = 2; 4; bb = 1; 2 and b
c = 5; 8;
b
Y = 2; 4 + 1; 2X + 5; 8D.
We call Yb the regression of Y on X and D. In the next table we calculate

b
Y and the errors e.
Y X Z Yb e
4 1 1 4; 6 0; 6
1 2 0 0 1
6 3 1 7 1
2 4 0 2; 4 0; 4
11 5 1 9; 4 1; 6
5 6 0 4; 8 0; 2
11 7 1 11; 8 0; 8
7 8 0 7; 2 0; 2
15 9 1 14; 2 0; 8
9 10 0 9; 6 0; 6
The R2 of the model is given by R2 = r2 (Y; Yb ) = 96% which is rather

close to 100%.
For the variations, we …nd:
SST = V (Y ) = 174; 9; SSR = V (Yb ) = 168; 1; SSE = V (e) = 6; 8.
Clearly SST = SSR + SSE and SSR=SST = 0; 96 = R2 .
22
3 Selection of variables
3.1 Introduction
When studying a variableY , usually we aregoing to start with many explana-
tory variables. In this section we show how to select the variables in a suitable
way. We are going to study the so-called "forward selection".
The basic ideas are the following:
if we have to choose between 2 variables, we choose the best variable;
it is desirable to choose variables with a high contribution;
we want to avoid QMC
we want the variables with the highest marginal contribution.
3.2 Selection
We are going to explain the steps by an example.
The data and the di¤erent parts of the calculations are available
from the website: www.edwardomey.com (teaching - econmetrics).
The problem is the following. We want to …nd a model for the price Y of
cars. To this end we collected information about cars and the collected data.
The variables that we intend to use are the following:
Y: the price of a car
X(1): Power (pk)
X(2): Length (mm)
X(3): Weight (kg)
X(4): Fuel amount (l)
X(5): size backspace (l)
X(6): # doors
X(7): Gasoil? (yes = 1)
X(8): Computer? (yes=1)
23
3.2.1 Step 1: correlation coe¢ cients
As a starting point, we calculate all correlations between Y and all variables.
In our example we …nd the following table:
Y X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8)
Y 1
X(1) 0,80 1
X(2) 0,68 0,58 1
X(3) 0,86 0,62 0,76 1
X(4) 0,62 0,42 0,51 0,80 1
X(5) 0,60 0,50 0,81 0,69 0,45 1
X(6) 0,24 0,25 0,70 0,23 0,13 0,58 1
X(7) 0,21 -0,16 0,14 0,39 0,33 0,17 -0,06 1
X(8) 0,60 0,54 0,49 0,52 0,31 0,45 0,29 0,06 1
We can use this correlation table for several purposes.
Sign of the correlation coe¢ cients From the theoretical point of view,
we expect that X(1) and Y have a positive correlation. In the table, we see
that the data con…rm our theoretical expectiations. We can check all the
sign to see whether or not they con…rm the theoretical expectations.
If we notice that a sign is wrong, it is possible that our theoretical motiva-
tion is wrong. It is also possible that the correlation coe¢ cient has the wong
sign, but that it si statistically not signi…cant. For the t test, see below.
Finally it is also possible that we have a problem with outliers.
Sort the variables w.r.t. Y In the table we can see that X(3) has the
highest correlation with Y . If we select variables, X(3) will be our natural
…rst choice.
Sorting the correlations (in absolute value) w.r.t. Y we …nd the following
table:
24
Variable X r(Y; X)
X(3) 0; 86
X(1) 0; 80
X(2) 0; 68
X(4) 0; 62
X(5) 0; 60
X(8) 0; 60
X(6) 0; 24
X(7) 0; 21
The table shows us the order in which variables are going to enter the
model.
Small correlations Before studying the problem we hope that all variables
will be important in the model. Sometimes some correlation coe¢ cients are
very small and maybe the variable is not as important as we thought.
When we see small correlation coe¢ cients, several cases can occur
* the correlation measures a linear relationship. Maybe the relationship
in our case is not linear but some other nonlinear relationship. To check this,
we make an X Y -scatter and decide to transform the variable(s) or not.
* it is possible indeed that the value of the variable is small and perhaps
we can delete the variable from the model.
t-test To test whether or not a calculated correlation coe¢ cient is "small",

we use a t test. We have the following test:
H0 : = 0 vs Ha : 6= 0
Using the sample correlation coe¢ cient r we calculate the t statistic:
r
n 2
t(r) = r
1 r2
and under suitable conditions, we have t(r) s tn 2 , the Student t distribution
with parameter n 2. Using prob-values we can decide to reject H0 or not.
In our example we perform the t test for r(X(6); Y ) and r(X(7); Y ). We
…nd n = 106 and
r(X(6); Y ) = 0; 24; t(r) = 2; 53; Prob-value (two-sided): 0; 012
r(X(7); Y ) = 0; 20; t(r) = 2; 24; Prob-value (two-sided): 0; 027
If we choose = 5%, then we can reject H0 in both cases.
25
3.2.2 Choice of the …rst variable
It is clear that the …rst variable in the model will be X(3) because X(3) has
the highest correlation w.r.t. Y . Now we construct a …rst model:
model 1: Y = a + bX(3) + ".
Using EXCEL, we …nd the following result:
Yb = 7406; 5 + 24; 23X(3)

2
R = 0; 73
Note that the coe¢ cient of X(3) is positive. This corresponds to the
theoretical expextations and with the positive correlation coe¢ cient that we
found earlier.
To see if this model 1 is statistically signi…cant, we perform an F test.
We have the following
H0 : the model is statistically not signi…cant

Ha : the model is statistically signi…cant
To choose between H0 and Ha , we use the R2 and calculate the following

F value
R2 (p 1)
F (R2 ) = F =
(1 R2 )=(n p)
Here we use n = the sample size, and p = the number of parameters in the
model.
If R2 is small (resp. not small), we …nd an F value that is small (resp.not
small). In order to decide whether or not the calculated F value is su¢ -
ciently large, we calculate its prob-value by using a Fisher F distribution
F (p 1; n p)
For our example, we have
R2 = 0; 73
F = 283; 6
n = 106 and p = 2
Prob-value is 1; 76 10 31
We conclude that our model 1 is a statistically relevant model.
26
3.2.3 Choice of a second variable
Taking into account QMC
As a …rst candidate, we can choose X(1) as a second variable.
To see if it is allowed to choose X(1), we have to check for QMC.

In our case we …nd that r(X(1); x(3)) = 0; 62. Earlier we have decided
that this value indicates problems with QMC.
The next candidate is X(2) and now we …nd r(X(2); X(3)) = 0; 76 and
again QMC
The next candidate is X(4) and we have r(X(4); X(3)) = 0; 80 and

again QMC
For the same reason X(5) has to be kicked out of the model.
For X(8) we …nd that r(X(8); X(3)) = 0; 52 and this value is acceptable
from the QMC-point of view.
Model 2 Now we construct a second model:
model 2: Y = a + bX(3) + cX(8) + ".
Yb = 3170; 6 + 21; 12X(3) + 4929; 33X(8)

2
R = 0; 764
Note that the coe¢ cients of X(3) and X(8) are positive. This corresponds
to the theoretical expextations and with the positive correlation coe¢ cients
that we found earlier.
To see if the model is relevant, we perform the F test as before. We …nd
R2 = 0; 76
F = 166; 6
n = 106 and p = 3
27
Marginal contribution of X(8) Before we decide that we are going to
keep X(8) as a second variable, we have to check whether or not the contri-
bution of X(8) to the model is statistically signi…cant.
We de…ne the M C(X(8)) = the marginal contribution of X(8) as the
change in the value of R2 .
Model 1: R2 = 0; 73
Model 2: R2 = 0; 764
MC(X(8)) = 0; 764 0; 73 = 0; 034.
To evalueate the marginal contribution, we perform an F test as follows:

F value of MC(X(8)):
M C(X(8))
F = 2
(1 Rmod el2 )=(n pmod el2 )
We calculate the prob-value by using an F (1; n pmod el2 ) distribution
In our example we …nd n = 106 and p = 3 so that

F = 14; 05
Prob-value = 0; 0003
Since the prob-value is small ( 5% or 1%), the marginal contribution
of X(8) is statistically signi…cant or relevant.
3.2.4 Choice of a 3rd variable: candidate 1

Taking into account QMC The …rst candidate is variable X(6).
We check for …rst order QMC.
We …nd r(X(6); X(3)) = 0; 23 and r(X(6); X(8)) = 0; 29.
These values are accceptable.
To check for higher order QMC, we proceed as follows: we try to explain
X(6) by X(3) and X(8). If X(6) can be explained well, we have a problem
of QMC. If not, we can proceed to model 3. We estimate the parameters in
the following model:
X(6) = u + vX(3) + wX(8) + ".
We …nd the following result:

b
X(6) = 3; 85 + 0; 00029X(3) + 0; 5629X(8)
2
R = 0; 094.
28
It turns out that we can explain 9; 4% of the variation in X(6) by X(3); X(8).
This small number indicates that there is no problem with QMC.
We make the following agreement: if we check for QMC, we calculate

R2 and then:
If R2 0; 36, we don’t have problems of QMC;
If R2 > 0; 36, we have a problem of QMC and have to reject the
candidate variable.
Model 3 Now we construct a new model:

model 3: Y = a + bX(3) + cX(8) + dX(6) + ".
Yb = 3538; 25 + 21; 09X(3) + 4875; 57X(8) + 95; 48X(6)
2
R = 0; 764
Note that the coe¢ cients of X(3), X(8) and X(6) are positive. This
corresponds to the theoretical expextations and corresponds with the positive
correlation coe¢ cients that we found earlier.
R2 = 0; 764
F = 110
n = 106 and p = 4

keep X(6) as a third variable, we have to check whether or not the contribu-
tion of X(6) to the model is statistically signi…cant.
We …nd:
Model 2: R2 = 0; 764
Model 3: R2 = 0; 764
MC(X(6)) = 0.
To evaluate the marginal contribution, we perform an F test as follows:
M C(X(6))
F = 2
29
F =0
Prob-value = 100%
Since the prob-value is large, the marginal contribution of X(6) is statis-
tically irrelevant!
3.2.5 Choice of a 3rd variable: candidate 2

Taking into account QMC The next candidate is variable X(7).
We check for …rst order QMC.
We …nd r(X(7); X(3)) = 0; 39 and r(X(7); X(8)) = 0; 06.
These values are accceptable.
To check for higher order QMC, we estimate the parameters in the fol-
lowing model:
X(7) = u + vX(3) + wX(8) + ".
We …nd the following result:
b
X(7) = 0; 727 + 0; 00079X(3) 0; 266X(8)
2
R = 0; 184.
It turns out that we can explain 18; 4% of the variation in X(7) by

X(3); X(8). This small number indicates that there is no problem with
QMC.
New model 3 Now we construct a new model:
model 3: Y = a + bX(3) + cX(8) + dX(7) + ".
Yb = 4611; 34 + 22; 68X(3) + 4401; 99X(8) 1982; 21X(7)

2
R = 0; 774
Note that the coe¢ cient of X(7) is negative. From the correlation table
we expected a positive contribution.
R2 = 0; 774
30
F = 116
n = 106 and p = 4
We conclude that our new model 3 is a statistically relevant model.

keep X(7) as a third variable, we have to check whether or not the contribu-
tion of X(7) to the model is statistically signi…cant.
We …nd:
Model 2: R2 = 0; 764
Model 3: R2 = 0; 774
MC(X(7)) = 0; 01.
To evalueate the marginal contribution, we perform an F test as follows:
M C(X(7))
F = 2

F = 4; 51
Prob-value = 3; 6%
Since this prob-value is smaller than 5%, the marginal contribution of
X(7) is statistically relevant at the 95% level!
31
4 The basic assumptions
4.1 Introduction
Up to now, we have devoted attention to the technique of selecting variables
and estimating parameters in linear models. As usual in statistics, we want
to obtain con…dence statements about the parameters. To this end we have
to formulate some basic assummptions.
Recall that we consider models of the form
Y = a + bX + cZ + ".
Taking into account our data, we have
Yi = a + bXi + cZi + "i , for i = 1; 2; :::; n.
The basic assumptions are assumptions about "i and about QMC.
4.2 The basic assumptions

4.2.1 BA1: E("i ) = 0; 8i
This basic assumption indicates that we make no systematic errors and that
we didn’t forget important and relevant variables.
The assumption implies that we don’t have outliers in the data and that
we don’t have clusters in the data.
2
4.2.2 BA2: V ar("i ) = ; 8i
This assumptions states that the variance of the model-error term is a con-
stant: it is independent of Y; X; Z and independent of the index i. If the
assumption holds we have homogeneity of the variance and we call the model
a homoscedastic model. If the assumption doesn’t hold, we have a problem
of heteroscedasticity.
4.2.3 BA3: Cov("i ; "j ) = 0; 8i 6= j

This assumption states that the error terms should show no correlation and
they should have no in‡uence on each other. The assumption is automatically
satis…ed if the error terms are independent.
32
2
4.2.4 BA4: "i s N (0; ); 8i
The normality assumption allows us to obtain con…dence statements for the
parameters. Also, we are allowed to perform the F tests (cf. selection of
variables) if this basic assumption holds.
4.2.5 BA5: Assumptions about the explanatory variables

Earlier we have discussed QMC. In our models we are going to avoid QMC-
problems.
Also we assume that in our models Y = a + bX + cZ + " all randomness
is in the "-term and not in the variables Y; X; Z.
4.3 Statistical properties of the LS-estimates

Recall that for the simple linear model Y = a+bX +" and the approximation
a + bbX, we have found that
Yb = b
bb = V (X; Y ) ,
V (X)
a = Y bbX.
b
The parameters of the model are a; b and 2 . The estimates for a and b
a and bb. We have the following important result.
are b
Theorem 1 Suppose that all basic assumptions hold. Then we have

2
b
a s N (a; a ),
b
bb s N (b; 2
b ),
b
where
2 X2 2
b
a = V ar(b
a) = ,
ns2 (X)
1
2
b
b
= V ar(bb) = 2 2 ,
ns (X)
X
a; bb) = 2 2
Cov(b .
ns (X)
33
I would like to stress that we are making all calculations in EXCEL. These
formulas are important to obtain some information about the quality of the
estimators.
We see that the variance of bb depends on 3 elements:
dependence on 2 : if 2 = V ar(") is larger, then V ar(bb) is larger. This
means that if we allow larger ‡uctuations in the error term, there will
be larger ‡uctuations in the parameter estimates.
dependence on n: if n increases, then V ar(bb) dexreases. This means
that we well have better results if the sample size is bigger.
dependence on s2 (X): if s2 (X) increases, then V ar(bb) decreases. This
means that if we are far away from MC and QMC, then the estimates
are better.
The previous result can be used to obtain con…dence intervals for a and
for b. We …nd the following 95% c.i.:
a = ba z2;5% a,
b
b = bb z2;5% bb .
Although these formulas are correct, the are without use! In order to use
the formulas, we need b2a and to …nd this,we need 2 !
2
Theorem 2 Suppose that all basic assumptions hold. Then we have that
can be estimated by s2e , where
SSE
s2e = .
n p
2 2
Moreover, b
a and b
b
can be estimated by sb2a and sb2b , where
X2 1
sb2a = s2e
2
, sb2b = s2e 2 .
ns (X) ns (X)
As a compensation, in the con…dence statements we have to replace the
normal distribution by a t distribution. We …nd the following 95% c.i.:
a = ba tn a,
p;2;5% sb
b = bb tn b.
p;2;5% sb
In practise, EXCEL calculates all things that we need.
34
4.4 Example
We have a look at Model 3B of the previous section. The model was the
following:
model : Y = a + bX(3) + cX(8) + dX(7) + ".
The full excel-output is given by the following 3 parts:

SUMMARY OUTPUT
Regression Statistics
Multiple R 0,879
R square 0,774
Adjusted R square 0,767
Standard Error 4356,7
Observations 106
ANOVA
df SS MS F Signi…cance F
Regression 3 6,6E09 2,21E09 116,32 8,55E-33
Residual 102 1,9E09 1,9E07
Total 105 8,6E09
Coe¤. Stand.Err t-stat P-value Lower 95% Upper 95%

Intercept -4611,4 2558 -1,78 0,077 -9745 522
X(3) 22,68 1,723 13,11 1,2E-23 19,24 26,11
X(8) 4401,99 1317 3,34 0,001 1789 7014
X(6) -1982,2 937,7 -2,11 0,037 -3842 -122
Part 1
In part 1 we get the regression statistics. Important for us is the R2 value.
In our case it is R2 = 77; 4%.
The standard error of the model is given by se = 4356; 7. It means that
with the model we make errors that have ‡uctuations around 0 and the size
of the ‡uctuations is around se = 4356; 7.
It is wize to compare the size of the errors with the size of Y . The mean
price of the cars in our example is given by Y = 32473; 6. The relative error
in our model is given by
se 4356; 7
= t 0; 13.
Y 32473; 6
35
With the model we make relative errors around 13%. As a rule of the thumb,
we are glad if the relative error is less than 10%.
Part 2
Part 2 is devoted to the analysis of the variance
In our notations, we have
(regression) SSR = 6623984149
(residuals) SSE =1936113261
(total) SST = 1936113261
The abbreviation "df" means degrees of freedom. When we calculate the

mean square, we …nd for example that
SSE 1936113261
= = 18981502; 56
n p 106 4
This is s2e = 18981502; 56. Calculating the square root gives se = 4356; 7
which we got in part 1.
The F value is the F statistic that we use to see if R2 is su¢ ciently
large. The prob-value of the F value is given by "F-signi…cance".
Part 3
Part 3 gives the detailed statistical analysis of the parameter estimates.
We consider b, the coe¢ cient of X(3).
From the excel output, we see that
bb = 22; 68
sbb = 1; 723
Using a 95% c.i. we obtain that
b = bb tn p;2;5% sb
b
= 22; 68 1; 98 1; 723
= [19; 24 26; 11]
which is given in the last columns of the table in Part 3.

The results of Part 3 also allow to test the following type of hypotheses.
Let us consider
H0 : b = 0 versus Ha : b 6= 0
36
and let us use = 5%.
The …rst method to choose between H0 and Ha is based on con…dence
intervals. Since the 95% c.i. is given by [19; 24; 26; 11],we conclude that we
reject H0 .
A second method is based on calculating prob-values. The t value of the
sample result is given by
bb b0 bb 22; 68
t value = = = t 13; 11.
sbb sbb 1; 723
Using the t102 distribution, we …nd that
P (jtn 2 j > 13; 11) = 1; 26E 23
This small value allows us to reject H0 .

Remark. Excel always calculates a two-sided prob-value.
37
5 Checking the basic assumptions
5.1 Introduction
In view of the previous section, the basic assumptions are crucial for all
statistical properties of the estimators and the models in econometrics. In
the section we discuss how the check whether or not the basic assumptions
hold. If the basic assumptions do not hold, we have a problem with the
statistical properties and this may contaminate any conclusions we make.
5.2 Basic assumption 1

If basic assumption 1 doesn’t hold, then our estimates are biased and most
of the regression output is not reliable any more.
We check this basic assumption by making a scatter plot of Y and the
calculated residuals or errors e.
We hope to see a graph without outliers and without clusters.
If we see outliers, we have to check the data again: did we make an
input-error? is the data correct? do we have a special data-point?
It is possible that we decide to delete the data line that generates the
outlier.
If we see clusters, we forgot a variable in the model. Careful checking the
data in the cluster(s) may lead to the introduction of a new variable or a
new dummy.
In our example we …nd the following graph:

If basic assumption 2 doesn’t hold, then all estimates related to variances
(s2 e ; sb2a ; R2 ,...) are not longer valid and mostof the statistical analysis (con-
…dence intervals, F -value) is not reliable any more.
We check this basic assumption in several ways
5.3.1 Scatter plots

We maka a scatter plot of e2 and each of the variables Y; X; Z; :::. In a time-
seriesanalysis, we also make a scatter plot (i; e2i ). In each plot we hope to see
a horizontal box without any systematic pattern or trend.
38
BA 1
12000
8000
4000
0
e
10000 15000 20000 25000 30000 35000 40000 45000 50000 55000
-4000
-8000
-12000
Y
39
BA 2
140000000
120000000
100000000
80000000
e²
60000000
40000000
20000000
0
10000 20000 30000 40000 50000 60000
Y
In our example, we …nd the following graphs (Y; e2 ) and (X(3); e2 ). Graphs
of the dummies are not useful.
The graphs show a more or less horizontal box. There is possibly one
outlier that we missed earlier.
5.3.2 Correlation coe¢ cients

As a second tool we calculate the correlation coe¢ cient between e2 and the
variables Y; X; Z; :::. For a timeseries we also calculate the correlation coef-
…cient r(e2 i ; i). Under ideal circumstances, all these correlation coe¢ cients
equal zero. We hope to …nd "small" correlation coe¢ cients.
In our example we …nd:
r(e2 ; Y ) = 0; 02269; r(e2 ; X(3)) = 0; 044;
r(e2 ; X(8)) = 0; 115; r(e2 ; X(7)) = 0; 072
These correlation coe¢ cients seem to be small.
40
BA2
140000000
120000000
100000000
80000000
e²
60000000
40000000
20000000
0
1000 1500 2000 2500
X(3)
5.3.3 Bartlett’s test

In this test, …rst we sort the data with respect to one of the variables and
then divide the data into two equal or almost equal parts. For each of these
parts, we calculate the regression as in the …nal model and for each part we
calculate s2e (I) and s2e (II).
If we have homoscedasticity, then we expect s2e (I) t s2 (II). If on the
other hand these values are "far away" from each other, then we have het-
eroscedasticity.
As an example, we sort our data with respect to Y and then divide the
data into two parts.
Part I contains 53 data lines. In part I, the variable X(8) takes only the
value 0 and has to be excluded from the regression analysis.
For part I, we …nd:
number of parameters: 3
R2 = 0; 103
se (I) = 3396; 46
Observations: 53
Part II also contains 53 data points and all variables vary. For part II,
41
we …nd:
number of parameters: 4
R2 = 0; 629
se (II) = 3641; 08
Observations: 53
Bartlett’s test allows us to choose between the following set of hypothesis:
2 2 2 2
H0 : (I) = (II) vs Ha : (I) 6= (II).
As a statistic, we use , where
s2e (I)
=
s2e (II)
Under H0 (and if all other basic assumptions hold) we have s F (n(I)

p(I); n(II) p(II)), where F denotes the F -distribution, n(I); n(II) are the
sample sizes and p(I); p(II) the number of parameters.
In our example we have = (3396; 46=3641; 08)2 = 0; 87. The F distribution
that we need is the F (53 3; 53 4) = F (50; 49).
Using this F -distribution, we …nd that the prob-value of is given by
P (F (50; 49) 0; 87) t 0; 31.
If we use = 5%, we don’t reject H0 .
We should perform Bartlett’s test by sorting wrt each of the variables.

If we study a timeseries, we also have to divide the data into two parts by
sorting wrt to time i.

This is not a part of the study material this year

If basic assumption 4 doesn’t hold, then all con…dence statements are not
valid anymore. Most of the statistical analysis (con…dence intervals, F -value)
is not reliable any more. We check this basic assumption in several ways.
42
histogram
0,2
0,18
0,16
0,14
0,12
0,1
0,08
0,06
0,04
0,02
0
-8500 -6500 -4500 -2500 -500 1500 3500 5500 7500 9500 11500
5.5.1 Histogram of the residuals

We make a histogram of the residuals and hope it has the normal, bell-shaped
form. First we have to construct a frequency table of the errors ei . We use
around 10 classes of equal lenght.
In our example we …nd:
We …nd back a normal shaped histogram.
5.5.2 The test of Kolmogorov and Smirnov

Kolmogorov and Smirnov compared the empirical distribution function (EDF)
and the theoretical distribution function (TDF).
2
The TDF is, as we assume in BA 4 given by " s N (0; ). We estimate
2
by using s2e and use X s N (0; s2e ) as the TDF.
For each of the residuals ei , we calculate
T DF (ei ) = P (X ei ).
43
The EDF is the empirical distribution of errors, this is, for each of the
residuals, we calculate
1
EDF (ei ) = #(errors ei ).
n
If BA4 holds, then we expect T DF t EDF . If,on the other hand, there
is a major di¤erence between T DF and EDF , we don”t believe that BA4
holds. As a test-statistic, Kolmogorov and Smirnov use the largest di¤erence
between the EDF and the T DF :
KS = max jEDF T DF j
In our example we calculate all ingredients.

Part of the table wit EDF and TDF is given here:
e EDF TDF
1 -8459,737709 0,009433962 0,026084149
2 -8067,756171 0,018867925 0,032029298
3 -7942,166053 0,028301887 0,034155828
4 -7703,271272 0,037735849 0,038521048
9 -5471,484999 0,08490566 0,104584053
....
104 8479,141356 0,981132075 0,974184401
105 8994,578009 0,990566038 0,980514971
106 10710,65694 1 0,993021928
The graph of EDF and TDF is given in the following picture:
Clearly the two distribution functions are rather close to each other. For
the KS-statistic we …nd
KS = max jEDF T DF j = 0; 054.
For KS, we don’t calculate the prob-value, but for small samples, we use
tables with critical values.
Such tables can be downloaded from
www.york.ac.uk/depts/maths/tables/pdf.htm
44
EFD and TDF
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-12000 -7000 -2000 3000 8000 13000
e
45
or from
www.edwardomey.com
For large samples, we use the critical values given by:
r
ln( )
KS( ) = .
2n
p
In our case we …nd KS(5%) = 1; 22= 106 = 0; 118.
Since the calculated value (0; 054) is less than the critical value, we don’t
reject BA4.
46
6 Making predictions
After constructing a model, it is interesting to …nd information about the
quality of the predictions made by the model. To this end we have to use
NEW. For our example, we obtained 10 new data lines:
Y X(3) X(8) X(7)

22990 1530 0 0
29215 1219 0 0
35980 1860 0 1
23800 1220 0 0
17699 1180 0 0
40750 1995 0 1
31500 1708 1 0
31500 1815 0 1
24995 1360 0 0
36450 1870 1 1
For the …rst data line, the real price is given by Y (1) = 22990. On the
other hand, we have X(3) = 1530, X(8) = X(7) = 0. Using the formula
from Model 3B, we have the followin prediction from our model:
Pb = 4611; 34 + 22; 68X(3) + 4401; 99X(8) 1982; 21X(7)

= 4611; 34 + 22; 68 1530 + 0 = 30089; 06.
We can make similar calculations for the other data entries, and we …nd
the following table:
47
Y P
22990 30089,06
29215 23035,58
35980 35591,25
23800 23058,26
17699 22151,06
40750 38653,05
31500 38528,09
31500 34570,65
24995 26233,46
36450 40220,04
At …rst view, some of the predictions look good and other prediction look
bad. We are going to evaluate the quality of the predictions by using several
methods.
6.1 Correlation coe¢ cient

Under ideal circumstances, we have that P = Y . It means that there is a per-
fect linear relationship and moreover that the linear relationship corresponds
to the …rst diagonal.
We calculate the correlation coe¢ cient r(Y; P ) and hope that r(Y; P ) is
"close" to the number 1.
In our case, we …nd r(Y; P ) = 0; 828.
It means that Y and P show a rather strong linear relationship.
6.2 Scatter
To see if (Y; P ) are close to the …rst diagonal, we make an XY-scatter plot.
In our example, we …nd that
From the graph we see that the points are spread around the …rst diagonal.
6.3 Mean absolute deviations

In this section we calculate the mean absolute deviation and the mean ab-
solute relative deviation:
48
45000
40000
35000
30000
P
25000
20000
15000
15000 20000 25000 30000 35000 40000 45000
Y
49
1 X
M AD = jYi Pi j ,
K
1 X Yi P i
M ARD = .
K Yi
In the example, we …nd the following numbers:
We have
Y = 29487; 9
P = 31213; 05
M AD = 3606; 5
M ARD = 0; 134.
Our data ‡uctuate around the central value Y = 29487; 9 and we have
M AD = 3606; 5. Our predictions are predictions with an absolute error of
around 3606. Related to Y this is an error of around 12%.
Looking at the relative deviations, we …nd a relative error of around
13; 4%.
6.4 Mean squared errors

In the place of taking absolute deviations, one can also look at the squared
errors:
1 X
M SE = (Yi Pi )2
K
p
RM SE = M SE
In our example, we …nd

M SE = 18807115
RM SE = 4336; 72
We can also use squared relative errors and we …nd:
1 X Yi Pi
2
M SRE = = 0; 0278
K Yi
p
RM SRE = M SRE = 0; 167
50
These measure are used when we want to compare the quality of predic-
tions of several models.
6.5 More
More measures can be found on the following website:
http://en.wikipedia.org/wiki/Forecasting#Forecasting_accuracy
This website refers also to some interesting examples.
51
7 Some references
7.1 Internet
1. http://en.wikipedia.org/wiki/Econometrics
2. Econometric Theory on Wikibooks:
http://en.wikibooks.org/wiki/Econometric_Theory
3. B.E. Hansen, Econometrics:
http://www.ssc.wisc.edu/~bhansen/econometrics/
4. Econometrics Resources on internet:
http://www.oswego.edu/~kane/econometrics/
5. Empirics and Econometrics:
http://homepage.newschool.edu/het//schools/metric.htm
6. Links to Online Texts and Notes in Econometrics:
http://www.economicsnetwork.ac.uk/teaching/text/econometrics.htm/
7. Startpagina econometrie:
http://econometrie.startpagina.nl/
7.2 Books
1. A.P.Barten,Econometrische lessen Schoonhoven: Academic Service, Economie
en Bedrifjskunde, 1989
2. W.S. Brown, Introducing econometrics. West Publishing Company,
1991.
3. P. Kennedy, A quide to econometrics, 3rd edition, MIT Press, Cam-
bridge, Mass. 1992.
4. D.N.Gujarati, Essentials of Econometrics. Mc Graw Hill International
Edition, New York, 2006
5. D.N.Gujarati, Basic Econometrics, 4th editions, Mc Graw Hill Inter-
national Edition, New York, 2003.
52
6. G.S. MADDALA, Econometrics, McGraw-Hill Ltd., New York, 1977.
7. E. Omey, Inleiding tot de econometrie. Den Arend, Bonheiden, 2003.
8. J. Schmidt, Econometrics, Mc Graw Hill, New York, 2005.
9. M.F. Triola and L.A. Franklin, Business Statistics, Addison-Wesley,

New York, 1994.
10. R.J. Wonnacott and T.H. Wonnacott, Econometrics, 2nd edition, Wi-
ley, New York, 1979.
53

BBA3 Econometrics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BBA3 Econometrics

Uploaded by

Copyright:

Available Formats

ECONOMETRICS: an introduction

HUB - Stormstraat 2 - 1000 Brussel

Academic year 2010 - 2011

1.2 Econometric approach

I (economics) II (data) III (methodology)

We brie‡y discuss the entries of this table.

1.3 Input from economics

1.4 Input from data

1.4.1 There are di¤erent types of data.

1.4.4 Reworking data

1.4.5 Time-series versus cross-sections

1.5 Input from methodology

1.5.1 Stochastic models

where Y is the dependent variable and X; Z; U; V ,... are independent or

This error term summarizes all errors that can occur.

1.5.2 Linear models

1.5.3 "Good" models

1.5.4 Econometric methods

Data. To estimate the parameters, we need data about Y; X; Z; U; V; :::We

Yi = f (Xi ; Zi ; Ui ; Vi ; :::; a; b; c; :::; "i ), for i = 1; 2; :::; n.

Yi = f (Xi ; Zi ; Ui ; Vi ; :::; a; b; c; :::; "i ), for i = 1; 2; :::; n,

Ybi = f (Xi ; Zi ; Ui ; Vi ; :::; a; b; c; :::), for i = 1; 2; :::; n.

ei = Yi Ybi , for i = 1; 2; :::; n.

We consider a model to be the best model, if the sum of the squared

We consider the following 2 models:

Using these formulas, we …nd the following results:

X Y Model 1 errors model 1 Model 2 errors model 2

Simple calculations show the following results:

Yi = a + bXi + cZi + dUi + ::: + "i , for i = 1; 2; :::; n.

Ybi = a + bXi + cZi + dUi + ::: + , for i = 1; 2; :::; n.

The approximation error that we make is ei :

ei = Yi Ybi , for i = 1; 2; :::; n.

We consider a model to be the best model, if the sum of the squared

2.2 Example 1: the simple model Y = a + bX + "

Solution (details are here for completeness)

For the last two terms, we …nd

For the other terms, we use the following notations:

Using these new notations, we have calculated that

SSE = V (Y ) + b2 V (X) 2bV (X; Y ) + n(Y a bX)2 .

SSE = V (Y ) + b2 V (X) 2bV (X; Y ).

a and bb are called the least squares estimates for a

2.2.2 Some important remarks

The regression line is given by Yb = b a + bbX. Since Y = b

For the errors we …nd

It follows that the mean error is zero:

For the variation of Yb , we …nd that

Combining these two formulas, we …nd that

Variations: we use the following notations:

SST = V (Y ) = the variation of Y

The previous remark shows that SST = SSR + SSE.

2.2.3 Numerical example

Correlation coe¢ cient As a …rst indicator we can calculate the corre-

Scatter plot As a second indicator, we can make a scatter plot of Y Yb .

ANOVA As a third indicator we can analyse the variations or variances

Cov(X; Y ) = bs2 (X),

Cov(X; Y ) = bs2 (X) + cCov(X; Z),

We are going to use EXCEL to solve this mathematical problem.

2.3.2 Multicolinearity and quasi-multicolinearity

and there is no solution!

In this case, we can simplify again and we …nd

or just only one equation: b6 + 10c = 5. It is clear that we …nd many

De…nition of MC and QMC Looking at the system of equations that