You are on page 1of 5

Lecture 2: Simple Linear Regression Model

QF06-2015
Sunil Paul

Recap
In the last lecture we have discussed that a joint distribution of any two variables can be factored into two
ways:
f (X; Y ) = f (Y jX):f (X)
and
f (X; Y ) = f (XjY ):f (Y ):
In particular,
1. If f (X; Y ) is bivariate normal (bvn) then f (Y jX) and f (XjY ) are normally distributed,
2. If f (X; Y ) is bivariate normal (bvn) then E(Y jX) and E(XjY ) are linear in X and Y, and
3. var(Y jX) and var(XjY ) are constant.
We can summarize this results as follows;
(Y jX) N ( 1 + 2 X; 2Y jX ) and (XjY )
y
and = xxyy :
x
Similarly

2
yjx

2
y

2 2
2 x

= (1

N(

2 Y;

2
XjY

);where

2 x;

2
xy = x

or

2
y:

Regression as conditional expectation function


Consider a model of consumption (Y) as a function of income(X). Since the model includes two variables
the data generating process of these variables for N population may be governed by some complex bivariate
distribution. Concentrating on conditional distribution, economic theory would suggest that the average
consumption(Y) can be expressed as a function of income(X) using conditional expectations as follows:
E(Y jX) = g(X):
Theoretically the conditional expectation function g() is expected to be an increasing function of X.( As
income increases consumption increases).
Now assume that the conditional expectation function is linear as in the case of bvn, then;
E(Y jX) = + X:
This conditional expectation function is known as population regression function.
The specication given above gives an exact or deterministic relation between Y and X.
Similarly the average consumption of an ith household for a given income is given as ;
E(Y jXi ) = + Xi :
However the actual consumption of ith household need not be equal to E(Y jXi ).
We can denote the discrepancy between actual (Yi ) and expected with ui as follows;
ui = Yi E(Y jXi ):
The discrepancy can occur due to many reasons.
Possible sources of error term are:
Randomness in human behavioral,
Eect of committed variables,
Measurement errors.
With ui ,the deterministic relation specied earlier becomes a stochastic one as follows:
Yi = E(Y jXi ) + ui :
With linearity of CEF we have;
Yi = + Xi + ui ;
where + xi is the deterministic component , u is the stochastic or random component known as error
with a known distribution function, and are known as regression coe cients or population parameters.

Estimation,estimators and estimates


We may not know the population distribution and do not have the luxury to get population data. We need
to estimate the parameters of CEF from sample data. The task of estimation is to determine a rule, method
or function that species the unknown parameters in terms of sample observations of random variables or
(in some cases) xed quantities. In a bivariate case Y and X are the variables (here X is assumed to be
xed in repeated samples).
There are dierent methods to estimate the regression coe cients They are:
Method of Moments
Method of Least Squares
Method of Maximum Likelihood and so on.
These methods, rules or formulas converts sample observations into actual estimate of population regression coe cients ( and ) and are known as estimators. The estimators are often denoted by ^ ; and :
The numerical values obtained using these estimators are called an estimate. As far as simple regression
is concerned these estimators give identical estimates. We will start our discussion with method of least
squares.

Method of Least Squares (OLS)


OLS suggests a criteria to choose the best t line given the observation on dependent and independent
variables.
The principle of least squares suggest that the parameters of a regression model are to be estimated in
such a way that the sum of the squared deviations between the actual and the estimated assumes a minimum
value.
This can beP
stated as follows;
Pn
n
minRSS = i=1 u
^2i = i=1 (Yi
b ;b

^ Xi )2 ;

where RSS is Residual Sum of Squares, ^ ; ^ and u


^i are estimated values of their counterparts. Each
pair of values of ^ and ^ will give dierent set of values for u
^i and ^ ^ Xi = Ybi : We can get the expression
for alpha hat and beta hat as follows:
@RSS
@^

i.e.

= 0 =)

2(Y

Y = n^ + ^

Similarly,
@RSS
@^

i.e.

= 0 =)

YX = ^

2(Y

^ X)( 1) =

(1).

X+^

^ X)( Xi ) =
X2

(2).

u
^ = 0:

Xu
^ = 0:

Equations (1) and (2) are called normal equations.


From eq. (1) we have;

^=Y

^X

Substituting ^ to eq (2) we get;


2

YX =

X(Y

= nX(Y

Thus,
P

^X ) + ^ P X 2

^ X ) + ^ P X 2 ( Using the fact P X = nX):

P
Y X = nXY + ^ ( X 2

nX ):

Rearranging the above equations we get the expression for beta hat;
^=

P
XY nXY
P 2
2
nX
X

where Sxy =

Sxx =

Syy =

P
(X

P
(Y

Similarly the
b2 =

P
u
b
n 2

xy =

(X X)(Y Y )
P
(X X)2

P
(X

P
X)2 = X 2

X)(Y

P
P xy
x2

Y)=

Sxy
Sxx ;

XY

nXY ;

nX and,

Y )2 :
2

has to be estimated from u


bi ;

^ ; ^ are the OLS estimators of and :


Once we get ^ and ^ we can estimate Y i.e. Yb :
Using Yb we can estimate the residuals u
b:

Numerical properties of least squares


The normal equations shows that;
P
1. Residual sum to zero u
b = 0: It also implies regression line pass through the mean values of X and
P
P
P
Y(with
u
b = 0 we have
Y = n^ + ^ X: Dividing both sides by number od sample observation
n we get Y = n^ + ^ X):

2. P
Residuals and independent
variables
are uncorrelated
cov(X
b) =
P
P
P
P u
P0[ from second
P normal eq we have
Xu
b = 0; and
Xu
b =
(x + X)b
u =
xb
u+X u
b =
xb
u (since
u
b = 0):This implies
cov(X u
b) = 0]:
Pb P
3. The sum of the estimated Ybi s from the sample is equal to the sum of the actual Yi s i.e.
Y = Y
P
P
Pb
b
b
bi = Yi Yi hence
u
b=
Y
Y
or Y = Y ( This can be proved using rst property. We know u
P
Pb P
since
u
b = 0 we have
Y = Y ):
Pb
4. LS residuals and the estimated Ybi s are uncorrelated
Y u = 0 ( proof: substitute Ybi = ^
Pb
Y u = 0 and also use property 1 and 2):

^ Xi into

Decomposition of the sum squares


The total variation in Y can be decomposed into variation due to regression and residuals.
Consider,
ui = Yi ^

^ Xi :

^ X we get;
Substituting ^ = Y
^
ui = (Yi Y )
(Xi X) = y ^ x, where yi = (Yi

Y ); and xi = (Xi

Squaring both sides and applying summation we get;


P 2 P
^ x)2 = P y 2 2 ^ P xy + ^ 2 P x2 :
u = (y
P
Using ^ = P xy
x2 we have;
P
P
2P
P
P
P 2 P 2
xy + P xy
x2 = y 2
u = y
2 P xy
x2
x2
Therefore

where

u2 =

y2

^ P xy;

2 ( Pxy)2
x2 +

X):

P
( Pxy)2
x2

y2

^ P xy:

P
P
u2 =RSS, ^ xy = ESS( Explained Sum of Square);and y 2 = T SS( Total Sum of Squares).

Thus TSS=ESS+RSS

2
The proportion of TSS explained by the regression can be denoted by rxy
( Coe cient of determination).

In a simple regression it is the


square of correlation coe cient;
^ P xy
ESS
T SS RSS
2
P
rxy = T SS =
=
T SS
y2 :

Assumptions of classical linear regression model(CLRM)


Since there are dierent estimators to get an estimate of population parameter, we need to check the characteristics of these estimators in terms of its desirable properties. The properties of the estimators depend
on the way the data is generated. Depending on the validity of these assumption we have to decide on the
estimators that should be to used while estimating the population parameters. The assumption regarding
data generating process are known as Assumptions of classical linear regression model(CLRM).They
are:
1. Linearity : The dependent variable Y is calculated as a linear function of independent variables and
an error term.
Violations of this assumption is known as specication errors such as wrong regressors, nonlinearity,
changing parameters.
2. Expected values of error term is zero i.e. E(ui ) = 0 for all i:
Violation of this assumption leads to biased intercept.
3. Common variance. var(ui ) = E(ui )2 =

for all i.

Violation of this assumption is known as heteroskedasticity.


4. Error terms are not correlated with each other i.e. E(ui; uj ) = 0 for i 6= j:
Violation of this assumption is known as autocorrelation.

The above statements from 2 to 5 can be summarized as follows; ui = iid(0;


dently and identically distributed with mean zero and a constant variance.

2_

) i.e. ui are indepen-

5. Independence of Xj :and ui ;i.e. the explanatory variable X is nonstochastic(xed in repeated samples),


and hence, not correlated with the error term
Violations of this assumptions are errors in measuring the independent variable, autoregression and
simultaneous equations bias.
4

Reference:
1. Johnston, J and J. DiNardo (1997) Econometric Methods,McGraw Hill, ( Chapter 1)
2. Maddala,. G.S (1992). Introduction to Econometrics, 2nd ed., Macmillan.(Ch 3)

You might also like