You are on page 1of 4

Chapter 3.

Estimation

Ordinary least squares (OLS) method

The two-variable population regression function is given by Yi = β1 + β2 X i + ui , but we do not


observe it so we estimate it from the sample regression function Yi = βˆ1 + βˆ2 X i + uˆi . This
function is equal to Yi =Yˆi +uˆi .

We can rewrite the sample regression function as uˆi = Yi −Yˆi = Yi − βˆ1 − βˆ2 X i . In other words,
the residuals are the differences between the actual and the estimated Yi values.
With n observations, we want to choose β̂1 and β̂2 such that the sum of the residuals is
n n
minimized: ∑uˆi = ∑(Yi − Yˆi ). This turns out not to be a very good rule because some
i =1 i =1
residuals are negative and some are positive (and they would cancel each other), and all residuals
have the same weight (importance) even though some are small and some are large.

Least squares criterion: Minimize ∑uˆi =∑ =∑


2 2
2
(Yi −Yˆi ) ˆ −β
(Yi −β ˆ X ) with respect to
1 2 i

β̂1 and β̂2 .

Partial differentiation yields


∂(∑uˆi )
2

= −2∑(Yi − βˆ1 − βˆ2 X i )


∂βˆ 1

∂(∑uˆi )
2

= −2∑(Yi − βˆ1 − βˆ2 X i )X i


∂βˆ 2

Setting these two equations equal to zero and rearranging terms yields the normal equations
∑Y i
ˆ +β
= nβ1
ˆ
2 ∑X i

∑Y X = βˆ1 ∑ X i + βˆ2 ∑ X i
2
i i

These equations can be solved simultaneously to obtain

βˆ2 = ∑x y i i
and β
ˆ =Y −β
ˆ X
, where xi = X i − X and yi = Yi −Y (these are
∑x 2 1 2

deviations from the mean values of X and Y.

Numerical properties of least-squares estimators:


I. OLS estimators are easy to compute because they are expressed in terms of
observable quantities
II. They are point estimators
III. Sample regression line can be easily obtained after OLS because:
1. It passes through the sample means of X and Y (i.e., Y = βˆ1 + βˆ2 X )
2. The mean value of the estimated Y is equal to the mean value of the actual Y
3. The mean value of the residuals ûi is zero (see proof on page 64)
4. The residuals ûi are uncorrelated with the predicted Yi (see proof on page 65)
5. The residuals ûi are uncorrelated with X i

Classical Linear Regression Model (CLRM) Assumptions:

A1: Linear regression model [the regression model is linear in the parameters]

A2: X values are fixed in repeated sampling [X is nonstochastic]

A3: Zero mean value of disturbance ui

A4: Homoscedasticity [ Var (ui | X i ) = σ ]


2

A5: No autocorrelation [correlation between any ui and u j ( i ≠ j ) is zero]

A6: Zero covariance between ui and X i [ E (ui X i ) = 0 ]

A7: The number of observations n must be greater than the number of parameters k.

A8: Variability in X values [ Var ( X ) > 0 ]

A9: The regression model is correctly specified [no specification bias]

A10: There is no perfect multicollinearity [there are no perfect linear relationships among
explanatory variables]

Variance and standard errors of least-squares estimates


σ2 σ
Var ( βˆ2 ) = and se ( βˆ2 ) =
2
∑x i
2
∑x i

Var ( βˆ1 ) =
∑X i
2

σ and se ( βˆ1 ) =
2 ∑X i
2

σ
n∑ x i
2
n∑ x i
2

How do we estimate the variance of ui , σ 2 ? Using the OLS estimator of σ 2 , σˆ 2 =


∑ uˆi2 ,
n−2

where ∑uˆi is the residual sum of squares (RSS) and n − 2 are the number of degrees of
2

freedom (df).

σˆ =
∑uˆ 2
i
= the standard error of estimate or standard error of regression. This is the standard
n−2

deviation of the Y values around the estimated regression line, and it is often used as a measure
of “goodness of fit”.
Properties of Least-squares Estimators

An estimator is said to be the best linear unbiased estimator (BLUE) if:


1. It is linear
2. It is unbiased [the expected value is equal to the true value]
3. It is efficient [efficiency means that the estimator has the minimum variance within the
class of all linear unbiased estimators]

Gauss-Markov Theorem

Given the assumptions of the CLRM, the least-squares estimators, in the class of unbiased linear
estimators, have minimum variance, that, is, they are BLUE.

The properties presented above are finite (small) sample properties. We will discuss large sample
properties later on.

Coefficient of Determination (R-squared)

r2 (two-variable case) or R2 (multiple regression) tells us how well the sample regression line fits
the data.

∑(Yˆ ∑uˆ 2
−Y ) 2 ESS i RSS
r2 = i
= or, alternatively, as r 2 =1 − =1 −
∑(Yi −Y ) 2 TSS ∑(Y −Y )
i
2
TSS

Where ESS = explained sum of squares; RSS = residual sum of squares; and TSS = total sum of
squares.

r2 is called the Coefficient of Determination and it “measures the percentage of the total variation
in Y explained by the regression model”. r2 is a non-negative number that ranges from zero and
one. Zero means no fit and an r-squared of one means a perfect fit.

r =± r 2 =
∑x y i i
The sample correlation coefficient can be estimated as (∑xi2 )( ∑yi2 )

Properties of r:
1. It can be positive or negative [depends on the sign of the numerator]
2. −1 ≤ r ≤ 1
3. It is symmetrical [i.e., you get the same value whether you calculate it between X and Y,
or between Y and X]
4. It is independent of the origin and scale
5. If X and Y are independent then the correlation coefficient is zero [but zero correlation
does not necessarily imply independence
6. It is a measure of linear association only
7. It does not imply that there is any cause-and-effect relationship
Monte Carlo Experiment (See Example on Page 92)

A Monte Carlo experiment is essentially a computer simulation that is useful to check the
sampling properties of estimators. If you know the true value of the parameters then you would
choose the sample size, fix the values of the independent variables at a given level and draw
random numbers of the residual to obtain values of the dependent variable. You can do this since
you know the X’s, the betas and u. The generated values for Y are then used with the values of X
to get the parameter estimates (the estimated betas).

You would repeat this experiment 100 or 1,000 times, which will generate 100 or 1,000
parameter estimates. If the average values of these estimates are close to the true values then the
Monte Carlo experiment tells you that your estimator is unbiased. In general, Monte Carlo
experiments are useful when we want to know the statistical properties of different ways of
estimating population parameters.

You might also like