You are on page 1of 49

Basic Econometrics

Chapter 4:
Classical Normal Linear Regression
Model (CNLRM)
Iris Wang

iris.wang@kau.se
Sampling distributions
• We have studied the expected value and
variance of the OLS estimators
• In order to do inference, we need to know the
full sampling distribution of the estimator
• To make this
this sampling distribution tractable
tractable, we
now assume that the unobserved error term (u)
is normally distributed in the population.
¾ This is often referred to as the normality
assumption. (Assumption 10)
Assumption 10: Normality
• We continue to make the assumptions introduced
in the previous lecture (linear regression, no perfect collinearity,
homoskedasticity, …).
zero conditional mean, homoskedasticity

• And we add the following:


• Assumption 10 10: Normality – Th The population
l i error
u is independent of the explanatory variables x1,
x2,…,xk, and d is
i normally distributed
di ib d with i h zero
mean and variance σ2: u ~ Normal(0, σ2)
Recap: The normal distribution
• The normal distribution is very
widely used in statistics &
econometrics (one reason is that
normality simplifies probability
calculations)
• A normal random variable is a
continuous random variable that
can take on any value.
• The shape of the probability
density function (pdf) for the
normal distribution is shown on
the right.
right
• The mathematical formula for
the pdf is as follows:
…where:
Why are we assuming normality?
• Answer: It implies that the OLS estimator follows
a normal distribution too. And this makes it
straightforward to do inference.
• Under the CLM assumptions (1‐7), conditional on the
sample values of the independent variables,
The result that

implies that

• In words, this says that the deviation between the


estimated value and the true parameter value, divided
by the standard deviation of the estimator, is normally
distributed with mean zero and variance equal to 1.
• On p.100
• The assumptions 1—7 are called the classical
linear model (CLM) assumptions.
• One immediate implication of the CLM
assumptions is that, conditional on the
explanatory variables, the dependent variable
y has a normal distribution with constant
variance, p.101.
How justify the normality assumption?
• Central limit theorem (CLT): the residual u is the sum of
many different factors; and by the CLT the sum of
many
random variables is normally distributed
• This argument is not without weaknesses (e.g. doesn’t
hold if u is not additive).
• Whether normality holds in a particular application is
an empirical matter – which can be investigated
• Sometimes using a transformation – e.g. taking the log
– yields a distribution that is closer to normal.
Example:
l and Return on Equity
CEO Salary
• Data: CEOSAL1.SAV (available on course
website)
e pressed in thousands of USD
• Salaries expressed USD.
• It would be interesting to do the sample
distributions of salary in the different
scales.
Sample distributions of CEO salaries in
l l & logs
levels
Basic Econometrics

Chapter 5:
Interval Estimation and Hypothesis
Testing
Iris Wang

iris.wang@kau.se
Confidence intervals
• Once we have estimated the population parameter β
and obtained the associated standard error, we can
easily construct a confidence interval (CI) for βj.
• has a t distribution with n‐k‐1 degrees
of freedom (df).
• Define a 95% confidence interval for βj as

where the constant t0.025 is the 97.5th percentile in


the t distribution.
Confidence intervals
(lower limit)

(upper limit)

Meaning of CI: in 95 out of 100 cases intervals like


Eq. above,
b will contain the
h true βJ.
Confidence intervals
Confidence intervals

The width of the CI is proportional to the standard error of the


estimator.
• the larger the Se, the larger is the width of the CI.
• the larger the , the greater is the uncertainty of
estimating the true of the unknown parameter.

How is the confidence interval affected by an increase in the level


of confidence (e.g. from 95% to 99%)? Why?
Don’t forget the CLM assumptions!
• Estimates of the confidence interval will not be
reliable if the CLM assumptions do not hold.
Example:
• Data: wage1.sav
• These data were originally obtained from the 1976
Current Population Survey in the US.
• SPSS output:
Coefficientsa
Standardized

Unstandardized Coefficients Coefficients 95,0% Confidence Interval for B

Model B Std. Error Beta t Sig. Lower Bound Upper Bound


1 (Constant) -,892 ,686 -1,300 ,194 -2,239 ,456

educ ,541 ,053 ,405 10,143 ,000 ,436 ,645

a. Dependent Variable: wage

• Can you calculate these two CIs by yourself according


tothe
to theformula??
Hypothesis Testing
• In Chapter 3 we learned that Assumptions 1‐7
(such as, linear regression, no perfect collinearity, zero conditional
enable us to obtain
mean, homoskedasticity)
mathematical formulas for the expected value
and variance of the OLS estimators
• To test a hypothesis, we need to know the full
sampling distribution of the estimator
1. Sampling Distribution: Illustration
• Suppose we want to make statements about a
population consisting of (say) 10 million individuals.
• The model is as follows: Y = β0 + β1*x + u
• Suppose we could draw (say) 100 samples from this
population, where each sample consists of (say) 200
observations. Further suppose we would estimate 100
different regressions (one for each sample)
• This would generate 100 different estimates of our
parameter of interest β1 – and they would form the
distribution of our estimator.
Let’s do this!
• Let’s simulate 500 samples consisting of 200
individuals. Our model is Y = β0 + β1*x + u,
where u is normally distributed (and all other
assumptions hold too).
• Since we are simulating data, we can now
choose the true parameters (this would
obviously not be the case for real empirical
applications). Let’s choose β0 = 0 and β1=0.
Here’s the distribution of the 500
different
d ff estimates of β 1

6
4
Density
2

0
0

-.2 -.1 0 .1 .2
b1

Mean of b1: ‐0.005


Std dev of b1: 0.072
2. Why do we need to know the sampling
distribution of the OLS estimator?

• Recall the formula for the t statistic:

• In other words, the difference between the parameter


estimate and a given (unknown) value of the true parameter,
scaled by the standard error of the estimator, follows a
t‐distribution.
t• This is very good news, because we know exactly what the t
distribution looks like (statisticians have studied this
distribution for many years).
• In particular, we know exactlyhow to compute probabilities
using a t distribution – and this will be very useful when testing
hypothesis (more on this shortly)
• Here’s the answer to the question – if we don’t
know the sampling distribution of the OLS
estimator, we can’t be sure that (beta_hat –
beta)/se(beta_hat) follows a t‐distribution.
• In that case, this quantity could follow any
distribution – in which case there’ss no way of
doing the probability analysis that underlies the
hypothesis testing
testing.
Testing the null hypothesis
• In most applications, testing

is of central interest (j corresponds to any of the k


independent variables in the model).
• Since βj measures the partial effect of xj on the
expected value y after controlling for other factors,
th null hypothesis
the h th i means that xj h has no effect on
the expected value of y.
Example: Wage equation
log(wage)=β0 +β1 education+u

• The nullll hypothesis H0: β1=0 means that, education


d i
has no effect on hourly wage.
• Is thi
this an economically iinteresting
t ti hypothesis??
• Now let’s look at how we can carry out and interpret
such a test.
test
• The test statistic we use to test is called
the t statistic or the t ratio of and is defined as

• As you can see, the t statistic is easy to compute: just


divide your coefficient estimate by the standard
error.
• SPSS (and most other econometrics sofware) will do
this for you.
• Since the se is always positive, the t statistic always
has the same sign as the coefficient estimate.
Intuition
Two‐tailed tests
• Consider a null hypothesis like H0: βj=0 against a two‐
sided alternative
lt ti like H 1: βj≠0
≠0.
• In words, H1 is that xj has a ceteris paribus effect on
y, which could be either positive or negative.

Now let’s decide on a significance level


¾ Significance level = probability of rejecting H0 when it is
in fact true (i.e. a mistake).
¾ Let’s decide on a 5% significance level (the most common
choice): hence, we are willing to mistakenly reject H0
when it is true 5% of the time.
Two‐sided (cont’d)
• To find the critical vale of t
(denote by c), we first
specify the significance
level, say 5%.
• Since the test is two‐
tailed, c is then chosen to
make the area in each tail
equal 2.5% ‐ i.e. c is the
97.5th percentile in the t
distribution (again, with n‐
k‐1 degrees of freedom).
• The graph shows that, if
df=26, then c=2.06.
Econometric jargon: If H0: ßj=0 is rejected against a two‐sided alternative, we may
say that ”xj is statistically signifcant at the 5% level”. Thus we conclude that the
effect of xj on y is not zero.
Testing against one‐sided alternatives
• The rule for rejecting H0 depends on:
1. The alternative hypothesis (H1)
2. The chosen significance level of the test
• Let’s begin by looking at a one‐sided alternative of
the form:

• Let’s assume we decide to apply a 5% significance


level, that is, α=5%.
One‐tail test
• Under H0 (βj=0 ), the t statistic has a t distribution.
• Under H1 (βj>0), the expected value of the t statistic
is positive.
• Denote the critical value by c.
• On p.118
Rejection rule: H0 is rejected in favor of H1 at the 5%
significance level if

We’ve seen how to obtain the t statistic.

But how do we obtain c?


¾ To obtain c, we only need the significance level
and the degrees of freedom (df).
Example: For df = 28 and significance level 5%, c=1.701
¾ If our t statistic is less than 1.701, we do not
reject H0
¾ But if our t statistic is higher than 1.701, we do
reject H0
A few points worth noting
• As the significance level falls, the critical value
increases. Why?
• If H0 is rejected at (say) the 5% level, it is
automatically rejected at the 1% level too.
• What is the critical value c for
o A 10% significance level with df=21?
o A 1% significance level with df=120?
• Confirm that, as the df gets large, the critical values
for the t‐distribution get very close to the critical
l
values for the standard normal distribution.
Example: The wage equation
(
(Data: WAGE1.SAV))
Model:
Based on the results below, test H0: β1=0 against H1: β1>0

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.


1 (Constant) -,892 ,686 -1,300 ,194

educ ,541 ,053 ,405 10,143 ,000

a. Dependent Variable: wage


Testing other hypotheses about βj
• Although H0: βj=0 is the most common hypothesis, we
sometimes want to test whether βj is equal to some
other given constant. Suppose the null hypothesis is

• In this case the appropriate t statistic is:

• Now go back and test the hypothesis that the educ


coefficient in the regression above is equal to 1 (against a
two
two‐sided alternative)
alternative).
Computing p‐values for t tests
• You have seen how the researcher chooses the
significance level. There’s no ”correct”
significance level.
• In practice, the 5% level is the most common one, but
10% is also frequently used (especially for small
(
datasets) as is 1% (more common for large datasets)
datasets).
• Given the observed value of the t statistic, what is the
smallest significance level at which the null hypothesis
would be rejected?
• This level is known as the p‐value.
• Example: Suppose t = 1.85 and df=40.
• This results in a p‐value = 0.0718.
p‐values in SPSS
• Correct interpretation: The p‐value is
Coefficientsa
the probability of observing a t value Standardize

as extreme as we did if the null Unstandardized d

hypothesis is true. ☺ Coefficients Coefficients

Model B Std. Error Beta t Sig.

• Wrong interpretation (not 1 (Constant) -,892 ,686 -1,300 ,194

uncommon): ”The p‐value is the educ ,541 ,053 ,405 10,143 ,000

a. Dependent Variable: wage


probability that the null hypothesis is
true….”. /
¾ Thus, small p‐values are evidence
against the null hypothesis. If the p‐
value is, say, 0.04, we might say
there
there’s significance at the 5% level
(actually at the 4% level) but not at
the 1% level (or 3% or 2% level).
Basic Econometrics

Chapter 6:
Extensions of the Two‐Variable
Linear Regression Model
Iris wang

iris.wang@kau.se
Log‐linear regression models
• In many cases relationships between
economic variables may be non‐linear.
• However we can distinguish between
functional forms that are intrinsically non‐
linear and those that can be transformed into
an equation to which we can apply ordinary
least squares techniques.
Log‐linear regression models
• Of those non‐linear equations that can be
transformed, the best known is the
multiplicative power function form
(sometimes called the Cobb‐Douglas
functional form), which is transformed into a
linear format by taking logarithms.
Log‐linear regression models
Production functions
For example, suppose we have cross‐section
data on firms in a particular
part u r industry with
observations both on the output (Q) of each
firm and on the inputs of labour (L) and capital
(K).
C id the
Consider h following
f ll i ffunctional
i l fform
Log‐linear regression models
Log‐linear regression models
Log‐linear regression models

The parameters α and β can be estimated directly from a regression of


the variable lnQ on lnL and lnK
Log‐linear regression models
Log‐linear regression models

You might also like