You are on page 1of 160

Application of Econometrics in

Managerial Economics
Dr. Manoj Kumar Dash
M.A; M.Phil; MBA; Ph.D
What is Econometrics?
Simply, Econometrics, means economic
measurement.

Some formal definitions:
Econometrics is concerned with the empirical
determination/quantification of theoretical
postulations economics, management, political
science etc.


Econometrics is defined as the social science in
which the tools of economic theory, mathematics,
and statistical inference are applied to the
analysis of economic phenomena (Goldberger,
1964).

Econometrics .. consists of the application of
mathematical statistics to economic data to lend
empirical support to the models constructed by
mathematical economics and to obtain numerical
results (Gerhard Tintner, 1968).

Econometrics renders a positive help in trying
to dispel the poor public image of economics
as a subject in which empty boxes are opened by
assuming the existence of can-openers to reveal
contents which any ten economists will interpret
in 11 ways.
Examples
Consumption expenditure = F (Income, Wealth
etc)
Hourly earnings = F (Education, Labour
productivity etc)
Quantity Demanded = F (Price, Income of
consumer, Prices of relative commodities etc)
Election Outcome = F (Economic performance,
Populism etc)
Debt/Equity = F (Corporate tax rate, Capital
gains tax rate, Inflation rate etc)

Sales = F (Price, Advertising expenses etc)

Crop yield = F (Temperature, Rainfall, Fertilizer
use etc.)

Why Study Econometrics?
(1)To provide empirical support to theories:
Theories make statements or hypotheses that are
mostly qualitative in nature.
They do not provide numerical measure of the
proposed ideas.
Example: Law of Demand in Economics
In a given situation how to quantify this law?
Econometrics comes to the rescue.
(2)To provide empirical support to mathematical
models:
Many theories are expressed in mathematical
form/models (Common in Economics, Finance)
These theoretical models are developed without
regard to their measurability
Econometrics helps to put these models into
empirical testing (e.g. Tax competition models).
(3) For academicians:
Research in economics, finance, management,
marketing etc is becoming increasingly
quantitative [Show examples].
Hence, knowledge about econometrics is very
important to conduct empirical research.
(4) For students:
For students a command over econometrics will
be of great use in their employment.
Example: forecasting of sales, consumer
behaviour (KPOs) corporate research.

Steps in Econometric Analysis
1. Identify research issue
2. Select Variables
3. Check Data Availability
4. Specify Econometric Model
5. Data Collection
6. Model Estimation

7. Hypothesis Testing

8.Using Estimated Model for
Forecasting/Prediction

Step 1: Identify research issue
You should have a research problem to probe
(How to identify a problem?)
Example from Labour Economics: Effect of
economic conditions on peoples willingness to
work (or LFPR).
Two hypothesises/arguments are postulated.

Argument 1 (Discouraged worker hypothesis): As
economic conditions worsen, unemployed may
drop out of labour force (falling LFPR) as they
loose hope of finding a job.


Argument 2 (Added worker hypothesis): As
economic conditions worsen, unemployed may
join labour force (rising LFPR) if the main bread
winner in the family loses job.
In this context, our research objective could be
to test relative strength/validity of these two
contrasting claims.

Step 2: Select Variables
For running a regression, we have to select a
DEPENDENT variable and an INDEPENDENT
variable(s).
A model with one dependent variable and one
independent variable is called simple or two
variable regression model (TWRM).
A model with one dependent variable and more
than one independent variable is called multiple
regression model (MRM).
Example:
LFPR = f (Unemployment Rate) - TWRM
LFPR = f (Unemployment Rate; Hourly
Earnings; Family Wealth) MRM

Generally, MRM is preferred because it
enhances credibility of research findings.

How many independent variables do we include?
Strictly, guiding principle should be underlying
theory or prior literature or nature of problem.
Even then, it may not always possible to include
every possible variable
Why?

Non-availability of data
Inherent randomness in human behaviour, which cannot
be captured easily (Ex: Being unemployed due to
attitude problem).
Inability to properly quantify certain variables (Ex.
Interest groups)
Due to ignorance we may miss relevant variables
Anyway, ultimate purpose of econometrics is not to
capture complete reality
How to rectify this problem?
Add a term called error term which could take
care of influence of all omitted variables.
In other words, error term captures all those
forces that affect the dependent variable but are
not explicitly included in the model.
Error term will also be useful when we use proxy
independent variables (e.g. interest groups)
If the error term is small it implies that the
combined influence of omitted variables is
small/negligible.

Step 3: Check Data Availability
Before proceeding further, check data availability
for relevant variables.
Three types of data are generally available for
empirical analysis Cross-sectional, Time-
series, Pooled Data/Panel Data.
In cross-section data, values of one or more
variables are collected for several sample units,
or entities at the same point in time (e.g. GDP
figure of countries for a given year).

Cross-Sectional data
States Tax Revenue in 2000-01 (in crores)
Andhra Pradesh 10551.92
Bihar 2934.75
Haryana 4311.48
Karnataka 9042.67
Kerala 5870.26
Maharashtra 19724.28
Orissa 2184.03
Punjab 4895.22
Rajasthan 5299.97
In time series data we observe the values of one
or more variables over a period of time daily
stock prices or annual GDP figures

In panel data, same cross-sectional unit (say a
firm or state or country) is surveyed over time

Thus, in panel data we have elements of both
time series and cross-sectional data.

Time Series Data
Year Tax Revenue of Andhra Pradesh
1990-91 2647.25
1991-92 3054.96
1992-93 3388.72
1993-94 3832.93
1994-95 4227.43
1995-96 4120.44
1996-97 4881.83
1997-98 7113.55
1998-99 7961.4
1999-00 9008.6
2000-01 10551.92
Panel Data: Tax Revenues of States
Year AP KAR KER TN
1990-91 2647.25 2332.12 1340.35 3124.05
1991-92 3054.96 2900.2 1673.93 3734.11
1992-93 3388.72 3097.81 1886.97 4162.06
1993-94 3832.93 3812.34 2344.87 4801.37
1994-95 4227.43 4289.31 2799.1 5833.76
1995-96 4120.44 5273.93 3382.68 7151.2
1996-97 4881.83 5767.84 3898.5 7983.45
1997-98 7113.55 6411.87 4501.05 8682.64
1998-99 7961.4 6943.1 4649.5 9625.3
1999-00 9008.6 7744.37 5193.51 10918.93
2000-01 10551.92 9042.67 5870.26 12282.25
Step 4: Specify Econometric Model
LFPR = B
1
+ B
2
UR +u
LFPR Labour force participation rate
UR Unemployment rate (a proxy for economic
condition. Other option could be GDP)
B
1
- Intercept term. Gives value of LFPR (dependent
variable) when UR (value of independent variable) is
zero

B
2
- Slope term. Measures the rate of change in LFPR
for a unit change in UR. Together B
1
and B
2
are
known as the parameters of the regression model
u Error term [LFPR (B
1
+ B
2
UR)]

This is a linear regression model - LFPR is linearly
related to UR

Our objective is to explain the behaviour of dependent
variable in relation to the explanatory variable.


Step 5: Data Collection
For empirical estimation, we need to collect data
on the variables used in the econometric model.
Data can be obtained either from primary sources
[HR] or from secondary sources [Finance,
Economics]
Step 6: Model Estimation
By estimation we mean estimating the parameters
(B
1
& B
2
) of the chosen model.
Estimation is carried out using the technique of
regression analysis.

What is regression analysis?
Regression analysis is concerned with the
study of the dependence of one variable, the
dependent variable, on one or more other
variables, the explanatory variables, with a view
to estimating and/or predicting the (population)
mean or average value of the former in terms of
the known or fixed (in repeated sampling)
values of the latter
Assume that after estimation we get the following
result:
LFPR = 50.63 - 0.45UR
B
2
= 0.45. Implies that if UR goes up by 1%
point, ceteris paribus, LFPR is expected to
decrease on the average by about 0.45% points
Discouraged worker hypothesis finds support.
B
1
= 50.63. Implies that average value of LFRP
will be about 50.63% if the UR were zero (i.e.
full employment).
Sometimes/often, intercept term has no
particular economic meaning.

But, in the present example it has meaning
(How?)

We say on the average because the presence
of error term is likely to make the relationship
somewhat imprecise.

Step 7: Hypothesis Testing
Objective here is to test certain hypotheses
suggested by theory and/or prior empirical
experience on parameters of the model.
Specifically, we are interested to verify how
close the estimated parameter is to a pre-
supposed value of that parameter
(e.g.

) 45 . 0

; 1 :
1 1 0
= ( B B H
Hypothesis testing helps to verify whether the
results obtained through regression analysis
conform to the underlying theory


Step 8: Using Estimated Model for
Forecasting/Prediction
LFPR = 50.6333 - 0.4486UR
If we want to predict LFPR in some future
periods for a given value of UR (say 5) we can
obtain it (How?)
Substitute UR value (5) in the above equation
When data on LFPR (for a given UR) for
future period is out, we can compare the
predicted value with the actual value.

The discrepancy between the two is called the
prediction error.

Regression Vs. Correlation
Regression Correlation
Estimate/predict average value of
one variable on the basis of the
fixed values of other variables
Measure the strength or degree of
linear association between two
variables
The dependent variable is treated
as stochastic or random

The explanatory variables are
assumed to have fixed values
Both the variables are treated as
random
Our Immediate Task
Identification of research issue, selection of
variables, checking of data availability,
specification of econometric model and data
collection are not big tasks
Hence, our focus would be on
Model Estimation
Hypothesis Testing

Criticisms
By and large, events can be explained without
econometric analysis
Data mining Results are created!
Problem with intercept term

Two-Variable Regression Model
Meaning of the term Linear

Term linear can be interpreted in 2 ways:
Linearity in the variables and linearity in the
parameters.

Linear regression always means a regression
that is linear in the parameters. It may or may not
be linear in the explanatory variables
Example:
Y = B
1
+ B
2
X +u (or)
Y = B
1
+ B
2
X
2
+u

Linear Regression Models
Model linear in
parameters?

Model linear in variables?
Yes No
Yes LRM LRM
No NLRM NLRM
Population Regression Function (PRF)
Let us consider an example of the law of demand.
The demand schedule for commodity x
Price
(X)
Quantity Demanded (Y) No. of
Consumers
Average
demand
1 45,46,47,48,49,50,51 7 48
2 44,45,46,47,48 5 46
3 40,42,44,46,48 5 44
4 35,38,42,44,46,47 6 42
5 36,39,40,42,43 5 40
6 32,35,37,38,39,42,43 7 38
7 32,34,36,38,40 5 36
8 31,32,33,34,35,36,37 7 34
9 28,30,32,34,36 5 32
10 29,30,31 3 30
Total 55

Figure 1: Population Regression Line
0
10
20
30
40
50
60
0 1 2 3 4 5 6 7 8 9 10 11 12
Price
Q
u
a
n
t
i
t
y

D
e
m
a
n
d
e
d
The impression we get from scattergram is that
demand (Y) decreases as price (X) increases, and
vice versa.
The downward slopping line is called Population
Regression Line (PRL).
It is nothing but the locus of conditional means of
the dependent variable for the fixed values of the
explanatory variable(s)

Thus, PRL gives the average value of the
dependent variable corresponding to each value
of the independent variable.

The point on the PRL represents expected or
population mean value of Y corresponding to
the various Xs.

The adjective population comes as our example
deals with entire population of 55 consumers.
PRL can be expressed in following functional
form:

E(Y/X
i
) = B
1
+B
2
X
i
(1)
where i is ith subpopulation


Eq (1) gives average value of Y corresponding to
each value of X and is called Population
Regression Function (PRF) or non-stochastic
PRF.




In regression analysis our interest is in
estimating the PRFs (i.e. B
1
and B
2
) on the basis
of observations on Y and X.

Stochastic Specification of the PRF
How to explain the demands of the individual
consumer in relation to price?



The best we can do is to say that any
individuals demand is equal to the average for
that group plus or minus some quantity.
D
e
m
a
n
d


O

2

7

X
36
u
32

Price
u
46

Y
48
We can express the deviation of an individual Y
i

around its expected value as follows:

Y
i
= E(Y/X
i
) + u
i
(2)


In Eq. (2), u
i
is an unobservable random variable
called stochastic error term - taking positive or
negative values.

Now, by substituting Eq. (1) in (2), we get

Y
i
= B
1
+B
2
X
i
+u
i
(3)

Eq. (3) is called stochastic PRF (SPRF), whereas
Eq. (1) is called non-stochastic PRF (NPRF).

NPRF represents means of various Y values
corresponding to specified prices.

SPRF tells us how individual demands vary around
their mean values due to presence of stochastic error
term, u.





How to interpret SPRF (Eq.3)?

We can say that demand of an individual consumer
(say i) corresponding to a specific price can be
expressed as sum of following 2 components:
(i) Systematic/deterministic component:
B
1
+B
2
X
i
(Nothing but average quantity demanded
by all the consumers at a given price level X
i
)


(ii) Nonsystematic/random component: u
i

(Determined by factors other than price) [See
Figure]

D
e
m
a
n
d


O

2

7

X
36
u
32

Price
u
46

Y
48
Sample Regression Function (SRF)
If we have data on whole population (like in
following Table) arriving at PRF is an
easy/straightforward exercise

That is, find conditional means of Y
corresponding to each X and then join these
means

Unfortunately, in practice, we rarely have entire
population at our disposal.



The demand schedule for commodity x
Price
(X)
Quantity Demanded (Y) No. of
Consumers
Average
demand
1 45,46,47,48,49,50,51 7 48
2 44,45,46,47,48 5 46
3 40,42,44,46,48 5 44
4 35,38,42,44,46,47 6 42
5 36,39,40,42,43 5 40
6 32,35,37,38,39,42,43 7 38
7 32,34,36,38,40 5 36
8 31,32,33,34,35,36,37 7 34
9 28,30,32,34,36 5 32
10 29,30,31 3 30
Total 55
We only have a sample from the population.

The following is an example from our case
Sample 1
Y (Demand) X (Price)
49 1
45 2
44 3
39 4
38 5
37 6
34 7
33 8
30 9
29 10
Hence, our task is to estimate the PRF on the basis of
sample information

[OR]

Task is to estimate average quantity demanded in the
population as a whole corresponding to each X (price)
from sample data such as above



But, we may not be able to estimate the PRF accurately
because of sampling error
To see this clearly,
suppose another random
sample (Sample 2) is
drawn from the population
of above Table.

If we plot the data of these
two samples, and entire
population we may obtain
corresponding SRLs and
PRL as follows
Y (Demand) X (Price)

51 1
47 2
46 3
42 4
40 5
37 6
36 7
35 8
32 9
30 10
D
e
m
a
n
d


Y

X
SRL
1

SRL
2

PRL

Price
Now the question is: which of the two SRLs
represents the true PRL?

If we avoid temptation of looking at above
figure, which represents the PRL, there is no way
we can be sure that either of the SRLs represents
the true PRL

In general, we get K different SRLs for K
different samples and all these SRLs are not
likely to be the same








Now, analogous to PRF that underlies the PRL,
we can develop SRF to represent SRL as
follows:



= estimator of E(Y/X
i
)
b
1
= estimator of B
1
b
2
= estimator of B
2


Where is read as Y-hat or Y-cap

Y
) 4 (

2 1 i i
X b b Y + =
i
Y

An estimator is a rule or formula that indicates


how to estimate the population parameter at
hand.

A particular numerical value obtained by the
estimator is an estimate.


Now, we can express SRF (Eq.4) in its
stochastic form as follows:


= b
1
+ b
2
X
i
+ e
i
(5)

Where e
i
is the estimator of u
i

It is analogous to u
i
and is introduced for same
reasons as u
i
was introduced in the PRF


i
Y
To sum up, our primary objective in regression
analysis is to estimate the stochastic PRF (Eq.3)
on the basis of stochastic SRF (Eq.5) because
more often than not our analysis is based on a
single sample from some population.



But, because of sampling variation, our estimate
of PRF based on SRF is only approximate.
D
e
m
a
n
d


Y

X
PRF
SRF

Price
Error
Error
Granted that SRF is only an approximation of the PRF
the question is:


Can we find a method or a procedure that will make
this approximation as close as possible?
[OR]
How should we construct the SRF so that b
1
is as close
as possible to B
1
and b
2
is as close as possible to B
2
?



This can be done by adopting the method of Ordinary
Least Squares (OLS).
What is OLS method?
It chooses SRF or b
1
and b
2
in such a way that that
the sum of the squares of the residuals is as small
as possible.

Symbolically,
Minimize
Where Y
i
= actual Y value
= estimated/predicted Y value


In this way, SRF is made as close as possible to
PRF


( )

=
2
2

i i i
Y Y e
i
Y

Why and not ?



Due to two reasons.
Reason 1: To give different weightage to
residuals according to the extent of their
closeness to SRF

Example: e
1
= 10, e
2
= -2, e
3
= +2 and e
4
= -10

By squaring e
i
we can give more weightage to
large errors (10) comparing others (2)


2
i
e
i
e
Reason 2: This procedure avoids the problem of
sign of the residuals which can be positive as
well as negative, and therefore can add to zero.
How to select b
1
and b
2
values to minimise ?
2
i
e

From Eq. (5), we know that




Which is nothing but


Hence, = f (b
1
, b
2
)

Hence, for any given set of data, choosing
different values of b
1
and b
2
will give different
e
i
s and hence different values of
( )
2
2
i i i

e Y Y =

( )
2
2 1

i i
X b b Y
2
i
e
2
i
e
4 1 2.929 1.071 1.147 4 0 0
5 4 7.000 -2.000 4.000 7 -2 4
7 5 8.357 -1.357 1.841 8 -1 1
12 6 9.714 2.286 5.226 9 3 9
Sum:28 16 0 12.214 0 14
An Example:
i
Y i
X
i
Y
1

i
e
1

i e 1
2

i
Y
2

i
e
2

i e 2
2

Notes:






Now which sets of estimated b (parameter)
values should we choose?

Since b values of 1
st
experiment gives us lower
than that obtained from b values of 2
nd
experiment,
we say bs of first experiment are best values.


( ) 357 . 1 ; 572 . 1 357 . 1 572 . 1

2 1 1
= = + = b b X Y
i i
) 1 ; 3 ( 1 3

2 1 2
= = + = b b X Y
i i
)

1 1 i i i
Y Y e =
( )
i i i
Y Y e
2 2

=
2
i
e
But how do we ascertain this?
We still can choose many more values for bs
that gives us the least possible value of
However, in doing so we must be sure that we
have considered all the conceivable values of b
1

and b
2
If we have infinite time and patience we can do
this exercise


2
i
e
But, fortunately, OLS method chooses b
1
and b
2

in such a manner that, for a given set of data,
is as small as possible.

[OR]

For a given sample, OLS method provides us
with unique estimates of b
1
and b
2
that give the
smallest possible values of
2
i
e
2
i
e
How do we accomplish this?
This is a straight-forward exercise in differential calculus.


Values of b
1
and b
2
that minimize are obtained by
solving the following two simultaneous equations:





These simultaneous equations are known as least squares
normal equations.

2
i
e

+ =
i i
X b nb Y
2 1

+ =
2
2 1 i i i i
X b X b X Y
In above equations n is sample size
Unknowns are bs. Knowns are quantities
involving sums, sum of squares, and sums of
cross products of the variables Y and X
The knowns can be obtained from sample at
hand
Solving these equations simultaneously, we
obtain following solutions for b
1
and b
2
:






Where
- Mean of Y variable
- Mean of X variable
- Deviation from sample mean values
- OLS estimators
X b Y b
2 1
=
( )( )
( )


=
2
2
X X
Y Y X X
b
i
i i
Y
X
Y Y X X
i i
,
2 1
, b b
Example:
YX
Y X
49 1 -4.5 11.2 20.25 -50.4
45 2 -3.5 7.2 12.25 -25.2
44 3 -2.5 6.2 6.25 -15.5
39 4 -1.5 1.2 2.25 -1.8
38 5 -0.5 0.2 0.25 -0.1
37 6 0.5 -0.8 0.25 -0.4
34 7 1.5 -3.8 2.25 -5.7
33 8 2.5 -4.8 6.25 -12
30 9 3.5 -7.8 12.25 -27.3
29 10 4.5 -8.8 20.25 -39.6
Mean = 37.8 Mean = 5.5
82.5 -178
X X
i

Y Y
i

2
) ( X X
i
( )( ) Y Y X X
i i

Using the above formulas we obtain estimates
of b
1
and b
2
as follows

b
2
= -2.1576

b
1
= 49.667

The results can be obtained using regression
packages without much effort (Demonstration)

Properties of OLS estimators
OLS estimators b
1
and b
2
satisfy the BLUE
Best Linear Unbiased Estimator property


Linearity:
Estimators are a linear function of the dependent
and independent variables
A linear estimator is much easier to deal with
than a nonlinear estimator


Unbiasedness:
Average or expected value of the estimators is
equal to true/population value

Minimum variance/Efficiency:
It has minimum variance in the class of all such
linear unbiased estimators
Smaller the variance of b
1
or b
2
, the closer they
will be to true B
1
or B
2
Implication: Regression coefficient estimated by
OLS on average coincides with population/true
value


Assumptions Underlying OLS
Assumption 1:
The regression model is linear in parameters (Bs)

Assumption 2:
Explanatory variables (Xs) are fixed in repeated
sampling
Implication: Changes in Y is conditional on the
given values of X
Example:
The demand schedule for commodity x
Price (X) Quantity Demanded (Y)
1 45,46,47,48,49,50,51
2 44,45,46,47,48
3 40,42,44,46,48
4 35,38,42,44,46,47
5 36,39,40,42,43
6 32,35,37,38,39,42,43
7 32,34,36,38,40
8 31,32,33,34,35,36,37
9 28,30,32,34,36
10 29,30,31
Assumption 3:
Mean value of disturbance u
i
is zero
Because positive u
i
values cancel out negative u
i

values (see figure)
Implication: factors not explicitly included in the
model (i.e. u
i
) dont systematically affect mean
of Y (or) u
i
s average effect on Y is zero


X
X
1
X
2
X
3

PRF
+ u
i

- u
i

Mean
.
.
.
.
.
.
Assumption 4:
The variance of u
i
is same for all observations
(Xs) [Homoscedasticity or equal (homo) spread
(scedasticity)]
Implication: Variation around regression line of
individual Y values remains same regardless of
values taken by Xs; it neither increases or
decreases as X varies
Hence all Y values corresponding to the various
Xs are equally important.

Violation of this assumption (i.e. increase in variation
around regression line of Y values as X increases) is
called heteroscedasticity

Example for homoscedasticity: Richer families on
average consume more than poorer families, but there
is no/not much variability in consumption pattern
between richer and poorer families

Example for heteroscedasticity: There is greater
variability in consumption pattern of richer families
compared to poorer ones becoz. as income grows
people have more consumption choice.



Assumption 5:
No autocorrelation between error terms
(Implications for time series data)

Example for autocorrelation:
Y
t
= B
1
+B
2
X
t
+u
t
, where u
t
and u
t-1
are positively
correlated. Here, Y
t
depends not only on X
t
, but
also on u
t-1
for u
t-1
to some extent determined u
t

By invoking no autocorrelation assumption, we
consider only the effect on X
t
on Y
t
and not
worry about other influences that might act on Y
as a result of possible inter-correlations among
us
Assumption 6:
No correlation between u and explanatory
variables (Xs)

Implication: If X and u are correlated, it is not
possible to assess their individual effects on Y

If X and u are positively correlated, X increases
when u increases and it decreases when u
decreases.
Assumption 7:
Number of observations n must be greater than the
number of parameters to be estimated or explanatory
variables.

Assumption 8:
X values in given sample must not all be the same
(Applies to Y as well)

If so, X
i
= and hence denominator of estimator b
2

will be zero.

This makes it impossible to estimate b
2
and b
1
X
Assumption 9:
The regression model is correctly specified

Omission of important variables or inclusion of wrong
variables undermines the validity of regression exercise

Theory should be the guiding principle in building
econometric model

If theory is not clear, we have to use some judgment in
choosing the model and interpreting the results (e.g. tax
competition)

But, data mining should be avoided.
Assumption 10:
There are no perfect linear relationships among Xs
[No multicollinearity]

Important with respect to multiple regression models

Implication: In the presence of multicollinearity, we
cannot assess the separate influence of Xs on Y.

All these assumptions pertain to PRF only and not
SRF. This means that SRF may not always duplicate
all these assumptions (Example: presence of
autocorrelation and multicollinearity problems).
Coefficient of Determination (r
2
)
This is a measure of goodness of fit of the (sample)
regression line to a given set of data
[OR]
It is a summary measure that tells how well SRF fits
given data

r
2
measures % of total variation in Y explained by
regression model or X(s).

A perfect fit of regression line is rarely the case


Generally, there will be some positive and
negative errors

Our goal is to minimize the errors as far as
possible

See Figure: If all the observations were to lie on
the regression line, we would obtain a perfect
fit.

Y
X
X
1
X
2
X
3
X
4
SRF
Y
i
1
u
2
u
3
u
4
u
i
1 2 i
Y X =| +|
The concept of coefficient of determination can
be explained using the following diagram
Y
O
X
i
X
Y
i
Y
Total= (Yi - )
Y
SRF
e
i
= Due to residual
( - ) = Due to
regression/Explained by X
(Why?)
i
Y

Y
i
Y




- Mean of the sample data

- Predicted value of Y for a given X (Point on
SRF)

- An individual sample observation

Numerical proof (Consider following
example)


( ) ( ) ( )
i i i i
Y Y Y Y Y Y

+ =
i
Y
Y
i
Y

Y X
49 1
45 2
44 3
39 4
38 5
37 6
34 7
33 8
30 9
29 10
= 37.8
Y
For the above data b
2
= -2.1576; b
1
= 49.667

Hence = 49.667 2.1576X
i

For X
i
= 1, = 47.509

Now applying for Y
i
= 49 we
get

49-37.8 = (47.509-37.8) + (49-47.509)

11.2 = 11.2, i.e. LHS = RHS

i
Y

i
Y

( ) ( ) ( )
i i i i
Y Y Y Y Y Y

+ =
By squaring the above identity and summing
them we get,


- Called TSS (Total variation in Y)

- Called ESS (Variation due to X)

- Called RSS (Variation due to error)
( ) ( ) ( )
2 2
2

i i i i
Y Y Y Y Y Y + =

2
) ( Y Y
i

( )
2

Y Y
i
( )


2
2

i i i
e or Y Y
Thus, TSS = ESS + RSS

If all actual Ys lie on fitted SRF, RSS=0 and hence
ESS=TSS (Polar cases)

If X explains no variation in Y, ESS=0 and hence
RSS=TSS (Polar cases)

If ESS is relatively larger than RSS, then the chosen
SRF fits the data well

If RSS is relatively larger than ESS, then the chosen
SRF fits the data poorly






Now, r
2
is defined as

r
2
=


r
2
=


This is nothing but portion of variation in Y (TSS)
explained by X (ESS)

TSS
ESS
( )
( )

2
2

Y Y
Y Y
i
i
Properties of r
2
It is a nonnegative quantity (Why?)

Its limits are 0s r
2
s1

An r
2
of 1 means a perfect fit, that is, for
each i.

An r
2
of zero means that there is no relationship
between Y and X
i i
Y Y =

Example: In a regression with quantity


demanded as dependent variable and price
independent variable an r
2
value of 0.975
implies that price variable explains about 98%
of variation in quantity demanded. In this case,
we can say that sample regression gives an
excellent fit.
Coefficient of correlation (r)
It measures degree of linear association
between Y and X
It is nothing but
In practice r is of little importance
The more meaningful quantity is r
2
(Why?)
r is also called simple correlation coefficient or
correlation coefficient of zero order
2
r
Interpretation of r: r
12
means correlation
between variable 1 (say Y) and variable 2 (say
X
2
)
Standard Error (SE) of Regression Coefficients
We know that least-squares estimates (b
1
and b
2
)
are estimated using sample data

But since data are likely to change from sample
to sample, the estimates will change as well

Therefore, what is needed is some measure of
reliability of OLS estimators

The precision of an estimate or regression
coefficient is measured by SE
SE is the standard deviation (positive square root
of variance) of sampling distribution (SD) of the
estimator (say b
2
)
SD of an estimator is a distribution of set of
values of estimator obtained from all possible
samples of same size from a given population.
Thus, SE of an estimator is the amount it varies
across samples.

SEs of OLS estimates can be obtained as
follows:






Where var = variance; se = standard error and
is homoscedastic variance of u
i
(Assumption 4)

=
2
2
2
) var(
i
x
b
o

=
2
2
) (
i
x
b se
o
2
2
2
1
) var( o

=
i
i
x n
X
b
o

=
2
2
1
) (
i
i
x n
X
b se
2
o
2

2
2

=

n
u
i
o
Here the variance of b
2
is inversely proportional to



That is, given , larger the variation in X values, the
smaller the variance of b
2
and hence greater the
precision with which b
2
can be estimated.


In short, if there is substantial variation in Xs (recall
Assumption 8), b
2
can be measured more accurately
than when Xs do not vary substantially.

2
i
x
2
o
Hence, what is a big SE of regression
coefficients and what is a small SE depends on
the context (i.e. variation in Xs)
A more standardized statistic, which also gives a
measure of the goodness of fit of estimated
equation is R
2
SEs of regression coefficients can be used for
hypothesis testing and constructing confidence
intervals (discussed later)
Standard Error of Regression/Residuals
SE of regression is the standard deviation
(Positive square root of variance) of individual Y
values about the estimated regression line or
error term


If SE of residuals is high, then deviation will also
be high and hence fitness will be poor


SE of residuals can be obtained using the
following formula



2

2 2
2
2

=

n
x b y
i i
o
Multiple Regression Analysis
Meaning
In single/two variable model there is only one
explanatory variable
In practice, most problems cant be explained by
this model
Example: Apart from prices, demand is a function
of many other factors
Hence, we use multiple regression models which
contain more than one Xs
How the model looks like?

PRF for cross sectional data

PRF for time-series data
Any individual Y value can be expressed as
sum of 2 components
Deterministic component [E (Y
i
)]

Random component [ ]
i i i i
U X B X B B Y + + + =
3 3 2 2 1

t t t t
U X B X B B Y + + + =
3 3 2 2 1
i i
X B X B B
3 3 2 2 1
+ +
i
U
Assumptions
All the assumptions of two-variable
model are applicable in the case of
multiple regression as well
Interpreting Multiple Regression
Eq.
It gives conditional mean value of Y
conditional upon the given/fixed values of Xs

Symbolically


Thus, what we obtain is the mean value of Y for
the given values of Xs
( )
i i i i i
X B X B B X X Y E
3 3 2 2 1 3 2
, + + =
Partial Regression Coefficients
(PRCs)
In multiple regression B
2
&B
3
are called PRCs
B
2
measures change in the mean value of Y per
unit change in X
2
, holding value of X
3
constant
[OR]
B
2
gives direct or net effect of a unit change
in X
2
on E(Y), net of any effect that X
3
may have
on mean Y
Similar explanation is applicable for B
2
as well
To estimate parameters of model we use OLS
method
Let SRF corresponding to PRF described above
as:
Where b
1
, b
2
& b
3
are estimators of unknown
population coefficients B
1
, B
2
& B
3
respectively
e
i
is sample counterpart of residual term U
i i i i
e X b X b b Y + + + =
3 3 2 2 1
Estimating PRCs
OLS principle chooses values of unknown
parameters in such a way that the RSS is as
small as possible

Symbolically,

Minimization of this involves differentiation
with respect to unknowns, setting resulting
expressions to zero, and solving them
simultaneously


( )

2
i
e
( )

=
2
3 3 2 2 1
2
min
i i i i
X b X b b Y e
This procedure generates following formulas for
arriving at numerical values of OLS estimators
b
1
, b
2
& b
3








3 3 2 2 1
X b X b Y b =
( )( ) ( )( )
( )( ) ( )
2
3 2
2
3
2
2
3 2 3
3
2
2
2

=
i i i i
i i i i
i
i i
x x x x
x x x y x x y
b
( )( ) ( )( )
( )( ) ( )
2
3 2
2
3
2
2
3 2 2
2
2
3
3

=
i i i i
i i i i
i
i i
x x x x
x x x y x x y
b
In these formulas, lowercase letters denote
deviations from sample mean values

Properties of OLS estimators
The BLUE property continues to hold here as
well
Multiple coefficient of determination
(R
2
)
Explains proportion of variation in Y explained
by Xs jointly
Conceptually, R
2
is akin to r
2
As in two-variable case, R
2
is defined as
R
2
=

=

R
2
lies between 0 and 1
TSS
ESS


+
2
3 3 2
2
i
i i i i
y
x y b x y b
If R
2
=1, fitted regression line explains
100% of variation in Y.
If R
2
=0, model does not explain any of the
variation in Y
The fit of regression model is said to be
better, closer is R
2
to 1
By and large, as the number of Xs increases
R
2
value increases (Why? See below)

R
2
and Adjusted R
2

One aspect of R
2
is: As the no. of Xs increases,
R
2
almost invariably increases
Why? From elsewhere, we know










) R (
2
TSS
RSS
TSS
ESS
+ = 1
TSS
RSS
R + =
2
1
TSS
RSS
R =1
2

=
2
2
1
i
i
y
e


Here depends on no. of Xs, but not
denominator

Hence, as Xs increase is likely to decrease
(or at least it will not increase). Hence R
2

increases

2
i
e

2
i
e
Increasing R
2

Is it desirable to increase R
2
by adding more Xs?
We should adopt a cautious approach here
Why?
(i) With larger Xs, R
2
gives an overly optimistic
picture of regression fit
(ii) R
2
does not take into account d.f.
(iii) We need to have a measure of goodness of fit that
is adjusted for no. of Xs added in the model

Such a measure is known as [adjusted R
2
]
which is defined as




Here k is no. of parameters in the model
including intercept term

Term adjusted means adjusted for d.f.
associated with sums of squares entering into
above identity


2
R
) 1 (
) (
1
2
2
2

n y
k n e
R
i
i
Features of Adjusted R
2
If k>1, s R
2
; i.e. as Xs increase becomes
increasingly less than R
2
or increases less
than unadjusted R
2



This means that, a penalty is involved in adding
more Xs in to a regression model




can be negative, but not R
2
(Why?)


2
R
2
R
2
R
2
R
What we do in practice?
In practice, mainly R
2
is used to measure
goodness of fit

is used in deciding inclusion of a new
variable

If inclusion of a new variable increases , it is
retained in the model

When does increases? If value of the
coefficient of the added variable is larger than 1

2
R
2
R
2
R
t
Why maximizing opposed?
Our objective is not to obtain a high


The researcher should be more concerned about
logical or theoretical relevance of the Xs to Y
and their statistical significance


If this process produces a high it is well and
good

2
R
2
R
2
R
Hypothesis Testing
The procedure is same as in two-variable case


We can adopt both Confidence interval
approach and Test of significance approach

Testing under CIA
We construct a confidence interval and see
whether hypothesized value of population
parameters (B
1
/ B
2
/ B
3
) lies inside this interval.

If it lies inside, we do not reject H
0

If it lies outside, we can reject H
0


The remaining procedure is same as in the case
of two-variable model
The test (t) statistic we use for this purpose is

(For testing B
1
)


(For testing B
2
)



(For testing B
3
)
) (
1
1 1
b se
B b
t

=
) (
2
2 2
b se
B b
t

=
) (
3
3 3
b se
B b
t

=
Testing under ToSA

Step 1: Set H
0
and H
1
separately for each partial
regression coefficient

Examples: H
0
:B
2
=0 and H
1
:B
2
=0
H
0
:B
3
=0 and H
1
:B
3
=0



Step 2: Compute a test (t) statistic from sample
data (See above)






Step 3: Choose level of significance (o) (or)
probability of committing Type 1 error
(0.01 or 0.05 or 0.10)


Step 4: Find probability of obtaining computed test
(t) statistic for certain d.f.


Note: d.f. is (n-k), where n - no. of
observations, k- no. of Xs including
intercept term


Step 5: If this probability is less than the
prechosen o reject H
0
. Otherwise, accept H
0

(OR)

After Step 3 Use following rules to accept or
reject H
0
Null Hypothesis
(H
0
)
Alternative
Hypothesis (H
1
)
Critical region: Reject H
0
if
B
x
=
0
B
x
>
0
[One tailed] Calculated test statistic (t)> t
o, d.f.

(i.e. Table t value at o level of
significance and certain degrees of
freedom)
B
x
= 0 B
x
< 0 [One tailed] Calculated test statistic (t)< -t
o, d.f.

(i.e. Table t value at o level of
significance and certain degrees of
freedom)
B
x
= 0 B
x
= 0 [Two tailed] Calculated absolute value of test
statistic ( ) > t
o/2, d.f.
(i.e. Table t
value at o/2 level of significance
and certain degrees of freedom)
t
A summary of the t test
ANOVA or F test Relevance

This is a complementary way of hypothesis
testing

Commonly used to test joint H
0
in multiple
regression models

But, can be used in two variable regression model
as well
What is a joint H
0
?

H
0
: B
2
= B
3
= 0


Means that B
2
and B
3
are jointly equal to zero
(or) Xs have no influence on Y


A test of joint H
0
is called a test of the overall
significance of estimated regression line
How to construct F test statistic?
From R
2
discussion, we know that
TSS = ESS + RSS (Or)



The d.f. associated with components of this
identity is

TSS = n-1 because we lose 1 d.f. in computing
sample mean

ESS = 2 (k-1) because ESS is a function of B
2

and B
3
(where k is no Xs)

2
3 3 2 2
2

+ + =
i i i i i i
e x y b x y b y
Y
RSS = n-3 (n-k) because in computing RSS we
need to estimate B
1
B
2
and B
3


In case of two-variable model the corresponding
d.f. are:
TSS = n-1
ESS = 1
RSS = n-2



In general, in a regression model with k
explanatory variables (incl. intercept), the d.f. are
as follows

TSS = n-1 (always)
ESS = k-1
RSS = n-k
Source of variation Sum of squares (SS) d.f. Mean sum of Squares
(MSS) = SS/d.f.
Due to regression
(ESS)

2

Due to residuals
(RSS)

n-3

n-3

Total (TSS) n-1


+
i i i i
x y b x y b
3 3 2 2

2
i
e

2
i
y
2 i 2i 3 i 3i
b y x b y x / 2 +

2
i
e /

Now, by arranging sums of squares and d.f. we


get ANOVA table
Now, define F statistic as


F = =


F =
. .
. .
f d RSS
f d ESS
) (
) 1 (
k n RSS
k ESS

) 3 (
2
2
3 3 2 2


n e
x y b x y b
i
i i i i
Using F-ratio for hypothesis testing

Set H
0
: B
2
= B
3
= 0 & H
1
: Not all Bs are
simultaneously zero

Calculate F ratio using formula

We reject H
0
, if F value computed from formula
exceeds critical/table F value at o level of
significance and given d.f. in numerator and
denominator

Otherwise, we do not reject H0:

Alternatively, if the p value of computed F ratio
is sufficiently low, we reject H
0

Intuitive Reasoning

In F =

Numerator explains variance of Y explained by
Xs

Denominator explains variance of Y not
explained by Xs

If numerator > denominator, F>1

Increasingly large F is an evidence against H
0
. .
. .
f d RSS
f d ESS
Relationship between F and R
2

The null B
2
= B
3
= 0 is same as saying that H
0
:
R
2
=0 (why?)

Thus, F test is also a test of significance of R
2

(i.e. whether R
2
is different from zero)

The relationship between F ratio and R
2
is as
follows



Here, when R
2
=0, F=0

The larger R
2
is, the greater the F value will be

One advantage of this formula is the ease of
computation of F value. All we need to know is
R
2
value
) /( ) 1 (
) 1 (
2
2
k n R
k R
F


=
Testing significance of R
2
using F test

Substitute R
2
value in and compute F
ratio


We reject H
0
: R
2
=0, if F value computed from formula
exceeds the critical/table F value at o level of
significance and given d.f. in numerator and
denominator


Otherwise, we do not reject the null


) /( ) 1 (
) 1 (
2
2
k n R
k R
F


=
Usefulness of this statistic

In cross-sectional data involving several
observations, one generally obtains low R
2

This is due to diversity of the cross-sectional
units

Here, the statistical significance of R
2
value can
be verified using


) /( ) 1 (
) 1 (
2
2
k n R
k R
F


=

You might also like