Professional Documents
Culture Documents
Managerial Economics
Dr. Manoj Kumar Dash
M.A; M.Phil; MBA; Ph.D
What is Econometrics?
Simply, Econometrics, means economic
measurement.
Some formal definitions:
Econometrics is concerned with the empirical
determination/quantification of theoretical
postulations economics, management, political
science etc.
Econometrics is defined as the social science in
which the tools of economic theory, mathematics,
and statistical inference are applied to the
analysis of economic phenomena (Goldberger,
1964).
Econometrics .. consists of the application of
mathematical statistics to economic data to lend
empirical support to the models constructed by
mathematical economics and to obtain numerical
results (Gerhard Tintner, 1968).
Econometrics renders a positive help in trying
to dispel the poor public image of economics
as a subject in which empty boxes are opened by
assuming the existence of can-openers to reveal
contents which any ten economists will interpret
in 11 ways.
Examples
Consumption expenditure = F (Income, Wealth
etc)
Hourly earnings = F (Education, Labour
productivity etc)
Quantity Demanded = F (Price, Income of
consumer, Prices of relative commodities etc)
Election Outcome = F (Economic performance,
Populism etc)
Debt/Equity = F (Corporate tax rate, Capital
gains tax rate, Inflation rate etc)
Sales = F (Price, Advertising expenses etc)
Crop yield = F (Temperature, Rainfall, Fertilizer
use etc.)
Why Study Econometrics?
(1)To provide empirical support to theories:
Theories make statements or hypotheses that are
mostly qualitative in nature.
They do not provide numerical measure of the
proposed ideas.
Example: Law of Demand in Economics
In a given situation how to quantify this law?
Econometrics comes to the rescue.
(2)To provide empirical support to mathematical
models:
Many theories are expressed in mathematical
form/models (Common in Economics, Finance)
These theoretical models are developed without
regard to their measurability
Econometrics helps to put these models into
empirical testing (e.g. Tax competition models).
(3) For academicians:
Research in economics, finance, management,
marketing etc is becoming increasingly
quantitative [Show examples].
Hence, knowledge about econometrics is very
important to conduct empirical research.
(4) For students:
For students a command over econometrics will
be of great use in their employment.
Example: forecasting of sales, consumer
behaviour (KPOs) corporate research.
Steps in Econometric Analysis
1. Identify research issue
2. Select Variables
3. Check Data Availability
4. Specify Econometric Model
5. Data Collection
6. Model Estimation
7. Hypothesis Testing
8.Using Estimated Model for
Forecasting/Prediction
Step 1: Identify research issue
You should have a research problem to probe
(How to identify a problem?)
Example from Labour Economics: Effect of
economic conditions on peoples willingness to
work (or LFPR).
Two hypothesises/arguments are postulated.
Argument 1 (Discouraged worker hypothesis): As
economic conditions worsen, unemployed may
drop out of labour force (falling LFPR) as they
loose hope of finding a job.
Argument 2 (Added worker hypothesis): As
economic conditions worsen, unemployed may
join labour force (rising LFPR) if the main bread
winner in the family loses job.
In this context, our research objective could be
to test relative strength/validity of these two
contrasting claims.
Step 2: Select Variables
For running a regression, we have to select a
DEPENDENT variable and an INDEPENDENT
variable(s).
A model with one dependent variable and one
independent variable is called simple or two
variable regression model (TWRM).
A model with one dependent variable and more
than one independent variable is called multiple
regression model (MRM).
Example:
LFPR = f (Unemployment Rate) - TWRM
LFPR = f (Unemployment Rate; Hourly
Earnings; Family Wealth) MRM
Generally, MRM is preferred because it
enhances credibility of research findings.
How many independent variables do we include?
Strictly, guiding principle should be underlying
theory or prior literature or nature of problem.
Even then, it may not always possible to include
every possible variable
Why?
Non-availability of data
Inherent randomness in human behaviour, which cannot
be captured easily (Ex: Being unemployed due to
attitude problem).
Inability to properly quantify certain variables (Ex.
Interest groups)
Due to ignorance we may miss relevant variables
Anyway, ultimate purpose of econometrics is not to
capture complete reality
How to rectify this problem?
Add a term called error term which could take
care of influence of all omitted variables.
In other words, error term captures all those
forces that affect the dependent variable but are
not explicitly included in the model.
Error term will also be useful when we use proxy
independent variables (e.g. interest groups)
If the error term is small it implies that the
combined influence of omitted variables is
small/negligible.
Step 3: Check Data Availability
Before proceeding further, check data availability
for relevant variables.
Three types of data are generally available for
empirical analysis Cross-sectional, Time-
series, Pooled Data/Panel Data.
In cross-section data, values of one or more
variables are collected for several sample units,
or entities at the same point in time (e.g. GDP
figure of countries for a given year).
Cross-Sectional data
States Tax Revenue in 2000-01 (in crores)
Andhra Pradesh 10551.92
Bihar 2934.75
Haryana 4311.48
Karnataka 9042.67
Kerala 5870.26
Maharashtra 19724.28
Orissa 2184.03
Punjab 4895.22
Rajasthan 5299.97
In time series data we observe the values of one
or more variables over a period of time daily
stock prices or annual GDP figures
In panel data, same cross-sectional unit (say a
firm or state or country) is surveyed over time
Thus, in panel data we have elements of both
time series and cross-sectional data.
Time Series Data
Year Tax Revenue of Andhra Pradesh
1990-91 2647.25
1991-92 3054.96
1992-93 3388.72
1993-94 3832.93
1994-95 4227.43
1995-96 4120.44
1996-97 4881.83
1997-98 7113.55
1998-99 7961.4
1999-00 9008.6
2000-01 10551.92
Panel Data: Tax Revenues of States
Year AP KAR KER TN
1990-91 2647.25 2332.12 1340.35 3124.05
1991-92 3054.96 2900.2 1673.93 3734.11
1992-93 3388.72 3097.81 1886.97 4162.06
1993-94 3832.93 3812.34 2344.87 4801.37
1994-95 4227.43 4289.31 2799.1 5833.76
1995-96 4120.44 5273.93 3382.68 7151.2
1996-97 4881.83 5767.84 3898.5 7983.45
1997-98 7113.55 6411.87 4501.05 8682.64
1998-99 7961.4 6943.1 4649.5 9625.3
1999-00 9008.6 7744.37 5193.51 10918.93
2000-01 10551.92 9042.67 5870.26 12282.25
Step 4: Specify Econometric Model
LFPR = B
1
+ B
2
UR +u
LFPR Labour force participation rate
UR Unemployment rate (a proxy for economic
condition. Other option could be GDP)
B
1
- Intercept term. Gives value of LFPR (dependent
variable) when UR (value of independent variable) is
zero
B
2
- Slope term. Measures the rate of change in LFPR
for a unit change in UR. Together B
1
and B
2
are
known as the parameters of the regression model
u Error term [LFPR (B
1
+ B
2
UR)]
This is a linear regression model - LFPR is linearly
related to UR
Our objective is to explain the behaviour of dependent
variable in relation to the explanatory variable.
Step 5: Data Collection
For empirical estimation, we need to collect data
on the variables used in the econometric model.
Data can be obtained either from primary sources
[HR] or from secondary sources [Finance,
Economics]
Step 6: Model Estimation
By estimation we mean estimating the parameters
(B
1
& B
2
) of the chosen model.
Estimation is carried out using the technique of
regression analysis.
What is regression analysis?
Regression analysis is concerned with the
study of the dependence of one variable, the
dependent variable, on one or more other
variables, the explanatory variables, with a view
to estimating and/or predicting the (population)
mean or average value of the former in terms of
the known or fixed (in repeated sampling)
values of the latter
Assume that after estimation we get the following
result:
LFPR = 50.63 - 0.45UR
B
2
= 0.45. Implies that if UR goes up by 1%
point, ceteris paribus, LFPR is expected to
decrease on the average by about 0.45% points
Discouraged worker hypothesis finds support.
B
1
= 50.63. Implies that average value of LFRP
will be about 50.63% if the UR were zero (i.e.
full employment).
Sometimes/often, intercept term has no
particular economic meaning.
But, in the present example it has meaning
(How?)
We say on the average because the presence
of error term is likely to make the relationship
somewhat imprecise.
Step 7: Hypothesis Testing
Objective here is to test certain hypotheses
suggested by theory and/or prior empirical
experience on parameters of the model.
Specifically, we are interested to verify how
close the estimated parameter is to a pre-
supposed value of that parameter
(e.g.
) 45 . 0
; 1 :
1 1 0
= ( B B H
Hypothesis testing helps to verify whether the
results obtained through regression analysis
conform to the underlying theory
Step 8: Using Estimated Model for
Forecasting/Prediction
LFPR = 50.6333 - 0.4486UR
If we want to predict LFPR in some future
periods for a given value of UR (say 5) we can
obtain it (How?)
Substitute UR value (5) in the above equation
When data on LFPR (for a given UR) for
future period is out, we can compare the
predicted value with the actual value.
The discrepancy between the two is called the
prediction error.
Regression Vs. Correlation
Regression Correlation
Estimate/predict average value of
one variable on the basis of the
fixed values of other variables
Measure the strength or degree of
linear association between two
variables
The dependent variable is treated
as stochastic or random
The explanatory variables are
assumed to have fixed values
Both the variables are treated as
random
Our Immediate Task
Identification of research issue, selection of
variables, checking of data availability,
specification of econometric model and data
collection are not big tasks
Hence, our focus would be on
Model Estimation
Hypothesis Testing
Criticisms
By and large, events can be explained without
econometric analysis
Data mining Results are created!
Problem with intercept term
Two-Variable Regression Model
Meaning of the term Linear
Term linear can be interpreted in 2 ways:
Linearity in the variables and linearity in the
parameters.
Linear regression always means a regression
that is linear in the parameters. It may or may not
be linear in the explanatory variables
Example:
Y = B
1
+ B
2
X +u (or)
Y = B
1
+ B
2
X
2
+u
Linear Regression Models
Model linear in
parameters?
Model linear in variables?
Yes No
Yes LRM LRM
No NLRM NLRM
Population Regression Function (PRF)
Let us consider an example of the law of demand.
The demand schedule for commodity x
Price
(X)
Quantity Demanded (Y) No. of
Consumers
Average
demand
1 45,46,47,48,49,50,51 7 48
2 44,45,46,47,48 5 46
3 40,42,44,46,48 5 44
4 35,38,42,44,46,47 6 42
5 36,39,40,42,43 5 40
6 32,35,37,38,39,42,43 7 38
7 32,34,36,38,40 5 36
8 31,32,33,34,35,36,37 7 34
9 28,30,32,34,36 5 32
10 29,30,31 3 30
Total 55
Figure 1: Population Regression Line
0
10
20
30
40
50
60
0 1 2 3 4 5 6 7 8 9 10 11 12
Price
Q
u
a
n
t
i
t
y
D
e
m
a
n
d
e
d
The impression we get from scattergram is that
demand (Y) decreases as price (X) increases, and
vice versa.
The downward slopping line is called Population
Regression Line (PRL).
It is nothing but the locus of conditional means of
the dependent variable for the fixed values of the
explanatory variable(s)
Thus, PRL gives the average value of the
dependent variable corresponding to each value
of the independent variable.
The point on the PRL represents expected or
population mean value of Y corresponding to
the various Xs.
The adjective population comes as our example
deals with entire population of 55 consumers.
PRL can be expressed in following functional
form:
E(Y/X
i
) = B
1
+B
2
X
i
(1)
where i is ith subpopulation
Eq (1) gives average value of Y corresponding to
each value of X and is called Population
Regression Function (PRF) or non-stochastic
PRF.
In regression analysis our interest is in
estimating the PRFs (i.e. B
1
and B
2
) on the basis
of observations on Y and X.
Stochastic Specification of the PRF
How to explain the demands of the individual
consumer in relation to price?
The best we can do is to say that any
individuals demand is equal to the average for
that group plus or minus some quantity.
D
e
m
a
n
d
O
2
7
X
36
u
32
Price
u
46
Y
48
We can express the deviation of an individual Y
i
around its expected value as follows:
Y
i
= E(Y/X
i
) + u
i
(2)
In Eq. (2), u
i
is an unobservable random variable
called stochastic error term - taking positive or
negative values.
Now, by substituting Eq. (1) in (2), we get
Y
i
= B
1
+B
2
X
i
+u
i
(3)
Eq. (3) is called stochastic PRF (SPRF), whereas
Eq. (1) is called non-stochastic PRF (NPRF).
NPRF represents means of various Y values
corresponding to specified prices.
SPRF tells us how individual demands vary around
their mean values due to presence of stochastic error
term, u.
How to interpret SPRF (Eq.3)?
We can say that demand of an individual consumer
(say i) corresponding to a specific price can be
expressed as sum of following 2 components:
(i) Systematic/deterministic component:
B
1
+B
2
X
i
(Nothing but average quantity demanded
by all the consumers at a given price level X
i
)
(ii) Nonsystematic/random component: u
i
(Determined by factors other than price) [See
Figure]
D
e
m
a
n
d
O
2
7
X
36
u
32
Price
u
46
Y
48
Sample Regression Function (SRF)
If we have data on whole population (like in
following Table) arriving at PRF is an
easy/straightforward exercise
That is, find conditional means of Y
corresponding to each X and then join these
means
Unfortunately, in practice, we rarely have entire
population at our disposal.
The demand schedule for commodity x
Price
(X)
Quantity Demanded (Y) No. of
Consumers
Average
demand
1 45,46,47,48,49,50,51 7 48
2 44,45,46,47,48 5 46
3 40,42,44,46,48 5 44
4 35,38,42,44,46,47 6 42
5 36,39,40,42,43 5 40
6 32,35,37,38,39,42,43 7 38
7 32,34,36,38,40 5 36
8 31,32,33,34,35,36,37 7 34
9 28,30,32,34,36 5 32
10 29,30,31 3 30
Total 55
We only have a sample from the population.
The following is an example from our case
Sample 1
Y (Demand) X (Price)
49 1
45 2
44 3
39 4
38 5
37 6
34 7
33 8
30 9
29 10
Hence, our task is to estimate the PRF on the basis of
sample information
[OR]
Task is to estimate average quantity demanded in the
population as a whole corresponding to each X (price)
from sample data such as above
But, we may not be able to estimate the PRF accurately
because of sampling error
To see this clearly,
suppose another random
sample (Sample 2) is
drawn from the population
of above Table.
If we plot the data of these
two samples, and entire
population we may obtain
corresponding SRLs and
PRL as follows
Y (Demand) X (Price)
51 1
47 2
46 3
42 4
40 5
37 6
36 7
35 8
32 9
30 10
D
e
m
a
n
d
Y
X
SRL
1
SRL
2
PRL
Price
Now the question is: which of the two SRLs
represents the true PRL?
If we avoid temptation of looking at above
figure, which represents the PRL, there is no way
we can be sure that either of the SRLs represents
the true PRL
In general, we get K different SRLs for K
different samples and all these SRLs are not
likely to be the same
Now, analogous to PRF that underlies the PRL,
we can develop SRF to represent SRL as
follows:
= estimator of E(Y/X
i
)
b
1
= estimator of B
1
b
2
= estimator of B
2
Where is read as Y-hat or Y-cap
Y
) 4 (
2 1 i i
X b b Y + =
i
Y
e Y Y =
( )
2
2 1
i i
X b b Y
2
i
e
2
i
e
4 1 2.929 1.071 1.147 4 0 0
5 4 7.000 -2.000 4.000 7 -2 4
7 5 8.357 -1.357 1.841 8 -1 1
12 6 9.714 2.286 5.226 9 3 9
Sum:28 16 0 12.214 0 14
An Example:
i
Y i
X
i
Y
1
i
e
1
i e 1
2
i
Y
2
i
e
2
i e 2
2
Notes:
Now which sets of estimated b (parameter)
values should we choose?
Since b values of 1
st
experiment gives us lower
than that obtained from b values of 2
nd
experiment,
we say bs of first experiment are best values.
( ) 357 . 1 ; 572 . 1 357 . 1 572 . 1
2 1 1
= = + = b b X Y
i i
) 1 ; 3 ( 1 3
2 1 2
= = + = b b X Y
i i
)
1 1 i i i
Y Y e =
( )
i i i
Y Y e
2 2
=
2
i
e
But how do we ascertain this?
We still can choose many more values for bs
that gives us the least possible value of
However, in doing so we must be sure that we
have considered all the conceivable values of b
1
and b
2
If we have infinite time and patience we can do
this exercise
2
i
e
But, fortunately, OLS method chooses b
1
and b
2
in such a manner that, for a given set of data,
is as small as possible.
[OR]
For a given sample, OLS method provides us
with unique estimates of b
1
and b
2
that give the
smallest possible values of
2
i
e
2
i
e
How do we accomplish this?
This is a straight-forward exercise in differential calculus.
Values of b
1
and b
2
that minimize are obtained by
solving the following two simultaneous equations:
These simultaneous equations are known as least squares
normal equations.
2
i
e
+ =
i i
X b nb Y
2 1
+ =
2
2 1 i i i i
X b X b X Y
In above equations n is sample size
Unknowns are bs. Knowns are quantities
involving sums, sum of squares, and sums of
cross products of the variables Y and X
The knowns can be obtained from sample at
hand
Solving these equations simultaneously, we
obtain following solutions for b
1
and b
2
:
Where
- Mean of Y variable
- Mean of X variable
- Deviation from sample mean values
- OLS estimators
X b Y b
2 1
=
( )( )
( )
=
2
2
X X
Y Y X X
b
i
i i
Y
X
Y Y X X
i i
,
2 1
, b b
Example:
YX
Y X
49 1 -4.5 11.2 20.25 -50.4
45 2 -3.5 7.2 12.25 -25.2
44 3 -2.5 6.2 6.25 -15.5
39 4 -1.5 1.2 2.25 -1.8
38 5 -0.5 0.2 0.25 -0.1
37 6 0.5 -0.8 0.25 -0.4
34 7 1.5 -3.8 2.25 -5.7
33 8 2.5 -4.8 6.25 -12
30 9 3.5 -7.8 12.25 -27.3
29 10 4.5 -8.8 20.25 -39.6
Mean = 37.8 Mean = 5.5
82.5 -178
X X
i
Y Y
i
2
) ( X X
i
( )( ) Y Y X X
i i
Using the above formulas we obtain estimates
of b
1
and b
2
as follows
b
2
= -2.1576
b
1
= 49.667
The results can be obtained using regression
packages without much effort (Demonstration)
Properties of OLS estimators
OLS estimators b
1
and b
2
satisfy the BLUE
Best Linear Unbiased Estimator property
Linearity:
Estimators are a linear function of the dependent
and independent variables
A linear estimator is much easier to deal with
than a nonlinear estimator
Unbiasedness:
Average or expected value of the estimators is
equal to true/population value
Minimum variance/Efficiency:
It has minimum variance in the class of all such
linear unbiased estimators
Smaller the variance of b
1
or b
2
, the closer they
will be to true B
1
or B
2
Implication: Regression coefficient estimated by
OLS on average coincides with population/true
value
Assumptions Underlying OLS
Assumption 1:
The regression model is linear in parameters (Bs)
Assumption 2:
Explanatory variables (Xs) are fixed in repeated
sampling
Implication: Changes in Y is conditional on the
given values of X
Example:
The demand schedule for commodity x
Price (X) Quantity Demanded (Y)
1 45,46,47,48,49,50,51
2 44,45,46,47,48
3 40,42,44,46,48
4 35,38,42,44,46,47
5 36,39,40,42,43
6 32,35,37,38,39,42,43
7 32,34,36,38,40
8 31,32,33,34,35,36,37
9 28,30,32,34,36
10 29,30,31
Assumption 3:
Mean value of disturbance u
i
is zero
Because positive u
i
values cancel out negative u
i
values (see figure)
Implication: factors not explicitly included in the
model (i.e. u
i
) dont systematically affect mean
of Y (or) u
i
s average effect on Y is zero
X
X
1
X
2
X
3
PRF
+ u
i
- u
i
Mean
.
.
.
.
.
.
Assumption 4:
The variance of u
i
is same for all observations
(Xs) [Homoscedasticity or equal (homo) spread
(scedasticity)]
Implication: Variation around regression line of
individual Y values remains same regardless of
values taken by Xs; it neither increases or
decreases as X varies
Hence all Y values corresponding to the various
Xs are equally important.
Violation of this assumption (i.e. increase in variation
around regression line of Y values as X increases) is
called heteroscedasticity
Example for homoscedasticity: Richer families on
average consume more than poorer families, but there
is no/not much variability in consumption pattern
between richer and poorer families
Example for heteroscedasticity: There is greater
variability in consumption pattern of richer families
compared to poorer ones becoz. as income grows
people have more consumption choice.
Assumption 5:
No autocorrelation between error terms
(Implications for time series data)
Example for autocorrelation:
Y
t
= B
1
+B
2
X
t
+u
t
, where u
t
and u
t-1
are positively
correlated. Here, Y
t
depends not only on X
t
, but
also on u
t-1
for u
t-1
to some extent determined u
t
By invoking no autocorrelation assumption, we
consider only the effect on X
t
on Y
t
and not
worry about other influences that might act on Y
as a result of possible inter-correlations among
us
Assumption 6:
No correlation between u and explanatory
variables (Xs)
Implication: If X and u are correlated, it is not
possible to assess their individual effects on Y
If X and u are positively correlated, X increases
when u increases and it decreases when u
decreases.
Assumption 7:
Number of observations n must be greater than the
number of parameters to be estimated or explanatory
variables.
Assumption 8:
X values in given sample must not all be the same
(Applies to Y as well)
If so, X
i
= and hence denominator of estimator b
2
will be zero.
This makes it impossible to estimate b
2
and b
1
X
Assumption 9:
The regression model is correctly specified
Omission of important variables or inclusion of wrong
variables undermines the validity of regression exercise
Theory should be the guiding principle in building
econometric model
If theory is not clear, we have to use some judgment in
choosing the model and interpreting the results (e.g. tax
competition)
But, data mining should be avoided.
Assumption 10:
There are no perfect linear relationships among Xs
[No multicollinearity]
Important with respect to multiple regression models
Implication: In the presence of multicollinearity, we
cannot assess the separate influence of Xs on Y.
All these assumptions pertain to PRF only and not
SRF. This means that SRF may not always duplicate
all these assumptions (Example: presence of
autocorrelation and multicollinearity problems).
Coefficient of Determination (r
2
)
This is a measure of goodness of fit of the (sample)
regression line to a given set of data
[OR]
It is a summary measure that tells how well SRF fits
given data
r
2
measures % of total variation in Y explained by
regression model or X(s).
A perfect fit of regression line is rarely the case
Generally, there will be some positive and
negative errors
Our goal is to minimize the errors as far as
possible
See Figure: If all the observations were to lie on
the regression line, we would obtain a perfect
fit.
Y
X
X
1
X
2
X
3
X
4
SRF
Y
i
1
u
2
u
3
u
4
u
i
1 2 i
Y X =| +|
The concept of coefficient of determination can
be explained using the following diagram
Y
O
X
i
X
Y
i
Y
Total= (Yi - )
Y
SRF
e
i
= Due to residual
( - ) = Due to
regression/Explained by X
(Why?)
i
Y
Y
i
Y
- Mean of the sample data
- Predicted value of Y for a given X (Point on
SRF)
- An individual sample observation
Numerical proof (Consider following
example)
( ) ( ) ( )
i i i i
Y Y Y Y Y Y
+ =
i
Y
Y
i
Y
Y X
49 1
45 2
44 3
39 4
38 5
37 6
34 7
33 8
30 9
29 10
= 37.8
Y
For the above data b
2
= -2.1576; b
1
= 49.667
Hence = 49.667 2.1576X
i
For X
i
= 1, = 47.509
Now applying for Y
i
= 49 we
get
49-37.8 = (47.509-37.8) + (49-47.509)
11.2 = 11.2, i.e. LHS = RHS
i
Y
i
Y
( ) ( ) ( )
i i i i
Y Y Y Y Y Y
+ =
By squaring the above identity and summing
them we get,
- Called TSS (Total variation in Y)
- Called ESS (Variation due to X)
- Called RSS (Variation due to error)
( ) ( ) ( )
2 2
2
i i i i
Y Y Y Y Y Y + =
2
) ( Y Y
i
( )
2
Y Y
i
( )
2
2
i i i
e or Y Y
Thus, TSS = ESS + RSS
If all actual Ys lie on fitted SRF, RSS=0 and hence
ESS=TSS (Polar cases)
If X explains no variation in Y, ESS=0 and hence
RSS=TSS (Polar cases)
If ESS is relatively larger than RSS, then the chosen
SRF fits the data well
If RSS is relatively larger than ESS, then the chosen
SRF fits the data poorly
Now, r
2
is defined as
r
2
=
r
2
=
This is nothing but portion of variation in Y (TSS)
explained by X (ESS)
TSS
ESS
( )
( )
2
2
Y Y
Y Y
i
i
Properties of r
2
It is a nonnegative quantity (Why?)
Its limits are 0s r
2
s1
An r
2
of 1 means a perfect fit, that is, for
each i.
An r
2
of zero means that there is no relationship
between Y and X
i i
Y Y =
=
2
2
2
) var(
i
x
b
o
=
2
2
) (
i
x
b se
o
2
2
2
1
) var( o
=
i
i
x n
X
b
o
=
2
2
1
) (
i
i
x n
X
b se
2
o
2
2
2
=
n
u
i
o
Here the variance of b
2
is inversely proportional to
That is, given , larger the variation in X values, the
smaller the variance of b
2
and hence greater the
precision with which b
2
can be estimated.
In short, if there is substantial variation in Xs (recall
Assumption 8), b
2
can be measured more accurately
than when Xs do not vary substantially.
2
i
x
2
o
Hence, what is a big SE of regression
coefficients and what is a small SE depends on
the context (i.e. variation in Xs)
A more standardized statistic, which also gives a
measure of the goodness of fit of estimated
equation is R
2
SEs of regression coefficients can be used for
hypothesis testing and constructing confidence
intervals (discussed later)
Standard Error of Regression/Residuals
SE of regression is the standard deviation
(Positive square root of variance) of individual Y
values about the estimated regression line or
error term
If SE of residuals is high, then deviation will also
be high and hence fitness will be poor
SE of residuals can be obtained using the
following formula
2
2 2
2
2
=
n
x b y
i i
o
Multiple Regression Analysis
Meaning
In single/two variable model there is only one
explanatory variable
In practice, most problems cant be explained by
this model
Example: Apart from prices, demand is a function
of many other factors
Hence, we use multiple regression models which
contain more than one Xs
How the model looks like?
PRF for cross sectional data
PRF for time-series data
Any individual Y value can be expressed as
sum of 2 components
Deterministic component [E (Y
i
)]
Random component [ ]
i i i i
U X B X B B Y + + + =
3 3 2 2 1
t t t t
U X B X B B Y + + + =
3 3 2 2 1
i i
X B X B B
3 3 2 2 1
+ +
i
U
Assumptions
All the assumptions of two-variable
model are applicable in the case of
multiple regression as well
Interpreting Multiple Regression
Eq.
It gives conditional mean value of Y
conditional upon the given/fixed values of Xs
Symbolically
Thus, what we obtain is the mean value of Y for
the given values of Xs
( )
i i i i i
X B X B B X X Y E
3 3 2 2 1 3 2
, + + =
Partial Regression Coefficients
(PRCs)
In multiple regression B
2
&B
3
are called PRCs
B
2
measures change in the mean value of Y per
unit change in X
2
, holding value of X
3
constant
[OR]
B
2
gives direct or net effect of a unit change
in X
2
on E(Y), net of any effect that X
3
may have
on mean Y
Similar explanation is applicable for B
2
as well
To estimate parameters of model we use OLS
method
Let SRF corresponding to PRF described above
as:
Where b
1
, b
2
& b
3
are estimators of unknown
population coefficients B
1
, B
2
& B
3
respectively
e
i
is sample counterpart of residual term U
i i i i
e X b X b b Y + + + =
3 3 2 2 1
Estimating PRCs
OLS principle chooses values of unknown
parameters in such a way that the RSS is as
small as possible
Symbolically,
Minimization of this involves differentiation
with respect to unknowns, setting resulting
expressions to zero, and solving them
simultaneously
( )
2
i
e
( )
=
2
3 3 2 2 1
2
min
i i i i
X b X b b Y e
This procedure generates following formulas for
arriving at numerical values of OLS estimators
b
1
, b
2
& b
3
3 3 2 2 1
X b X b Y b =
( )( ) ( )( )
( )( ) ( )
2
3 2
2
3
2
2
3 2 3
3
2
2
2
=
i i i i
i i i i
i
i i
x x x x
x x x y x x y
b
( )( ) ( )( )
( )( ) ( )
2
3 2
2
3
2
2
3 2 2
2
2
3
3
=
i i i i
i i i i
i
i i
x x x x
x x x y x x y
b
In these formulas, lowercase letters denote
deviations from sample mean values
Properties of OLS estimators
The BLUE property continues to hold here as
well
Multiple coefficient of determination
(R
2
)
Explains proportion of variation in Y explained
by Xs jointly
Conceptually, R
2
is akin to r
2
As in two-variable case, R
2
is defined as
R
2
=
=
R
2
lies between 0 and 1
TSS
ESS
+
2
3 3 2
2
i
i i i i
y
x y b x y b
If R
2
=1, fitted regression line explains
100% of variation in Y.
If R
2
=0, model does not explain any of the
variation in Y
The fit of regression model is said to be
better, closer is R
2
to 1
By and large, as the number of Xs increases
R
2
value increases (Why? See below)
R
2
and Adjusted R
2
One aspect of R
2
is: As the no. of Xs increases,
R
2
almost invariably increases
Why? From elsewhere, we know
) R (
2
TSS
RSS
TSS
ESS
+ = 1
TSS
RSS
R + =
2
1
TSS
RSS
R =1
2
=
2
2
1
i
i
y
e
Here depends on no. of Xs, but not
denominator
Hence, as Xs increase is likely to decrease
(or at least it will not increase). Hence R
2
increases
2
i
e
2
i
e
Increasing R
2
Is it desirable to increase R
2
by adding more Xs?
We should adopt a cautious approach here
Why?
(i) With larger Xs, R
2
gives an overly optimistic
picture of regression fit
(ii) R
2
does not take into account d.f.
(iii) We need to have a measure of goodness of fit that
is adjusted for no. of Xs added in the model
Such a measure is known as [adjusted R
2
]
which is defined as
Here k is no. of parameters in the model
including intercept term
Term adjusted means adjusted for d.f.
associated with sums of squares entering into
above identity
2
R
) 1 (
) (
1
2
2
2
n y
k n e
R
i
i
Features of Adjusted R
2
If k>1, s R
2
; i.e. as Xs increase becomes
increasingly less than R
2
or increases less
than unadjusted R
2
This means that, a penalty is involved in adding
more Xs in to a regression model
can be negative, but not R
2
(Why?)
2
R
2
R
2
R
2
R
What we do in practice?
In practice, mainly R
2
is used to measure
goodness of fit
is used in deciding inclusion of a new
variable
If inclusion of a new variable increases , it is
retained in the model
When does increases? If value of the
coefficient of the added variable is larger than 1
2
R
2
R
2
R
t
Why maximizing opposed?
Our objective is not to obtain a high
The researcher should be more concerned about
logical or theoretical relevance of the Xs to Y
and their statistical significance
If this process produces a high it is well and
good
2
R
2
R
2
R
Hypothesis Testing
The procedure is same as in two-variable case
We can adopt both Confidence interval
approach and Test of significance approach
Testing under CIA
We construct a confidence interval and see
whether hypothesized value of population
parameters (B
1
/ B
2
/ B
3
) lies inside this interval.
If it lies inside, we do not reject H
0
If it lies outside, we can reject H
0
The remaining procedure is same as in the case
of two-variable model
The test (t) statistic we use for this purpose is
(For testing B
1
)
(For testing B
2
)
(For testing B
3
)
) (
1
1 1
b se
B b
t
=
) (
2
2 2
b se
B b
t
=
) (
3
3 3
b se
B b
t
=
Testing under ToSA
Step 1: Set H
0
and H
1
separately for each partial
regression coefficient
Examples: H
0
:B
2
=0 and H
1
:B
2
=0
H
0
:B
3
=0 and H
1
:B
3
=0
Step 2: Compute a test (t) statistic from sample
data (See above)
Step 3: Choose level of significance (o) (or)
probability of committing Type 1 error
(0.01 or 0.05 or 0.10)
Step 4: Find probability of obtaining computed test
(t) statistic for certain d.f.
Note: d.f. is (n-k), where n - no. of
observations, k- no. of Xs including
intercept term
Step 5: If this probability is less than the
prechosen o reject H
0
. Otherwise, accept H
0
(OR)
After Step 3 Use following rules to accept or
reject H
0
Null Hypothesis
(H
0
)
Alternative
Hypothesis (H
1
)
Critical region: Reject H
0
if
B
x
=
0
B
x
>
0
[One tailed] Calculated test statistic (t)> t
o, d.f.
(i.e. Table t value at o level of
significance and certain degrees of
freedom)
B
x
= 0 B
x
< 0 [One tailed] Calculated test statistic (t)< -t
o, d.f.
(i.e. Table t value at o level of
significance and certain degrees of
freedom)
B
x
= 0 B
x
= 0 [Two tailed] Calculated absolute value of test
statistic ( ) > t
o/2, d.f.
(i.e. Table t
value at o/2 level of significance
and certain degrees of freedom)
t
A summary of the t test
ANOVA or F test Relevance
This is a complementary way of hypothesis
testing
Commonly used to test joint H
0
in multiple
regression models
But, can be used in two variable regression model
as well
What is a joint H
0
?
H
0
: B
2
= B
3
= 0
Means that B
2
and B
3
are jointly equal to zero
(or) Xs have no influence on Y
A test of joint H
0
is called a test of the overall
significance of estimated regression line
How to construct F test statistic?
From R
2
discussion, we know that
TSS = ESS + RSS (Or)
The d.f. associated with components of this
identity is
TSS = n-1 because we lose 1 d.f. in computing
sample mean
ESS = 2 (k-1) because ESS is a function of B
2
and B
3
(where k is no Xs)
2
3 3 2 2
2
+ + =
i i i i i i
e x y b x y b y
Y
RSS = n-3 (n-k) because in computing RSS we
need to estimate B
1
B
2
and B
3
In case of two-variable model the corresponding
d.f. are:
TSS = n-1
ESS = 1
RSS = n-2
In general, in a regression model with k
explanatory variables (incl. intercept), the d.f. are
as follows
TSS = n-1 (always)
ESS = k-1
RSS = n-k
Source of variation Sum of squares (SS) d.f. Mean sum of Squares
(MSS) = SS/d.f.
Due to regression
(ESS)
2
Due to residuals
(RSS)
n-3
n-3
Total (TSS) n-1
+
i i i i
x y b x y b
3 3 2 2
2
i
e
2
i
y
2 i 2i 3 i 3i
b y x b y x / 2 +
2
i
e /
) 3 (
2
2
3 3 2 2
n e
x y b x y b
i
i i i i
Using F-ratio for hypothesis testing
Set H
0
: B
2
= B
3
= 0 & H
1
: Not all Bs are
simultaneously zero
Calculate F ratio using formula
We reject H
0
, if F value computed from formula
exceeds critical/table F value at o level of
significance and given d.f. in numerator and
denominator
Otherwise, we do not reject H0:
Alternatively, if the p value of computed F ratio
is sufficiently low, we reject H
0
Intuitive Reasoning
In F =
Numerator explains variance of Y explained by
Xs
Denominator explains variance of Y not
explained by Xs
If numerator > denominator, F>1
Increasingly large F is an evidence against H
0
. .
. .
f d RSS
f d ESS
Relationship between F and R
2
The null B
2
= B
3
= 0 is same as saying that H
0
:
R
2
=0 (why?)
Thus, F test is also a test of significance of R
2
(i.e. whether R
2
is different from zero)
The relationship between F ratio and R
2
is as
follows
Here, when R
2
=0, F=0
The larger R
2
is, the greater the F value will be
One advantage of this formula is the ease of
computation of F value. All we need to know is
R
2
value
) /( ) 1 (
) 1 (
2
2
k n R
k R
F
=
Testing significance of R
2
using F test
Substitute R
2
value in and compute F
ratio
We reject H
0
: R
2
=0, if F value computed from formula
exceeds the critical/table F value at o level of
significance and given d.f. in numerator and
denominator
Otherwise, we do not reject the null
) /( ) 1 (
) 1 (
2
2
k n R
k R
F
=
Usefulness of this statistic
In cross-sectional data involving several
observations, one generally obtains low R
2
This is due to diversity of the cross-sectional
units
Here, the statistical significance of R
2
value can
be verified using
) /( ) 1 (
) 1 (
2
2
k n R
k R
F
=