Professional Documents
Culture Documents
1x
Prof U Dinesh Kumar, IIMB
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
Interesting Hypotheses
What is Regression?
Regression is a tool for finding existence of an association
relationship between a dependent variable (Y) and one or
more independent variables (X1, X2, , Xn) in a study.
The relationship can be linear or non-linear.
Y = 0 + 1 X
Statistical relationship is not an exact relationship.
Y = 0 + 1 X +
Nomenclature in Regression
A dependent variable (response variable) measures an outcome
of a study (also called outcome variable).
An independent variable (explanatory variable) explains
changes in a response variable.
Regression often set values of explanatory variable to see how it
affects response variable (predict response variable).
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
Regression Nomenclature
Dependent Variable
Independent Variable
Explained Variable
Explanatory variable
Regressand
Regressor
Predictand
Predictor
Endogenous Variable
Exogenous Variable
Controlled Variable
Control Variable
Target Variable
Stimulus Variable
Response Variable
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
Where is it used?
Types of Regression
One independent
variable
Regression
Models
Simple
Regression
Linear
Non-linear
Multiple
Regression
Linear
Non-linear
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
Types of Regression
Y 0 1 X 1
Multiple linear regression
Y 0 1 X 1 2 X 2 ... k X k
Nonlinear regression
Y 0
1 2 X 1
X 2 3
All Rights Reserved, Indian Institute of Management Bangalore
Y 0 1x1 2 x2 ... k xk
2
Y 0 1x1 2 x2 3 x1x2 4 x2 ... k xk
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
Estimate Regression
Parameters
NO
Model Satisfies
Diagnostic Test
YES
STOP
All Rights Reserved, Indian Institute of Management Bangalore
Functional Form
Population
Slope
Random
Error
Y i 0 1 X i i
Dependent (Response)
Variable(e.g., Treatment Cost)
Independent (Explanatory)
Variable(e.g., Body weight)
All Rights Reserved, Indian Institute of Management Bangalore
Model Assumptions
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
X, the variance of
i is constant
Estimation of Parameters
Population
Random Sample
Unknown
Relationship
Yi 0 1X i i
$
$
$
$
$
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
Yi 0 1X i i
i = Random error
Observed value
E Y X 0 1 X i
All RightsKumar,
Reserved,IIM
Indian
Institute of Management Bangalore
UDinesh
Bangalore
2
2
2
2
2
LS minimizes i 1 2 3 .... n
i 1
Y2 0 1X 2 2
^4
^2
^1
^3
Yi 0 1X i
X
All Rights Reserved, Indian Institute of Management Bangalore
SSE yi 0 j xij
i 1
i 1
j 1
2
i
SSE
2 yi 0 j xij 0
0
i 1
j 1
and
n
k
SSE
2 yi 0 j xij xij 0 j
j
i 1
j 1
Coefficient Equations
Prediction Equation:
yi 0 1xi
Sample Slope:
xi yi n x y
xi x yi y i
2
2
xi x
xi n( x)
i
Sample Y-intercept:
0 y 1x
E 0
Coefficients Interpretation
y = 0 + 1x + ,
Fitting a model
Minimize SSE
Is x really related to y?
Is 1 statistically significant?
Using a model
All Rights Reserved, Indian Institute of Management Bangalore
Observed value of Y
Predicted Value of Y with the
model (with the knowledge of
explanatory variables)
Model Validation
Variation in Y
Y
Yii
00 11X
Xi i
Variation in Yi
Systemic
Variation
Random
Variation
or
Variation in Yi
Explained
Variation
Unexplained
Variation
Variation in Y
Yi Y
Total variation
Yi Y
Explained variation
Yi Yi
Unexplained variation
2
2
2
(
Y
Y
)
(
Y
Y
)
(
Y
Y
)
i
i
i
i
i 1
i 1
SST
i 1
SSR
SSE
SST = (Yi Y ) 2
How much error is there in predicting Y without the knowledge of X?
SUM OF SQUARES ERROR (SSE):
SSE = (Yi Yi )2
How much error is there in predicting Y with the knowledge of X?
Coefficient of determination
SSR
SSE
R
1
SST
SST
2
Spurious Regression
The data shows the number of Facebook users (in millions) and the number of people
who died of Helium poisoning in UK between 2004 and 2012
Year
2004
2005
2006
12
2007
58
2008
145
11
2009
360
21
2010
608
31
2011
845
40
2012
1056
51
i i
Se
n2
Se
SS x
SS x ( X i X ) 2
i