Prepared By
Siddharth Abbi
Post Graduate Diploma in Management
Guru Nanak Institute of Management
Supervised By
Dr. Shipra Jain
Professor of Quantitative e!hni"ues
Guru Nanak Institute of Management
#egression anal$sis has been a hand$ tool for anal$sts for a long period for the investigation of
relationship bet%een variables. &suall$' the anal$st seeks to as!ertain the !asual effe!t of one
variable upon another ( for e)ample' the effe!t of a pri!e !hange upon demand' or the effe!t of
!hanges in the mone$ suppl$ upon the inflation rate. his arti!le aims to e)plore the linear
regression model that establishes a relationship bet%een t%o variables and list do%n its
limitations in order to present both sides of the !oin.
he earliest form of regression %as the method of least s"uares published b$ *egendre in +,-.
and b$ Gauss in +,-/ to determine the astronomi!al observations and orbits of bodies around the
Sun. he term 0regression1 %as !oined b$ 2ran!is Galton in the nineteenth !entur$. 2or Galton'
regression had a biologi!al meaning as he gave the phenomenon that the heights of des!endants
of tall an!estors tend to regress do%n to%ards a normal average. 3is %ork %as later e)tended b$
&dn$ 4ule and 5arl Pearson %ho assumed Gaussian as a 6oint distribution of response and
e)planator$ variable. his assumption %as %eakened b$ #.A.2isher in his %orks in +/77 and
+/7. %here he assumed Gaussian to be the !onditional distribution of the response variable. In
+/.-s and +/8-s' ele!trome!hani!al desk !al!ulators %ere used to !al!ulate regressions and it
sometimes took up to 79 hours to get the result from one regression. :ut %hat does a regression
anal$sis e)a!tl$ sho%s;
#egression anal$sis is a statisti!al pro!ess for estimating the relationship bet%een a dependent
variable and one or more independent variables. It helps in understanding ho% the value of the
dependent variable !hanges %hen one of the independent variable is varied' keeping the other
independent variables !onstant. he estimation target is a fun!tion of the independent variables
!alled the regression fun!tion.
A linear regression is the one %hi!h sho%s a relationship bet%een t%o variables. It is !onstru!ted
b$ fitting a line through a s!atter plot of paired observations bet%een them. he diagram belo%
illustrates an e)ample of a linear regression line dra%n through a series of <=' 4> observations?
In a linear regression' independent and dependent variables are plotted on the = a)is and 4 a)is
respe!tivel$. he !hoi!e of these variables depends on the anal$st. #egression anal$sis is mostl$
used to anal$@e investment returns' %here the market inde) is independent %hereas the finan!ial
asset is the dependent variable. In short' regression anal$sis formulates a h$pothesis that the
movement in one variable <4> depends on the movement in the other <=>.
he regression e"uation is given b$?
Y = a b! "
Ahere' 4 B dependent variable
= B independent variable
a B inter!ept of regression line
b B slope
C B error term
he slope 0b1 indi!ates the unit !hange in 4 for ever$ unit !hange in =. 2or e)ample' if b B -.D9'
it means that %hen = !hanges b$ +--' 4 %ill !hange b$ D9. he inter!ept 0a1 indi!ated the value
of 4 at the point %here = B -. he error term 0C1 indi!ates ho% %ell a linear regression model is
here are a number of assumptions behind the %orking of the linear regression model. hese are
stated belo%?
i. he dependent variable is linearl$ related to the !oeffi!ients of the model and the model
is !orre!tl$ spe!ified.
ii. he independent variable<s> isEare un!orrelated %ith the e"uation error term.
iii. he mean of the error term is @ero.
iv. he error term has a !onstant varian!e. It is also kno%n as the 01homoskedasti!it$
assumption11. Ahen the regression model is heteroskedasti!' the model ma$ not be useful
in predi!ting values of the dependent variable.
v. he error terms are un!orrelated %ith ea!h other. No auto!orrelation or serial
vi. here is no perfe!t multi!ollinearit$. No independent variable has a perfe!t linear
relationship %ith an$ of the other independent variables.
vii. he error term is normall$ distributed. It allo%s h$pothesis ( testing methods to be
applied to linear regression models.
Standard Error o% Esti#at&
As stated above' Standard Frror of Fstimate <SFF> measures the %orking of the regression
model. It !ompares the a!tual values of 4 to the predi!ted values. *et us take a regression
e"uation 4 B +.7 G -..= to stud$ the !al!ulation of SFF for a period of five $ears.
he a!tual and predi!ted values are given belo%?
Year Halue of = Predi!ted value of 4 A!tual value of 4
1 9.7 I.I I.-
2 I.. 7./. +.8.
3 -., +.8 +.,
4 D.9 9./ ...
5 -.I +.I. +.+

No%' let us find the residual and s"uared residual value?
Year Halue of =
Halue of 4
A!tual value
of 4
+ (
1 9.7 I.I I.- -.I -.-/
2 I.. 7./. +.8. +.I +.8/
3 -., +.8 +., J-.7 -.-9
4 D.9 9./ ... J-.8 -.I8
5 -.I +.I. +.+ -.7. -.-87.
o find the standard error' %e take the sum of all the s"uared residual values and divide b$ <n J
7>' and the take its s"uare root. In the above e)ample' the sum of s"uared residual value is -.-/ G
+.8/ G -.-9 G -.I8 G -.-87. B 7.797.. No%' dividing it b$ I <. J 7>' %e get SFF B <7.797.EI>
-.,8 K.
Co'&%%ici&nt o% (&t&r#ination
It tells us the !hanges in the dependent variable 4 that are e)plained b$ !hanges in the
independent variable =. It is therefore also kno%n as e)plained variation. he !orrelation !oJ
effi!ient is denoted b$ 0r1' %hereas the !oJeffi!ient of determination is denoted b$ 0#
or #J
s"uared1. 2or e)ample' if r B -..8' then #
B <-..8>
B -.I+9 or I+.9 K. It implies that I+.9 K of
the !hange in 4 is resulted from =' %hereas the remaining + ( I+.9 K B 8,.8 K of the !hange in
4 is une)plained' i.e.' due to fa!tors other than =.
Some of the limitations of the linear regression model are stated belo%?
i. here is a tenden!$ for relationships bet%een variables to !hange over time due to
!hanges in the e!onom$ and it results in parameter instabilit$.
ii. In an effi!ient market' publi! dissemination of the relationship !an limit the effe!tiveness
of that relationship in future periods.
iii. he assumptions stated earlier are often proved unrealisti! in the real %orld.
here has been an immense development in the field of statisti!s. It used to take around 79 hours
to get a result from the regression model' but no% it is 6ust a matter of some se!onds to get the
result %ith the help of advan!ed te!hni"ues and development in the I s$stems and soft%ares
su!h as the I:ML SPSSL #egression soft%are. 5eeping in mind the various limitations of the
linear regression model' models need to be developed in the near future to !ounter the parameter
instabilit$ and publi! dissemination of the relationship. he assumptions also need to be more
realisti! in nature in order to provide better and a!!urate results. Aith the gro%ing !omple)it$ in
the %orking of the e!onom$ and markets' anal$sts have also moved from linear regression
models to multiple regression models' general linear models' heteros!edasti! models'
hierar!hi!al linear models and so on.
