You are on page 1of 7

LINEAR REGRESSION ANALYSIS

A STATISTICAL RELATIONSHIP BETWEEN


TWO VARIABLES
Prepared By
Siddharth Abbi
Post Graduate Diploma in Management
Guru Nanak Institute of Management
Supervised By
Dr. Shipra Jain
Professor of Quantitative e!hni"ues
Guru Nanak Institute of Management
1
Abstract
#egression anal$sis has been a hand$ tool for anal$sts for a long period for the investigation of
relationship bet%een variables. &suall$' the anal$st seeks to as!ertain the !asual effe!t of one
variable upon another ( for e)ample' the effe!t of a pri!e !hange upon demand' or the effe!t of
!hanges in the mone$ suppl$ upon the inflation rate. his arti!le aims to e)plore the linear
regression model that establishes a relationship bet%een t%o variables and list do%n its
limitations in order to present both sides of the !oin.
Introduction
he earliest form of regression %as the method of least s"uares published b$ *egendre in +,-.
and b$ Gauss in +,-/ to determine the astronomi!al observations and orbits of bodies around the
Sun. he term 0regression1 %as !oined b$ 2ran!is Galton in the nineteenth !entur$. 2or Galton'
regression had a biologi!al meaning as he gave the phenomenon that the heights of des!endants
of tall an!estors tend to regress do%n to%ards a normal average. 3is %ork %as later e)tended b$
&dn$ 4ule and 5arl Pearson %ho assumed Gaussian as a 6oint distribution of response and
e)planator$ variable. his assumption %as %eakened b$ #.A.2isher in his %orks in +/77 and
+/7. %here he assumed Gaussian to be the !onditional distribution of the response variable. In
+/.-s and +/8-s' ele!trome!hani!al desk !al!ulators %ere used to !al!ulate regressions and it
sometimes took up to 79 hours to get the result from one regression. :ut %hat does a regression
anal$sis e)a!tl$ sho%s;
#egression anal$sis is a statisti!al pro!ess for estimating the relationship bet%een a dependent
variable and one or more independent variables. It helps in understanding ho% the value of the
dependent variable !hanges %hen one of the independent variable is varied' keeping the other
independent variables !onstant. he estimation target is a fun!tion of the independent variables
!alled the regression fun!tion.
A linear regression is the one %hi!h sho%s a relationship bet%een t%o variables. It is !onstru!ted
b$ fitting a line through a s!atter plot of paired observations bet%een them. he diagram belo%
illustrates an e)ample of a linear regression line dra%n through a series of <=' 4> observations?
2
In a linear regression' independent and dependent variables are plotted on the = a)is and 4 a)is
respe!tivel$. he !hoi!e of these variables depends on the anal$st. #egression anal$sis is mostl$
used to anal$@e investment returns' %here the market inde) is independent %hereas the finan!ial
asset is the dependent variable. In short' regression anal$sis formulates a h$pothesis that the
movement in one variable <4> depends on the movement in the other <=>.
he regression e"uation is given b$?
Y = a b! "
Ahere' 4 B dependent variable
= B independent variable
a B inter!ept of regression line
b B slope
C B error term
he slope 0b1 indi!ates the unit !hange in 4 for ever$ unit !hange in =. 2or e)ample' if b B -.D9'
it means that %hen = !hanges b$ +--' 4 %ill !hange b$ D9. he inter!ept 0a1 indi!ated the value
of 4 at the point %here = B -. he error term 0C1 indi!ates ho% %ell a linear regression model is
%orking.
3
Assu#$tions
here are a number of assumptions behind the %orking of the linear regression model. hese are
stated belo%?
i. he dependent variable is linearl$ related to the !oeffi!ients of the model and the model
is !orre!tl$ spe!ified.
ii. he independent variable<s> isEare un!orrelated %ith the e"uation error term.
iii. he mean of the error term is @ero.
iv. he error term has a !onstant varian!e. It is also kno%n as the 01homoskedasti!it$
assumption11. Ahen the regression model is heteroskedasti!' the model ma$ not be useful
in predi!ting values of the dependent variable.
v. he error terms are un!orrelated %ith ea!h other. No auto!orrelation or serial
!orrelation.
vi. here is no perfe!t multi!ollinearit$. No independent variable has a perfe!t linear
relationship %ith an$ of the other independent variables.
vii. he error term is normall$ distributed. It allo%s h$pothesis ( testing methods to be
applied to linear regression models.
Standard Error o% Esti#at&
As stated above' Standard Frror of Fstimate <SFF> measures the %orking of the regression
model. It !ompares the a!tual values of 4 to the predi!ted values. *et us take a regression
e"uation 4 B +.7 G -..= to stud$ the !al!ulation of SFF for a period of five $ears.
he a!tual and predi!ted values are given belo%?
Year Halue of = Predi!ted value of 4 A!tual value of 4
1 9.7 I.I I.-
2 I.. 7./. +.8.
3 -., +.8 +.,
4 D.9 9./ ...
5 -.I +.I. +.+

4
No%' let us find the residual and s"uared residual value?
Year Halue of =
Predi!ted
Halue of 4
<4
+
>
A!tual value
of 4
<4
7
>
#esidual
Halue
<4
+ (
4
7
>
S"uared
#esidual
Halue
1 9.7 I.I I.- -.I -.-/
2 I.. 7./. +.8. +.I +.8/
3 -., +.8 +., J-.7 -.-9
4 D.9 9./ ... J-.8 -.I8
5 -.I +.I. +.+ -.7. -.-87.
o find the standard error' %e take the sum of all the s"uared residual values and divide b$ <n J
7>' and the take its s"uare root. In the above e)ample' the sum of s"uared residual value is -.-/ G
+.8/ G -.-9 G -.I8 G -.-87. B 7.797.. No%' dividing it b$ I <. J 7>' %e get SFF B <7.797.EI>
+E7
B
-.,8 K.
Co'&%%ici&nt o% (&t&r#ination
It tells us the !hanges in the dependent variable 4 that are e)plained b$ !hanges in the
independent variable =. It is therefore also kno%n as e)plained variation. he !orrelation !oJ
effi!ient is denoted b$ 0r1' %hereas the !oJeffi!ient of determination is denoted b$ 0#
7
or #J
s"uared1. 2or e)ample' if r B -..8' then #
7
B <-..8>
7
B -.I+9 or I+.9 K. It implies that I+.9 K of
the !hange in 4 is resulted from =' %hereas the remaining + ( I+.9 K B 8,.8 K of the !hange in
4 is une)plained' i.e.' due to fa!tors other than =.
Li#itations
Some of the limitations of the linear regression model are stated belo%?
i. here is a tenden!$ for relationships bet%een variables to !hange over time due to
!hanges in the e!onom$ and it results in parameter instabilit$.
ii. In an effi!ient market' publi! dissemination of the relationship !an limit the effe!tiveness
of that relationship in future periods.
iii. he assumptions stated earlier are often proved unrealisti! in the real %orld.
5
Conc)usion
here has been an immense development in the field of statisti!s. It used to take around 79 hours
to get a result from the regression model' but no% it is 6ust a matter of some se!onds to get the
result %ith the help of advan!ed te!hni"ues and development in the I s$stems and soft%ares
su!h as the I:ML SPSSL #egression soft%are. 5eeping in mind the various limitations of the
linear regression model' models need to be developed in the near future to !ounter the parameter
instabilit$ and publi! dissemination of the relationship. he assumptions also need to be more
realisti! in nature in order to provide better and a!!urate results. Aith the gro%ing !omple)it$ in
the %orking of the e!onom$ and markets' anal$sts have also moved from linear regression
models to multiple regression models' general linear models' heteros!edasti! models'
hierar!hi!al linear models and so on.
6
R&%&r&nc&s
1. 5handel%al' S.5.' Business Statistics' International :ook 3ouse Pvt. *td.' 0*inear #egression
Anal$sis1' 7-+-' pp 79I.
2. *inear #egression' Assumptions' http?EEkenfarr.g!su.eduEassumptionsK7-ofK7-the
K7-!lassi!alK7-model.htm. *ast a!!essed on November 77' 7-+I.
3. I:M Soft%are' Business Analytics' http?EE%%%J-I.ibm.!omEsoft%areEprodu!tsEenEspssJ
regression. *ast a!!essed on November 77' 7-+I.
4. Aikipedia' Linear Regression Analysis' http?EEen.%ikipedia.orgE%ikiE*inearMregression.
*ast a!!essed on November 7I' 7-+I.
5. Aikipedia' Regression Analysis' http?EEen.%ikipedia.orgE%ikiE#egressionManal$sis. *ast
a!!essed on November 7I' 7-+I.
6. e6asNiimb' IIMB Management Review' http?EEte6as.iimb.a!.inEarti!lesE. *ast a!!essed on
November 79' 7-+I.
7

You might also like