Professional Documents
Culture Documents
1 1 2 2
...
k k
y x x x o | | | c = + + + + +
Where
- y isthedependent(response)variable.
-
1 2
{ , ,..., }
k
x x x aretheindependent(explanatory)variables.
- o istherealintercept(constant).
-
j
| isthecoefficient(loading)ofthejthindependentvariable.
- { } c isasetofindependent,identical,normallydistributederrors(residuals).
2
~. . ~ (0, ) i i d N c o
Inpractice,thetrueunderlyingmodelisunknown.However,withfinitesampledataandanOLSorother
procedure,wecanestimatethevaluesofthecoefficients(akaloadings)forthedifferentinput
(explanatory)variables.
LetsassumewehaveasampledatasetwithNobservations,i.e.
1, 2, ,
( , ,..., , )
i i k i i
x x x y .UsinganOLS
method,wearriveatthefollowingregressionmodel:
1 1 2 2
...
k k
y x x x u o | | | = + + + + +
MLRForecastErrorTutorial 2 SpiderFinancialCorp,2014
Where
-
j
| istheOLSestimateforthejthcoefficient(loading).
- o istheOLSestimateoftheintercept.
- { } u istheregressionresiduals.Theresidualsarehomoscedastic(i.e.stablevariance)and
uncorrelatedwithanyoftheinputvariables.
2 2
1
[ ] 0
[ ]
[ ] 0
i
i k
E u
E u s
E u x
s s
=
=
=
Forecast
Inpractice,thetrueregressionmodelishiddenorunknown.Wewillreverttotheestimatedregression
modeltoperformaforecast.
Mathematically,theconditionalforecastcanbeexpressedasfollows:
1 2 1 1 2 2
[ | , ,.., ] ...
k k k
y E Y x x x x x x o | | | = = + + + +
Asaresult,theerrorsintheforecastoriginatefromtwodistinctsources:
1. Residuals( { } c or{ } u )
2. Errorsintheestimatedcoefficientsvalues(i.e.using
j
| insteadof
j
| )
UsinganOLSprocedure,theestimatedvaluesofone
j
| arenormallydistributed.Nevertheless,the
errorsinthevaluesofthewholesetofparameters
1
{ }
j
j k
|
s s
arecorrelated.So,wecanignorethe
covariancetermswhenweexaminethestatisticalsignificanceofonecoefficient,butwewillneedto
factorintheiroverall/aggregateeffectfortheforecasterror.
Asaresult,theforecastvariance(akaerrorsquared)canbeexpressedasfollows:
2
,
1 2
1, 2, ,
2
,
1 1
( )
1
[ | , ,..., ] 1
( )
k
j m j
j
m m k m N k
j i j
i j
x x
Var y y x x x
N
x x
o
=
= =
| |
|
|
= + +
|
|
\ .
MLRForecastErrorTutorial 3 SpiderFinancialCorp,2014
However,thevarianceofresiduals(
2
o )inthetruemodelisunknown,soweusethevarianceofthe
errorterms(
2
o )oftheestimatedregressionmodel:
2
2 2 2 1
1 1 2 2
[u ] E[(y ... ) ]
1 1
N
i
i
k k
u
SSE
E x x x
N K N k
o o | | |
=
= = = =
Overall,theMLRforecasterrorsquaredisexpressedasfollows:
2
,
1
1, 2, ,
2
,
1 1
( )
1
[ | , ,..., ] 1
1
( )
k
j m j
j
m m k m N k
j i j
i j
x x
SSE
Var y y x x x
N k N
x x
=
= =
| |
|
|
= + +
|
|
\ .
Now,letstakeacloselookattheformulaaboveandtrytoexplainthedifferentterms:
1.
2
o istheestimatedvarianceoftrueregressionmodelresiduals.Thisvalueisconstantand
independentfromtheXvalue(s)ofthetargetdatapoint.
2.
2
N
o
istheerrorintheestimatedintercept(akaconstant).Thisvalueisconstantand
independentfromtheXvaluesofthetargetdatapoint.
3. Thelasttermisproportionaltothesquared(Euclidean)distanceofthetargetdatapointfrom
thecenterofthesampledataset.Thistermiszeroatthesampledatacenterpoint
1, 2, ,
( , ,..., )
i i k i
x x x .
Ineffect,theforecastvarianceishigherfordatapoints
1, 2, ,
( , ,..., )
i i k i
x x x thatarefurtherfromthe
centeroftheinputsampledataset(i.e.
1, 2, ,
( , ,..., )
i i k i
x x x ).
Asaresult,theforecasterrorissmallestatthesampledatacenterpoint
1, 2, ,
( , ,..., )
i i k i
x x x .