You are on page 1of 8

signup

CrossValidatedisaquestionandanswersiteforpeopleinterestedinstatistics,machinelearning,dataanalysis,
datamining,anddatavisualization.It's100%free,noregistrationrequired.

login

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

21/7/2015

tour

help

Takethe2minutetour

Whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?
Inthedefinitionofstandarddeviation,whydowehavetosquarethedifferencefromthemeantogetthemean(E)andtakethesquare
rootbackattheend?Can'twejustsimplytaketheabsolutevalueofthedifferenceinsteadandgettheexpectedvalue(mean)of
those,andwouldn'tthatalsoshowthevariationofthedata?Thenumberisgoingtobedifferentfromsquaremethod(theabsolute
valuemethodwillbesmaller),butitshouldstillshowthespreadofdata.Anybodyknowwhywetakethissquareapproachasa
standard?
Thedefinitionofstandarddeviation:

= E [(X ) ] .

Can'twejusttaketheabsolutevalueinsteadandstillbeagoodmeasurement?
= E [|X |]

standarddeviation

definition

editedJul28'11at16:42
mbq
15.4k

askedJul19'10at21:04
c4il

44

93

1,019

10

11 Inaway,themeasurementyouproposediswidelyusedincaseoferror(modelquality)analysisthenit
iscalledMAE,"meanabsoluteerror".mbq Jul19'10at21:30
2

Inacceptinganansweritseemsimportanttomethatwepayattentiontowhethertheansweriscircular.
Thenormaldistributionisbasedonthesemeasurementsofvariancefromsquarederrorterms,butthat
isn'tinandofitselfajustificationforusing(XM)^2over|XM|.rpierceJul20'10at7:59

DoyouthinkthetermstandardmeansthisisTHEstandardtoday?Isn'titlikeaskingwhyprincipal
componentare"principal"andnotsecondary?robingirardJul23'10at21:44

12 Everyanswerofferedsofariscircular.Theyfocusoneaseofmathematicalcalculations(whichisnicebut
bynomeansfundamental)oronpropertiesoftheGaussian(Normal)distributionandOLS.Around1800
GaussstartedwithleastsquaresandvarianceandfromthosederivedtheNormaldistributionthere'sthe
circularity.Atrulyfundamentalreasonthathasnotbeeninvokedinanyansweryetistheuniquerole
playedbythevarianceintheCentralLimitTheorem.Anotheristheimportanceindecisiontheoryof
minimizingquadraticloss.whuber Sep13'13at15:28
1

+1@whuber:Thanksforpointingthisout,whichwasbotheringmeaswell.Now,though,havetogoand
readupontheCentralLimitTheorem!Ohwell.)SabuncuFeb11'14at21:55

20Answers

Ifthegoalofthestandarddeviationistosummarisethespreadofasymmetricaldataset(i.e.
ingeneralhowfareachdatumisfromthemean),thenweneedagoodmethodofdefining
howtomeasurethatspread.
Thebenefitsofsquaringinclude:
Squaringalwaysgivesapositivevalue,sothesumwillnotbezero.
Squaringemphasizeslargerdifferencesafeaturethatturnsouttobebothgoodandbad
(thinkoftheeffectoutliershave).
Squaringhoweverdoeshaveaproblemasameasureofspreadandthatisthattheunitsare
allsquared,whereaswe'dmightpreferthespreadtobeinthesameunitsastheoriginaldata
(thinkofsquaredpoundsorsquareddollarsorsquaredapples).Hencethesquarerootallows
ustoreturntotheoriginalunits.
Isupposeyoucouldsaythatabsolutedifferenceassignsequalweighttothespreadofdata
whereassquaringemphasisestheextremes.Technicallythough,asothershavepointedout,
squaringmakesthealgebramucheasiertoworkwithandofferspropertiesthattheabsolute
methoddoesnot(forexample,thevarianceisequaltotheexpectedvalueofthesquareofthe
distributionminusthesquareofthemeanofthedistribution)
It'simportanttonotehoweverthatthere'snoreasonyoucouldn'ttaketheabsolute
differenceifthatisyourpreferenceonhowyouwishtoview'spread'(sortofhowsomepeople
see5%assomemagicalthreshholdforpvalues,wheninfactit'ssituationdependent).
Indeed,thereareinfactseveralcompetingmethodsformeasuringspread.
MyviewistousethesquaredvaluesbecauseIliketothinkofhowitrelatestothe

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

1/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

PythagoreanTheoremofStatistics:c=sqrt(a^2+b^2)...thisalsohelpsmerememberthat
whenworkingwithindependentrandomvariables,variancesadd,standarddeviationsdon't.
Butthat'sjustmypersonalsubjectivepreference.
Anmuchmoreindepthanalysiscanbereadhere.
editedJul20'10at14:56

answeredJul19'10at22:31
TonyBreyal
1,647

10

12

31 "Squaringalwaysgivesapositivevalue,sothesumwillnotbezero."andsodoesabsolutevalues.
robingirardJul22'10at9:54
15 @robingirard:Thatiscorrect,hencewhyIprecededthatpointwith"Thebenefitsofsquaringinclude".I
wasn'timplyingthatanythingaboutabsolutevaluesinthatstatement.Itakeyourpointthough,I'llconsider
removing/rephrasingitifothersfeelitisunclear.TonyBreyalJul22'10at13:19
8

Muchofthefieldofrobuststatisticsisanattempttodealwiththeexcessivesensitivitytooutliersthatthat
isaconsequenceofchoosingthevarianceasameasureofdataspread(technicallyscaleordispersion).
en.wikipedia.org/wiki/Robust_statisticsThylacoleoAug13'10at5:15

ThankyouforthelinktothatanalysisJackAidleyJan23'13at14:03

Thearticlelinkedtointheanswerisagodsend.traggatmotMar19at7:27

Thesquareddifferencehasnicermathematicalpropertiesit'scontinuouslydifferentiable(nice
whenyouwanttominimizeit),it'sasufficientstatisticfortheGaussiandistribution,andit's(a
versionof)theL2normwhichcomesinhandyforprovingconvergenceandsoon.
Themeanabsolutedeviation(theabsolutevaluenotationyousuggest)isalsousedasa
measureofdispersion,butit'snotas"wellbehaved"asthesquarederror.
answeredJul19'10at21:14
Rich
2,154

10

15

said"it'scontinuouslydifferentiable(nicewhenyouwanttominimizeit)"doyoumeanthattheabsolute
valueisdifficulttooptimize?robingirardJul23'10at21:40

16 @robin:whiletheabsolutevaluefunctioniscontinuouseverywhere,itsfirstderivativeisnot(atx=0).This
makesanalyticaloptimizationmoredifficult.VinceJul23'10at23:59
1

Yeah,findingquantilesingeneral(whichincludesoptimizingabsolutevalues)tendstochurnuplinear
programmingtypeproblems,whichwhilethey'recertainlytractablenumericallycangetfiddly.They
typicallydon'thaveananalyticalclosedformsolution,andareabitslowerandabitmoredifficultto
implementthanleastsquaretypesolutions.RichJul24'10at2:55

Idonotagreewiththis.First,theoretically,theproblemmaybeofdifferentnature(becauseofthe
discontinuity)butnotnecessarilyharder(forexamplethemedianiseaselyshowntobearginf_mE[|Ym|]).
Second,practically,usingaL1norm(absolutevalue)ratherthanaL2normmakesitpiecewiselinearand
henceatleastnotmoredifficult.Quantileregressionanditsmultiplevarianteisanexampleofthat.
robingirardJul24'10at6:01

11 Yes,butfindingtheactualnumberyouwant,ratherthanjustadescriptorofit,iseasierundersquared
errorloss.Considerthe1dimensioncaseyoucanexpresstheminimizerofthesquarederrorbythe
mean:O(n)operationsandclosedform.Youcanexpressthevalueoftheabsoluteerrorminimizerbythe
median,butthere'snotaclosedformsolutionthattellsyouwhatthemedianvalueisitrequiresasortto
find,whichissomethinglikeO(nlogn).Leastsquaressolutionstendtobeasimpleplugandchugtype
operation,absolutevaluesolutionsusuallyrequiremoreworktofind.RichJul24'10at9:10

Onewayyoucanthinkofthisisthatstandarddeviationissimilartoa"distancefromthe
mean".
Comparethistodistancesineuclideanspacethisgivesyouthetruedistance,wherewhat
yousuggested(which,btw,istheabsolutedeviation)ismorelikeamanhattandistance
calculation.
answeredJul19'10at21:14
ReedCopsey
731

11 Niceanalogyofeuclideanspace! c4il Jul19'10at21:38

Yeah.Greatanalogy.DanielRodriguezOct31'11at4:10

Exceptthatinonedimensionthel1 andl2 normarethesamething,aren'tthey?naught101Mar29'12


at5:20

@naught101:It'snotonedimension,butrathern dimensionswheren isthenumberofsamples.The


standarddeviationandtheabsolutedeviationare(scaled)l2 andl1 distancesrespectively,betweenthe
twopoints(x1 , x2 , , xn ) and(, , , ) where isthemean.ShreevatsaRNov16'12at7:21

Thisshouldbemodifiedasminimumdistancefromthemean.It'sessentiallyaPythagoreanequation.

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

2/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

JohnNov21'14at16:40

Thereasonthatwecalculatestandarddeviationinsteadofabsoluteerroristhatweare
assumingerrortobenormallydistributed.It'sapartofthemodel.
Supposeyouweremeasuringverysmalllengthswitharuler,thenstandarddeviationisabad
metricforerrorbecauseyouknowyouwillneveraccidentallymeasureanegativelength.A
bettermetricwouldbeonetohelpfitaGammadistributiontoyourmeasurements:
E(log(x)) log(E(x))

Likethestandarddeviation,thisisalsononnegativeanddifferentiable,butitisabettererror
statisticforthisproblem.
editedMay14'14at11:47

answeredAug10'10at22:34
NeilG
4,203

11

31

1 Ilikeyouranswer.Thesdisnotalwaysthebeststatistic.RockScienceNov25'10at3:03
1 Greatcounterexampleastowhenthestandarddeviationisnotthebestwaytothinkoffluctuationsizes.
HbarMay13'14at2:49

Squaringthedifferencefromthemeanhasacoupleofreasons.
Varianceisdefinedasthe2ndmomentofthedeviation(theR.Vhereis(x ))andthus
thesquareasmomentsaresimplytheexpectationsofhigherpowersoftherandom
variable.
Havingasquareasopposedtotheabsolutevaluefunctiongivesanicecontinuousand
differentiablefunction(absolutevalueisnotdifferentiableat0)whichmakesitthenatural
choice,especiallyinthecontextofestimationandregressionanalysis.
ThesquaredformulationalsonaturallyfallsoutofparametersoftheNormalDistribution.
answeredJul19'10at21:15
KungPaoChicken
251

Theanswerthatbestsatisfiedmeisthatitfallsoutnaturallyfromthegeneralizationofa
sampletondimensionaleuclideanspace.It'scertainlydebatablewhetherthat'ssomething
thatshouldbedone,butinanycase:
Assumeyourn measurementsXi areeachanaxisinRn .Thenyourdataxi defineapoint
x inthatspace.Nowyoumightnoticethatthedataareallverysimilartoeachother,soyou
canrepresentthemwithasinglelocationparameter thatisconstrainedtolieontheline
^ = x
,andthedistance
definedbyXi = .Projectingyourdatapointontothislinegetsyou

n1
^ 1 totheactualdatapointis
^
fromtheprojectedpoint

^1
= x

Thisapproachalsogetsyouageometricinterpretationforcorrelation,^

~ ~
= cos (x, y)

answeredNov24'10at20:49
sesqu
416

Thisiscorrectandappealing.However,intheenditappearsonlytorephrasethequestionwithoutactually
answeringit:namely,whyshouldweusetheEuclidean(L2)distance?whuber Nov24'10at21:07

Thatisindeedanexcellentquestion,leftunanswered.IusedtofeelstronglythattheuseofL2is
unfounded.Afterhavingstudiedalittlestatistics,Isawtheanalyticniceties,andsincethenhaverevised
myviewpointinto"ifitreallymatters,you'reprobablyindeepwateralready,andifnot,easyisnice".Idon't
knowmeasuretheoryyet,andworrythatanalysisrulestheretoobutI'venoticedsomenewinterestin
combinatorics,soperhapsnewnicetieshavebeen/willbefound.sesquNov24'10at21:39

14 @sesquStandarddeviationsdidnotbecomecommonplaceuntilGaussin1809derivedhiseponymous
deviationusingsquarederror,ratherthanabsoluteerror,asastartingpoint.However,whatpushedthem
overthetop(Ibelieve)wasGalton'sregressiontheory(atwhichyouhint)andtheabilityofANOVAto
decomposesumsofsquareswhichamountstoarestatementofthePythagoreanTheorem,arelationship
enjoyedonlybytheL2norm.ThustheSDbecameanaturalomnibusmeasureofspreadadvocatedin
Fisher's1925"StatisticalMethodsforResearchWorkers"andhereweare,85yearslater.whuber Nov
24'10at21:56
10 (+1)Continuingin@whuber'svein,IwouldbetthathadStudentpublishedapaperin1908entitled,
"ProbableErroroftheMeanHey,Guys,CheckOutThatMAEintheDenominator!"thenstatisticswould
haveanentirelydifferentfacebynow.Ofcourse,hedidn'tpublishapaperlikethat,andofcoursehe
couldn'thave,becausetheMAEdoesn'tboastallthenicepropertiesthatS^2has.Oneofthem(relatedto
Student)isitsindependenceofthemean(inthenormalcase),whichofcourseisarestatementof

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

3/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

orthogonality,whichgetsusrightbacktoL2andtheinnerproduct.G.JayKernsNov25'10at3:38

Yetanotherreason(inadditiontotheexcellentonesabove)comesfromFisherhimself,who
showedthatthestandarddeviationismore"efficient"thantheabsolutedeviation.Here,
efficienthastodowithhowmuchastatisticwillfluctuateinvalueondifferentsamplingsfroma
population.Ifyourpopulationisnormallydistributed,thestandarddeviationofvarioussamples
fromthatpopulationwill,onaverage,tendtogiveyouvaluesthatareprettysimilartoeach
other,whereastheabsolutedeviationwillgiveyounumbersthatspreadoutabitmore.Now,
obviouslythisisinidealcircumstances,butthisreasonconvincedalotofpeople(alongwith
themathbeingcleaner),somostpeopleworkedwithstandarddeviations.
answeredJul27'10at1:51
EricSuh
346

3 Yourargumentdependsonthedatabeingnormallydistributed.Ifweassumethepopulationtohavea
"doubleexponential"distribution,thentheabsolutedeviationismoreefficient(infactitisasufficientstatistic
forthescale)probabilityislogicJul16'11at5:08
3 Yes,asIstated,"ifyourpopulationisnormallydistributed."EricSuhSep8'11at19:49

Justsopeopleknow,thereisaMathOverflowquestiononthesametopic.
Whyisitsocooltosquarenumbersintermsoffindingthestandarddeviation
Thetakeawaymessageisthatusingthesquarerootofthevarianceleadstoeasiermaths.A
similarresponseisgivenbyRichandReedabove.
answeredJul26'10at22:22
RobbyMcKilliam
898

11

Therearemanyreasonsprobablythemainisthatitworkswellasparameterofnormal
distribution.
editedApr27'13at14:09

answeredJul19'10at21:11
mbq
15.4k

44

93

4 Iagree.Standarddeviationistherightwaytomeasuredispersionifyouassumenormaldistribution.Anda
lotofdistributionsandrealdataareanapproximatelynormal.ukaszLewJul20'10at14:40
2 Idon'tthinkyoushouldsay"naturalparameter":thenaturalparametersofthenormaldistributionaremean
andmeantimesprecision.(en.wikipedia.org/wiki/Natural_parameter)NeilGMar12'12at7:40
@NeilGGoodpointIwasthinkingabout"casual"meaninghere.I'llthinkaboutsomebetterword.mbq
Mar12'12at10:41

Ithinkthecontrastbetweenusingabsolutedeviationsandsquareddeviationsbecomes
cleareronceyoumovebeyondasinglevariableandthinkaboutlinearregression.There'sa
nicediscussionathttp://en.wikipedia.org/wiki/Least_absolute_deviations,particularlythe
section"ContrastingLeastSquareswithLeastAbsoluteDeviations",whichlinkstosome
studentexerciseswithaneatsetofappletsat
http://www.math.wpi.edu/Course_Materials/SAS/lablets/7.3/73_choices.html.
Tosummarise,leastabsolutedeviationsismorerobusttooutliersthanordinaryleastsquares,
butitcanbeunstable(smallchangeinevenasingledatumcangivebigchangeinfittedline)
anddoesn'talwayshaveauniquesolutiontherecanbeawholerangeoffittedlines.Also
leastabsolutedeviationsrequiresiterativemethods,whileordinaryleastsquareshasasimple
closedformsolution,thoughthat'snotsuchabigdealnowasitwasinthedaysofGaussand
Legendre,ofcourse.
answeredAug12'10at12:00
onestop
13.3k

30

60

the"uniquesolution"argumentisquiteweak,itreallymeansthereismorethanonevaluewellsupportedby
thedata.Additionally,penalisationofthecoefficients,suchasL2,willresolvetheuniquenessproblem,and
thestabilityproblemtoadegreeaswell.probabilityislogicJul4'14at11:13

Inmanyways,theuseofstandarddeviationtosummarizedispersionisjumpingtoa

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

4/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

conclusion.YoucouldsaythatSDimplicitlyassumesasymmetricdistributionbecauseofits
equaltreatmentofdistancebelowthemeanasofdistanceabovethemean.TheSDis
surprisinglydifficulttointerprettononstatisticians.OnecouldarguethatGini'smean
differencehasbroaderapplicationandissignificantlymoreinterpretable.Itdoesnotrequire
onetodeclaretheirchoiceofameasureofcentraltendencyastheuseofSDdoesforthe
mean.Gini'smeandifferenceistheaverageabsolutedifferencebetweenanytwodifferent
observations.Besidesbeingrobustandeasytointerpretithappenstobe0.98asefficientas
SDifthedistributionwereactuallyGaussian.
answeredMay14'14at12:55
FrankHarrell
27.3k

45

108

2 Justtoaddto@Frank'ssuggestiononGini,there'sanicepaperhere:
projecteuclid.org/download/pdf_1/euclid.ss/1028905831Itgoesovervariousmeasuresofdispersionand
alsogiveaninformativehistoricalperspective.ThomasSpeidelMay14'14at17:06
1 Iliketheseideastoo,butthere'salesswellknownparalleldefinitionofthevariance(andthustheSD)that
makesnoreferencetomeansaslocationparameters.Thevarianceishalfthemeansquareoverallthe
pairwisedifferencesbetweenvalues,justastheGinimeandifferenceisbasedontheabsolutevaluesofall
thepairwisedifference.NickCoxOct21'14at23:46

Becausesquarescanallowuseofmanyothermathematicaloperationsorfunctionsmore
easilythanabsolutevalues.
Example:squarescanbeintegrated,differentiated,canbeusedintrigonometric,logarithmic
andotherfunctions,withease.
answeredJul27'10at0:24
user369
49

1 Iwonderifthereisaselffulfillingprofecyhere.WegetprobabilityislogicMar13'12at12:04

Naturallyyoucandescribedispersionofadistributioninanywaymeaningful(absolute
deviation,quantiles,etc.).
Onenicefactisthatthevarianceisthesecondcentralmoment,andeverydistributionis
uniquelydescribedbyitsmomentsiftheyexist.Anothernicefactisthatthevarianceismuch
moretractablemathematicallythananycomparablemetric.Anotherfactisthatthevarianceis
oneoftwoparametersofthenormaldistributionfortheusualparametrization,andthenormal
distributiononlyhas2nonzerocentralmomentswhicharethosetwoveryparameters.Even
fornonnormaldistributionsitcanbehelpfultothinkinanormalframework.
AsIseeit,thereasonthestandarddeviationexistsassuchisthatinapplicationsthesquare
rootofthevarianceregularlyappears(suchastostandardizearandomvarianble),which
necessitatedanameforit.
answeredJul27'10at4:04
arik

IfIrecallcorrectly,isn'tthelognormaldistributionnotuniquelydefinedbyitsmoments.probabilityislogic
Apr10'14at13:38

Variancesareadditive:forindependentrandomvariablesX1 , , Xn ,
var(X1 + + Xn ) = var(X1 ) + + var(Xn ).

Noticewhatthismakespossible:SayItossafaircoin900times.What'stheprobabilitythat
thenumberofheadsIgetisbetween440and455inclusive?Justfindtheexpectednumberof
heads(450 ),andthevarianceofthenumberofheads(225 = 152 ),thenfindtheprobability
withanormal(orGaussian)distributionwithexpectation450 andstandarddeviation15is
between439.5 and455.5 .AbrahamdeMoivredidthiswithcointossesinthe18thcentury,
therebyfirstshowingthatthebellshapedcurveisworthsomething.
answeredSep18'12at1:41
MichaelHardy
894

14

Aremeanabsolutedeviationsnotadditiveinthesamewayasvariances?rpierceFeb9'13at23:30
2 No,they'renot.MichaelHardyFeb10'13at18:14

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

5/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

Itdependsonwhatyouaretalkingaboutwhenyousay"spreadofthedata".Tomethiscould
meantwothings:
1. Thewidthofasamplingdistribution
2. Theaccuracyofagivenestimate
Forpoint1)thereisnoparticularreasontousethestandarddeviationasameasureofspread,
exceptforwhenyouhaveanormalsamplingdistribution.ThemeasureE(|X |) isamore
appropriatemeasureinthecaseofaLaplaceSamplingdistribution.Myguessisthatthe
standarddeviationgetsusedherebecauseofintuitioncarriedoverfrompoint2).Probably
alsoduetothesuccessofleastsquaresmodellingingeneral,forwhichthestandarddeviation
istheappropriatemeasure.ProbablyalsobecausecalculatingE(X 2 ) isgenerallyeasierthan
calculatingE(|X|) formostdistributions.
Now,forpoint2)thereisaverygoodreasonforusingthevariance/standarddeviationasthe
measureofspread,inoneparticular,butverycommoncase.YoucanseeitintheLaplace
approximationtoaposterior.WithDataDandpriorinformationI ,writetheposteriorfora
parameter as:
exp (h())
p( DI ) =

h() log[p( I )p(D I )]


exp (h(t)) dt

Ihaveusedt asadummyvariabletoindicatethatthedenominatordoesnotdependon .If


theposteriorhasasinglewellroundedmaximum(i.e.nottooclosetoa"boundary"),wecan
taylorexpandthelogprobabilityaboutitsmaximummax .Ifwetakethefirsttwotermsofthe
taylorexpansionweget(usingprimefordifferentiation):
1

h() h(max ) + (max )h (max ) +

(max ) h (max )

Butwehaveherethatbecausemax isa"wellrounded"maximum,h (max )


have:
1
h() h(max ) +

= 0

,sowe

(max ) h (max )

Ifwepluginthisapproximationweget:
exp (h(max ) +
p( DI )
exp (h(max ) +
1

exp (
=
exp (

2
1
2

1
2
1
2

(max ) h (max ))
2

(max t) h (max )) dt
2

(max ) h (max ))
2

(max t) h (max )) dt

Which,butfornotationisanormaldistribution,withmeanequaltoE(
varianceequalto

DI ) max

,and

V ( DI ) [h (max )]

(h (max ) isalwayspositivebecausewehaveawellroundedmaximum).Sothismeans
thatin"regularproblems"(whichismostofthem),thevarianceisthefundamentalquantity
whichdeterminestheaccuracyofestimatesfor .Soforestimatesbasedonalargeamountof
data,thestandarddeviationmakesalotofsensetheoreticallyittellsyoubasicallyeverything
youneedtoknow.Essentiallythesameargumentapplies(withsameconditionsrequired)in
multidimensionalcasewithh ()jk

h()

j k

beingaHessianmatrix.Thediagonalentries

arealsoessentiallyvariancesheretoo.
Thefrequentistusingthemethodofmaximumlikelihoodwillcometoessentiallythesame
conclusionbecausetheMLEtendstobeaweightedcombinationofthedata,andforlarge
samplestheCentralLimitTheoremappliesandyoubasicallygetthesameresultifwetake
p( I ) = 1 butwith andmax interchanged:

p(max ) N (, [h (max )]

(seeifyoucanguesswhichparadigmIprefer:P).Soeitherway,inparameterestimationthe
standarddeviationisanimportanttheoreticalmeasureofspread.
editedJul4'14at14:29

answeredJul16'11at14:37

MichaelHardy

probabilityislogic

894

13.2k

14

40

55

Estimatingthestandarddeviationofadistributionrequirestochooseadistance.
Anyofthefollowingdistancecanbeused:
1/n

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

6/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated
n

1/n

dn ((X)i=1,,I , ) = ( |X | )

Weusuallyusethenaturaleuclideandistance(n = 2),whichistheoneeverybodyusesin
dailylife.Thedistancethatyouproposeistheonewithn = 1.
Botharegoodcandidatesbuttheyaredifferent.
Onecoulddecidetousen

= 3

aswell.

Iamnotsurethatyouwilllikemyanswer,mypointcontrarytoothersisnottodemonstrate
thatn = 2isbetter.Ithinkthatifyouwanttoestimatethestandarddeviationofadistribution,
youcanabsolutelyuseadifferentdistance.
editedJul31'14at17:00

answeredNov25'10at3:01

MichaelHardy

RockScience

894

1,004

14

Didyoumeann

= 1

insteadofthe(undefined)n

= 0

13

33

?whuber Jan5'11at3:25

Yesindeed,thxRockScienceJan5'11at3:40

Myguessisthis:Mostpopulations(distributions)tendtocongregatearoundthemean.The
fartheravalueisfromthemean,therareritis.Inordertoadequatelyexpresshow"outofline"
avalueis,itisnecessarytotakeintoaccountbothitsdistancefromthemeanandits(normally
speaking)rarenessofoccurrence.Squaringthedifferencefromthemeandoesthis,as
comparedtovalueswhichhavesmallerdeviations.Onceallthevariancesareaveraged,then
itisOKtotakethesquareroot,whichreturnstheunitstotheiroriginaldimensions.
answeredSep13'13at2:24
SamuelBerry
21

2 Thisdoesn'texplainwhyyoucouldn'tjusttaketheabsolutevalueofthedifference.Thatseems
conceptuallysimplertomoststats101students,&itwould"takeintoaccountbothitsdistancefromthe
meanandits(normallyspeaking)rarenessofoccurrence".gungSep13'13at2:35
Ithinktheabsolutevalueofthedifferencewouldonlyexpressthedifferencefromthemeanandwouldnot
takeintoaccountthefactthatlargedifferencesaredoublydisruptivetoanormaldistribution.
SamuelBerrySep13'13at2:44
1 Whyis"doublydisruptive"importantandnot,say,"triplydisruptive"or"quadruplydisruptive"?Itlookslike
thisanswermerelyreplacestheoriginalquestionwithanequivalentquestion.whuber Sep13'13at
15:19

"Whysquarethedifference"insteadof"takingabsolutevalue"?Toanswerveryexactly,there
isliteraturethatgivesthereasonsitwasadoptedandthecaseforwhymostofthosereasons
donothold."Can'twesimplytaketheabsolutevalue...?".Iamawareofliteratureinwhichthe
answerisyesitisbeingdoneanddoingsoisarguedtobeadvantageous.
AuthorGorardstates,first,usingsquareswaspreviouslyadoptedforreasonsofsimplicityof
calculationbutthatthoseoriginalreasonsnolongerhold.Gorardstates,second,thatOLSwas
adoptedbecauseFisherfoundthatresultsinsamplesofanalysesthatusedOLShadsmaller
deviationsthanthosethatusedabsolutedifferences(roughlystated).Thus,itwouldseemthat
OLSmayhavebenefitsinsomeidealcircumstanceshowever,Gorardproceedstonotethat
thereissomeconsensus(andheclaimsFisheragreed)thatunderrealworldconditions
(imperfectmeasurementofobservations,nonuniformdistributions,studiesofapopulation
withoutinferencefromasample),usingsquaresisworsethanabsolutedifferences.
Gorard'sresponsetoyourquestion"Can'twesimplytaketheabsolutevalueofthedifference
insteadandgettheexpectedvalue(mean)ofthose?"isyes.Anotheradvantageisthatusing
differencesproducesmeasures(measuresoferrorsandvariation)thatarerelatedtotheways
weexperiencethoseideasinlife.Gorardsaysimaginepeoplewhosplittherestaurantbill
evenlyandsomemightintuitivelynoticethatthatmethodisunfair.Nobodytherewillsquare
theerrorsthedifferencesarethepoint.
Finally,usingabsolutedifferences,henotes,treatseachobservationequally,whereasby
contrastsquaringthedifferencesgivesobservationspredictedpoorlygreaterweightthan
observationspredictedwell,whichislikeallowingcertainobservationstobeincludedinthe
studymultipletimes.Insummary,hisgeneralthrustisthattherearetodaynotmanywinning
reasonstousesquaresandthatbycontrastusingabsolutedifferenceshasadvantages.
References:
Gorard,S.(2005).Revisitinga90yearolddebate:theadvantagesofthemeandeviation,
BritishJournalofEducationalStudies,53,4,pp.417430.
Gorard,S.(2013).Thepossibleadvantagesofthemeanabsolutedeviationeffectsize,
SocialResearchUpdate,65:1.

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

7/8

21/7/2015

definitionWhysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddeviation?CrossValidated

editedJul14'14at2:57

answeredJul14'14at2:13

gung

Jen

53.6k

15

116

225

36

Whenaddingrandomvariables,theirvariancesadd,foralldistributions.Variance(and
thereforestandarddeviation)isausefulmeasureforalmostalldistributions,andisinnoway
limitedtogaussian(aka"normal")distributions.Thatfavorsusingitasourerrormeasure.Lack
ofuniquenessisaseriousproblemwithabsolutedifferences,asthereareoftenaninfinite
numberofequalmeasure"fits",andyetclearlythe"oneinthemiddle"ismostrealistically
favored.Also,evenwithtoday'scomputers,computationalefficiencymatters.Iworkwithlarge
datasets,andCPUtimeisimportant.However,thereisnosingleabsolute"best"measureof
residuals,aspointedoutbysomepreviousanswers.Differentcircumstancessometimescall
fordifferentmeasures.
answeredOct21'14at23:27
EricL.Michelsen
21

1 Iremainunconvincedthatvariancesareveryusefulforasymmetricdistributions.FrankHarrellOct22'14
at12:58

Squaringamplifieslargerdeviations.
Ifyoursamplehasvaluesthatarealloverthechartthentobringthe68.2%withinthefirst
standarddeviationyourstandarddeviationneedstobealittlewider.Ifyourdatatendedtoall
fallaroundthemeanthencanbetighter.
Somesaythatitistosimplifycalculations.Usingthepositivesquarerootofthesquarewould
havesolvedthatsothatargumentdoesn'tfloat.

|x| = x2

Soifalgebraicsimplicitywasthegoalthenitwouldhavelookedlikethis:

= E [(x )2 ]

whichyieldsthesameresultsasE [|x |] .

Obviouslysquaringthisalsohastheeffectofamplifyingoutlyingerrors(doh!).
editedJul28'14at22:46
Alexis
6,931

answeredJul28'14at20:57
PrestonThayne

16

47

BasedonaflagIjustprocessed,Isuspectthedownvoterdidnotcompletelyunderstandhowthisanswer
respondstothequestion.IbelieveIseetheconnection(butyoumightneverthelessconsidermakingsome
editstohelpotherreadersappreciateyourpointsbetter).Yourfirstparagraph,though,strikesmeasbeing
somewhatofacircularargument:the68.2%valueisderivedfrompropertiesofthestandarddeviation,so
howdoesinvokingthatnumberhelpjustifyusingtheSDinsteadofsomeotherLp normofdeviationsfrom
themeanasawaytoquantifythespreadofadistribution?whuber Jul28'14at21:20
Thefirstparagraphwasthereasonformydownvote.AlexisJul28'14at22:45
2 @PrestonThayne:Sincethestandarddeviationisnottheexpectedvalueof sqrt((xmu)^2) ,your
formulaismisleading.Inaddition,justbecausesquaringhastheeffectofamplifyinglargerdeviationsdoes
notmeanthatthisisthereasonforpreferringthevarianceovertheMAD.Ifanything,thatisaneutral
propertysinceoftentimeswewantsomethingmorerobustliketheMAD.Lastly,thefactthatthevarianceis
moremathematicallytractablethantheMADisamuchdeeperissuemathematicallythenyou'veconveyed
inthispost.SteveSJul29'14at2:18

protectedbywhuber Oct22'14at3:46
Thankyouforyourinterestinthisquestion.Becauseithasattractedlowqualityanswers,postingananswernowrequires10reputationonthissite.
Wouldyouliketoansweroneoftheseunansweredquestionsinstead?

http://stats.stackexchange.com/questions/118/whysquarethedifferenceinsteadoftakingtheabsolutevalueinstandarddevia

8/8

You might also like