You are on page 1of 34

ECO72INTRODUCTIONTOECONOMICSTATISTICS

Topic3 Measuresof Dispersion

Dispersion

Measuresofcentraltendencylookatmeasuringa typicalobservation Thissectionmeasureshowdispersedorspreadout thedataare HelpsusanswerthequestionHowtypicalistypical? Forexample,ifmostobservationsareclosetothe mean,thenthemeanistypical;otherwisenot.

Knowingthemeanisnotenough

Sampleoffivepeople's wagesinSouth Dakota:


$5,7,11,12,and15.

Sampleoffive people'swagesin NorthDakota:


$9,9.50,10,9.50,12.

Knowingthemeanisnotenough

Sampleoffivepeople's wagesinSouth Dakota:


$5,7,11,12,and15.

Sampleoffive people'swagesin NorthDakota:


$9,9.50,10,9.50,12.

Bothsampleshaveameanwageof$10 ButthewagesinSouthDakotaaremuchmore dispersed. Howdowemeasurethis?

MeasuresofDispersionAvailable

Range MeanAbsoluteDeviation Variance/StandardDeviation InterQuartileRange Eachofthesemeasures,likethemeasuresof centraltendency,hasstrengthsandweaknesses.

Rangedefinitionandadvantage

Therangesimplytellsusthehighestvalueinthe sampleminusthelowestvalue.

Exampleabove:$129=3;$155=10.

Advantage:Easytocalculate,intuitive.

Range(cont'd)

Problem:Hugedatasetincludesoutliers observationsthatareunusualandextremein value


TopincomeearnerinadatasetmaybeBillGates Bottomincomeearnermaybeastreetvendor making$1perhour Therangewillbethedifferencebetweentheir incomes.Doesthistellusmuch? Moreover,outliersareoftencreatedbyerrors.

RangeUsage

Oneuseofrangeisqualitycontrol,whereabsolute minimamayneedtobesetforsafetystandards.

Example:Engineerstestfiveairbagstofindouthow longtheytaketoinflate.Resultsare0.7,0.8,0.85, 0.95,0.8seconds. Rangeis0.25.

MeanAbsoluteDeviation

Iscalculatedastheaveragedistancefromthe mean Givenasampleofsizenwithmean,


i=N i =1

MAD =

X i X N

MeanAbsoluteDeviationExample

WagesinSouth Dakota(Mean$10)
Wage 5 7 11 12 15 |510| |710|

X i X

5 2

=5 =3
5 7
3 5

|1110| =1 |1210| =2 |1510| =5 3.2

11 12 15

Avg: 10

MeanAbsoluteDeviationExample

WagesinNorthDakota (Mean$10)
Wage 9 9.50 10 9.50 12

X i X

|910| |1010| |1210|

=1.0 =0.0 =2.0 0.8

|9.510|=0.5 |9.510|=0.5

SotheNorth Dakotasample hasasmaller MAD,aswe mighthope.

Avg: 10

MeanAbsoluteDeviation Advantages

Usesinformationfromallobservations Notasaffectedbyoutliersastherange

EveryobservationaffectstheMADequally

Relativelyintuitive(comparedtowhat'scoming....)

Absolutevalueturnsouttohaveadifficultproperty:
|X|

MeanAbsoluateDeviation Disadvantage

X 2

It'snotsmoothatzero.Sincezerointhiscaseis thesamplemean,itcanmovearoundstrangelyifour estimatedmeanchanges.

Variance

Iscalculatedastheaveragesquareddistance fromthemean GivenapopulationofsizeN X withmean,


i=N

i =1 MAD =

X i X N =
2 i= N i= 1

2 Xi X N

Samplevs.PopulationVariance

Noticethatsampleandpopulationvarianceare calculateddifferently!
i= N i =1 i=n i =1

PopulationVariance: SampleVariance:

2/ N Xi X

2 / n1 X iX

Inasample,wedon'tknowthemeanexactly,soweuse uponedegreeoffreedomcalculatingit.Thisislikelosing anobservationinoursample.

VarianceExample

WagesinSouth Dakota(Mean$10) 2
Wage 5 7 11 12 15 (510)2 =25 (710)2 =9 (1110)2 =1 (1210)2 =4 (1510)2 =25 Sum: 64 2:64/(51)=16
5 25 25 9 4 1 7 1112 15

X iX

Avg: 10

VarianceExample

WagesinNorth Dakota(Mean$10)
Wage 9 9.50 10 9.50 12

2 X iX

(910)2

=1

(9.510)2 =.25 (1010)2 =0 (9.510)2 =.25 (1210)2 =4 Sum 5.5

SotheNorth Dakotasample hasasmaller variance,too.

Avg:10

2:5.5/(51)=1.375

VarianceAdvantage
VariancehasandadvantageoverMAD:
X2 X 2 2 |X| X 1 0 1 2

0
2

ThefunctionX issmoothatzero.Thismeansthata slightlymisestimatedmeanwillnothaveserious consequences.(Thisisadifficultpointtoexplain fully.)

DisadvantagesofVariance

It'sabithardertocalculateandnotasintuitiveas MAD,letaloneRange It'sslightlymoreaffectedbyoutliersthanMAD

StandardDeviation

Thevarianceisonthescaleofthevariable squared,notonthescaleofthevariableitself Togetastatisticthatisonthescaleoftheoriginal variable,wetakethe(positive)squarerootofthe variance.Thisiscalledthestandarddeviation.

MAD,Variance,andStandard DeviationCompared
SOUTHDAKOTA: Range MAD 10 3.2 NORTHDAKOTA: Range MAD 3 0.8

Variance 16 Std.Dev. 4

Variance 1.375 Std.Dev. 1.173

ThestandarddeviationandtheMADareusuallyof thesameorderofmagnitude.

TheRuleofThumb

We'veseenthatalargerstandarddeviationmeans moredisperseddata Butwhatdoesastandarddeviationmean?


Thestandarddeviationis1.167 Thestandarddevationis4 Dotheactualnumbersmatter?

TheRuleofThumb

Thetypicalcase:

68%ofobservationsliebetweenand+ 95%ofobservationsliebetween2and+2 68%ofsampleliesbetween101.17and10+1.17 95%ofsampleliesbetween102.34and10+2.34


68%ofobservations
8.83 10 + 11.17

Example:NorhDakota.=10,=1.17

Chebyshev'sTheorem

Auseofthestandarddeviation:Saysatleastaportion1 (1/k)2ofthedatalieswithinkstandarddeviationsofthe mean


Atmostfraction1/k2,combined

Atleastfraction 11/k2ofdata
k k

+k

Chebyshev'sTheorem(cont'd)

Chebyshev'stheoremisusuallyconsideredconservative: Typicallyalotmorethanfraction11/k2ofthedataliesin thisrange.


Atmostfraction1/k2,combined

Atleastfraction 11/k2ofdata
k k

+k

Chebyshev'sTheorem

Example:SouthDakotawages.=10,=4. Considerthecaseofk=2standarddeviations.Thiswould betherange1024to10+24,i.e.,2to18. Chebyshevguaranteesthatatleastfraction11/k2=11/22 =ofthedataliesinthisrange.Infactallofitdoes.


Atleastfraction 11/k2=ofdata
k=24=8 k=24=8

k=2

=10

+k=18

InterQuartileRange
1. Writethedataoutinorder 2. Breakitintofourparts,eachwithanequalnumber ofobservations 3. Pickthetopnumberinthefirstpart,andthetop numberinthethirdpart 4. Subtracttheformerfromthelatter

InterQuartileRangeExample

Agesof16students:

23 21 23 19 23 20 17 21 24 37 21 20 19 22 18 24

InterQuartileRangeExample

Agesof16students:

Inorder,thisis:

23 21 23 19 23 20 17 21 24 37 21 20 19 22 18 24

17 18 19 19 20 20 21 21 21 22 23 23 23 24 24 37

InterQuartileRangeExample

Agesof16students:

Inorder,thisis:
4obs.

23 21 23 19 23 20 17 21 24 37 21 20 19 22 18 24

17 18 19 19 20 20 21 21 21 22 23 23 23 24 24 37

First(bottom)Quartile

4obs.

SecondQuartile
4obs.

4obs.

ThirdQuartile

Fourth(top)Quartile

InterQuartileRangeExample

Agesof16students:

Inorder,thisis:
4obs.

23 21 23 19 23 20 17 21 24 37 21 20 19 22 18 24

2319=4

4obs.

INTER QUARTILE RANGE:

17 18 19 19 20 20 21 21 21 22 23 23 23 24 24 37

First(bottom)Quartile

4obs.

SecondQuartile

4obs.

ThirdQuartile

Fourth(top)Quartile

IQR:AdvantagesandDisadvantages

Theinterquartilerangeistheleastaffectedby outliersofallofthemeasuresabove ItuseslessdatathanthevarianceortheMADand maythereforenotreflectchangesinthe distributioninthebottomortopquartile

Skewness

Theskewnesslooksatthecubeofthedifference fromthemean:
i= N i=1

Xi X N

Skewness=

N X i X 3
i =1

i= N

N 1 N 2

Itisusedtomeasurehowsymmetricthedataare aroundthemean.Zeroskewnessmeansthedata aresymmetric;skewnesscanbepositiveor negative.

PositiveVs.NegativeSkewness

NegativeSkew LeftTail Median>Mean

PositiveSkew RightTail Median<Mean

Mean

Median

Median

Mean

You might also like