You are on page 1of 11

aguideto

ProbabilityandStatisticsin
MicrosoftExcel

Resourcestosupportthelearningofmathematics,
statisticsandORinhighereducation.

www.mathstore.ac.uk
TheStatisticalEducationthroughProblemSolving(STEPS)
glossary
www.stats.gla.ac/steps/glossary
ProbabilityandStatisticsinMicrosoftExcel
Excelprovidesmorethan100functionsrelatingtoprobabilityandstatistics.Italsohasafacilityfor
constructingawiderangeofchartsandgraphsfordisplayingdata.Thisleafletprovidesaquickreference
guidetoassistyouinharnessingExcelsstatisticalcapability.Exceptwhereindicated,thefeaturesincluded
hereareavailableinExcelVersions4.0andabove.Almostalltheinstructionsherealsoapplytothe
spreadsheetfacilityinOpenOffice(http://openoffice.orgsuite.com/);anyslightvariationsincommands
shouldbeobvioustotheuser.

Excelisnotdesignedforstatisticalcomputing.Ifyourequirestatisticalanalysisbeyonddatavalidationand
manipulation,tabulation,presentationandcalculationofsummarystatistics,youareadvisedtousea
bespokestatisticalpackagesuchasMinitaborSPSS.
ExcelhasanAnalysisToolpakoptionaladdinfacilitythatincludesmacrosforcarryingoutmany
elementarystatisticalanalyses.TheinstructionsforinstallationofthisaddinvarywiththeversionofExcel
usetheHelpfacilityinExcelforfurtherinformationonthis.Thisaddinfacilityisnotusedinthisleaflet.

Therearetworeasonswhythisaddinshouldbeusedwithcare:
Unlikeotherspreadsheetfunctionality,whichensuresthatcalculationsautomaticallyupdateinthe
lightofchangeselsewhereintheworkbook,theoutputfromtheaddinisnotdynamicallylinkedtothe
sourcedata.Henceifanyofthedatachangetheaddinmustberunagaintoobtainupdatedoutput.
Outputfromtheaddincanbemisleading(seehttp://support.microsoft.com/kb/829252forexample).

ThereareothercommerciallyavailableaddinsthatmakeuseofExcelsfamiliaruserinterfacebut
supplementitsstatisticalfunctionality.Examplesinclude:
Analyseit http://www.analyseit.com/
RExcel http://rcom.univie.ac.at/
Unistat http://www.unistat.com/
XLSTAT http://www.xlstat.com/en/home/
StatTools http://www.palisade.com/stattools/

Usingthisleaflet
Supposeyouhaveasampleofthreedata,10.4,11.2and16.4,thatyouhaveenteredintocellsA2:A4ona
worksheet.InExcelafunction,e.g.SUM,canbeappliedtothesedatainoneoffourways:
=SUM(10.4,11.2,16.4)
=SUM(A2,A3,A4)
=SUM(A2:A4)
=SUM(x) wherexisthenameattachedtorangeA2:A4.
Inthisleaflet,forsimplicity,wehavechosentorefertonamed
ranges.Tonamearange,simplyhighlighttherangeofcells,clickin
theNameBoxonthefarleftoftheFormulaBar,typeintherequired
name,e.g.x,thenpressEnter.InExcel2007namescanbemanaged
viaFormulas>NameManager.

Ifyouprefernottousenamestheninwhatfollowssimplyreplacethe
nameoftherange,e.g.x,bytherangeaddress,e.g.A2:A4.

DescriptiveStatistics
Assumingasampleofdatainrangex
Sampletotal,x =SUM(x)
Samplesize,n =COUNT(x)
Samplemean,x/n =AVERAGE(x)
2
Samplevariance,s =VAR(x)
Samplestandarddeviation,s =STDEV(x)
Meansquareddeviation =VARP(x)
Rootmeansquareddeviation =STDEVP(x)
Correctedsumofsquares,Sxx =DEVSQ(x)
2
Rawsumofsquares,x =SUMSQ(x)
Minimumvalue =MIN(x)
Maximumvalue =MAX(x)
Range =MAX(x)MIN(x)
LowerQuartile,Q1* =QUARTILE(x,1)
Median,Q2 =MEDIAN(x)
UpperQuartile,Q3* =QUARTILE(x,3)
Interquartilerange,IQR =QUARTILE(x,3)QUARTILE(x,1)
th
K Percentile =PERCENTILE(x,K%) whereKisanumberbetween0and100
Mode =MODE(x)

*Note:Thereareseveraldifferentdefinitionsfortheupperandlowerquartiles,sothevaluescalculatedby
Excelmaynotagreewithyourtextbookorotherstatisticalcalculationtools.

Boxplot Seehttp://www.coventry.ac.uk/ec/~nhunt/boxplot.htm

GroupedFrequencyData
Assumingafrequencydistributionwithclassmidpointsstoredinrangexandfrequenciesinrangef:
Samplesize,n =SUM(f)
Sampletotal,fx =SUMPRODUCT(f,x)
Samplemean,fx/n =SUMPRODUCT(f,x)/SUM(f)
Correctedsumofsquares,Sxx =SUMPRODUCT(f,x,x)SUMPRODUCT(f,x)^2/SUM(f)
Samplevariance,s2 =(SUMPRODUCT(f,x,x)SUMPRODUCT(f,x)^2/SUM(f))/(SUM(f)1)
Samplestandarddeviation,s =SQRT(Samplevariance)

GraphicalRepresentations
Exceloffersawiderangeofcharttypesfordisplayingdata.Manyoftheseareoverelaborate.In
particular,3Deffectscanbemisleadingandshouldbeavoided.
InExcel2007toconstructachartforyourdata:
1. Selecttherangecontainingyourdata,includinganyroworcolumnlabels.
2. Onthemainribbon,clickontheInserttab.
3. UndertheChartsgroupoficons,selectthecharttyperequired,thenthepreferredchartsubtype.
4. UnderChartToolsonthemainribbon,usetheDesign,LayoutandFormattabstocustomisethechart.
InearlierversionsofExcel,selectthedatarangeandthenInsert>CharttoinvoketheChartWizard.

PermutationsandCombinations
Numberofdifferentcombinationsofmobjectsselectedfromnobjects
n
C m =COMBIN(n,m)

Numberofdifferentpermutationsofmobjectsselectedfromnobjects
n
P m =PERMUT(n,m)

StandardProbabilityDistributions
AssumingarandomvariableXandconstantsaandb

Binomial Bin(n,p)

P(X=a) =BINOMDIST(a,n,p,FALSE)

P(Xa) =BINOMDIST(a,n,p,TRUE)



Geometric Geom(p)

P(X=a) =BINOMDIST(1,a,p,FALSE)/a

P(Xa) =1BINOMDIST(0,a,p,FALSE)



Poisson Po()

P(X=a) =POISSON(a,lambda,FALSE)

P(Xa) =POISSON(a,lambda,TRUE)



Pascal Pasc(n,p)

P(X=a) =NEGBINOMDIST(an,n,p)

P(Xa) =BETADIST(p,n,an+1)/BETADIST(1,n,a
n+1)



Normal N(, 2)

f(a) =NORMDIST(a,mu,sigma,FALSE)

P(Xa) =NORMDIST(a,mu,sigma,TRUE)

P(aXb) =NORMDIST(b,mu,sigma,TRUE)
NORMDIST(a,mu,sigma,TRUE)

P(Xb) =1NORMDIST(b,mu,sigma,TRUE)



Exponential Expon()

f(a) =EXPONDIST(a,theta,FALSE)

P(Xa) =EXPONDIST(a,theta,TRUE)

P(aXb) =EXP(a*theta)EXP(b*theta)

P(Xb) =EXP(b*theta)



Gamma Ga(,)

f(a) =GAMMADIST(a,alpha,beta,
FALSE)

P(Xa) =GAMMADIST(a,alpha,beta,TRUE)

P(aXb) =GAMMADIST(b,alpha,beta,
TRUE)
GAMMADIST(a,alpha,beta,
TRUE)

P(Xb) =1GAMMADIST(b,alpha,beta,
TRUE)

TestStatisticsforPopularSignificanceTests

Onesampletestofamean
Assumingasampleofdatainrangex,drawnfromapopulationwithmeanandstandarddeviation:
H0:=0H1:0
Teststatistic,z =(AVERAGE(x)mu0)/(sigma/SQRT(COUNT(x))) assumingknown
Teststatistic,t =(AVERAGE(x)mu0)/(STDEV(x)/SQRT(COUNT(x))) assumingunknown

Onesampletestofavariance
Assumingasampleofdatainrangex,drawnfromapopulationwithmeanandstandarddeviation:
H0:2=02H1:2> 02
Teststatistic,2 =DEVSQ(x)/sigma0^2

Twosampletestofdifferencebetweenmeans
Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithmeans1and2andequal
variances:
H0:12=cH1:12c
Estimatetheunknowncommonstandarddeviationbythepooledestimate:
s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)2))
Teststatistic,t =(AVERAGE(x)AVERAGE(y)c)/(s*SQRT(1/COUNT(x)+1/COUNT(y)))

Twosampletestofratioofvariances
Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithvariances12and22:
H0:12=22H1:12>22
Teststatistic,F =VAR(x)/VAR(y)

Chisquaredtestofassociation
Assumingatwowaycontingencytableofobservedfrequencies.
H0:rowfactorindependentofcolumnfactor
H1:someassociationbetweenrowandcolumnfactors
Thesuggestedlayoutbelowfora4x2tablecaneasilybemodifiedfortablesofothersizes.

A1: =SUM(C3:D6)
A3: =SUM(C3:D3) copydowntoA6
C1: =SUM(C3:C6) copyacrosstoD1
G3: =$A3*C$1/$A$1 copyintoG3:H6
C8: =CHITEST(C3:D6,G3:H6)
C9: =(COUNT(A3:A6)1)*(COUNT(C1:D1)1)
C10: =CHIINV(C8,C9)

CriticalValuesandPvaluesforStatisticalTests
Therearetwoapproachestoconductingsignificancetests.Someanalystsliketocomparetheteststatistic
withthecriticalvalueforagivensignificancelevel;othersprefertocalculatethePvaluecorrespondingto
theteststatistic.Excelcanbeusedforeithermethod.
Assumingsignificancelevel,(typically=5%or0.05):

Twotailedztest
Uppertailcriticalvalue =NORMSINV(1alpha/2)
Pvalueforgivenz =2*(1NORMSDIST(ABS(z)))

Twotailedttestwithvdegreesoffreedom
Uppertailcriticalvalue =TINV(alpha,v)
Pvalueforgivent =TDIST(ABS(t),v,2)

Onetailed 2testwithvdegreesoffreedom
Uppertailcriticalvalue =CHIINV(alpha,v)
Pvalueforgivenchisquared=CHIDIST(chisquared,v)

OnetailedFtestwithv1degreesoffreedomin
thenumeratorandv2inthedenominator
Uppertailcriticalvalue =FINV(alpha,v1,v2)
PvalueforgivenF =FDIST(F,v1,v2)

ConfidenceLimits
Assumingdegreeofconfidence100(1)% (e.g.for95%confidence=0.05):

Onesamplestatistics,withdatainrangex
For (known) Lowerlimit=AVERAGE(x)NORMSINV(1alpha/2)*sigma/SQRT(COUNT(x))
or =AVERAGE(x)CONFIDENCE(alpha,sigma,COUNT(x))
Upperlimit=AVERAGE(x)+NORMSINV(1alpha/2)*sigma/SQRT(COUNT(x))
or =AVERAGE(x)+CONFIDENCE(alpha,sigma,COUNT(x))

For (unknown) Lowerlimit=AVERAGE(x)TINV(alpha, COUNT(x)1)*STDEV(x)/SQRT(COUNT(x))

Upperlimit=AVERAGE(x)+TINV(alpha, COUNT(x)1)*STDEV(x)/SQRT(COUNT(x))

For 2 Lowerlimit=(DEVSQ(x)/CHIINV(alpha/2,COUNT(x))1)

Upperlimit=(DEVSQ(x)/CHIINV(1alpha/2,COUNT(x))1)

Twosamplestatistics,withdataforthefirstsampleinrangex,andthesecondsampleinrangey
For x y ( xknown, y known)
Lowerlimit
=AVERAGE(x)AVERAGE(y)NORMSINV(1alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))
Upperlimit
=AVERAGE(x)AVERAGE(y)+NORMSINV(1alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))

For x y ( xand y unknownbutassumedequal)


Estimatetheunknowncommonstandarddeviationbythepooledestimate:
s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)2))
Lowerlimit
=AVERAGE(x)AVERAGE(y)TINV(alpha,COUNT(x)+COUNT(y)2)*s*SQRT(1/COUNT(x)+1/COUNT(y))
Upperlimit
=AVERAGE(x)AVERAGE(y)+TINV(alpha,COUNT(x)+COUNT(y)2)*s*SQRT(1/COUNT(x)+1/COUNT(y))

For x2/ y2

Lowerlimit=DEVSQ(x)/DEVSQ(y)/FINV(alpha/2,COUNT(x)1,COUNT(y)1)
Upperlimit(DEVSQ(x)/DEVSQ(y)/FINV(1alpha/2,COUNT(x)1,COUNT(y)1)

SimpleLinearRegression
InExcelVersions5andabove,aregressionline(ortrendline)canbeaddedtoascatterplotbyrightclicking
ononeoftheplottedpointsandselectingAddTrendlinefromtheshortcutmenu.Bothlinearandavariety
ofnonlinearmodelsmaybefittedtothedata.Theequationofthefittedmodelmaybedisplayed,
togetherwiththevalueofthecoefficientofdetermination,R2.Therearealsooptionstoextrapolatethe
trendlineineitherdirection,ortoforcethetrendlinetohaveaspecificintercept.

Thetrendlineapproachispurelygraphical.Tocalculatepredictions,regressionfunctionsmustbeused.

Assumingasampleofvaluesoftheindependentvariableinrangex,andcorrespondingvaluesofthe
dependentvariableinrangey:
Leastsquaresestimateofintercept,a =INTERCEPT(y,x)
Leastsquaresestimateofslope,b =SLOPE(y,x)
Sxy =SUMPRODUCT(x,y)COUNT(x)*AVERAGE(x)*AVERAGE(y)
Sxx =DEVSQ(x)
Syy =DEVSQ(y)
Samplecovariance,Cov(x,y) =COVAR(x,y)*COUNT(x)/(COUNT(x)1)
Estimateof,s =STEYX(y,x)
Predictionofyatx=x0,=a+bx0 =FORECAST(x0,y,x)

Estimatedstandarderrorofindividualpredictedyatx=x0
=STEYX(y,x)*SQRT(1+1/COUNT(x)+(x0AVERAGE(x))^2/DEVSQ(x))
Estimatedstandarderrorofmeanpredictedyatx=x0
=STEYX(y,x)*SQRT(1/COUNT(x)+(x0AVERAGE(x))^2/DEVSQ(x))

Correlation
Assumingtwosamplesofpaireddatainrangesxandy:
Pearsonproductmoment
correlationcoefficient,r =CORREL(x,y)

RankCorrelation
Assumingtwosamplesofpaireddatainrangesxandywithnoties:
Rankofithvalueinrangex =RANK(INDEX(x,i),x,1)

Assumingtwosamplesofpaireddatainrangesxandywithsometiedvalues:
Rankofithvalueinrangex =(RANK(INDEX(x,i),x,1)RANK(INDEX(x,i),x,0)+COUNT(x)+1)/2

Assumingthattherangesrxandrycontaintheranksofthedatainxandyrespectively:
Spearmanrankcorrelationcoefficient,rS=CORREL(rx,ry)


Intheexampleabove:
D2: =RANK(B2,$B$2:$B$7,1) copydowntoD7
E2: =RANK(B2,$B$2:$B$7,0) copydowntoE7
F2: =(D2E2+COUNT($B$2:$B$7)+1)/2 copydowntoF7
F9: =CORREL(C2:C7,F2:F7) adjustedforties

TimeSeries
Theexamplesbelowrefertothreeyearsofobservedquarterlydata.
Forecastsaremadeforafurtherfourquarters(oneextrayear).
Levelonly


Simplemovingaverageperiod5
C4: =AVERAGE(B2:B6) copydowntoC11
C14: =C$11 copydowntoC17
Centredmovingaverageperiod4
D4: =(AVERAGE(B2:B5)+AVERAGE(B3:B6))/2 copydowntoD11
D14: =D$11 copydowntoD17
Exponentiallyweightedmovingaverage
E2: =B2 initiallevelestimate
E3: =$G2*B3+(1$G2)*E2 copydowntoE13
E14: =E$13 copydowntoE17

ThechartwasdrawnbyhighlightingB1:B17andE1:E17thenusingInsert>Charts>Line>2DLine.

Levelandconstanttrend


C2: =FORECAST(A2,$B$2:$B$13,$A$2:$A$13) copydowntoC17

Levelandchangingtrend


C2: =B2 initiallevelestimate
C3: =$F2*B3+(1$F2)*(C2+D2) copydowntoC13
D2: =B3B2 initialtrendestimate
D3: =$G2*(C3C2)+(1$G2)*D2 copydowntoD13
E3: =C2+D2 copydowntoE13
E14: =C$13+(A14A$13)*D$13 copydowntoE17

Level,changingtrendandseasonality


C5: =AVERAGE(B2:B5) initiallevelestimate
C6: =G$2*B6/E2+(1G$2)*(C5+D5) copydowntoC13
D5: =(AVERAGE(B6:B9)C5)/4 initialtrendestimate
D6: =H$2*(C6C5)+(1H$2)*D5 copydowntoD13
E2: =B2/C$5 copydowntoE5,initialseasonalestimates
E6: =I$2*B6/C6+(1I$2)*E2 copydowntoE13
F6: =(C5+D5)*E2 copydowntoF13
F14: =(C$13+(A14A$13)*D$13)*E10 copydowntoF17

Version9,9June2009

You might also like