You are on page 1of 11

aguideto

ProbabilityandStatisticsin MicrosoftExcel

Resourcestosupportthelearningofmathematics, statisticsandORinhighereducation. www.mathstore.ac.uk TheStatisticalEducationthroughProblemSolving(STEPS) glossary www.stats.gla.ac/steps/glossary

ProbabilityandStatisticsinMicrosoftExcel
Excelprovidesmorethan100functionsrelatingtoprobabilityandstatistics.Italsohasafacilityfor constructingawiderangeofchartsandgraphsfordisplayingdata.Thisleafletprovidesaquickreference guidetoassistyouinharnessingExcelsstatisticalcapability.Exceptwhereindicated,thefeaturesincluded hereareavailableinExcelVersions4.0andabove.Almostalltheinstructionsherealsoapplytothe spreadsheetfacilityinOpenOffice(http://openoffice.orgsuite.com/);anyslightvariationsincommands shouldbeobvioustotheuser. Excelisnotdesignedforstatisticalcomputing.Ifyourequirestatisticalanalysisbeyonddatavalidationand manipulation,tabulation,presentationandcalculationofsummarystatistics,youareadvisedtousea bespokestatisticalpackagesuchasMinitaborSPSS. ExcelhasanAnalysisToolpakoptionaladdinfacilitythatincludesmacrosforcarryingoutmany elementarystatisticalanalyses.TheinstructionsforinstallationofthisaddinvarywiththeversionofExcel usetheHelpfacilityinExcelforfurtherinformationonthis.Thisaddinfacilityisnotusedinthisleaflet. Therearetworeasonswhythisaddinshouldbeusedwithcare: Unlikeotherspreadsheetfunctionality,whichensuresthatcalculationsautomaticallyupdateinthe lightofchangeselsewhereintheworkbook,theoutputfromtheaddinisnotdynamicallylinkedtothe sourcedata.Henceifanyofthedatachangetheaddinmustberunagaintoobtainupdatedoutput. Outputfromtheaddincanbemisleading(seehttp://support.microsoft.com/kb/829252forexample). ThereareothercommerciallyavailableaddinsthatmakeuseofExcelsfamiliaruserinterfacebut supplementitsstatisticalfunctionality.Examplesinclude: Analyseit RExcel Unistat XLSTAT StatTools http://www.analyseit.com/ http://rcom.univie.ac.at/ http://www.unistat.com/ http://www.xlstat.com/en/home/ http://www.palisade.com/stattools/

Usingthisleaflet
Supposeyouhaveasampleofthreedata,10.4,11.2and16.4,thatyouhaveenteredintocellsA2:A4ona worksheet.InExcelafunction,e.g.SUM,canbeappliedtothesedatainoneoffourways: =SUM(10.4,11.2,16.4) =SUM(A2,A3,A4) =SUM(A2:A4) =SUM(x) wherexisthenameattachedtorangeA2:A4. Inthisleaflet,forsimplicity,wehavechosentorefertonamed ranges.Tonamearange,simplyhighlighttherangeofcells,clickin theNameBoxonthefarleftoftheFormulaBar,typeintherequired name,e.g.x,thenpressEnter.InExcel2007namescanbemanaged viaFormulas>NameManager. Ifyouprefernottousenamestheninwhatfollowssimplyreplacethe nameoftherange,e.g.x,bytherangeaddress,e.g.A2:A4.

DescriptiveStatistics
Assumingasampleofdatainrangex Sampletotal,x =SUM(x) Samplesize,n =COUNT(x) Samplemean,x/n =AVERAGE(x) 2 Samplevariance,s =VAR(x) Samplestandarddeviation,s =STDEV(x) Meansquareddeviation =VARP(x) Rootmeansquareddeviation =STDEVP(x) Correctedsumofsquares,Sxx =DEVSQ(x) 2 Rawsumofsquares,x =SUMSQ(x) Minimumvalue =MIN(x) Maximumvalue =MAX(x) Range =MAX(x)MIN(x) LowerQuartile,Q1* =QUARTILE(x,1) Median,Q2 =MEDIAN(x) UpperQuartile,Q3* =QUARTILE(x,3) Interquartilerange,IQR =QUARTILE(x,3)QUARTILE(x,1) th K Percentile =PERCENTILE(x,K%) whereKisanumberbetween0and100 Mode =MODE(x) *Note:Thereareseveraldifferentdefinitionsfortheupperandlowerquartiles,sothevaluescalculatedby Excelmaynotagreewithyourtextbookorotherstatisticalcalculationtools. Boxplot Seehttp://www.coventry.ac.uk/ec/~nhunt/boxplot.htm

GroupedFrequencyData
Assumingafrequencydistributionwithclassmidpointsstoredinrangexandfrequenciesinrangef: Samplesize,n Sampletotal,fx Samplemean,fx/n Correctedsumofsquares,Sxx Samplevariance,s2 Samplestandarddeviation,s =SUM(f) =SUMPRODUCT(f,x) =SUMPRODUCT(f,x)/SUM(f) =SUMPRODUCT(f,x,x)SUMPRODUCT(f,x)^2/SUM(f) =(SUMPRODUCT(f,x,x)SUMPRODUCT(f,x)^2/SUM(f))/(SUM(f)1) =SQRT(Samplevariance)

GraphicalRepresentations
Exceloffersawiderangeofcharttypesfordisplayingdata.Manyoftheseareoverelaborate.In particular,3Deffectscanbemisleadingandshouldbeavoided. InExcel2007toconstructachartforyourdata: 1. Selecttherangecontainingyourdata,includinganyroworcolumnlabels. 2. Onthemainribbon,clickontheInserttab. 3. UndertheChartsgroupoficons,selectthecharttyperequired,thenthepreferredchartsubtype. 4. UnderChartToolsonthemainribbon,usetheDesign,LayoutandFormattabstocustomisethechart. InearlierversionsofExcel,selectthedatarangeandthenInsert>CharttoinvoketheChartWizard.

PermutationsandCombinations
Numberofdifferentcombinationsofmobjectsselectedfromnobjects n C m =COMBIN(n,m) Numberofdifferentpermutationsofmobjectsselectedfromnobjects n P m =PERMUT(n,m)

StandardProbabilityDistributions
AssumingarandomvariableXandconstantsaandb Binomial Bin(n,p) P(X=a) =BINOMDIST(a,n,p,FALSE) P(Xa) =BINOMDIST(a,n,p,TRUE)

Geometric P(X=a) P(Xa) Geom(p) =BINOMDIST(1,a,p,FALSE)/a =1BINOMDIST(0,a,p,FALSE)

Poisson P(X=a) P(Xa) Po() =POISSON(a,lambda,FALSE) =POISSON(a,lambda,TRUE)

Pascal Pasc(n,p) P(X=a) =NEGBINOMDIST(an,n,p) P(Xa) =BETADIST(p,n,an+1)/BETADIST(1,n,a n+1)

Normal f(a) P(Xa) P(aXb) P(Xb) Exponential f(a) P(Xa) P(aXb) P(Xb) Gamma f(a) FALSE) P(Xa) P(aXb) TRUE) TRUE) P(Xb) TRUE) N(, 2) =NORMDIST(a,mu,sigma,FALSE) =NORMDIST(a,mu,sigma,TRUE) =NORMDIST(b,mu,sigma,TRUE) NORMDIST(a,mu,sigma,TRUE) =1NORMDIST(b,mu,sigma,TRUE) Expon() =EXPONDIST(a,theta,FALSE) =EXPONDIST(a,theta,TRUE) =EXP(a*theta)EXP(b*theta) =EXP(b*theta) Ga(,) =GAMMADIST(a,alpha,beta,

=GAMMADIST(a,alpha,beta,TRUE) =GAMMADIST(b,alpha,beta, GAMMADIST(a,alpha,beta, =1GAMMADIST(b,alpha,beta,

TestStatisticsforPopularSignificanceTests
Onesampletestofamean Assumingasampleofdatainrangex,drawnfromapopulationwithmeanandstandarddeviation: H0:=0H1:0 Teststatistic,z =(AVERAGE(x)mu0)/(sigma/SQRT(COUNT(x))) assumingknown Teststatistic,t =(AVERAGE(x)mu0)/(STDEV(x)/SQRT(COUNT(x))) assumingunknown Onesampletestofavariance Assumingasampleofdatainrangex,drawnfromapopulationwithmeanandstandarddeviation: H0:2=02H1:2> 02 Teststatistic,2 =DEVSQ(x)/sigma0^2 Twosampletestofdifferencebetweenmeans Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithmeans1and2andequal variances: H0:12=cH1:12c Estimatetheunknowncommonstandarddeviationbythepooledestimate: s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)2)) Teststatistic,t =(AVERAGE(x)AVERAGE(y)c)/(s*SQRT(1/COUNT(x)+1/COUNT(y))) Twosampletestofratioofvariances Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithvariances12and22: H0:12=22H1:12>22 Teststatistic,F =VAR(x)/VAR(y) Chisquaredtestofassociation Assumingatwowaycontingencytableofobservedfrequencies. H0:rowfactorindependentofcolumnfactor H1:someassociationbetweenrowandcolumnfactors Thesuggestedlayoutbelowfora4x2tablecaneasilybemodifiedfortablesofothersizes.

A1: A3: C1: G3: C8: C9: C10: =SUM(C3:D6) =SUM(C3:D3) =SUM(C3:C6) =$A3*C$1/$A$1 =CHITEST(C3:D6,G3:H6) =(COUNT(A3:A6)1)*(COUNT(C1:D1)1) =CHIINV(C8,C9) copydowntoA6 copyacrosstoD1 copyintoG3:H6

CriticalValuesandPvaluesforStatisticalTests
Therearetwoapproachestoconductingsignificancetests.Someanalystsliketocomparetheteststatistic withthecriticalvalueforagivensignificancelevel;othersprefertocalculatethePvaluecorrespondingto theteststatistic.Excelcanbeusedforeithermethod. Assumingsignificancelevel,(typically=5%or0.05): Twotailedztest Uppertailcriticalvalue =NORMSINV(1alpha/2) Pvalueforgivenz =2*(1NORMSDIST(ABS(z))) Twotailedttestwithvdegreesoffreedom Uppertailcriticalvalue =TINV(alpha,v) Pvalueforgivent =TDIST(ABS(t),v,2) Onetailed 2testwithvdegreesoffreedom Uppertailcriticalvalue =CHIINV(alpha,v) Pvalueforgivenchisquared=CHIDIST(chisquared,v) OnetailedFtestwithv1degreesoffreedomin thenumeratorandv2inthedenominator Uppertailcriticalvalue =FINV(alpha,v1,v2) PvalueforgivenF =FDIST(F,v1,v2)

ConfidenceLimits
Assumingdegreeofconfidence100(1)% (e.g.for95%confidence=0.05): Onesamplestatistics,withdatainrangex For (known) Lowerlimit=AVERAGE(x)NORMSINV(1alpha/2)*sigma/SQRT(COUNT(x)) or =AVERAGE(x)CONFIDENCE(alpha,sigma,COUNT(x)) Upperlimit=AVERAGE(x)+NORMSINV(1alpha/2)*sigma/SQRT(COUNT(x)) or =AVERAGE(x)+CONFIDENCE(alpha,sigma,COUNT(x)) For (unknown) For 2 Lowerlimit=(DEVSQ(x)/CHIINV(alpha/2,COUNT(x))1) Upperlimit=(DEVSQ(x)/CHIINV(1alpha/2,COUNT(x))1) Lowerlimit=AVERAGE(x)TINV(alpha, COUNT(x)1)*STDEV(x)/SQRT(COUNT(x)) Upperlimit=AVERAGE(x)+TINV(alpha, COUNT(x)1)*STDEV(x)/SQRT(COUNT(x))

Twosamplestatistics,withdataforthefirstsampleinrangex,andthesecondsampleinrangey For x y ( xknown, y known) Lowerlimit =AVERAGE(x)AVERAGE(y)NORMSINV(1alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y)) Upperlimit =AVERAGE(x)AVERAGE(y)+NORMSINV(1alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y)) For x y ( xand y unknownbutassumedequal) Estimatetheunknowncommonstandarddeviationbythepooledestimate: s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)2)) Lowerlimit =AVERAGE(x)AVERAGE(y)TINV(alpha,COUNT(x)+COUNT(y)2)*s*SQRT(1/COUNT(x)+1/COUNT(y)) Upperlimit =AVERAGE(x)AVERAGE(y)+TINV(alpha,COUNT(x)+COUNT(y)2)*s*SQRT(1/COUNT(x)+1/COUNT(y)) For x2/ y2 Lowerlimit=DEVSQ(x)/DEVSQ(y)/FINV(alpha/2,COUNT(x)1,COUNT(y)1) Upperlimit(DEVSQ(x)/DEVSQ(y)/FINV(1alpha/2,COUNT(x)1,COUNT(y)1)

SimpleLinearRegression
InExcelVersions5andabove,aregressionline(ortrendline)canbeaddedtoascatterplotbyrightclicking ononeoftheplottedpointsandselectingAddTrendlinefromtheshortcutmenu.Bothlinearandavariety ofnonlinearmodelsmaybefittedtothedata.Theequationofthefittedmodelmaybedisplayed, togetherwiththevalueofthecoefficientofdetermination,R2.Therearealsooptionstoextrapolatethe trendlineineitherdirection,ortoforcethetrendlinetohaveaspecificintercept.

Thetrendlineapproachispurelygraphical.Tocalculatepredictions,regressionfunctionsmustbeused.

Assumingasampleofvaluesoftheindependentvariableinrangex,andcorrespondingvaluesofthe dependentvariableinrangey: Leastsquaresestimateofintercept,a =INTERCEPT(y,x) Leastsquaresestimateofslope,b =SLOPE(y,x) Sxy =SUMPRODUCT(x,y)COUNT(x)*AVERAGE(x)*AVERAGE(y) Sxx =DEVSQ(x) Syy =DEVSQ(y) Samplecovariance,Cov(x,y) =COVAR(x,y)*COUNT(x)/(COUNT(x)1) Estimateof,s =STEYX(y,x) Predictionofyatx=x0,=a+bx0 =FORECAST(x0,y,x) Estimatedstandarderrorofindividualpredictedyatx=x0 =STEYX(y,x)*SQRT(1+1/COUNT(x)+(x0AVERAGE(x))^2/DEVSQ(x)) Estimatedstandarderrorofmeanpredictedyatx=x0 =STEYX(y,x)*SQRT(1/COUNT(x)+(x0AVERAGE(x))^2/DEVSQ(x))

Correlation
Assumingtwosamplesofpaireddatainrangesxandy: Pearsonproductmoment correlationcoefficient,r =CORREL(x,y)

RankCorrelation
Assumingtwosamplesofpaireddatainrangesxandywithnoties: Rankofithvalueinrangex =RANK(INDEX(x,i),x,1) Assumingtwosamplesofpaireddatainrangesxandywithsometiedvalues: Rankofithvalueinrangex =(RANK(INDEX(x,i),x,1)RANK(INDEX(x,i),x,0)+COUNT(x)+1)/2 Assumingthattherangesrxandrycontaintheranksofthedatainxandyrespectively: Spearmanrankcorrelationcoefficient,rS=CORREL(rx,ry)

Intheexampleabove: D2: E2: F2: F9: =RANK(B2,$B$2:$B$7,1) =RANK(B2,$B$2:$B$7,0) =(D2E2+COUNT($B$2:$B$7)+1)/2 =CORREL(C2:C7,F2:F7) copydowntoD7 copydowntoE7 copydowntoF7 adjustedforties

TimeSeries
Theexamplesbelowrefertothreeyearsofobservedquarterlydata. Forecastsaremadeforafurtherfourquarters(oneextrayear). Levelonly

Simplemovingaverageperiod5 C4: =AVERAGE(B2:B6) C14: =C$11 copydowntoC11 copydowntoC17 copydowntoD11 copydowntoD17

Centredmovingaverageperiod4 D4: =(AVERAGE(B2:B5)+AVERAGE(B3:B6))/2 D14: =D$11

Exponentiallyweightedmovingaverage E2: =B2 initiallevelestimate E3: =$G2*B3+(1$G2)*E2 copydowntoE13 E14: =E$13 copydowntoE17 ThechartwasdrawnbyhighlightingB1:B17andE1:E17thenusingInsert>Charts>Line>2DLine. Levelandconstanttrend

C2: =FORECAST(A2,$B$2:$B$13,$A$2:$A$13) copydowntoC17

Levelandchangingtrend

C2: C3: D2: D3: E3: E14: =B2 =$F2*B3+(1$F2)*(C2+D2) =B3B2 =$G2*(C3C2)+(1$G2)*D2 =C2+D2 =C$13+(A14A$13)*D$13 initiallevelestimate copydowntoC13 initialtrendestimate copydowntoD13 copydowntoE13 copydowntoE17

Level,changingtrendandseasonality

C5: C6: D5: D6: E2: E6: F6: F14: =AVERAGE(B2:B5) =G$2*B6/E2+(1G$2)*(C5+D5) =(AVERAGE(B6:B9)C5)/4 =H$2*(C6C5)+(1H$2)*D5 =B2/C$5 =I$2*B6/C6+(1I$2)*E2 =(C5+D5)*E2 =(C$13+(A14A$13)*D$13)*E10 initiallevelestimate copydowntoC13 initialtrendestimate copydowntoD13 copydowntoE5,initialseasonalestimates copydowntoE13 copydowntoF13 copydowntoF17 Version9,9June2009

You might also like