Professional Documents
Culture Documents
Math1040
InstructorPingYu
SkittlesTermProject
ForthisprojectIwillbeabletousemanyoftheconceptsIvelearnedthissemester.The
projectwillincludingorganizing,analyzingdata,drawingconclusionsusingconfidenceintervals
andhypothesistests.Intheprojectwegathereddatafromaskittlesbag,whereweeachbought
a2.17ouncebagthenseparatedthecolorsandcollectedthedata.
DataCollection
Formypurchasedone2.17ouncebagoftheOriginalSkittlesandrecordedthefollowingdata:
Numberof
redcandies
Numberof
orangecandies
Numberof
yellowcandies
Numberof
greencandies
Numberof
purplecandies
13
19
10
12
ClassData
red
orange
yellow
green
purple
12
15
13
12
14
17
14
10
12
17
10
18
13
10
13
13
15
11
11
17
14
13
15
10
13
10
14
10
19
11
16
11
12
10
11
17
12
11
11
12
10
10
12
10
11
13
13
12
15
13
14
20
11
14
15
14
16
10
16
16
16
18
10
12
17
13
21
10
18
15
12
13
11
19
14
10
17
20
10
13
14
10
21
26
12
22
12
15
11
11
12
23
12
13
14
13
24
18
20
12
25
13
10
10
15
14
26
13
19
10
12
27
14
13
14
10
28
15
10
14
16
29
10
14
15
12
11
30
12
12
11
14
13
total
366
343
314
355
346
OrganizingandDisplayingCategoricalData:colors
TheobservationsofthedataandthegraphsreflectedwhatIexpected.Iusedapiechart
andaparetocharttocomparetheclassdatawithmydata.Eventhoughwehadveryclose
numbersforthecharts,Iendeduphavingthecolororangeasmymajorityat31.148%where
theclassmajoritycolorwasredat21.230%.Mychartsshowedthatmycolorsvariedwherethe
classgraphwereallaboutthesameamount.
OrganizingandDisplayingQuantitativeDatatheNumberofCandiesperBag
Column n
Mean
Standard
Deviation
Min
Q1
Median
Q3
Max
Total
59.4
2.39
53
58
60
61
64
29
TheshapeofthedistributiongraphIhaveisskewedtotheleft.Mygraphdoesmatchmy
expectationsbecauseIexpectedthateverybagshouldhavearoundthesameamount.
However,thegraphshowsthattheamountofskittlesineachbagfavorsthehighendrather
thanthelowend.Iexpectedthatitwouldbeequalonbothsideofthemedianbutthegraph
clearlyshowsaslightskewtotheleft.Althoughweonlyhad29samplessoifthereweremore
samplesthenthegraphmightevenoutabitmore.Mybaghad61skittleswhichwascloseto
themedianoftheclassdata.ForthatreasonIbelievethatmyclassdataalignswithwhatIgot
frommybagofskittles.
Reflection
Instatisticswehavedifferenttypeofdata,whereweneedtolearnthedifference
betweenthetermsquantitativedataandcategoricaldata.Quantitativedataarenumbers
representedbycountingormeasurements.Anexampleforquantitativedatacanbethetotal
numberofskittlesinanormalsizedskittlebag.Histogramandtheboxplotmakesensefor
quantitativedatabecausewearecomparingdifferentnumbers.Howeverapiechartorapareto
chartwouldnotmakesensebecausewearenotcomparingnumberstoalabelorsomesort.As
forcalculation,wecancalculatethemeanandstandarddeviationforquantitativedatabutit
wouldnotbeusefulforustocalculatethepercentageofeachofourdatavalues.For
categoricaldataconsistsofnamesorlabelsthatarenotnumbersrepresentingcountsor
measurements.Anexamplewouldbehowmanypurplesdidyougetfromtheskittlesbag.Pie
chartandparetochartaregreatforrepresentingthesesortsofdatabecausewecanseethe
comparisonofeachofthecategory.Ontheotherhand,histogramandboxplotwouldnotbetoo
usefulbecausewearenotcomparingnumberstoeachother.Wecancalculatethepercentage
ofacertaincategoryforcategoricaldatasbutitwouldnotmaketoomuchsensetocalculatethe
standarddeviationofacategoricaldata.
ConfidenceIntervalEstimates
Anconfidenceintervalisarangeofvaluesusedtoestimatethetruevalueofa
populationparameter,wheretheyarealsoabbreviatedasCI.Thepurposeofconfidence
intervalistotellustheestimate,whereweusethebestpointestimateofthepopulation
proportion.Yetthepointestimateisasinglevaluethatgivesusnoindicationofhowgoodthat
bestestimateis.Statisticianshavecleverlydevelopedtheconfidenceinterval,whichconsistsof
arangeofvaluesinsteadofasinglevalue,whereitgivesusabettersenseofhowgoodthe
estimateis.
Ihadtodo3differentconfidenceintervalestimate.Onefora99%confidenceintervalof
thetrueproportionofyellowcandies,anotherfora95%confidenceintervaloftheestimatefor
thetruemeannumberofcandiesperbag,andfinallyonefora98%confidenceintervalforthe
standarddeviationofthenumberofcandiesperbag.
Forthe99%confidenceintervalofthetrueproportionofyellowcandiesIgot.158<p<
0.206.ThismeanthatIam99%surethatthetrueproportionofyellowcandiesisbetween0.158
and0.206.
TheanswerIgotforthe95%confidenceintervaloftheestimateforthetruemean
numberofcandiesperbagwas58.48<<60.32.ThismeansthatIcanbe95%confidentthat
allskittlesbagswillhaveabout59.4plusorminus0.92candiesperbag.
Theanswerforthe98%confidenceintervalforthestandarddeviationofthenumberof
candiesperbagwas1.8<<3.4.ThismeanthatIcanbe98%confidentthatallskittlesbag
willhaveastandarddeviationbetween1.8and3.4.
HypothesisTests
Instatistics,ahypothesisisaclaimorstatementaboutapropertyofapopulation.A
hypothesistestisaprocedurefortestingaclaimaboutapropertyofapopulation.Thegeneral
componentsofaformalhypothesistestarethefollowing:identifythenullhypothesisand
alternativehypothesisfromagivenclaimandexpressbothinsymbolicform.Calculatethe
valueoftheteststatistic,givenaclaimandasampledata.Choosethesamplingdistribution
thatisrelevant.EitherfindthePvalueoridentifythecriticalvalues.Statetheconclusionabout
aclaiminsimpleandnontechnicalterms.
Ihadtodotwohypothesistestformydataset.Thefirstwasdonebyusinga0.05
significanceleveltotesttheclaimthat20%ofallskittlescandiesarered.Thesecondwasdone
byusinga0.01significanceleveltotesttheclaimthatthemeannumberofcandiesinabagof
skittlesis55.ForthefirsttestIfoundthattherewassufficientevidencetosupporttheclaimthat
20%ofallskittlescandiesarered.Asforthesecondtest,Ifoundthattherewasnotenough
evidencetosupporttheclaimthatthereare55skittlesinabag.
Reflection
Thereare3conditionsfordoingconfidenceintervalestimatesofapopulationproportionp.
These3conditionsare:
Thesampleisasimplerandomsample.
Eitherthepopulationisnormallydistributedorn>30
Thereareatleast5successesandatleast5failures.
Thereare2conditionsfordoingconfidenceintervalforestimatingapopulationstandard
deviationorvariationwhichare:
Thesampleisasimplerandomsample
Thepopulationmusthavenormallydistributedvalues.
Finallythereare2conditionsforconfidenceintervalforestimatingapopulationmeanwithnot
knownwhichare:
Thesampleisasimplerandomsample.
Eitherthepopulationisnormallydistributedorn>30.
Theseconditionsalsoappliestothehypothesistestaswell.
Thereareafewmistakesthatmighthavehappenedwhenthedatawasbeinggathered.
Forexamplesomeonecouldhavecountedtheskittlesincorrectlyandsothatmighthave
skewedthedata.Anotherproblemwouldbeifsomeoneforgottogetaskittlebag.Thisisa
problembecausethiswouldgiveusasmallersamplesizethus,thecalculationwouldyielda
lessaccurateresult.
Therearemanywaysthatthesamplingmethodcouldbeimproved.Wecouldhave
madeeveryonetakeapictureoftheirsamplesothatwearesurethattheycountedtheirskittles
correctly.Anotherthingwecouldhavedoneisbuymorebagsofskittlesbutcounttheskittlesin
eachbagseparately.Thiswouldgiveusmoresampleswhichwouldmakeourresultsmore
accurate.
Inconclusionthereareacoupleofthingsthatweconcludefromthisresearch.I
concludedthattheamountofeachofthecolorsofskittlesineachbagareallaboutthesame
amount.Nocolorofskittleswerenoticeablymorefavorablethananother.AnotherthingthatI
couldconcludewasthatthemeanthatwegotfromtheclassdatafallsbetweenthetruemean
thatIcalculated.