Professional Documents
Culture Documents
9/18/15
APStatistics
TheDataExplorationProjectStatisticalReport
Whilecompletinghomeworkforschooloneday,Istartedtothinkabouthowmanyhours
ofhomeworkIdoonaverageeverynight.Eventually,Ibecamecuriousabouthowlongother
peoplespenddoinghomeworkatnightinothergradesandthroughoutAustinHighingeneral.
So,IdecidedtochoosethisinterestingtopictouseasmydatafortheDataExplorationProject.
ThepopulationisstudentsatAustinHighSchoolingrades912,andeachvalueisanestimated
amountoftheaveragetimespentonhomeworkpernightforeachperson.Thetimeis
measuredinhoursbecausethatwasthemostconvenientamountoftimetoworkwithforthe
subjectstocomeupwiththeirownaverage.
InordertocollectthedatathatIhaveusedforthisproject,Iattemptedtousearandom
conveniencesample,byaskingdifferentstudentsinmyclassesorinthehallwaysatAustin
Highabouttheirnightlyhomeworkload.Iattemptedtoeliminatebiasinmyquestioningby
simplyasking,onaverage,howmanyhoursofhomeworkdoyoudoeachnight?The
questiondoesnotcontributebiasbecauseofhowsimpleandstraightforwarditis.Afterthe
subjecttoldrespondedtomyquestion,Irecordedtheanswerinanotebook.Icollectedthis
typeofdatabecauseIthoughtthatconductingaconveniencesampleatschoolwouldcovera
lotofpeoplesopinionsinashortamountoftime,whilestillbeingsomewhatrandom,becauseI
triedtoapproachawidevarietyofpeople.Thiswasthebestwayofcollectingdataforme
becauseitwasthefastestandmostreliablewaytotalktopeopleinthetimeframeofthe
project.
Icollecteddatafromasamplesizeof30studentsatAustinHighSchool.Inorderto
analyzethisdata,IusedmyTI84calculatortogivemesomevaluableinformation.Iusedthe
statkeyandthenselectededitwhereIwasthenabletoinputallthedataIhadcollectedinto
calculatorintotwolists,oneIusedforthegradethestudentwasin,andoneIusedfortheir
assumptionoftimespentonhomework.Fromthere,Iusedthestatkeyagainandthistime
selected1VarStatswherethecalculatorshowedmelotsofusefuldigits,suchasthe5
numbersummary.Thisincludestheminimumxvalue(minx),thevalueofthefirstquartile
(Q1),themedian(med),thethirdquartile(Q3),andthemaximumxvalue(maxx).ThevaluesI
wasgivenforthesevaluesareasfollows:
MinX:1
Q1:1
Med:2
Q3:2
MaxX:4
Othervaluesgiventomebythecalculatoronthiswindowwasthemeanoraverageofallthe
data,whichwas2.02hourspernight,thestandarddeviationwhichrepresentshowfareach
numberissetfromthemean,was.859,andthevarianceis.737,becauseitisjustthestandard
deviationsquared.Icalculatedtherangeofthedatabyusingthe5numbersummary
becauseitprovidestheMinimumXvalueandtheMaximumXvalue.Usingthesevalues,you
cansaythattherangeis14hoursofhomeworkpernight.
Therearemultipledifferentwaystofindanoutliersinasetofdata,andIchosethe1.5
IQRmethod.Inthismethod,youtaketheIQRwhichistheInterQuartileRange,whichisthe
valueofQ3minusthevalueofQ1,whichis21,whichequals1.Fromthere,youmultiplythe
IQRby1.5,andthenaddthattotheQ3valueandsubtractthatfromtheQ1value.Inthiscase,
becausetheIQRwas1,and1.5*1=1.5,allyouhavetodoisadd1.5to2(Q3value),whichis
3.5,andvaluesover3.5areconsideredoutliers.Youalsohavetosubtract1.5from2,toshow
outliersontheothersideofthemedian.Inmydataset,therewere2outliers,both4s.They
qualifyasoutliersbecausetheyareabove3.5.Myworkforfindingtheoutliersisshownbelow:
NextIcreatedsomevisualrepresentationsofmydata.Ihaveincludedherea
histogram,boxplot,andastemplot.Forthestemplot,Iusedadividedstemplotbecauseallof
myvaluesaresingledigitnumbers,soatraditionalstemplotwouldnotprovideanyusable
information.
Histogram:
Stemplot:
Boxplot:
Aftercompletingthelaststeps,Iadded100toeachnumberinmydata.Fromthere,I
wentbackandfoundtheinformationthatIhadalreadyfoundwithmyoriginaldata,byfollowing
thesamesteps.The5numbersummaryformynewmodifieddatawasasfollows:
MinX:101
Q1:101
Med:102
Q3:102
MaxX:104
Themeangivenwas102.02,andthestandarddeviationgivenwas.859.Allofthenewvalues
comparesimilarlytotheoriginalvalues.Therangeis101104,andthemeanandmedianare
similarforbothsetsofvalues,forboththesecondsetsmeanandmedianarethesameasthe
firstsetjust+100.Thestandarddeviationandvariancearethesameacrossbothcalculations.
Inordertofindoutliersforthissecondcalculation,Iusedthe1.5IQRmethodagain,andby
adding(1.5*1)tothemedian(102),aswellassubtractingit,Ifoundthatthe104valuesarethe
onlyoutliers.
Afterfindingthesenewcalculations,Irecreatedthethreevisualrepresentationsforthe
newvalues.Theyareagainastemplot,ahistogram,andaboxplot.
Histogram:
Stemplot:
Boxplot:
Next,Imodifiedmydatayetagainbyincreasingeachnumberintheoriginaldataby
50%.Oncecompleted,Irecalculatedthe5numbersummaryagainusingthecalculator.The
valuesIrecalculatedareasfollows:
MinX:1.5
Q1:1.5
Median:3
Q3:3
MaxX:6
Mean:3.03
StandardDeviation:1.287
ThesenewcaIculationsshowachangeinthemean,medianandstandarddeviationthatis
muchdifferentthantheothermodificationtothedata.Therangeis1.56inthissetofdata.In
ordertocalculateoutliers,Iagainusedthe1.5IQRmethod,inwhichIfoundthat
1.5*2.25=3.375.Fromthere,Iadded3.375toQ3andsubtracteditfromQ1.Accordingtomy
calculationstherewerenooutliersinthissetbecausenodatapointswereabove6.375orbelow
1.875.Ithencompletedthevisualrepresentationsofastemplot,boxplot,andhistogramagain
forthisnewdata.Theyaredisplayedbelow:
Histogram:
Stemplot:
Boxplot:
Atthispoint,Ihadtofindthepercentthatisgreaterthan5unitsabovethemeanofmyoriginal
data.Becausethemeanofmyoriginaldatawas2.02hours,allyouhavetodoisadd5unitsto
that,whichis2.02+5=7.02hours,andanythingabovethatwouldfactorintotherequired
percentage.Becausemydatadoesnothavethatlargeofaspread,therearenovalueslarger
than5unitsabovemymean,somypercentageis0%.IfIweretocalculatethishowever,I
wouldhavetotakethenumberofvaluesthatmeetsthiscriteriaanddivideitbythetotalnumber
ofvalues(30),andthenmultiplythatanswerby100,inordertogetthepercentage.Next,Ihad
tofindthepercentthatisbetween3unitsbelowthemeanand2unitsabovethemean.This
meansanyvaluesbetween1hourand4hours,becausethemeanisstill2.Allofmydata
pointsfallintothiscategory,sothepercentagewouldbe30/30,or100%.Inordertofindthe
unitsinthetop10%ofmydata,Isetupaproportion,withonesidehaving1/10representing
10%,andtheotherx/30,andthensolvedforxasyoucanseebelow:
Aftersolvingforx,Ifoundthatthetop3valuesofmy30totalvalueswouldrepresentthetop
10%.Inthiscase,thetop10%forthemosthomeworkwouldbethetwo4hourvaluesandone
3hourvalue,becausetheyarethe3highestvaluesinmydatasetof30,makingthemthetop
10%.
Inconclusion,Ihavefoundthatonaverage,homeworkatAustinHighSchoolvaries
betweeneachgrade,andthatjunioryearhasasignificantlyhigheramountofhomeworkthan
theotherthreeyears.Senioryearandsophomoreyearhavecomparableloadsofhomework,
whilefreshmanyearsitsalonewiththeleastamountofhomework.Whilethismaynotbe
entirelyrepresentativeofAustinHigh,duetomyconveniencesampling,Ibelievethatgenerally
speaking,theresultsareaccurate.