You are on page 1of 7

SpencerChristian

9/18/15
APStatistics
TheDataExplorationProjectStatisticalReport

Whilecompletinghomeworkforschooloneday,Istartedtothinkabouthowmanyhours
ofhomeworkIdoonaverageeverynight.Eventually,Ibecamecuriousabouthowlongother
peoplespenddoinghomeworkatnightinothergradesandthroughoutAustinHighingeneral.
So,IdecidedtochoosethisinterestingtopictouseasmydatafortheDataExplorationProject.
ThepopulationisstudentsatAustinHighSchoolingrades912,andeachvalueisanestimated
amountoftheaveragetimespentonhomeworkpernightforeachperson.Thetimeis
measuredinhoursbecausethatwasthemostconvenientamountoftimetoworkwithforthe
subjectstocomeupwiththeirownaverage.
InordertocollectthedatathatIhaveusedforthisproject,Iattemptedtousearandom
conveniencesample,byaskingdifferentstudentsinmyclassesorinthehallwaysatAustin
Highabouttheirnightlyhomeworkload.Iattemptedtoeliminatebiasinmyquestioningby
simplyasking,onaverage,howmanyhoursofhomeworkdoyoudoeachnight?The
questiondoesnotcontributebiasbecauseofhowsimpleandstraightforwarditis.Afterthe
subjecttoldrespondedtomyquestion,Irecordedtheanswerinanotebook.Icollectedthis
typeofdatabecauseIthoughtthatconductingaconveniencesampleatschoolwouldcovera
lotofpeoplesopinionsinashortamountoftime,whilestillbeingsomewhatrandom,becauseI
triedtoapproachawidevarietyofpeople.Thiswasthebestwayofcollectingdataforme
becauseitwasthefastestandmostreliablewaytotalktopeopleinthetimeframeofthe
project.
Icollecteddatafromasamplesizeof30studentsatAustinHighSchool.Inorderto
analyzethisdata,IusedmyTI84calculatortogivemesomevaluableinformation.Iusedthe
statkeyandthenselectededitwhereIwasthenabletoinputallthedataIhadcollectedinto
calculatorintotwolists,oneIusedforthegradethestudentwasin,andoneIusedfortheir
assumptionoftimespentonhomework.Fromthere,Iusedthestatkeyagainandthistime
selected1VarStatswherethecalculatorshowedmelotsofusefuldigits,suchasthe5
numbersummary.Thisincludestheminimumxvalue(minx),thevalueofthefirstquartile
(Q1),themedian(med),thethirdquartile(Q3),andthemaximumxvalue(maxx).ThevaluesI
wasgivenforthesevaluesareasfollows:

MinX:1

Q1:1

Med:2

Q3:2

MaxX:4

Othervaluesgiventomebythecalculatoronthiswindowwasthemeanoraverageofallthe
data,whichwas2.02hourspernight,thestandarddeviationwhichrepresentshowfareach
numberissetfromthemean,was.859,andthevarianceis.737,becauseitisjustthestandard
deviationsquared.Icalculatedtherangeofthedatabyusingthe5numbersummary
becauseitprovidestheMinimumXvalueandtheMaximumXvalue.Usingthesevalues,you
cansaythattherangeis14hoursofhomeworkpernight.
Therearemultipledifferentwaystofindanoutliersinasetofdata,andIchosethe1.5
IQRmethod.Inthismethod,youtaketheIQRwhichistheInterQuartileRange,whichisthe
valueofQ3minusthevalueofQ1,whichis21,whichequals1.Fromthere,youmultiplythe
IQRby1.5,andthenaddthattotheQ3valueandsubtractthatfromtheQ1value.Inthiscase,
becausetheIQRwas1,and1.5*1=1.5,allyouhavetodoisadd1.5to2(Q3value),whichis
3.5,andvaluesover3.5areconsideredoutliers.Youalsohavetosubtract1.5from2,toshow
outliersontheothersideofthemedian.Inmydataset,therewere2outliers,both4s.They
qualifyasoutliersbecausetheyareabove3.5.Myworkforfindingtheoutliersisshownbelow:

NextIcreatedsomevisualrepresentationsofmydata.Ihaveincludedherea
histogram,boxplot,andastemplot.Forthestemplot,Iusedadividedstemplotbecauseallof
myvaluesaresingledigitnumbers,soatraditionalstemplotwouldnotprovideanyusable
information.

Histogram:

Stemplot:

Boxplot:

Aftercompletingthelaststeps,Iadded100toeachnumberinmydata.Fromthere,I
wentbackandfoundtheinformationthatIhadalreadyfoundwithmyoriginaldata,byfollowing
thesamesteps.The5numbersummaryformynewmodifieddatawasasfollows:

MinX:101

Q1:101

Med:102

Q3:102

MaxX:104

Themeangivenwas102.02,andthestandarddeviationgivenwas.859.Allofthenewvalues
comparesimilarlytotheoriginalvalues.Therangeis101104,andthemeanandmedianare
similarforbothsetsofvalues,forboththesecondsetsmeanandmedianarethesameasthe
firstsetjust+100.Thestandarddeviationandvariancearethesameacrossbothcalculations.
Inordertofindoutliersforthissecondcalculation,Iusedthe1.5IQRmethodagain,andby

adding(1.5*1)tothemedian(102),aswellassubtractingit,Ifoundthatthe104valuesarethe
onlyoutliers.
Afterfindingthesenewcalculations,Irecreatedthethreevisualrepresentationsforthe
newvalues.Theyareagainastemplot,ahistogram,andaboxplot.

Histogram:

Stemplot:

Boxplot:

Next,Imodifiedmydatayetagainbyincreasingeachnumberintheoriginaldataby
50%.Oncecompleted,Irecalculatedthe5numbersummaryagainusingthecalculator.The
valuesIrecalculatedareasfollows:

MinX:1.5

Q1:1.5

Median:3

Q3:3

MaxX:6

Mean:3.03

StandardDeviation:1.287

ThesenewcaIculationsshowachangeinthemean,medianandstandarddeviationthatis
muchdifferentthantheothermodificationtothedata.Therangeis1.56inthissetofdata.In
ordertocalculateoutliers,Iagainusedthe1.5IQRmethod,inwhichIfoundthat
1.5*2.25=3.375.Fromthere,Iadded3.375toQ3andsubtracteditfromQ1.Accordingtomy
calculationstherewerenooutliersinthissetbecausenodatapointswereabove6.375orbelow
1.875.Ithencompletedthevisualrepresentationsofastemplot,boxplot,andhistogramagain
forthisnewdata.Theyaredisplayedbelow:

Histogram:

Stemplot:

Boxplot:

Atthispoint,Ihadtofindthepercentthatisgreaterthan5unitsabovethemeanofmyoriginal
data.Becausethemeanofmyoriginaldatawas2.02hours,allyouhavetodoisadd5unitsto
that,whichis2.02+5=7.02hours,andanythingabovethatwouldfactorintotherequired
percentage.Becausemydatadoesnothavethatlargeofaspread,therearenovalueslarger
than5unitsabovemymean,somypercentageis0%.IfIweretocalculatethishowever,I
wouldhavetotakethenumberofvaluesthatmeetsthiscriteriaanddivideitbythetotalnumber
ofvalues(30),andthenmultiplythatanswerby100,inordertogetthepercentage.Next,Ihad
tofindthepercentthatisbetween3unitsbelowthemeanand2unitsabovethemean.This
meansanyvaluesbetween1hourand4hours,becausethemeanisstill2.Allofmydata
pointsfallintothiscategory,sothepercentagewouldbe30/30,or100%.Inordertofindthe
unitsinthetop10%ofmydata,Isetupaproportion,withonesidehaving1/10representing
10%,andtheotherx/30,andthensolvedforxasyoucanseebelow:

Aftersolvingforx,Ifoundthatthetop3valuesofmy30totalvalueswouldrepresentthetop
10%.Inthiscase,thetop10%forthemosthomeworkwouldbethetwo4hourvaluesandone

3hourvalue,becausetheyarethe3highestvaluesinmydatasetof30,makingthemthetop
10%.
Inconclusion,Ihavefoundthatonaverage,homeworkatAustinHighSchoolvaries
betweeneachgrade,andthatjunioryearhasasignificantlyhigheramountofhomeworkthan
theotherthreeyears.Senioryearandsophomoreyearhavecomparableloadsofhomework,
whilefreshmanyearsitsalonewiththeleastamountofhomework.Whilethismaynotbe
entirelyrepresentativeofAustinHigh,duetomyconveniencesampling,Ibelievethatgenerally
speaking,theresultsareaccurate.

You might also like