You are on page 1of 16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

CleaningdatainStata
TableofContents
Someusefultipsbeforeyougetstarted
Creatinganumberofsmallersubsetsbasedonresearchcriteria
Droppingobservations
Droppingvariables
Transformingvariables
Dealingwithoutliers
Creatingnewvariables
Movingvariables
Labellingvariables
Renamingvariables
Afewlastwords

Cleaningdataisaratherbroadtermthatappliestothepreliminarymanipulationsonadatasetpriortoanalysis.Itwillveryoftenbethefirstassignmentofaresearch
assistantandisthetediouspartofanyresearchprojectthatmakesuswishweHADaresearchassistant.Stataisagoodtoolforcleaningandmanipulatingdata,
regardlessofthesoftwareyouintendtouseforanalysis.Yourfirstpassatadatasetmayinvolveanyorallofthefollowing:
Creatinganumberofsmallersubsetsbasedonresearchcriteria
Droppingobservations
Droppingvariables
Transformingvariables
Dealingwithoutliers
Creatingnewvariables
Movingvariables
Labelingvariables
Renamingvariables
Whetherthisisyourfirsttimecleaningdataoryouareaseasoneddatamonkey,youmightfindsomeusefultipsbyreadingmore.
Someusefultipsbeforeyougetstarted[1]
UsetheStatahelpfile.Statahasabuiltinfeaturethatallowsyoutoaccesstheusermanualaswellashelpfilesonanygivencommand.Simplytypehelpinthe
commandwindow,followedbythenameofthecommandyouneedhelpwithandpresstheEnterkey:

http://data.library.utoronto.ca/cleaningdatastata

1/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Writeadofile.Nevercleanadatasetbyblindlyenteringcommands(orworse,clickingbuttons).Youwanttowritethecommandsinadofile,andthenrunit.This
way,ifyoumakeamistake,youwillnothaveruinedyourentiredatasetandyouwillnotneedtostartagainfromscratch.Thisisageneraladvicethatappliestoany
workyoudoonStata.Workingfromdofilesletsotherpeopleseewhatyoudidifyoueverneedadvice,itmakesyourworkreproducibleanditallowsyoutocorrect
smallmistakessomewhatpainlessly.
Tostartadofile,clickontheiconthatlookslikeanotepadonthetopleftcornerofyourStataviewer[2].

Inthepreliminarystagesofyourwork,youmayfeelthatadofileismorehindrancethanitisuseful.Forexample,ifyouarenotsofamiliarwithacommand,you
mayprefertotryitfirst.Onesimplewaytodothatandstillhavedisciplineaboutwritingdofilesistowriteyourdofileinstages,writingonlyafewcommandsbefore
executingthem,correctingmistakesasyougo.Inordertoexecuteanumberofcommandsratherthanthewholedofile,simplyhighlighttheonesyouwantto
execute,andclickontheExecuteSelection(do)icononthetopofyourdofileeditor,atthefarright.

http://data.library.utoronto.ca/cleaningdatastata

2/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

AsyoubecomemoreproficientwithprogramminginStata,youwontneedtotryoutcommandsanymore,andyoulldiscoverthejoyofwritingadofileandhavingit
runwithoutaglitch.Torunawholedofile,donothighlightanypartofitandclickontheExecuteSelection(do)icon.
Youmaywonderaboutthecommandsclear,setmoreoffandsetmem15000inthescreenshotexample.Thesethreecommandsareadministrativecommands
thatarequiteusefultohaveatthebeginningofadofile.Thefirst,clear,isusedtoclearanypreviousdatasetyoumayhavebeenworkingon.Thecommandset
moreofftellsStatanottopauseordisplaythemoremessage.Finally,thecommandsetmem15000increasesthememoryavailabletoStatafromyour
computerherewewillneeditasthesizeofthedatasetwedownloadedfrom<odesi>[3]islargerthanthe10mballocatedtodatabydefault.
Onelastcommentaboutdofiles:ifyoudoubleclickasaveddofile,itwillnotopenforediting,butratherStatawillrunthatdofile,whichcanbeabitannoyingTo
reopenadofilefromafolderwithoutexecutingthecommandsinit,rightclickonitandselecteditratherthanopen.
Alwayskeepalog.Again,thisisageneralruleofthumbonStata.Keepingalogmeansyoucangobackandlookatwhatyoudidwithouthavingtodoitagain.
StartingalogisjustamatterofaddingacommandatthetopofyourdofilethattellsStatatolog,aswellaswhereyouwantthelogtobesaved:
logusingwhateverpathyouwant:\pickanameforyourlog.smcl[4],replace[5]
Notehowlogsaresavedunderthesmclextension.
Donotforgettocloseyourlogbeforestartinganewone.Thelastcommandonyourdofile[6]willusuallybelogclose.
Saveasyougo.Computerscrash,powergoesout,stuffhappens.Saveyourdofileseveryfewminutesasyouwritethem.Savingadofileisdonethesamewayas
savinganytexteditordocument:eitherclickonthedisketteicon,orpressCTRL+S:

Youshouldalsosaveyourdatasetasyoumodifyit,butmakesuretokeeponeversionoftheoriginaldataset,incaseyouneedtostartover.Thecommandtosave
adatasetonStataissave,followedbythepathwhereyouwantthedatasettobesaved,andthe[optional]commandreplace.

http://data.library.utoronto.ca/cleaningdatastata

3/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

NotehowtheextensionforStatadatais.dta,andalsonotehowthenewdatasethasadifferentnamefromtheoriginal[7].
Becomefamiliarwithyourdataset.Datasetscomewithcodebooks.Youshouldknowwhateachvariableis,howitscoded,howmissingvaluesareidentified.A
goodpracticeistoactuallylookatthedata,sothatyouunderstandthestructureoftheinformation.Todoso,youcanclickonDatainthetopleftcornerofyour
viewerandselectDataeditor,thenDataeditor(browse).Anewwindowwillopenandyoucanseeyourdata.

Youcanalsousethecommandbrowse,eitherbytypingitdirectlyinthecommandwindow,orfromadofile:

Oneofthedistinguishingfeaturesof<odesi>isthatwhenyoudownloadadataset,itcomeswithlabels.Variablelabelsaredescriptionsofvariables,andvalue
labelsareusedtodescribethewayvariablesarecoded.Basically,thevaluelabelsitsontopofthecode,sothatwhenyoubrowse,youseewhatthecodemeans
ratherthanwhatitis.Tomakethisclearer,letslookatthedatawithnolabels.Look,forexample,attheGEOPRVvariable.

http://data.library.utoronto.ca/cleaningdatastata

4/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Backtotop
Creatinganumberofsmallersubsetsbasedonresearchcriteria
Therearemanyreasonswhyyoumaywantasmallersubsetofyourdatabutthemainoneisthatthebiggerthedataset,theharderitisforStatatomanage,which
slowsdownyoursystem.Yourgoalistomakeyourdatasetassmallaspossible,whilekeepingalltherelevantinformation.Yourresearchagendadetermineswhat
yourfinaldatasetwillcontain.
LetssayyouhavedataonthehealthhabitsofCanadiansaged12andup,butyourresearchquestionisspecifictowomenofreproductiveagelivinginOntario[8].
Youclearlydontneedtokeepthemeninyourdataset,andyouwontneedtokeeptheresidentsofprovincesotherthanOntario.Furthermore,youcanprobably
dropwomenunder15andover55yearsold.Now,letslookathowyouwoulddothat.
Backtotop
Droppingobservations
Todropobservations,youneedtocombineoneoftwoStatacommands(keepordrop)withtheifqualifier.
Makesureyouhavesavedyouroriginaldatasetbeforeyougetstarted.
Thekeepcommandshouldbeusedwithcaution(oravoidedaltogether)becauseitwilldropallbutwhatyouspecificallykeep.Thiscanbeaproblemifyouarenot
100%certainofwhatyouwanttokeep.
ThedropcommandwilldropfromyourdatasetwhatyouspecificallyaskStatatodrop.
Theifqualifierrestrictsthescopeofthecommandtothoseobservationsforwhichthevalueofanexpressionistrue.Thesyntaxforusingthisqualifierisquite
simple:
commandifexp
Wherecommandinthiscasewouldbe,dropandexpistheexpressionthatneedstobetrueforthedropcommandtoapply[9].

UsingtheexampleofwomenofreproductiveageinOntario,thefirsthighlightedlinedropsmen,thesecondlinedropsanyobservationnotinOntario,whilethelast
linedropsobservationsinagegroupsolderoryoungerthanoursubsetofinterest.
Youhavetobecarefulwithlogicaloperatorsnoticethesyntaxinthethirdline.AcommonmistakeistoaskStatatodropifDHHGAGE>10&DHHGAGE<2.There
arenoindividualsinthedatasetwhoareolderthan55ANDyoungerthan15.Wewanttodropifolderthan55ORyoungerthan15.
Hereisalistofoperatorsinexpressions.Youwouldmostlyuselogicalandrelationaloperatorsinconjunctionwithif:

http://data.library.utoronto.ca/cleaningdatastata

5/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Backtotop
Droppingvariables
Anotherwayinwhichyoumayneedtomakeyourdatasetsmallerisbydroppingvariablesthatarenotusefultoyourresearch.Itmaybethattheinformation
containedinagivenvariableisduplicated(i.e.anothervariableprovidesthesameinfo),ormaybealltheobservationsforavariablearemissing,oravariablejust
happenstobeinyourdatasetbutisirrelevanttoyourresearch.Droppingvariablesisverystraightforwardsimplyusethedropcommand.
LookingatthedatafromCCHS,thevariableSLP_01(Numberofhoursspentsleepingpernight)iscodedas.a(NOTAPPLICABLE)foreachobservationinthe
dataset.

Clearlywewillnotlearnanythingfromthatvariable,sowecandropit.Thesyntaxfordroppingvariableissimple:
dropvarlist

http://data.library.utoronto.ca/cleaningdatastata

6/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Wherevarlististhelistofvariablesyouwouldliketodrop.Itseasytodropanumberofavariableatatimethisway.HereIamdroppingallthevariablesthatwere
codedasNotApplicableformorethan95%ofobservations[10]:

Backtotop
Transformingvariables
Sometimesvariablesarenotcodedthewayyouwantthemtobe.Inthissectionwewilllookattwotransformationsyoumayneedtodoonsomevariablesbefore
usingthem:recodeanddestring.
Therecodecommandchangesthevaluesofnumericvariablesaccordingtotherulesspecified.IntheCCHSdataset,manyvariableshavemissingvaluescoded
as.aor.d.Thisisconvenientbecauseitwillnotaffectcalculationsyoumightdousingthedata(forexampleifyoucalculateanaverage).However,many
datasetsuse999asamissingvariablecode,andthatmightbeproblematic.Wemightwanttorecodetheseas.inordertonothavethemaffectanycalculations
weplanondoingwiththedata.Thesyntaxforthiscommandis:
recodevarlist(oldvalue(s)=newvalue)[11]
LetsrecodetheheightandBMIvariablesfromtheCCHSdata,(forthesakeofillustration,sinceitsreallynotnecessaryinthiscase):

Thedestringcommandallowsyoutoconvertdatasavedinthestringformat(i.e.alphanumeric)intoanumericalformat.TheCCHSdatasetdoesnotcontainany
stringvariable.Inordertoseewhatastringvariablelookslike,wecanusetheconversecommand,tostring,tocreateastringvariable.Wewillthenconvertthat
variablebacktoanumericalformat.

Astringvariableshowsupinredinthedataeditor:

http://data.library.utoronto.ca/cleaningdatastata

7/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

AlthoughitmaylookthesameasthevariableCIH_2,Statacannotdoanycalculationsonthestringvariable(sinceitsformatistellingStatathatitismadeofletters
orothersymbols).Letsdestringit:

Noticetheuseoftheoptionsgenerateandreplace.Whenwecreatedthefakestringvariable,weusedgeneratebecausewewantedanewseparatevariable.
Now,whenwedestring,wearereplacingthestringvariablebyitsnumericalcounterpart.Howyouchoosetodothisinyourowndatasetdependsonhowyouplanto
usethevariables.Willyoustillhaveanyuseforthestringvariable?Ifsogenerateanewonewhenyoudestring.Doyoujustwantthatvariabletonotbeinstring
format?Thenreplaceitwiththenewone.
Here,wecanseethatourvariablestringisnowcompletelyidenticaltothevariableCIH_2:

http://data.library.utoronto.ca/cleaningdatastata

8/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

(Wecandropthatvariablenow)
Backtotop
Dealingwithoutliers
Outliersdeservetheirownsectionbecausethereisoftenconfusionastowhatexactlyconstitutesanoutlier.AnoutlierisNOTanobservationwithanunusualbut
possiblevalueforavariable[12]rareeventsdooccur.Theoutliersyoushouldbeconcernedaboutaretheonesthatcomefromcodingerror.Howdoyoutellwhich
iswhich?Commonsensegoesalongwayhere.
First,lookatyourdatausingthedataeditor(browse).Outlierstendtojumpatyou.Ifyouhaveasmalldataset,youcanalsotabulateeachofyourvariables:
tabvarlist[13]
Tabulatingavariablewillgiveyoualistofallthepossiblevaluesthatvariabletakesinthedataset.Outlierswillbetheextremevalues.Lookattheorderof
magnitude.Arethesevaluesbelievable?
Ifthedatasetisverybig,however,itmaynotbepracticaltostareatallthevaluesavariablecantake.Infact,Statawillnottabulateiftherearetoomanydifferent
values.
Youcanlookatyourdatainascatterplot:

IntheCCHSdataset,caseidistheindividualid,whilehwtghtmistheheightinmeters.Thegraphtellsustherearenooutliersinthisdataset:

http://data.library.utoronto.ca/cleaningdatastata

9/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Anotherwaytolookforoutliersistosummarizetheobservationsforavariable,usingthedetailedoption:

Theresultwindowwillshowthemainpercentilesofthedistribution(includingthemedian50%),thefirstfourmoments,aswellasthefoursmallestandfourlargest
observations:

http://data.library.utoronto.ca/cleaningdatastata

10/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Clearly,therearenooutliers.Letsimagineforamomentthatthe99percentileoftheheightdistributionincludesanobservationwith5.2menteredastheheight.Isit
plausiblethattherereallywasa5.2mwomanrecordedinthisdataset?Lookattheorderofmagnitudebywhichthisobservationwoulddifferfromthesecondlargest.
Itsalmost50standarddeviationsbigger...
Whatshouldyoudowithsuchanobservation?Thereareanumberofsolutionsbutnoneisperfect:
Dropitfromyourdataset(dropifhwtghtm>1.803)
Usetheifqualifiertoexcludeitwhengeneratingstatisticsthatusetheheightvariable(commandifhwtghtm<=1.803)
Ignoreitiftheheightvariableisnotactuallythatimportantinyourresearchandtherestofthevariablesforthisobservationsarecodedjustfine
Backtotop
Creatingnewvariables
Therearetwomaincommandsyouneedtoknowtogeneratenewvariables:genisforthebasics,whileegenallowsyoutogetprettyfancy.Youcancombine
thesewithqualifierssuchasiforinaswellasprefixsuchasbyandbysort[14].
Forexample,sayyouwanttocreateavariablethattellsyouwhetherthewomeninthedatasethavealiveinpartner.Whilethereisnosurefirewaytoestablishthat,
wewillapproximateitbyassumingthatwomenwhoindicatedtheirmaritalstatusasmarriedorcommonlawactuallylivewiththeirspouseorcommonlawpartner:

Thefirstlinecreatesthevariableliveinandassignsitavalueof1ifthevalueofthemaritalstatusvariable(dhhgms)iseither1(married)or2(commonlaw).The
secondlinereplacesthemissingvaluecodeby0,makingtheliveinvariablebinary.
Now,letssayyouwouldliketocreateacategoricalvariablethattellsyou,byagegroup,ifawomanisbeloworaboveaverageintermsofbodymassindex(BMI).

http://data.library.utoronto.ca/cleaningdatastata

11/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Thefirstlineofcommandcreatesavariable(meanbmi)whichtakesonauniquevalueforeachagegroup,theaverageBMIforthatagegroup.Theprefixbysortis
acombinationofbyandsortyoucouldequivalentlybreakitintotwocommands:
sortDHHGAGE
byDHHGAGE:egenmeanbmi=mean(HWTGBMI)
ThesortpartofthecommandorganizestheobservationaccordingtothevariableDHHGAGE,fromsmallesttolargest,asteprequiredbeforedoinganyactionby
thevariable.Itsusuallyeasiertojustusebysort.
Thesecondandthirdlines(startingwithgen)createabinaryvariablewhichequals0ifanobservationhasaBMIlowerthantheaverageforheragegroup,and1if
herBMIisaboveheragegroupaverage.
Backtotop
Movingvariables
Nowthatyouhavecreatedthesenewvariables,itwouldbenicetomakesurethattherulesbywhichyougeneratedthemwascorrect.Ideally,youwouldliketolook
atlivein(thenewvariablebasedonmaritalstatus)anddhhgms(themaritalstatusvariable).However,itshardtocomparetwovariablesunlesstheyaresideby
side.Youcanusetheordercommandtomoveavariable(i.e.moveacolumnofyourdataset).
Whenyoucreateavariable,bydefaultitbecomesthelastcolumnofyourdataset.Youcanmoveitnexttoanothervariableinstead:

Nowifwelookatourdataset,wecanseecomparethenewvariabletotheoldandmakesurethatwecodeditproperly:

http://data.library.utoronto.ca/cleaningdatastata

12/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Similarly,sinceourtwonewvariablespertainingtoBMIarenowthelastcolumns,letsmovetheoriginalBMIvariabletotheendofthedataset:

Itnoweasytoglanceatournewvariables:

http://data.library.utoronto.ca/cleaningdatastata

13/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Doyounoticetheproblemonline8?Thevariablebmicatshouldnotbecoded1iftheoriginalBMIvariableiscodedasamissingvalue.Wecanfixthiswithaquick
replace:
replacebmicat=.ifhwtgbmi==.d
Backtotop
Labellingvariables
Wheneveryoucreateanewvariable,itisagoodideatolabelit.Why?Havingyourvariableslabeledmakesiteasyforyouoranyoneelseusingyourdatasetto
quicklyseewhateachvariablerepresents.Youshouldthinkofyourworkassomethingthatpeopleshouldbeabletoreproduce.Labelingyourvariablesisasmall
taskthatmakesitmucheasierforotherstouseyourdata[15].
Thesyntaxforlabelingvariablesisasfollow:
labelvariablevarnamelabel.
Inourpreviousexample,thecommandwouldlooklikethis:

Notethatyoucanabbreviatethiscommandtolabvar:

Backtotop
Renamingvariables
Youmayfindthatyouworkfasterifyourvariableshavenamesthatyourecognizeatfirstglance.Inmostcasesthisisbynomeansanecessarytaskincleaning
data,butifyouusedatafromanothercountry,forexample,youmayfindthatthevariablenamesareinaforeignlanguage,makingitveryhardtoremember.The
syntaxisaseasyascanbe:
renameoldnamenewname

Letsseethefinaldofile
Yourdofilemaybeslightlydifferentfromthisbutitshouldresultinthesamefinaldataset:

http://data.library.utoronto.ca/cleaningdatastata

14/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Letstryrunningitinonegotoseeifitworks.DonothighlightanycommandandclickonExecute(Do).NotethatwheneverStataencountersthecommandbrowse
adataeditorwillpopuponyourscreen.HavealookatyourdatathenclosethedataeditorinorderforStatatocontinuerunningthedofile.
Letsalsotakethetimetoopenourlogstoseewhatitlookslikeandhowitcouldbeuseful.
Finallyletslookatourfinaldatasetsandmakesureitcontainsalltherightvariables,intherightformat.
Backtotop
Afewlastwords
Thisconcludesourworkshopbutitsonlythebeginningforyou.Learningtousestatisticalsoftwareinvolvesalotoftrialanderror,angrygoogling,anddesperately
tryingtofindsomeonewhoknowshowtowritealoopListedbelowareafewexcellentresourcestofurtheryourworkingknowledgeofStata:
UCLA:http://www.ats.ucla.edu/stat/stata/default.htm(http://www.ats.ucla.edu/stat/stata/default.htm)

Princeton:http://data.princeton.edu/stata/default.html(http://data.princeton.edu/stata/default.html)
http://www.princeton.edu/~otorres/Stata/statnotes(http://www.princeton.edu/%7Eotorres/Stata/statnotes)

LSE:http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesJan2009PartA.pdf
(http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesJan2009PartA.pdf)
http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesFeb2009PartB.pdf
(http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesFeb2009PartB.pdf)

UniversityofNorthCarolinaatChapelHill:http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial
(http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial)

Stata:http://www.stata.com/support/faqs/(http://www.stata.com/support/faqs/)

http://data.library.utoronto.ca/cleaningdatastata

15/16

6/9/2015

CleaningdatainStata|data.library.utoronto.ca

Backtotop

[1]Thereisanassumptionherethatyoualreadyhaveadataset.Ifyoudonotandyouneedassistanceassemblingdata,pleasevisitthedatalibrary(THIS
COMMENTNEEDSTOREFERENCETHEGUIDEONHOWTODOWNLOADADATASETFROMSDA)
[2]Youcanuseothertexteditorstocreateandmanagedofiles.Forexample,SmultronisanopensourcesoftwarethatworkswellwithStata.
[3]Youcanseethesizeofadatasetbyrightclickingonit,thenselectingproperties.
[4]Youshouldcreateafolderinaneasytorememberlocation(desktopworkswell)foryourStatawork.Thencheckitspropertiesbyrightclickingonit,andcopythe
location.Thatsyourpath.
[5],replaceisoptionalherebutratherusefulifyouwanttokeepjustonelogperdofile.Ifyoudonthavethe,replacecommand,youwillneedtomodifythename
ofthelogeverytimeyourunthedofile.
[6]However,ifadofileisinterruptedbecauseofanerrorandalogisopen,youwillneedtocloseitbeforerunningthesamedofileagain,becauseoneofthefirst
commandofthedofileistostartalog,whichwillresultinanerrormessageunlessthepreviouslogisclosed.Simplytypethecommandlogcloseinthecommand
window,orhighlightitandexecutefromyourdofile.
[7]Notetousersofthisguide:thiscommandwouldtypicallybelocatedtowardstheendofthedofile.Ihavecreatedascreenshotherewithanewdofileonlyto
showonecommandalone.Alltheexamplesinthisguidethatsimilarlyuseanewdofilewithonlyonecommandweredonethatwaytosavespace.Thegoalofthis
workshopistolearntocreateacleaningdofile,inwhichcommandsarelistedoneaftertheother.Itrustthatuserscanunderstandthecommandswellenoughby
theendoftheworkshoptoassemblethemintheorderthatislogicalforthepurposeoftheirowntask.
[8]TheexamplesinthisguidewerecreatedusingacustomizedsubsetoftheCanadiancommunityhealthsurvey(CCHS),annualcomponent,20072008,available
throughtheDataLiberationInitiative(DLI)anddownloadedusingSDA@CHASS.
[9]SeetheStatahelpfilesonexpressionsandoperators:typehelpexpandhelpoperatorinthecommandscreen.
[10]ThereisnoruleofthumbatplayhereIsimplypickedalistofvariablesthatcontainedlittleusefulinformation.Sometimes,thefactthatonlyasmallnumberof
observationscontaininformationISinformative,inandofitself.Donotdropvariablesthattellyousomethingimportant.
[11]Notethatyoucanalsousethiscommandtomakegroups.TheCCHSdatasetalreadyhasagebyagegroupbutifyouhadavariableforactualage,youcould
generateanagegroupvariableusingrecode.SeetheStatahelpsheet(helprecode)formoreoptions.
[12]Admittedly,theseareindeedoutliers,justnotthetypewewanttodoanythingabout.Leavethosealone.Dealingwithtrueeventsinanywayislikelytodo
moreharmthangoodasyouwouldtruncateyourdataset,potentiallycreatingbiasinyouranalysislater.
[13]Youreplacevarlistwiththelistofthevariablesyouwanttabulated,asinthedropexample.
[14]Allofthesecommands,qualifiersandprefixeshaveStatahelpfiles.Havealookatthemforamoreindepthpresentation.
[15]Knowinghowtolabelvariablescanalsobeusefulifthedatawasnotprovidedtoyouwithadictionaryfileyoucanthenusethequestionnairetobuildlabelsfor
allyourvariablesofinterest,justasadictionaryfilewoulddo.
Backtotop

MDLhours

Contactus

http://data.library.utoronto.ca/cleaningdatastata

16/16

You might also like