You are on page 1of 8

URBP 204A

Fall 2009

Greg Newmark

SPSS and Data Cleaning Tutorial


This tutorial has two main purposes. The irst is to e!pan" #our skills with $P$$ an" the se%on" is to intro"u%e #ou to the %hallenges o "ata %leaning. First things irst& laun%h $P$$ an" open the 2009 F'BT $ur(e# Team Results.sa( ile in the Newmark ol"er on the ) "ri(e an" sa(e this un"er a new name on #our "esktop. This is the raw %ompilation o e(er#one*s "ata. +ou shoul" ,e in the -.ata /iew* page 0#ou will know this ,e%ause there is a ta, -.ata /iew* in #ellow at the ,ottom le t si"e o the page1. 2 #ou are in the -/aria,le /iew* %li%k the -.ata /iew* ta,. The "ata is arrange" with in"i(i"ual %ases in num,ere" rows an" the (aria,les in %olumns. Now& look at the tool ,ar or an i%on that looks like a pri%e tag. 'hen #ou put the %ursor o(er it& the hea"ing 3/alue 4a,els5 will appear. 6li%k this i%on an" all o the la,els that 2 programme" in will appear. This will make it a lot easier to un"erstan" what #ou are looking at. Now& 7ust or un& right %li%k on one o the (aria,le names then sele%t -$ort As%en"ing* or -$ort .es%en"ing.* This eature will sort all the %ases ,ase" on their in"i(i"ual (alues or that (aria,le. This will ,e%ome (er# use ul soon. Now& %li%k on the -/aria,le /iew* ta,. This shows #ou all o the (aria,les& their %o"e names& an" what the# reall# stan" or. Pi%k a (aria,le an" %li%k on its -/alues* %ell. A small ,o! with an ellipsis will appear. 6li%k that ,o!. This shows how 2 %o"e" in the la,els or ea%h (alue. Now& %li%k the -.ata /iew* ta, again an" we will get "own to the ,usiness o %leaning the "ata.

URBP 204A Data Cleaning

Fall 2009

Greg Newmark

9(en the ,est planne" sur(e# will ha(e strange stu in the "ata. : ten this is "ue to trans%ription error 0the pro,lems o %op#ing in ormation rom one pla%e to the ne!t ; like with the game -telephone* ,ut not with talking1. A %riti%al a%ti(it# upon irst getting the "ata rom a sur(e# is to %lean it. Frequency 0Anal#<e = .es%ripti(e $tatisti%s = Fre>uen%ies1 The simplest wa# to s%an #our "ata or weir" stu is to %he%k the Fre>uen%ies or ea%h (aria,le. This lets #ou >ui%kl# see i an#thing une!pe%te" entere" the "ata set.

'hen #ou go to re>uen%ies 7ust sele%t the (aria,le 0or (aria,les1 that #ou want to look at an" push the ,lue arrow ke# to mo(e them into the ,o! on the right. 'hen #ou ha(e all o the (aria,les that #ou want to use in the ,o!& %li%k :?.

URBP 204A

Fall 2009

Greg Newmark

:ne tip@o o pro,lems is i the re>uen%ies are parti%ularl# low or a %ertain response. Census Tract 6umulati(e Fre>uen%# Per%ent /ali" Per%ent Per%ent /ali" A084.00 A08A.08 A08A.02 A0CD.08 A0A8.08 Total Fissing $#stem Total 28B 809 A0 94 C 4BC E 4E8 4A.8 22.B 80.4 89.A .D 9E.C 8.B 800.0 4A.9 2C.0 80.D 89.9 .D 800.0 4A.9 DE.9 B9.A 99.4 800.0

This was the initial re>uen%# ta,le or our "ata. 6an #ou spot the pro,lemG :ne %ensus tra%t onl# has three sur(e#s rom it. That is strange gi(en how man# sur(e#s are rom the other %ensus tra%ts. 2 #ou look up the #ellow sheet that we passe" out at the sur(e# 0or the ile 204A Team $ur(e# Areas.!ls1 it turns out that we "i" not assign 6ensus Tra%t A0A8.08. $o how "i" it get thereG F# assumption is that a%%i"entall# someone swit%he" the 8 an" the A when the# were entering "ata& ,ut how %an we make sure without going an" in"ing those raw sur(e# ormsG Crosstabs 0Anal#<e = .es%ripti(e $tatisti%s = 6rossta,s1 A se%on" use ul tool or in"ing anomalies in the "ata is 6rossta,s. Basi%all#& this tool lets #ou make a matri! o the re>uen%ies o two (aria,les. 'hen in 6rossta,s %li%k on -Blo%k Num,er* an" then %li%k on the arrow to put it into the rows ,o!. Then %li%k on -6ensus Tra%t* an" %li%k on the arrow to put it into the %olumns ,o!. Then %li%k :?. The resulting ta,le shows that the three re%or"s or the m#sterious A08A.08 tra%t are all in a ,lo%k num,er that appears or ,oth %ensus tra%ts A08A.08 an" A08A.02& ,ut not the other tra%ts in our sur(e#. This makes me eel prett# %on i"ent that it was 7ust a trans%ription error. 2 am hoping that the 08 at the en" meant that the# shoul" ha(e ,een %o"e" as A08A.08 an" not A08A.02 an" 2 am making a "e%ision to %hange all three %ases marke" A0A8.08 to A08A.08 ,# han" in the -.ata /iew.* Tr# this an" then sa(e #our "ata. Repeat the Fre>uen%# sorting an" see what #ou in". Now& take the re>uen%ies or other (aria,les an" see i #ou %an in" an# pro,lems. 2 #ou think #ou ha(e oun" something& use the 6rossta, un%tion to see i it %an help she" light on what

URBP 204A

Fall 2009

Greg Newmark

happene" an" how we might i! it. 'hen #ou in" an error& or something weir"& put #our han" up an" Greg will %all on #ou an" we will "is%uss how to approa%h the issue. Greg will enter the %lass "e%ision into the %leane" "ata set whi%h will ,e use" ,# #ou or the Term Pro7e%t. Data Recoding $ometimes the wa# that "ata is %o"e" in a sur(e# is not so use ul or what #ou might ,e intereste" in. A (er# important tool that $P$$ o ers is the a,ilit# to re%o"e "ata. For e!ample& sa# #ou are intereste" ages o the population. 2 #ou took the re>uen%ies o the raw "ata& this is what #ou woul" getH

Age of the respondent in categories Fre>uen%# /ali" 8E to C4 +ears CA to A4 +ears AA to B4 +ears BA +ears or :l"er 6an not %hoose I Re use" Total Fissing Total $#stem 8AE 8E9 9A 2A D 4BC E 4E8 Per%ent C2.E C9.C 89.E A.2 8.2 9E.C 8.B 800.0 /ali" Per%ent CC.4 40.0 20.8 A.C 8.C 800.0 6umulati(e Per%ent CC.4 BC.4 9C.4 9E.B 800.0

Remem,er that #ou %an alwa#s re%o"e or"inal (aria,les into nominal (aria,les. $o sa#& 2 was intereste" in how ol"er people are "i erent rom #ounger people. 2 might %hoose to %om,ine %ategories so that 2 ha" one that went rom 8E to A4 an" another that went rom AA an" up.

URBP 204A

Fall 2009

Greg Newmark

Recode into Different Variables 0Trans orm = Re%o"e into .i erent /aria,les1 $ele%t the (aria,le #ou want to re%o"e 0in our %ase it is -Age o Respon"ent*1 an" %li%k the ,lue arrow. Now on the right si"e o the ,o!& enter a name or the new (aria,le& sa# -AG9GR:UP&* an"& or goo" measure& gi(e it a la,el& sa# -Two Age Groups.* Then %li%k -6hange.* Now %li%k -:l" an" New /alues* an" ,ehol" a new win"ow. Now 2 %an look at m# sur(e# to see that whi%h (alues 2 want to re%o"e. This is what the sur(e# >uestion sai". 2. hat is your age! 8 J 8E to C4 #ears 2 J CA to A4 #ears C J AA to B4 #ears 4 J BA #ears or ol"er B J 6an not %hooseIRe use" Now& 2 want to re%o"e the 8 an" 2 as a new num,er& sa# 0& an" the C an" 4 as a new num,er& sa# 8. $o or -:l" /alue* t#pe 8 an" or -New /alue* t#pe 0& then %li%k -A"".* Then "o it again. For -:l" /alue* t#pe 2 an" or -New /alue* t#pe 0& then %li%k -A"".* +ou get the pi%ture. .o the same thing or C an" 4 ,ut gi(e them the new (alue o 0. Now& what "o we "o with the anno#ing people that woul" not share their ageG 'e %an "ump them. For -:l" /alue* t#pe B an" or -New /alue* %li%k the -$#stem Fissing* ra"io ,utton& then %li%k -A"".* +our s%reen shoul" now look like this.

6li%k -6ontinue* an" then %li%k -:?.* Now i #ou go to the /aria,le /iew an" s%roll to the ,ottom& #ou will see the new (aria,le AG9GR:UP. Now& ,e ore we orget what we "i"& %li%k on the -/alue* %ell 0whi%h %urrentl# sa#s -None*1. An ellipsis will appear. 6li%k it. Now the -/alue 4a,els* "ialogue ,o! will open. For -/alue* t#pe in 0 an" or -4a,el* t#pe in -+oung 08E

URBP 204A

Fall 2009

Greg Newmark

; A41* an" then press a"". Repeat or -/alue* 8 whi%h we %an la,el -:l" 0AAK1*. Press -A""* an" #our s%reen shoul" show thisH

6li%k -:?* an" then take the re>uen%ies o this new (aria,le. +ou shoul" ha(e thisH

T"o Age #roups Fre>uen%# /ali" +oung 08E @ A41 :l" 0AAK1 Total Fissing Total $#stem C4B 820 4DB 84 4E8 Per%ent B2.8 24.9 9B.8 2.9 800.0 /ali" Per%ent B4.C 2A.B 800.0 6umulati(e Per%ent B4.C 800.0

Now& wh# "i" we re%o"e this as 0 an" 8. 'ell& that is a %on(enient wa# to %o"e a ,inar# nominal (aria,le so that we %an use it as a kin" o inter(al (aria,le. There is e(en a name or this approa%h& a -.umm# /aria,le.* .umm# (aria,les are when #ou ha(e onl# two possi,ilities an" #ou %o"e one option as a 8 0ha(ing that trait1 an" one option as a 0 0not ha(ing that trait1. The use ul part o this is that i #ou take the mean #ou get the per%entage o respon"ents who ha(e the trait. There ore& #ou %an use this approa%h to make T@tests an" AN:/A tests.

URBP 204A Select Cases 0.ata = $ele%t 6ases1

Fall 2009

Greg Newmark

This last tool lets #ou limit #our statisti%al %onsi"eration to %ertain groups. This %oul" ,e (er# use ul i #ou onl# are intereste" in sa# immigrants to the U$ or people who ha(e o(er a high s%hool e"u%ation. Basi%all#& #ou %an set the program to ilter out unwante" %ases. Lere*s how to ilter out all people who "o not ha(e a high s%hool "egree. Go to $ele%t 6ases an" then %li%k on the ra"io ,utton ,# -2 %on"ition is satis ie"* then sele%t -9"u%ation 4e(el* an" %li%k the ,lue arrow 0on m# %omputer the arrow is ,la%k1. Now& we know that we %o"e" high s%hool gra"uates with a C. $o 2 will sele%t all the %ases that are greater than or e>ual to three. F# s%reen 0in $P$$ 8C1 looks like thisH

0+ou %an "o an%# ilters with this eature& parti%ularl# i #ou use the ampersan" %omman" to %onne%t multiple %on"itions. 9!periment aroun" a ,it i #ou want to look at a spe%i i% group. For e!ample& how %oul" #ou o%us on men ,orn in Fe!i%o who ha(e ,een in the states or less than two #earsG1 Now 2 %li%k %ontinue an" m# s%reen looks like thisH

URBP 204A

Fall 2009

Greg Newmark

M Noti%e that m# unsele%te" %ases are not erase"& 7ust iltere". 2 #ou want to "o all o #our work on a spe%i i% population& it might make sense to %li%k on the -.elete"* ra"io ,utton to %lear them out o #our sample. Now& when #ou go ,a%k to the -.ata /iew* #ou will see a "iagonal line through all the unsele%te" "ata row num,ers on the le t.

'hen #ou "o #our anal#ses& these %ases will not ,e in%lu"e". +ou %an return to sele%t %ases an" %li%k on the -All 6ases* ra"io ,utton i #ou want to ,ring those %ases ,a%k into %onsi"eration.

You might also like