You are on page 1of 8

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!

es &a$son fro' (ario!s Internet )e"sites.

Definition of Statisti$al *er's


Standard Deviation
)+A* IS S*A&DA,D DE-IA*I.&/ *0e standard de(iation is t0e 'ost fre%!ently $al$!lated 'eas!re of (aria"ility or dispersion in a set of data points. *0e standard de(iation (al!e represents t0e a(era1e distan$e of a set of s$ores fro' t0e 'ean or a(era1e s$ore.

Standard deviation and the normal curve


Kno2in1 t0e standard de(iation 0elps $reate a 'ore a$$!rate pi$t!re of t0e distri"!tion alon1 t0e nor'al $!r(e. A s'aller standard de(iation represents a data set 20ere s$ores are (ery $lose to t0e 'ean s$ore 3a s'aller ran1e4. A data set 2it0 a lar1er standard de(iation 0as s$ores 2it0 'ore (arian$e 3a lar1er ran1e4. 5or e6a'ple, if t0e a(era1e s$ore on a test 2as 80 and t0e standard de(iation 2as 2, t0e s$ores 2o!ld "e 'ore $l!stered aro!nd t0e 'ean t0an if t0e standard de(iation 2as 10.

Figure 1. The normal curve. Standard deviation is a constant interval from the mean. Roll the mouse over the curve to discover the percentage each portion represents.

Calculating the standard deviation


*0e fi1!re "elo2 displays t0e for'!la for $al$!latin1 t0e standard de(iation. 3It is '!$0 easier t0an it loo7s84

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

S 9 standard de(iation : 9 s!' of ; 9 indi(id!al s$ore M 9 'ean of all s$ores n 9 sa'ple si<e 3n!'"er of s$ores4

*0e "est 'et0od to $al$!late t0e standard de(iation "y 0and is to $reate a or1ani<ed $0art to perfor' ne$essary e%!ations. It is ne$essary first to $o'p!te t0e 'ean. X 1 2 = > ? Total () M = = = = = (X-M) -2 -1 0 1 2 0 (X-M)2 > 1 0 1 > 10

*a7e noti$e of se(eral 7ey points re1ardin1 t0e $al$!lation of t0e standard de(iation. 5irst, t0e s$ore 'in!s t0e 'ean 1rand total 3t0ird $ol!'n4 s0o!ld A@)AYS e%!al <ero. *0is is a 1ood $ross-$0e$7 to ens!re t0at t0e 'ean 0as "een $orre$tly $al$!lated. Se$ond, t0e p!rpose of s%!arin1 t0e de(iations is to eli'inate t0e ne1ati(e (al!es so t0at t0eir 1rand total does not e%!al <ero. 5inally, t0e reason t0e deno'inator is n-1 is "e$a!se t0e standard de(iation is "ein1 $al$!lated for a sa'ple. S0o!ld t0e standard de(iation "e $al$!lated for a pop!lation, t0e deno'inator 2o!ld si'ply "e n.

Completing the calculation. Divide total squared deviations by n-1. That leaves 1 !". Ta#e the square root of $.%. The standard deviation equals 1.%&. 'Refresh bro(ser if calculation remains static.)

Kat0leen Aarlo SDSB Ed!$ational *e$0nolo1y

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

Descriptive Statistics True Mean! Con"idence #ntervals and $evels o" Signi"icance
Pro"a"ly t0e 'ost often !sed des$ripti(e statisti$ is t0e 'ean or a(era1e s$ore in a set of data. *0e 'ean is a parti$!larly infor'ati(e 'eas!re of t0e C$entral tenden$yC of t0e (aria"le 3set of s$ores4 if it is reported alon1 2it0 its $onfiden$e inter(als 3related to t0e (aria"ility a'on1 t0e s$ores4 . Bs!ally 2e are interested in statisti$s 3s!$0 as t0e 'ean4 fro' o!r sa'ple only 20en t0ey $an 0elp !s infer infor'ation a"o!t t0e pop!lation. *0e $onfiden$e inter(als for t0e 'ean 1i(e !s a ran1e of (al!es aro!nd t0e 'ean 20ere 2e e6pe$t to find t0e Ctr!eC 3pop!lation4 'ean 32it0 a 1i(en le(el of $ertainty4. 5or e6a'ple, if t0e 'ean in a sa'ple is 2=, and t0e lo2er and !pper li'its of t0e p9.0? $onfiden$e inter(al are 1D and 27 respe$ti(ely 32 standard de(iations on eit0er side of t0e 'ean !nder t0e E"ellF $!r(e4, t0en yo! $an $on$l!de t0at t0ere is a D?G pro"a"ility t0at t0e pop!lation 'ean is 1reater t0an 1D and lo2er t0an 27. If yo! set t0e p-le(el to a s'aller (al!e, t0en t0e inter(al 2o!ld "e$o'e 2ider t0ere"y in$reasin1 t0e C$ertaintyC of t0e esti'ate, and (i$e (ersa. *0is $on$ept is also !sef!l to !nderstand resear$0ers 20en t0ey point to Ele(els of si1nifi$an$eF "et2een t2o or 'ore 'eans. As 2e all 7no2 fro' t0e 2eat0er fore$ast, t0e 'ore C(a1!eC t0e predi$tion 3i.e., 2ider t0e $onfiden$e inter(al4, t0e 'ore li7ely it 2ill 'ateriali<e. &ote t0at t0e 2idt0 of t0e $onfiden$e inter(al depends on t0e sa'ple si<e and on t0e (ariation of data (al!es. *0e lar1er t0e sa'ple si<e, t0e 'ore relia"le t0e 'ean. As t0e (ariation in$reases, t0e 'ean "e$o'es less relia"le. *0e reportin1 of pollin1 res!lts is anot0er e6a'ple of a sa'ple statisti$ t0at is 'eanin1f!l in ter's of inferen$e to t0e pop!lation only 20en t0e $onfiden$e inter(als are defined. .n$e a1ain, t0e lar1er sa'ple 2ill yield a 'ore relia"le 'ean s$ore sin$e t0e (ariation 2ill "e s'aller. A s'aller pollin1 sa'ple 2ill yield a less relia"le 'ean d!e to t0e lar1er (ariation.

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

orrelation Analysis
orrelation is a 'eas!re of t0e relation "et2een t2o or 'ore (aria"les. orrelation $oeffi$ients $an ran1e fro' -1.00 to H1.00. *0e (al!e of -1.00 represents a perfe$t negative $orrelation 20ile a (al!e of H1.00 represents a perfe$t positive $orrelation. A (al!e of 0.00 represents a la$7 of $orrelation.

*0e 'ost 2idely-!sed type of $orrelation $oeffi$ient is Pearson r, also $alled linear or product- moment $orrelation. Pearson $orrelation deter'ines t0e e6tent to 20i$0 (al!es of t0e t2o (aria"les are CproportionalC to ea$0 ot0er. *0e (al!e of $orrelation 3i.e., $orrelation $oeffi$ient4 does not depend on t0e spe$ifi$ 'eas!re'ent !nits !sedI for e6a'ple, t0e $orrelation "et2een 0ei10t and 2ei10t 2ill "e identi$al re1ardless of 20et0er inches and pounds, or centimeters and kilograms are !sed as 'eas!re'ent !nits. Proportional 'eans linearly relatedI t0at is, t0e $orrelation is 0i10 if it $an "e Cs!''ari<edC "y a strai10t line 3sloped !p2ards or do2n2ards4.

>

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

*0is line is $alled t0e regression line or least squares line (related to Regression Analysis). As 'entioned a"o(e, t0e $orrelation $oeffi$ient 3r4 represents t0e linear relations0ip "et2een t2o (aria"les. If t0e $orrelation $oeffi$ient is s%!ared, t0en t0e res!ltin1 r2 (al!e 3$alled t0e $oeffi$ient of deter'ination4 2ill represent t0e proportion of $o''on (ariation in t0e t2o (aria"les 3i.e., t0e Cstren1t0C or C'a1nit!deC of t0e relations0ip4. ,e1ardless of t0e Estren1t0 or 'a1nit!deF of a $orrelation, it is ris7y and inappropriate to infer a $a!sal or $a!se-effe$t relations0ip "et2een t0e t2o (aria"les. So'eti'es a $orrelation 'ay "e Esp!rio!sIF t0at is, a $orrelation t0at is d!e 'ostly to t0e infl!en$es of Cot0erC (aria"les. 5or e6a'ple, t0ere is a $orrelation "et2een t0e total a'o!nt of losses in a fire and t0e n!'"er of fire'en t0at 2ere p!ttin1 o!t t0e fire. If 2e 2ere to infer a $a!sal relations0ip, t0en one 2o!ld say t0at fe2er fire'en 2o!ld res!lt in lo2er losses. +o2e(er, t0ere is a t0ird (aria"le 3t0e initial size of t0e fire4 t0at infl!en$es "ot0 t0e a'o!nt of losses and t0e n!'"er of fire'en. If yo! C$ontrolC for t0is (aria"le 3e.1., $onsider only fires of a fi6ed si<e4, t0en t0e $orrelation 2ill eit0er disappear or per0aps e(en $0an1e its si1n. *0e 'ain pro"le' 2it0 sp!rio!s $orrelations is t0at 2e typi$ally do not 7no2 20at t0e C0iddenC a1ent is. +o2e(er, in $ases 20en 2e do 7no2, resear$0ers $an !se partial correlations t0at $ontrol for 3or partial out4 t0e infl!en$e of spe$ified (aria"les. In t0e KEYS resear$0 t0e effe$ts of SES 2ere a$$o!nted for in e6a'inin1 t0e $orrelations "et2een t0e indi$ators and 'eas!res of st!dent a$0ie(e'ent. .

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

%actor &nal'sis
*0e p!rpose of fa$tor analysis is to dis$o(er patterns of relations0ips a'on1 'any (aria"les. In parti$!lar, it see7s to dis$o(er if t0e o"ser(ed (aria"les $an "e e6plained lar1ely or entirely in ter's of a '!$0 s'aller n!'"er of (aria"les $alled factors. It is a statisti$al pro$ed!re, in(ol(in1 $orrelation analysis, !sed to !n$o(er relations0ips a'on1 'any (aria"les. *0is allo2s n!'ero!s inter-$orrelated (aria"les to "e $ondensed into fe2er di'ensions, $alled fa$tors, or indi$ators as in KEYS 2.0. Many statisti$al 'et0ods are !sed to st!dy t0e relation "et2een independent and dependent (aria"les. 5a$tor analysis is differentI it is !sed to st!dy t0e patterns of relations0ip a'on1 'any dependent (aria"les, 2it0 t0e 1oal of dis$o(erin1 so'et0in1 a"o!t t0e nat!re of t0e independent (aria"les t0at affe$t t0e', e(en t0o!10 t0ose independent (aria"les 2ere not 'eas!red dire$tly. *0!s ans2ers o"tained "y fa$tor analysis are ne$essarily 'ore 0ypot0eti$al and tentati(e t0an is tr!e 20en independent (aria"les are o"ser(ed dire$tly. *0e inferred independent (aria"les are $alled factors. A typi$al fa$tor analysis s!11ests ans2ers to fo!r 'aJor %!estionsK 1. +o2 'any different fa$tors are needed to e6plain t0e pattern of relations0ips a'on1 t0ese (aria"les/ 2. )0at is t0e nat!re of t0ose fa$tors/ =. +o2 2ell do t0e 0ypot0esi<ed fa$tors e6plain t0e o"ser(ed data/ >. +o2 '!$0 p!rely rando' or !ni%!e (arian$e does ea$0 o"ser(ed (aria"le in$l!de/

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

Regression Analysis
*0e 'ost $o''on type of re1ression analysis is linear re1ression. *0ere are t2o 7inds of linear re1ressionK 14 si'ple linear re1ression, and 24 '!ltiple linear re1ressions 3also 7no2n as '!lti(ariate linear re1ression4. Si'ple linear re1ression is 20en yo! 0a(e one dependent (aria"le 3also 7no2n as an o!t$o'e, or response (aria"le4 and one independent (aria"le 3also 7no2n as a predi$tor or e6planatory (aria"le4. M!ltiple linear re1ressions are 20en yo! 0a(e one dependent (aria"le and t2o or 'ore independent (aria"les. .ne p!rpose of linear re1ression analysis is to predi$t a dependent (aria"le. S!ppose yo! 0a(e a data set $onsistin1 of t0e 1ender, 0ei10t and a1e of $0ildren "et2een t0e a1es of ? and 10 years. In si'ple linear re1ression, yo!r 1oal 'i10t "e to predi$t t0e 0ei10t of a $0ild, 1i(en 0is or 0er a1e. In '!ltiple linear re1ressions, yo! 'i10t 2ant to predi$t t0e 0ei10t of a $0ild 1i(en a1e and 1ender. In t0e KEYS 2.0 analysis, after t0e fa$tor analysis t0at 0elped !s identify t0e >2 indi$ators 3fa$tors4, 2e ran a series of re1ression analyses to deter'ine t0e e6tent to 20i$0 ea$0 indi$ator 2as $orrelated to t2o different 'eas!res of st!dent a$0ie(e'ent. 5or t0ose of yo! 20o 2o!ld li7e to 7no2 a "it 'ore a"o!t re1ression analysis, read on. *0e linear re1ression 'odel is a 'at0e'ati$al e%!ation for a line. *0e para'eters of t0e e%!ation are esti'ated !sin1 'at0e'ati$al for'!las t0at are applied to t0e data set of 1ender, 0ei10t and a1e of t0e $0ildren a1es ?-10. In ot0er 2ords, t0e linear re1ression 'odel is EfittedF to t0e sa'ple data. *0is $an "e (is!ali<ed as a s$atter plot 2it0 a line r!nnin1 t0ro!10 it. *0e re1ression analysis pro$ed!re finds t0e line t0at "est fits t0e data. *0e re1ression analysis pro$ed!re tests t0e n!ll 0ypot0esis t0at t0e slope para'eter of t0e independent (aria"le is 0 (ers!s t0e alternati(e 0ypot0esis t0at t0e slope para'eter is different t0an 0. If t0e p-(al!e for t0e test is less t0an 0.0? 3le(el of si1nifi$an$e4, t0e n!ll 0ypot0esis is reJe$ted and it is $on$l!ded t0at t0ere is a statisti$ally si1nifi$ant asso$iation "et2een t0e dependent (aria"le and t0e independent (aria"le. In t0at $ase, t0e 'odel 'ay "e !sed to 'a7e predi$tions of t0e dependent (aria"le. Also, t0e slope para'eter $an "e interpreted as Et0e a'o!nt of $0an1e in t0e a(era1e of t0e dependent (aria"le for a one-!nit in$rease in t0e independent (aria"le.F Bsin1 t0e e6a'ple a"o(e, s!ppose t0e slope para'eter for a1e 2as =.?, and ass!'e 0ei10t 2as 'eas!red in in$0es. *0e interpretation of t0e slope for a1e isK t0e a(era1e 0ei10t of a $0ild is e6pe$ted to in$rease "y =.? in$0es for ea$0 additional year of a1in1.

Prepared for KEYS 2.0 Data Analysis and Interpretation, May 8-11, 2007 !lled "y #a$%!es &a$son fro' (ario!s Internet )e"sites.

%re(uenc' ta)les
5re%!en$y or one-2ay ta"les represent t0e si'plest 'et0od for analy<in1 $ate1ori$al data. *0ey are often !sed as one of t0e e6ploratory pro$ed!res to re(ie2 0o2 different $ate1ories of (al!es are distri"!ted in t0e sa'ple. 5or e6a'ple, in a s!r(ey of parents interested in parti$ipatin1 in a s$0ool e(ent, 2e $o!ld s!''ari<e t0e respondentsM interest in a fre%!en$y ta"le as follo2sK
ST&T#ST#C& *&S#C ST&TS Categor' &$.&/S , &l0a's interested 1S1&$$/ , 1suall' interested S2M+T#MS, Sometimes interested 3+4+5 , 3ever interested Missing School +vent, #nterest in -articipating

Count =D 1L 2L 1D 0

Cumulatv Cumulatv Count -ercent -ercent =D ?? 81 100 100 =D.00000 1L.00000 2L.00000 1D.00000 0.00000 =D.0000 ??.0000 81.0000 100.0000 100.0000

*0e ta"le a"o(e s0o2s t0e n!'"er, proportion, and $!'!lati(e proportion of respondents 20o $0ara$teri<ed t0eir interest in 2at$0in1 foot"all as eit0erK 314 Always interested, 324 sually interested, 3=4 !ometimes interested, or 3>4 "ever interested. In pra$ti$ally e(ery resear$0 3in$l!din1 a$tion resear$0 $ond!$ted "y s$0ool staff4 proJe$t, a first Cloo7C at t0e data !s!ally in$l!des fre%!en$y ta"les. 5or e6a'ple, if 2e 2ere to s!r(ey s$0ool parents, fre%!en$y ta"les $an s0o2 t0e n!'"er of 'ales and fe'ales 20o parti$ipated in t0e s!r(ey, t0e n!'"er of respondents fro' parti$!lar et0ni$ and ra$ial "a$71ro!nds, and so on. ,esponses on so'e la"eled attit!de 'eas!re'ent s$ales 3e.1., interest in (ol!nteerin1 in so'e s$0ool a$ti(ity4 $an also "e ni$ely s!''ari<ed (ia t0e fre%!en$y ta"le. !sto'arily, if a data set in$l!des any $ate1ori$al data, t0en one of t0e first steps in t0e data analysis is to $o'p!te a fre%!en$y ta"le for t0ose $ate1ori$al (aria"les.

You might also like