Lec 3 - Continuous Probabilities

1
ContinuousProbabilities:
NormalDistribution,Confidence
IntervalsfortheMean,andSampleSize
TheNormalDistribution
Normal(Gaussian)distribution:asymmetricdistribution,
shapedlikeabell,thatiscompletelydescribedbyits
meanandstandarddeviation.
o
2
Thereareaninfinitenumberofnormalcurves.
Tobeuseful,thenormalcurveisstandardized toamean
of0andastandarddeviationof1.
Thisiscalledastandardnormalcurve.
Tousethestandardnormalcurve,datamustfirstbe
convertedtozscores.
Zscore:atransformationthatexpressesdataintermsof
standarddeviationsfromthemean.
o
) ( ) (
=
=
x
z or
s
x x
z
ForsampleForpopulation
Forexample:Wehaveasamplethathasameanof8anda
standarddeviationof2.53.Whatisthezscoreofan
observationfromthisdatasetthathasavalueof13?
Therefore,avalueof13inthisdatasetis1.98standard
deviationsfromthemean.
Wecanusetheztabletofindouttheprobabilityofpicking
anumber>=13fromthisdataset.
98 . 1 976 . 1
53 . 2
8 13
or z =
=
Probabilityof<13Probabilityof13
13
3
StandardNormalProbabilities
TableA.2
Note!
p=(1 0.0239)=0.9761p=0.0239
97.6%chanceofpicking2.4%chanceofpicking
avalue<13avalue=>13
13
1.0or100%
Probabilitydensityfunctions(e.g.normaldistribution)areused
todeterminetheprobabilitiesthataneventwillandwillnot
occur.
Soforpickingavalue=>13:
97.6%chancethatitwillnotoccur.
2.4%chancethatitwilloccur.
Soifitisimprobablethataneventwilloccur(anda2.4%chance
ISimprobable),anditDOESthatisofinterest.
ConfidenceIntervalsabouttheMean
4
Anytimealargenumberofindependent,identically
distributedvariablesaresummed,thesumwillhave
anormaldistribution.
Independent meansthatoneobservationdoesnot
influencethevalueofanotherobservation.
Identicallydistributed meansthateachobservation
isfromthesamefrequencydistribution.
Soifwetakemanysamplesandcomputemanymeans,
theaverageofthosemeanswillbeclosetothetrue
mean.
ExperimentalDataSet:
1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5,6,6,6,6,7,7,7,8,8,9
Forthesedatathetruemeanis5andthetrueSDis2.
2
25
100
1 26
) 5 9 ( ) 5 2 ( ) 5 1 (
1
) (
5
26
130
26
2 2 2 2
= =
+
=
=
= = =
=
n
x x
s
n
x
x
n
i
i
Ifwetakeasampleofthesedata,themeanofthatsampleshouldbeclose
tothetruemean(5).
SampleData:8,7,6,6,6,5,5,4,3,3,3,1
Ex
i
=57
n=13
n
x
x
n
i
i
=
=
1
43 . 4
13
57
=
Ifwecontinuetotakesamplesfromthisdataset,wewillhavea
datasetofsamplemeans.
Thedatasetofsamplemeansshouldbenormallydistributed.
Asthenumberofsamplemeansincreases,themeanofthe
samplemeanswillapproachthetruemean.
Timeforthenumberedchitexperiment
5
Basedonthenormalityofrandomsamplemeanswecan
constructconfidenceintervalsaboutasamplemean.
Inanormaldistribution,95%ofthedatafallwithin1.96
(approx.2)standarddeviationsfromthemean.
95%
Thisimpliesthat95%ofthetimethesamplemeanlieswith
+or 1.96standarddeviationsfromthetruemean.
Wecancalculatethisrangeusingtheequation:
whereistakenfromTableA.2.
Byconventionweuseeither95%(z=1.96)or99%(z=
2.58).
o
z
o
o o
=
(
|
|
.
|
\
|
+ s s
|
|
.
|
\
|
1
n
s
z x
n
s
z x pr
Sofortheexampledatasetthe95%confidenceintervals
wouldbe:
However,thenormaldistributioncanonlybeusedwhenthe
samplesizeislarge
( ) ( ) | | 77 . 5 23 . 4
% 95
26
2
96 . 1 5
26
2
96 . 1 5
s s
=
(
|
.
|
\
|
+ s s |
.
|
\
|

pr
pr
Forsmallish samplesizesweusethetdistribution.
Tdistribution:asymmetricdistribution,morepeakedthan
thenormaldistribution,thatiscompletelydescribedbyits
meanandstandarddeviationforkdegreesoffreedomordf
(wewilldiscussthisterminmoredetaillater).
Thedf forconfidenceintervalsisn1.Soforourexample
thedf =261=25.
Usea2tailedprobabilityof0.05(1 0.95).
6
Again,weusethe2tailedvaluessinceweare
calculatingconfidenceintervalsthatlieaboveand
belowthemean.
( ) ( ) | | 83 . 5 18 . 4
% 95
25
2
060 . 2 5
25
2
060 . 2 5
s s
=
(
|
.
|
\
|
+ s s |
.
|
\
|

pr
pr
Thereforethecalculationforthetdistributionis:
( ) ( ) | |
( ) ( ) | | on distributi t for pr
on distributi normal for pr
83 . 5 18 . 4
77 . 5 23 . 4
s s
s s
Notethatthetdistributionismoreconservative (wider)
thanthenormaldistributionforsmallsamplesizes.
4.18
5.83
InSPSS
SPSSallowsyoutocalculateanyconfidenceintervalbutdefaults
to95%intervals.
SPSSusestheequation:
whichisequivalenttothetdistributionconfidenceintervals.So
checkingyourworkwithSPSSisonlygoodwhencalculatingthe
tdistributionconfidenceintervals.
) 1 , 1980 (
) , 05 . 0 ; 2 / (
) , 5 . 0 ; 2 / 1 ( ) , 05 . 0 ; 2 / (
Table Owen and Odeh from is g where
sd g x p sd g x
d
d d
o
o o
+ s s +

7
SampleSizeandTestPower
SampleSizeandEstimatingPopulationParameters
Thequestionoftenarisesconcerninghowmanysamplesare
neededorwhatistheminimumsamplesize?
Manybooksstatethat30samplesaretheminimumto
confidentlyperformastatisticalanalysis.
However,theminimumnumberofsamplesisrelatedtothe
conceptofprecisionandminimumdetectibledifference:
IfyourmeasurementsareinF,smallersamplesizesmay
onlyallowonetodetectdifferenceswithaprecisionof
5,whilelargesamplesizesmayallowfordetectionto
lessthan0.5.
Thepowertodetect
differencesisnotalinear
functionofsamplesize.
Alsonotethatinthiscase
afterabout20samples
thepowertodetecta
differenceincreases
veryslowly.
Gatheringmorethan20
samplesinthiscaseis
aprobablywasteoftime.
Source:Mimna,2008
) (
2
), 1 ( ,
2
v v
p
t t
n
s
| o
o + >
Theminimumsamplesizerequiredtodetectadifferenceat
aspecificprecisionlevelcanbeestimatedusingtheequation:
where isthedetectibledifference,s
2
p
isthesamplevariance,
nisthesamplesize,andt
o,v
andt
(1),v
aretheprecision
parameterstakenfromthetdistributiontable.
Ifthisequationissolvedseveraltimesforvarioussamplesizes
thenasamplesizefunctioncurvecanbecreated.
8
Forexample,weareinterestedinthemeanagedifference
betweenmalesandfemalesinNewJerseycounties.Wewant
tobe90%sureofdetectingadifferenceatasignificancelevel
(moreonthislater)of0.05.Thes
2
fortheentiredatasetis
5.522.
Average
MaleAge
Average
FemaleAge
36.6 37.5
33.5 36.0
37.7 40.3
36.7 38.4
36.9 38.7
33.1 36.1
32.3 35.0
38.2 39.3
36.3 38.0
35.3 37.9
34.5 36.8
36.6 38.8
34.6 37.3
35.8 38.4
39.0 42.9
34.4 37.1
35.2 37.0
36.8 39.1
35.9 37.9
34.1 37.4
40.7 43.8
Fromthistablewechoosevaluesboundingour90%confidence
level(1 0.90or0.10)atoursamplesize(42).Sinceweare
calculatingupperandlowerintervals,usethe2tailed
probabilities.
Noticethatallwearechanging
isthesamplesize,andthatthe
rateofchangeintheminimum
detectibledifferencedecreases.
Sowithasampleof5wecanonly
detectdifferencesof> 4.93years.
Withasampleof100wecannow
detectdifferencesdownto> 1.1
years.
years
years
years
years
years
years
1 . 1 ) 302 . 1 018 . 2 (
100
) 522 . 5 ( 2
56 . 1 ) 302 . 1 018 . 2 (
50
) 522 . 5 ( 2
48 . 2 ) 302 . 1 018 . 2 (
20
) 522 . 5 ( 2
85 . 2 ) 302 . 1 018 . 2 (
15
) 522 . 5 ( 2
49 . 3 ) 302 . 1 018 . 2 (
10
) 522 . 5 ( 2
93 . 4 ) 302 . 1 018 . 2 (
5
) 522 . 5 ( 2
= + =>
= + =>
= + =>
= + =>
= + =>
= + =>
o
o
o
o
o
o
Fromthettable.
Soifweneedtodetectadifferenceofatleast2years,wed
havetohaveasamplesizeofabout32.
9
Notethatthecurvesforthetwoexamplesaredifferentthere
isnohardandfastminimumsamplesizenumber.
Mimna(2008) Marr(example)
Astheconfidencelimitsare
relaxed,theminimumsample
sizedecreases.Sointhiscase,
ifwewantedaminimum
detectiblevalueof1.5,at
50%confidencewewould
needasampleof22,at75%
confidencewewouldneeda
sampleof40,andat95%
confidencewewouldneeda
sampleof72.
Minimumsamplesizeisafunctionof:
Theprecision(inmeasuredunits,e.g.years,F,etc).
Theconfidencelevelrequired.
Thevarianceofthedataset.
Ultimatelythough,theminimumsamplesizeshould be
basedonourresearchprecisionrequirements.

Lec 3 - Continuous Probabilities

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 3 - Continuous Probabilities

Uploaded by

Copyright:

Available Formats

1

You might also like