You are on page 1of 69

ChapterThirteen

InferenceaboutComparingTwo
Populations
721
InferenceabouttheDifferencebetweenTwo
Means
InChapter9(SamplingDistributions)westudied
theproblemofcomparingthemeanoftwo
populations.
Forexample,supposewewanttocomparethe
averageincomeforcollegegraduatesandcollege
dropouts.Let
X
1
=Incomeforacollegegraduate
X
2
=Incomeforacollegedropout
Andlet
1 1
and
2 2
Supposetheparameterofinteresttousis
1 2
722
Supposewehaveaniid sampleforX
1
:
1
i
i=1,,n
1
(thesamplesizeforX
1
isdenotedbyn
1
).
Andwehaveaseparateiid sampleforX
2
:
2
i
i=1,,n
2
(thesamplesizeforX
2
isdenotedbyn
2
).
Thesamplesizescanbedifferent.Thatis,we
mayhave:
1 2
Finally,assumethatbothsamplesare
independentofeachother.
723
Thecorrespondingsamplemeansare:
1
1
n
1
1
i
n
1
=1
and
2
1
n
2
2
i
n
2
=1
Then,thestatisticofinterestisthedifference
betweenthesemeans.Thatis,
1 2
Whatistheexpectedvalueof
1 2
?We
alreadyknowthat
1
=
1
and
2
=
2
Therefore,
1 2 1 2 1 2
724
Next,whatisthevarianceof
1 2
?
Independencebetweenbothsamplesmeans
that
1 2
Next,recallfromourpreviouslecturesthat,ifZ
1
andZ
2
aretworandomvariableswithzero
covariance,then
1 1 2 2 1
2
1 2
2
2
Therefore,denotingV(X
1
)=
1
2
andV(X
2
)=
2
2
,
1 2 1 2

1
2
1

2
2
2
725
Insummary,
1 2 1 2
and
1 2

1
2
n
1

2
2
n
2
Whataboutthesamplingdistribution of
1 2
?
Thesametypeofresultsforasinglemean extendto
thiscase:
Case1. IfX1andX2arebothNormallydistributed,then
1 2
isexactlyNormallydistributedas
1 2

1
2
1

2
2
2
Case2. Otherwise,bytheCentralLimitTheorem,the
abovedistributionholdsapproximately,andthis
approximationismoreaccurateifn
1
andn
2
are
relativelylarge.
726
Thus,ifboth
1
2
and
2
2
wereknown,all
inferenceon
1 2
wouldbebasedonthe
statistic
1 2 1 2

1
2
1

2
2
2
whichwouldbeeitherexactlydistributedasa
StandardNormal(ifXisNormallydistributed),
orapproximatelydistributedasaStandard
Normal(byvirtueoftheCentralLimitTheorem).
727
Herewefocusonthemorerealisticcase
whereboth
1
2
and
2
2
areunknown.
Inthissetting,theconstructionofthetest
statisticanditsdistributiondependontwo
possiblecases:
CaseI:
1
2

2
2
CaseII:
1
2

2
2
Weexamineeachcaseseparately.
728
CaseI:Inferencefor
1 2
when
1
2

2
2
Ifwemaintainthat
1
2

2
2
,inferenceon
1 2
is
basedonthefollowingstatistic:
1 2 1 2
s
p
2
1 2
where
s
p
2
1
s
1
2
2
s
2
2
1 2
s
p
2
iscalledthepooledvarianceestimator.Itisvalidif

1
2
=
2
2
.
IfbothX
1
andX
2
areNormallydistributed,thenthe
statistictdescribedaboveisdistributedasaStudentt
randomvariablewith
1 2
degreesoffreedom.
729
Aswehavedonepreviously,wewillmaintain
thisdistributionastheapproximate
distributionfortevenifXisnotNormally
distributed,keepinginmindthatitwouldhold
onlyapproximately,andthattheaccuracyof
thisapproximationdependsonhowmuchthe
distributionofXdiffersfromNormal,andon
thesamplesize.
730
Fromhere,if
1
2

2
2
, aConfidenceInterval
for
1 2
with coverage
probabilityisgivenby
X

1
- X

2
- tu
2
, ,
s
p
2

1
n
1
+
1
n
2
, X

1
- X

2
+ tu
2
, ,
s
p
2

1
n
1
+
1
n
2
where

1 2
731
Hypothesistestingisdoneasbefore:
Fixasignificancelevel andlet

1 2
Ourrejectionrulesare:
RejectH
0
:
1

2
=
1
*
2
* infavorofH
1
:
1

2
>
1
*
2
* if
t>t
,
RejectH
0
:
1

2
=
1
*
2
*infavorofH
1
:
1

2
<
1
*
2
*if
t< t
,
RejectH
0
:
1

2
=
1
*
2
*infavorofH
1
:
1

2

1
*
2
*if
|t| >t
/2,
732
Pvaluesarealsoobtainedaspreviously
LetT beatrandomvariablewithdegreesoffreedom
givenby:

1 2
Andlettbethevalueobtainedforourteststatisticin
thedataobserved.Then:
IfH
0
:
1

2
=
1
*
2
* vs. H
1
:
1

2
>
1
*
2
*.
pvalue=
IfH
0
:
1

2
=
1
*
2
* vs. H
1
:
1

2
<
1
*
2
*.
pvalue=
IfH
0
:
1

2
=
1
*
2
* vs. H
1
:
1

2

1
*
2
*.
pvalue
733
CaseII:Inferencefor
1 2
when
1
2

2
2
If
1
2

2
2
,weemployadifferentformulaforourtest
statistic.Let
1 2 1 2
s
1
2
1
s
2
2
2
now,evenifXisNormallydistributed,twillnotbeexactly
Studenttdistributed.However,itisapproximatelydistributed
asaStudenttwithdegreesoffreedomgivenby:

s
1
2
1
s
2
2
2
2
s
1
2
1
2
1
s
2
2
2
2
2
734
Fromhere,if
1
2

2
2
, aConfidenceIntervalfor
1 2
with coverageprobabilityisgiven
by
X

1
-X

2
-to
2
,

s
1
2
n
1
+
s
2
2
n
2
, X

1
-X

2
+to
2
,

s
1
2
n
1
+
s
2
2
n
2
where

s
1
2
1
s
2
2
2
2
s
1
2
1
2
1
s
2
2
2
2
2
735
Hypothesistestingisasinthecase
1
2

2
2
,
keepinginmindthatnowtheteststatisticis
givenby
1 2 1 2
s
1
2
1
s
2
2
2
andthenumberofdegreesoffreedomis

s
1
2
1
s
2
2
2
2
s
1
2
1
2
1
s
2
2
2
2
2
736
Howdoweproceedinpractice?
OK,soitalldependsonwhether
1
2

2
2
OR

1
2

2
2
.
Therefore,beforedecidingwhetherweshoulduse
theformulasforCaseI(
1
2

2
2
)orCaseII
(
1
2

2
2
),weshouldfirsttestthehypothesis:


1
2

2
2
against
1

1
2

2
2
Wecanreexpressthisas:


1
2

2
2
against
1

1
2

2
2
737
Testing


1
2

2
2
against
1

1
2

2
2
Lateron(Section13.4)wewilldescribehowtodo
inferencefortheratio

1
2

2
2
.Rightnow,wefocusonthe
problemoftesting:
0

1
2

2
2
against
1

1
2

2
2
Forthis,weuseasateststatistic:
s
1
2
s
2
2
Underthenullhypothesis, thestatisticF hasanF
distributionwith
1 1
degreesoffreedomin
thenumeratorand
2 2
degreesoffreedomin
thedenominator.
738
Intuitively(aswehavereasonedinpreviouscases),
weshouldrejectH
0
infavorofH
1
ifeither
0
or
L
.
Thecriticalvaluesc
U
andc
L
areobtainedtosatisfya
targetsignificancelevel.
Weshouldthereforehave:
U
u
2
,
1
,
2
and
L 1-
u
2
,
1
,
2
Therefore,weshouldreject


1
2

2
2
infavor
of
1

1
2

2
2
ifeither
u
2
,
1
,
2
or
1-
u
2
,
1
,
2
where
1 1
and
2 2
739
Regardingthecriticalvalues,recallfromour
analysisoftheFdistributionthat
Itfollowsthenthat
1-
u
2
,
,
1
,
2
u
2
,
,
2
,
1
TheaboveformulaisusefulwhenweuseTable6
inAppendixB.
740
Therefore,weshouldreject
0

1
2

2
2
infavor
of
1

1
2

2
2
ifeither
u
2
,
1
,
2
or
1
Fu
2
,
,
2
,
1
where
1 1
and
2 2
SinceIwillgiveyouTable6toworkwith,thisis
theformulayoumustuseintheFinalExam.
741
Therefore,doinginferenceon
1 2
isatwostep
procedure:
STEP1:Test
0

1
2

2
2
against
1

1
2

2
2
.
STEP2:
a) IfH
0
isrejectedinStep1,thenproceeddoing
inferenceon
1 2
undertheassumptionthat

1
2

2
2
.
b) IfH
0
cannotberejectedinStep1,thenproceed
doinginferenceon
1 2
undertheassumption
that
1
2

2
2
742
Example:Intworandomsamplesofsize12
each,drawnfromtwoNormalpopulations,we
foundthefollowingstatistics:
1
,
2
,
1
and
2
Test,atsignificancelevel=5%,whetherwe
caninferthatthepopulationmeansdiffer.
Answer:Sincetheyareonlyaskingwhether
thepopulationmeansdiffer,thismeansthat
weneedtotest
H
0
:
1

2
=0 against H
1
:
1

2
0
743
STEP1. Wefirstneedtofindoutwhether

1
2

2
2
OR
1
2

2
2
.Tothisend,wetest:
0

1
2

2
2
against
1

1
2

2
2
.
Ourteststatisticis
s
1
2
s
2
2
2
2
Weshouldreject
0

1
2

2
2
infavorof
1

1
2

2
2
ifeither
o
2
,
1
,
2
or
1
P
o
2
, ,
2
,
1
where
1 1
and
2 2
744
GoingtoTable6,wehave:
u
2
,
,
1
,
2
.25,11,11
Since
1
=
2
,wealsohave:
u
2
,
,
2
,
1
.25,11,11
Weshouldreject
0

1
2

2
2
infavorof
1

1
2

2
2
if
either
or
1
3.47
Since ,wefailtoreject


1
2

2
2
.
STEP2. FromourresultsinStep1,wetest
H
0
:
1

2
=0 against H
1
:
1

2
0
undertheassumptionthat
1
2

2
2
745
Therefore,ourteststatisticis:
1 2
s
p
2
1 2
where
s
p
2
1
s
1
2
2
s
2
2
1 2
18
2
16
2
Thisyields:
746
Therejectionruleis:RejectH
0
:
1

2
=0infavorof
H
1
:
1

2
0if
|t| >t
/2,
,where =n
1
+n
2
2=22
GoingtoTable4,wehave:
t
.025,22
=2.074
Therefore,wefailtorejectH
0
:
1

2
=0.
LetThaveaStudenttdistributionwith22degreesof
freedom.Thepvalueofthetestisgivenby:
pvalue
ThereislittletonoevidenceinfavorofH
1
:
1

2
0.
747
Example:Intworandomsamples,drawnfrom
twoNormalpopulations,wefoundthefollowing
statistics:
1
,
2
,
1
and
2
with
n
1
=50andn
2
=45.
Estimatewith90%confidencethedifference
betweenthetwopopulationmeans.
STEP1. Onceagain,wefirstneedtotest
0

1
2

2
2
against
1

1
2

2
2
.
748
Ourteststatisticis
s
1
2
s
2
2
2
2
Weshouldreject
0

1
2

2
2
infavorof
1

1
2

2
2
ifeither
o
2
,
1
,
2
or
1
P
o
2
, ,
2
,
1
where
1
and
2
Sinceourtargetconfidencelevelis90%,itimplies
thatthesignificancelevelforthefirststeptestis
=10%
749
UsingExcel(thenumberofdegreesoffreedomis
outsidetherangeofTable6),wehave:
u
2
,
,
1
,
2
.5,49,44
u
2
,
2
,
1
=
.5,44,49
(thefunctioninExcelis=F.INV(.95,49,44)and
=F.INV(.95,44,49)).
Wereject
0

1
2

2
2
infavorof
1

1
2

2
2
if
either
o
2
,
1
,
2
or
1
P
o
2
, ,
2
,
1
Sinceourteststatisticis ,wereject


1
2

2
2
infavorof
1

1
2

2
2
.
750
STEP2. FromourresultsinStep1,wetest
H
0
:
1

2
=0 against H
1
:
1

2
0
undertheassumptionthat
1
2
=
2
2
.
Therefore,wemustuse:
t =
X

1
-X

2
- p
1
-p
2
s
1
2
n
1
+
s
2
2
n
2
andtreatitasatStudentrandomvariablewithnumber of
degreesoffreedom:
=
s
1
2
n
1
+
s
2
2
n
2
2
s
1
2
n
1
2
n
1
-1
+
s
2
2
n
2
2
n
2
-1
=
18
2
Su
+
7
2
4S
2
18
2
Su
2
Su -1
+
7
2
4S
2
4S -1
= 64.8 = 6S
751
Fromhere,if
1
2

2
2
, aConfidenceIntervalfor
1 2
with coverageprobabilityisgivenby
X

1
-X

2
-tu
2
,
,

s
1
2
n
1
+
s
2
2
n
2
, X

1
-X

2
+tu
2
,
,

s
1
2
n
1
+
s
2
2
n
2
GoingtoTable4inAppendixB,wehave
u
2
,
, .05,65
Thus,our90%ConfidenceIntervalfor
1 2
is:
Notethatthisconfidenceintervalincludes
1 2
=0.
752
Example:Aremajorleaguebaseballgamestakinglongeron
averagethan5yearsago?Toaddressthisquestion,a
statisticianrecordedtheamountoftime(inminutes)to
completearandomsampleofgames5yearsagoandthisyear.
Canweconcludethatgamestakelongertocompletethisyear
than5yearsago?Usea=1% significancelevel.
753
5YearsAgo ThisYear
169 153
160 182
174 162
161 190
187 163
172 189
177 171
187 197
153 159
169 180
161 197
194 178
Denote:

1
=averagelengthofgamesfiveyearsago.

2
=averagelengthofgamesthisyear.
Fromthewordingoftheproblem,thehypothesistest
problemis:
TestH
0
:
1

2
=0againstH
1
:
1

2
<0
Wehaven
1
=n
2
=12.
Wealsohave:
1
,
2
,
1
and
2
754
STEP1. Onceagain,wefirstneedtotest
0

1
2

2
2
against
1

1
2

2
2
.
Ourteststatisticis
s
1
2
s
2
2
2
2
Weshouldreject
0

1
2

2
2
infavorof
1

1
2

2
2
ifeither
o
2
,
1
,
2
or
1
P
o
2
, ,
2
,
1
where
1
and
2
755
FromTable6wehave
u
2
,
,
1
,
2
.005,11,11
So,weshouldreject
0

1
2

2
2
infavorof
1

1
2

2
2
ifeither
o
2
,
1
,
2
or
1
P
o
2
, ,
2
,
1
1
5.32
Since ,wefailtoreject
0

1
2

2
2
and
thereforeweshouldproceedassumingthat
1
2
=
2
2
.
756
Therefore,ourteststatisticis:
1 2
s
p
2
1 2
where
s
p
2
1
s
1
2
2
s
2
2
1 2
Thisyields:
757
Sinceouronesidedtestis:
H
0
:
1

2
=0againstH
1
:
1

2
<0,
therejectionruleis:
RejectH
0
:
1

2
=0infavorofH
1
:
1

2
<0if
t< t
,
where =
1 2
UsingTable4weobtain:
t
,
=t
.01,22
=2.508
Since isgreaterthan2.508,wefailto
rejectH
0
:
1

2
=0.Thereisnoevidenceinthedata
toinferthattheaveragegamelengthislongernow
than5yearsago.
758
Example:Inastudyofcreditcarduse,random
samplesweredrawnofcardholderswhoapplied for
thecreditcard(callthesetype1customers),and
creditcardholderswhowerecontacted by
telemarketersorbymail(callthesetype2
customers).Thetotalpurchasesmadebyeachover
thelastmonthwererecorded.Canweconcludefrom
thedatathatdifferencesexistonaveragebetween
thetwotypesofcustomers?
Let:

1
=averagemonthlypurchasersbytype1
customers.

2
=averagemonthlypurchasersbytype2
customers.
WewilltestH
0
:
1

2
=0againstH
1
:
1

2
0 .
Letususea5%significancelevel.
759
Inthedatawehave
1
,
2
,
1
,
2
,n
1
=n
2
=100.
STEP1. Asbefore,wefirsttestwhetherbothvariances
areequalornot:
0

1
2

2
2
against
1

1
2

2
2
.
Ourteststatisticis
s
1
2
s
2
2
2
2
Weshouldreject
0

1
2

2
2
infavorof
1

1
2

2
2
ifeither
o
2
,
1
,
2
or
1
Po
2
, ,
2
,
1
where
1

2
760
Thisisoutsidetherangeofdegreesoffreedomin
Table6.WecanuseExcelonceagainandobtain:
o
2
,
1
,
2
.025,99,99
=F.INV(.975,99,99)=1.486
Thisisthesameas
o
2
,
2
,
1
.Thereforeweshould
reject
0

1
2

2
2
ifeither
or
1
1.486
Since ,wereject
0

1
2

2
2
infavorof
1

1
2

2
2
,andweproceedwithourtestforH
0
:

1

2
=0assumingthat
1
2

2
2
761
Therefore,ourteststatisticis:
t =
X

1
-X

2
s
1
2
n
1
+
s
2
2
n
2
Which,underthenullhypothesis,hasanapproximateStudentt
distributionwithdegreesoffreedom:
=
s
1
2
n
1
+
s
2
2
n
2
2
s
1
2
n
1
2
n
1
-1
+
s
2
2
n
2
2
n
2
-1
=
S1.98
2
1uu
+
2S.99
2
1uu
2
S1.98
2
1uu
2
99
+
2S.99
2
1uu
2
99
= 19u.uS = 19u
Thevalueofourteststatisticis:
t =
1Su.92 -126.1S
S1.98
2
1uu
+
2S.99
2
1uu
= 1.16
762
Forthistwosidedtest,werejectH
0
:
1

2
=0in
favorofH
1
:
1

2
0if
|t| >t
/2, ,
where =190
UsingTable4,wegett
.025,190
=1.973.
Since|t|=1.16,wefailtorejectH
0
.Thereisnot
enoughevidenceinthedatatoinfer,ata5%
significancelevel,thatdifferencesexist,on
average,betweenthetwotypesofcustomers.
763
ExperimentalDatavs.ObservationalData
Sofar,wehavesimplycomparedthedifferenceinmeans
fortwoindependentsamples.Thedatawehaveusedso
farisassumedtobeobservationaldata,whichmeans
thatitwasgeneratedoutsidethecontrolofthe
researcher.
Ineconomics,sometimesweareinterestedinisolating
theeffectofsomecategoricalvariable(forinstance,
whetherornotanindividualfinishedcollege)often
referredtoastreatment onsomeoutcome variable(for
instance,income).
Inordertotrulyisolatetheeffectofthevariableof
interest,weshouldideallykeepeverythingelse
constant.Thisisthegoalofmatched experimental
data,whichisputtogetherbytheresearcherinorderto
isolateandidentifytheeffectofthetreatmentof
interest onanoutcomevariable.
764
Toachievethis,wematchpairsofobservations
fromtwodifferentsamples,wherethematchingis
basedoncategoriesthatideallykeepeverything
elseconstantandallowustoisolatetheeffectof
thevariableofinterest.
Theresultingdataiscalledamatchedexperiment,
whichcanresultfrommatchingobservationaldata,
ormatchingdatafromacontrolledexperiment
(thelatterbeingrareinEconomics,morecommon
inNaturalSciencedisciplinessuchasMedicine,
Biology,etc.).
Therefore,thematchedpairsofobservationsareno
longerindependent.Theparameterofinterestnow
istheaveragechangeintheoutcomevariable
owedtothecategoricalvariablewewantto
isolate.
765
Example: Supposewewanttoinvestigatewhether,allelse
equal,financeMBAmajorshavehighersalariesthan
marketingMBAmajors.Thatis,wewanttoisolatethe
effectofthiscategory(marketingvs.finance)on
salaries.
Ideally,weshouldmatchpairsofindividualswhoare
identicaltoeachotherinallrespectsexceptthatoneisa
marketingMBAmajorandtheotherisafinanceMBA
major.
Thisidealmatchingisoftenunfeasibleinpractice.Suppose
wematchpairsintermsoftheirGPAs(anaggregate
measuresupposedtocaptureallotherrelevant
determinantsofsalary).
Oncewematchpairsofobservations,wewillhave
effectivelymergedtwoindependentsamplesintoasingle
sample.
766
InExample13.5,thisisdonebysplittingGPAinto
25categories,thenmatchingpairsofindividuals
intoeachcategory.
Let:

1, 2,
denotethedifferenceinsalary
betweenthefinance(X
1
)andthemarketing(X
2
)
majorsforthei
th
pairinourpairmatched
sample.
Theparameterofinterestis:
1, 2, 1 2
Thefundamentaldifferencewithrespecttoour
previoussettingisthat
1,|
and
2,|
arenot
independentofeachother,sincetheywere
specificallymatchedtoeachother.
767
Wenowhaveasinglesample,wherethe
variableofinterestis
1, 2,
.Wecan
proceedaswedidinChapter12,wherewe
studiedinferenceonthepopulationmeanfor
asinglepopulation.
Theparameterofinterestis:
1, 2,
Let

denotethesizeofourpairwise
matchedsample.Thecorrespondingsample
meancanbedenotedas

1, 2,
n
D
=1
768
Thesamplestandarddeviationwouldthenbe:

1, 2,
2
n
D
=1
Then,ourrelevantstatistic (bothfortestingand
ConfidenceIntervalconstruction)wouldbethe
sameasinChapter12.Namely:

whichwouldbeapproximatelydistributedasa
Studenttwith

1degreesoffreedom.
769
Forinstance,ifwewantedtotestthe
assertionthat,allelseequal,financemajors
earnmorethanmarketingmajors,wewould
test:
0
against
1
Ourteststatisticisthen:

AndwewouldrejectH
0
infavorofH
1
if
t>t
,n
D
1
770
AConfidenceIntervalfor

with(1)%
coverageprobabilitywouldbeconstructedas:

t
/2,n
D
1


t
/2,n
D
1

771
Example13.5:Comparingthemeansalaryoffinance
vs.marketingMBAmajors.
Let

1
=AveragesalaryofFinanceMBAmajors.

2
=AveragesalaryofMarketingMBAmajors.
Ourgoalistofindoutwhetherwecaninferthat

1
>
2
Soourgoalistotest:H
0
:
1

2
=0vs.H
1
:
1

2
>0.
Letususeasignificancelevelof5%
Wewillcontrasttheresultsobtainedfrom
observationaldata vs.theresultsobtainedfrom
pairwisematchedexperimentaldata.
772
A) Testingusingtwoindependentsamples.
Suppose50randomlysampledrecentMBAgraduatesare
sampled,halfofwhomareFinancemajorsandhalfofwhom
areMarketingmajors(son
1
=n
2
=25).
So,whatwehaveistwoindependentsamples,oneforX
1
(Financemajorsalaries)andanotheroneforX
2
(Marketing
majorsalaries).
773
Finance 61228 51836 20620 73356 84186 79782 29523 80645 76125 62531 77073 86705
70286 63196 64358 47915 86792 75155 65948 29392 96382 80644 51389 61955 63573
Mkting 73361 36956 63627 71069 40203 97097 49442 75188 59854 79816 51943 35272
60631 63567 69423 68421 56276 47510 58925 78704 62553 81931 30867 49091 48843
Inthedatawehave
X

1
= 6S,624,X

2
= 6u,42S,
s
1
= 18,98S ,s
2
= 16,19S ,n
1
=n
2
=25.
STEP1. Asbefore,wefirsttestwhetherbothvariancesare
equalornot:E
0
:

1
2

2
2
= 1 againstE
1
:

1
2

2
2
= 1.
Ourteststatisticis
F =
s
1
2
s
2
2
=
18,98S
2
16,19S
2
= 1.S74
WeshouldrejectE
0
:

1
2

2
2
= 1 infavorofE
1
:

1
2

2
2
= 1 if
either
F > Fo
2
,
1
,
2
orF <
1
Po
2
, ,
2
,
1
where
1
=
2
= 2S -1 = 24
UsingExcelwefindF
.025,24,24
= 2.269.Thus,wefailto
rejectH

1
2

2
2
= 1 andweproceedwithourtestH
0
:
1

2
=
0vs.H
1
:
1

2
>0 assuming
1
2
=
2
2
774
Therefore,ourteststatisticis:
t =
X

1
-X

2
-u
s
p
2

1
n
1
+
1
n
2
where
s
p
2
=
n
1
-1 s
1
2
+ n
2
-1 s
2
2
n
1
+n
2
-2
= S11,SSu,926
Thisyields:
t =
6S,624 -6u,42S
S11,SSu,926
2
2S
= 1.u4
Thedegreesoffreedomare =n
1
+n
2
2=48.Therejectionrulecalls
forrejectingH
0
ift>t
.05,48
=1.676.Sincet=1.04,wefailtorejectH
0
.
Usingthesetwoindependentsamples,thereisnotenoughevidencein
thedatatoinferthatfinancemajorsalariesarehigherthanmarketing
majorsalaries.
775
B) Testingusingmatchedexperimentaldata.
Byusingtwoindependentsamples,wearenotreally
isolatingtheeffectofmajoronaveragesalaries.
Supposeinthedatacollectedabovewealsoobserve
eachindividualsGPA.Andsupposewematchpairsof
observationsfromthefinanceandmarketing
accordingtotheirGPAs.
Wedothisbycreating25categoriesofGPA(according
toGPAsnumericalvalue).
Fromhere,wematchpairsofobservationsfromthe
twosamplesandcreateasinglesample.
776
Thematcheddatalookslikethis:
777
Group Finance(X1) Marketing(X2) X1 X2
1 95171 89329 5842
2 88009 92705 4696
3 98089 99205 1116
4 106322 99003 7319
5 74566 74825 259
6 87089 77038 10051
7 88664 78272 10392
8 71200 59462 11738
9 69367 51555 17812
10 82618 81591 1027
11 69131 68110 1021
12 58187 54970 3217
13 64718 68675 3957
14 67716 54110 13606
15 49296 46467 2829
16 56625 53559 3066
17 63728 46793 16935
18 55425 39984 15441
19 37898 30137 7761
20 56244 61965 5721
21 51071 47438 3633
22 31235 29662 1573
23 32477 33710 1233
24 35274 31989 3285
25 45835 38788 7047
Wehaven
D
=25,
X

=
1
n

X
1,
-X
2,
n
D
=1
= S,u64.S2
S

=
1
n

-1
X
1,
-X
2,
-X

2
n
D
=1
= 6,647
t =
X

= S.81
ThestatistictisdistributedasaStudenttwithn
D
1=24
degreesoffreedom.Thecriticalvalueisthent
.05,24
=1.711,
andtherejectionrulecallsforrejectingH0ift>1.711.Since
wehavet=3.81,wenowrejectH
0
infavorofH
1
.Usingthis
matchedexperimentdata,wecaninferthatsalariesofMBA
financemajorsare,onaverage,higherthanthoseofMBA
marketingmajors.
778
Whydidweobtainsonoticeablydifferentresultsfrom
thetwosamplesvs.thematchedsample?
Intuitively,whenwematchpairsofobservations,we
arekeepingconstantotherfactorsthatdetermine
salary.
Thisisreflectedinthefactthatthestandarddeviation
inthedenominatorofourteststatisticisnoticeably
smallerforthematchedexperimentdata(1,329)as
comparedwiththeindependentsampledata(4,991).
Sincethenumeratorsarethesameinbothtest
statistics,thissmallerstandarddeviationtranslates
intoalargerteststatisticforthematchedexperiment
data,leadingustorejectH
0
inthatcase.
779
InferencefortheRatioofTwoVariances
Wehavealreadydiscussedtheproblemoftesting
E
0
:

1
2

2
2
= 1 againstE
1
:

1
2

2
2
= 1
Aswementioned,thestatisticweuseis
F =
s
1
2
s
2
2
Technically,theteststatisticis
F =
s
1
2
o
1
2
s
2
2
o
2
2
ThestatisticFisapproximatelydistributedasanFrandom
variablewith
1
= n
1
-1 degreesoffreedominthenumerator
and
2
= n
2
-1 degreesoffreedominthedenominator.
Weknowhowtodothehypothesistest,soitonlyremainsto
describehowtoconstructaconfidenceintervalfortheratioof
variances

1
2

2
2
780
Simplealgebraicmanipulationandthedistribution
resultstatedaboveshowthatwecanconstructa
ConfidenceIntervalfor

1
2

2
2
with(1)%coverage
probabilityas:
s
1
2
s
2
2

1
u
2
,
,
1
,
2
,
s
1
2
s
2
2

1
u
2
,
,
2
,
1
where

1 1
and
2 2
781
InferenceabouttheDifferencebetweenTwo
PopulationProportions
ThisisthelasttopiccoveredinChapter13.
Eventhoughproportionsareaspecialcaseofmeans,their
specialnaturesimplifiestheanalysisinsomeways.
Letp
1
andp
2
betheproportionsofinterest,whichwewantto
comparefortwopopulations.
Forinstance,wemaywantto:
a) Comparetheproportionofconsumerswholikebrand1(p
1
)
againsttheproportionofconsumerswholikebrand2(p
2
).
b) Comparetheproportionofthepopulationwhoapprovedof
theGovernmentsperformancesixmonthsago(p
1
)against
theproportionwhoapprovesthegovernmenttoday(p
2
).
c) Comparetheproportionofpatientswhosurvive5years
amongthosewhotookanewexperimentalmedication(p
1
)
againstthosewhotooktheexistingmedication(p
2
).
782
Asbefore,wehavetwoindependentsamples,one
frompopulation1andtheotherfrompopulation2.
Thesamplesizesaren
1
andn
2
respectively.Let
x
1
=Numberofsuccessesinsample1.
x
2
=Numberofsuccessesinsample2.
Theestimatedproportionsforsamples1and2are
therefore,
1
x
1
n
1
and
2
x
2
n
2
Theaspectthatdistinguishesthisproblemfromthe
generalproblemofdoinginferenceon
1

2
isthe
relationshipthatexists,inBinomialrandom
variables,betweentheexpectationandthevariance.
783
Let
d
1,i
=1ifsuccessintheith observationof
Sample1,and0iffailure.
d
2,i
=1ifsuccessintheith observationof
Sample2,and0iffailure.
FromourpreviousstudyofBinomialrandom
variables,weknowthat
1

1
2
1 1
2

2
2
2 2
Andtherefore,ifwehypothesizethat
1 2
,weareautomaticallyalso
hypothesizingthat
1
2

2
2
784
Thus,wheneverthenullhypothesisis
0 1 2
,
weproceedalsobyassumingthat
1
2

2
2
.
Thiswasnotthecaseforthegeneralproblemof
testing
0 1 2
,wherewehadtofirst
testwhether
1
2

2
2
.
Thismakestesting
0 1 2
asimpler,
onestepprocedure,asopposedtothetwostep
procedureweusedtotest
0 1 2
.
785
Testing
1 2
Since
1 2
implies
1
2

2
2
,weproceed
analogouslytowhenwetest
1 2
underthe
assumptionthat
1
2

2
2
.
Thatis,weuseapooledvarianceestimator.Let
1 2
1 2
notethat wouldbetheestimatorwewoulduseifwe
pooledbothsamplestogetherundertheassumption
that
1 2
.
The pooledvarianceestimatoris
786
Whenwetested H

:
1
-
2
= undertheassumption

1
2
=
2
2
,weusedthestatistic
t =
X

1
-X

2
-u
s
p
2

1
n
1
+
1
n
2
Wheres
p
2
wasourpooledvarianceestimator.
Analogously,whenwetest H

: p
1
-p
2
= weuse
z =
P

1
-P

2
-
P

1 -P


1
n
1
+
1
n
2
If H

: p
1
-p
2
= istrue,zisapproximatelyaStandardNormal
randomvariable.
Asaruleofthumb,recallthatwheneverwedoinferenceon
proportions,theStandardNormal(andnottheStudentt)isthe
approximatedistributionweuse.
Rejectionrulesandpvaluesarethesameasforanytestbased
onStandardNormalcriticalvalues.
787
Thepreviousformulasarevalidonlyifthenullhypothesisis
0 1 2
.
Ifournullhypothesisis
0 1 2
where ,
weuseamoregeneralexpressionfortheteststatistic:
1 2
1 1
1
2 2
2
Asbefore,underthenullhypothesis
0 1 2
,the
teststatisticzisapproximatelydistributedasaStandard
Normal.
Onceagain,rejectionrulesandpvaluesarethesameas
foranytestbasedonStandardNormalcriticalvalues.
788
Testing where
ConstructingConfidenceIntervalsfor
1 2
ToconstructaConfidenceIntervalfor
1 2
,we
usethegeneralstandarddeviationestimator
1 1
1
2 2
2
RelyingontheNormalapproximation,aConfidence
Intervalfor
1 2
with(1)%coverageprobability
isgivenby:
1 2 u2
1 1
1
2 2
2
789

You might also like