Professional Documents
Culture Documents
CrossValidatedisaquestionandanswersiteforpeopleinterestedinstatistics,machinelearning,dataanalysis,data
mining,anddatavisualization.It's100%free,noregistrationrequired.
login
zstatisticWaldtestforlogisticregressionCrossValidated
8/14/2015
tour
help
Takethe2minutetour
Waldtestforlogisticregression
AsfarasIunderstandtheWaldtestinthecontextoflogisticregressionisusedtodeterminewhetheracertainpredictorvariableXis
significantornot.Itrejectsthenullhypothesisofthecorrespondingcoefficientbeingzero.
Thetestconsistsofdividingthevalueofthecoefficientbystandarderror .
WhatIamconfusedaboutisthatX/ isalsoknownasZscoreandindicateshowlikelyitisthatagivenobservationcomesformthe
normaldistribution(withmeanzero).
logistic
zstatistic
editedMay26'13at20:40
askedMay26'13at19:13
mbq
15.5k
user695652
7
44
137
93
11
1Answer
Theestimatesofthecoefficientsandtheinterceptsinlogisticregression(andanyGLM)are
foundviamaximumlikelihoodestimation(MLE).Theseestimatesaredenotedwithahatoverthe
parameters,somethinglike^ .Ourparameterofinterestisdenoted0 andthisisusually0aswe
wanttotestwhetherthecoefficientdiffersfrom0ornot.FromasymptotictheoryofMLE,we
knowthatthedifferencebetween^ and0 willbeapproximatelynormallydistributedwithmean
0(detailscanbefoundinanymathematicalstatisticsbooksuchasLarryWasserman'sAllof
statistics ).Recallthatstandarderrorsarenothingelsethanstandarddeviationsofstatistics
(SokalandRohlfwriteintheirbookBiometry :"astatistic isanyoneofmanycomputedor
estimatedstatisticalquantities",e.g.themean,median,standarddeviation,correlationcoefficient,
regressioncoefficient,...).Dividinganormaldistributionwithmean0andstandarddeviation by
itsstandarddeviationwillyieldthestandardnormaldistributionwithmean0andstandard
deviation1.TheWaldstatisticisdefinedas(e.g.Wasserman(2006):AllofStatistics ,pages153,
214215):
^
( 0 )
W =
N (0, 1)
se ( )
or
2
^
( 0 )
^
Var ( )
Thesecondformarisesfromthefactthatthesquareofastandardnormaldistributionisthe21
distributionwith1degreeoffreedom(thesumoftwosquaredstandardnormaldistributionswould
bea22 distributionwith2degreesoffreedomandsoon).
Becausetheparameterofinterestisusually0(i.e.0
^
W =
= 0
),theWaldstatisticsimplifiesto
N (0, 1)
se ( )
Whichiswhatyoudescribed:Theestimateofthecoefficientdividedbyitsstandarderror.
= (X X )
where 2 isthevarianceoftheresiduals(whichis
http://stats.stackexchange.com/questions/60074/waldtestforlogisticregression
1/3
8/14/2015
zstatisticWaldtestforlogisticregressionCrossValidated
unknownandhastobeestimatedfromthedata)andXisthedesignmatrix .Thestandard
errorsofthecoefficientsarethesquarerootsofthediagonalelementsofthevariance
2
2
^ = s ,
covariancematrix.Becausewedon'tknow 2 ,wehavetoreplaceitbyitsestimate
^
so:
se (j )
= s (X X )
jj
.Nowthat'sthepoint:Becausewehavetoestimatethe
varianceoftheresidualstocalculatethestandarderrorofthecoefficients,weneedtouse
at valueandthet distribution.
Inlogistic(andpoisson)regression,thevarianceoftheresidualsisrelatedtothemean.If
Y Bin(n, p) ,themeanisE(Y ) = np andthevarianceisVar(Y ) = np(1 p)sothe
varianceandthemeanarerelated.Inlogisticandpoissonregressionbutnotinregression
withgaussianerrors,weknowtheexpectedvarianceanddon'thavetoestimateit
separately.Thedispersionparameter indicatesifwehavemoreorlessthantheexpected
variance.If = 1thismeansweobservetheexpectedamountofvariance,whereas < 1
meansthatwehavelessthantheexpectedvariance(calledunderdispersion)and > 1means
thatwehaveextravariancebeyondtheexpected(calledoverdispersion).Thedispersion
parameterinlogisticandpoissonregressionisfixedat1whichmeansthatwecanusethez
score.Thedispersionparameter.Inotherregressiontypessuchasnormallinearregression,we
havetoestimatetheresidualvarianceandthus,at valueisusedforcalculatingthep values.In
R ,lookatthesetwoexamples:
Logisticregression
mydata<read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank<factor(mydata$rank)
my.mod<glm(admit~gre+gpa+rank,data=mydata,family="binomial")
summary(my.mod)
Coefficients:
EstimateStd.ErrorzvaluePr(>|z|)
(Intercept)3.9899791.1399513.5000.000465***
gre0.0022640.0010942.0700.038465*
gpa0.8040380.3318192.4230.015388*
rank20.6754430.3164902.1340.032829*
rank31.3402040.3453063.8810.000104***
rank41.5514640.4178323.7130.000205***
Signif.codes:0***0.001**0.01*0.05.0.11
(Dispersionparameterforbinomialfamilytakentobe1)
Notethatthedispersionparameterisfixedat1andthus,wegetz values.
Normallinearregression(OLS)
summary(lm(Fertility~.,data=swiss))
Coefficients:
EstimateStd.ErrortvaluePr(>|t|)
(Intercept)66.9151810.706046.2501.91e07***
Agriculture0.172110.070302.4480.01873*
Examination0.258010.253881.0160.31546
Education0.870940.183034.7582.43e05***
Catholic0.104120.035262.9530.00519**
Infant.Mortality1.077050.381722.8220.00734**
Signif.codes:0***0.001**0.01*0.05.0.11
Residualstandarderror:7.165on41degreesoffreedom
Here,wehavetoestimatetheresidualvariance(denotedas"Residualstandarderror")and
hence,weuset valuesinsteadofz values.Ofcourse,inlargesamples,thet distribution
approximatesthenormaldistributionandthedifferencedoesn'tmatter.
Anotherrelatedpostcanbefoundhere.
editedJun12'13at15:44
answeredMay26'13at21:09
COOLSerdash
7,850
24
50
http://stats.stackexchange.com/questions/60074/waldtestforlogisticregression
2/3
8/14/2015
zstatisticWaldtestforlogisticregressionCrossValidated
@SanderW.vanderLaanThanksforyourcomment.Yes,Ibelievethat'scorrect.Ifyouperformalogistic
regression,theWaldstatisticswillbethezvalue.COOLSerdash2daysago
http://stats.stackexchange.com/questions/60074/waldtestforlogisticregression
3/3