You are on page 1of 3

signup

CrossValidatedisaquestionandanswersiteforpeopleinterestedinstatistics,machinelearning,dataanalysis,data
mining,anddatavisualization.It's100%free,noregistrationrequired.

login

zstatisticWaldtestforlogisticregressionCrossValidated

8/14/2015

tour

help

Takethe2minutetour

Waldtestforlogisticregression
AsfarasIunderstandtheWaldtestinthecontextoflogisticregressionisusedtodeterminewhetheracertainpredictorvariableXis
significantornot.Itrejectsthenullhypothesisofthecorrespondingcoefficientbeingzero.
Thetestconsistsofdividingthevalueofthecoefficientbystandarderror .
WhatIamconfusedaboutisthatX/ isalsoknownasZscoreandindicateshowlikelyitisthatagivenobservationcomesformthe
normaldistribution(withmeanzero).
logistic

zstatistic

editedMay26'13at20:40

askedMay26'13at19:13

mbq
15.5k

user695652
7

44

137

93

11

1Answer

Theestimatesofthecoefficientsandtheinterceptsinlogisticregression(andanyGLM)are
foundviamaximumlikelihoodestimation(MLE).Theseestimatesaredenotedwithahatoverthe
parameters,somethinglike^ .Ourparameterofinterestisdenoted0 andthisisusually0aswe
wanttotestwhetherthecoefficientdiffersfrom0ornot.FromasymptotictheoryofMLE,we
knowthatthedifferencebetween^ and0 willbeapproximatelynormallydistributedwithmean
0(detailscanbefoundinanymathematicalstatisticsbooksuchasLarryWasserman'sAllof
statistics ).Recallthatstandarderrorsarenothingelsethanstandarddeviationsofstatistics
(SokalandRohlfwriteintheirbookBiometry :"astatistic isanyoneofmanycomputedor
estimatedstatisticalquantities",e.g.themean,median,standarddeviation,correlationcoefficient,
regressioncoefficient,...).Dividinganormaldistributionwithmean0andstandarddeviation by
itsstandarddeviationwillyieldthestandardnormaldistributionwithmean0andstandard
deviation1.TheWaldstatisticisdefinedas(e.g.Wasserman(2006):AllofStatistics ,pages153,
214215):
^
( 0 )

W =

N (0, 1)

se ( )

or

2
^
( 0 )

^
Var ( )

Thesecondformarisesfromthefactthatthesquareofastandardnormaldistributionisthe21
distributionwith1degreeoffreedom(thesumoftwosquaredstandardnormaldistributionswould
bea22 distributionwith2degreesoffreedomandsoon).
Becausetheparameterofinterestisusually0(i.e.0
^

W =

= 0

),theWaldstatisticsimplifiesto

N (0, 1)

se ( )

Whichiswhatyoudescribed:Theestimateofthecoefficientdividedbyitsstandarderror.

Whenisaz andwhenat valueused?


Thechoicebetweenaz valueorat valuedependsonhowthestandarderrorofthecoefficients
hasbeencalculated.BecausetheWaldstatisticisasymptoticallydistributedasastandard
normaldistribution,wecanusethez scoretocalculatethep value.Whenwe,inadditiontothe
coefficients,alsohavetoestimatetheresidualvariance,at valueisusedinsteadofthez value.
Inordinaryleastsquares(OLS,normallinearregression),thevariancecovariancematrixofthe
coefficientsisVar[^|X]

= (X X )

where 2 isthevarianceoftheresiduals(whichis

http://stats.stackexchange.com/questions/60074/waldtestforlogisticregression

1/3

8/14/2015

zstatisticWaldtestforlogisticregressionCrossValidated

unknownandhastobeestimatedfromthedata)andXisthedesignmatrix .Thestandard
errorsofthecoefficientsarethesquarerootsofthediagonalelementsofthevariance
2
2
^ = s ,
covariancematrix.Becausewedon'tknow 2 ,wehavetoreplaceitbyitsestimate
^
so:
se (j )

= s (X X )
jj

.Nowthat'sthepoint:Becausewehavetoestimatethe

varianceoftheresidualstocalculatethestandarderrorofthecoefficients,weneedtouse
at valueandthet distribution.
Inlogistic(andpoisson)regression,thevarianceoftheresidualsisrelatedtothemean.If
Y Bin(n, p) ,themeanisE(Y ) = np andthevarianceisVar(Y ) = np(1 p)sothe
varianceandthemeanarerelated.Inlogisticandpoissonregressionbutnotinregression
withgaussianerrors,weknowtheexpectedvarianceanddon'thavetoestimateit
separately.Thedispersionparameter indicatesifwehavemoreorlessthantheexpected
variance.If = 1thismeansweobservetheexpectedamountofvariance,whereas < 1
meansthatwehavelessthantheexpectedvariance(calledunderdispersion)and > 1means
thatwehaveextravariancebeyondtheexpected(calledoverdispersion).Thedispersion
parameterinlogisticandpoissonregressionisfixedat1whichmeansthatwecanusethez
score.Thedispersionparameter.Inotherregressiontypessuchasnormallinearregression,we
havetoestimatetheresidualvarianceandthus,at valueisusedforcalculatingthep values.In
R ,lookatthesetwoexamples:
Logisticregression
mydata<read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank<factor(mydata$rank)
my.mod<glm(admit~gre+gpa+rank,data=mydata,family="binomial")
summary(my.mod)
Coefficients:
EstimateStd.ErrorzvaluePr(>|z|)
(Intercept)3.9899791.1399513.5000.000465***
gre0.0022640.0010942.0700.038465*
gpa0.8040380.3318192.4230.015388*
rank20.6754430.3164902.1340.032829*
rank31.3402040.3453063.8810.000104***
rank41.5514640.4178323.7130.000205***

Signif.codes:0***0.001**0.01*0.05.0.11
(Dispersionparameterforbinomialfamilytakentobe1)

Notethatthedispersionparameterisfixedat1andthus,wegetz values.

Normallinearregression(OLS)
summary(lm(Fertility~.,data=swiss))
Coefficients:
EstimateStd.ErrortvaluePr(>|t|)
(Intercept)66.9151810.706046.2501.91e07***
Agriculture0.172110.070302.4480.01873*
Examination0.258010.253881.0160.31546
Education0.870940.183034.7582.43e05***
Catholic0.104120.035262.9530.00519**
Infant.Mortality1.077050.381722.8220.00734**

Signif.codes:0***0.001**0.01*0.05.0.11
Residualstandarderror:7.165on41degreesoffreedom

Here,wehavetoestimatetheresidualvariance(denotedas"Residualstandarderror")and
hence,weuset valuesinsteadofz values.Ofcourse,inlargesamples,thet distribution
approximatesthenormaldistributionandthedifferencedoesn'tmatter.
Anotherrelatedpostcanbefoundhere.
editedJun12'13at15:44

answeredMay26'13at21:09
COOLSerdash
7,850

24

50

Thankyouverymuchforthisnicepostwhichanswersallmyquestions. user695652 May26'13at


21:41
So,practically,regardingthefirstpartofyourexcellentanswer:IfforsomereasonI'dhaveasanoutputthe
oddsratioandtheWaldstatistic,Icouldthancalculatethestandarderrorfromtheseas:SE=(1/Wald
statistic)*ln(OR)Isthiscorrect?Thanks!SanderW.vanderLaanAug10at20:50

http://stats.stackexchange.com/questions/60074/waldtestforlogisticregression

2/3

8/14/2015

zstatisticWaldtestforlogisticregressionCrossValidated

@SanderW.vanderLaanThanksforyourcomment.Yes,Ibelievethat'scorrect.Ifyouperformalogistic
regression,theWaldstatisticswillbethezvalue.COOLSerdash2daysago

http://stats.stackexchange.com/questions/60074/waldtestforlogisticregression

3/3

You might also like