You are on page 1of 7

Stat 111 Homework 8 Solutions, Spring 2015

Problem 1. A logistic regression model is often used to predict p, the probability of success for a
Bernoulli random variable Y , from a known and measurable predictor variable X (X can be quantita-
tive or binary). The logistic regression models independent Y1 , ..., Yn response variables are generated
from the following distribution:

e0 +1 Xi
 
1
Yi |Xi Bern p = =
1 + e0 +1 Xi 1 + e(0 +1 Xi )

(a) Why is the functional choice p = exp(x)/[1 + exp(x)] to model p a good one (x is just some
mathematical expression)?
Regardless of X, its range is limited to (0,1), which has the appropriate bounds for p. Plus its sym-
metric around the middle of the support of p = 0.5.

(b) Write down the formula for the log-likelihood function, l(0 , 1 ).
n  Yi  1Yi
Y 1 1
L(0 , 1 ) =
i=1
1 + e(0 +1 Xi ) 1 + e0 +1 Xi
Xn
l(0 , 1 ) Yi (log(1 + e(0 +1 Xi ) )) (1 Yi ) log(1 + e0 +1 Xi )
i=1
Xn n
X
= Yi [log(1 + e0 +1 Xi ) log(1 + e(0 +1 Xi ) )] log(1 + e0 +1 Xi )
i=1 i=1
n
X n
X n
X
= 0 Yi + 1 Xi Yi log(1 + e0 +1 Xi )
i=1 i=1 i=1

(c) There are not closed form solutions for the maximum likelihood estimators of 0 and 1 . In 1 or
2 sentences, explain how these could be calculated numerically (presumably in R).
Use Rs optim() function, inputting X and Y vectors and maximizing the log-likelihood function with
respect to 0 and 1 .

(d) Find the score equations, U (0 ) and U (1 ).


n n n 
e0 1 Xi e0 +1 Xi e0 +1 Xi
X     X X 
U (0 ) = Yi (1 Yi ) = Yi
1 + e0 1 Xi 1 + e0 +1 Xi 1 + e0 +1 Xi
i=1 i=1 i=1

n n n
e0 1 Xi e0 +1 Xi e0 +1 Xi
X     X X  
U (1 ) = Xi Yi Xi (1 Yi ) = Xi Yi Xi
1 + e0 1 Xi 1 + e0 +1 Xi 1 + e0 +1 Xi
i=1 i=1 i=1

(e) Calculate the 2x2 expected Fishers Information matrix for this model.
n n n
! !
X e0 +1 Xi X
2 e0 +1 Xi X e0 +1 Xi
I11 = , I22 = Xi , I21 = I12 = Xi
0 +1 Xi )2 (1 + e0 +1 Xi )
2
(1 + e0 +1 Xi )
2
i=1 (1 + e i=1 i=1

1
(f) Write 3 separate functions in R:

1. logistic.loglik: evaluates the log-likelihood function of the logistic model given 3 arguments:
the values of the parameters, par=c(beta0,beta1), the outcomes variable, y, and the predictor
variable, x.
2. logistic.score: evaluates the score functions given the same 3 arguments above, and returns
a vector, score = c(u1,u2)
3. logistic.fisher: evaluates the expected fisher information given the same 3 arguments above,
and returns a matrix, fisher = matrix(c(I11,I21,I12,I22),nrow=2,ncol=2) where I11 is
the entry in the first row and first column of the matrix, etc

logistic.loglik=function(theta,x,y,b1.null=NA){
#theta is a vector c(b0,b1)
n=length(y)
b0=theta[1]
b1=theta[2]
if(!is.na(b1.null)){ b1=b1.null }
# loglik = -sum(y*log(1+exp(-b0-b1*x))) + sum((y-1)*log(1+exp(b0+b1*x)))
loglik = b0*sum(y)+b1*sum(x*y)-sum(log(1+exp(b0+b1*x)))
return(loglik)
}
logistic.score=function(theta,x,y,b1.null=NA){
#theta is a vector c(b0,b1)
n=length(y)
b0=theta[1]
b1=theta[2]
if(!is.na(b1.null)){ b1=b1.null }
#u1 = sum(y*exp(-b0-b1*x)/(1+exp(-b0-b1*x))) + sum((y-1)*exp(b0+b1*x)/(1+exp(b0+b1*x)))
#u2 = sum(y*x*exp(-b0-b1*x)/(1+exp(-b0-b1*x))) + sum((y-1)*x*exp(b0+b1*x)/(1+exp(b0+b1*x)))
u1 = sum(y)-sum(exp(b0+b1*x)/(1+exp(b0+b1*x)))
u2 = sum(x*y)-sum(x*exp(b0+b1*x)/(1+exp(b0+b1*x)))
score=c(u1,u2)
return(score)
}
logistic.fisher=function(theta,x,y,b1.null=NA){
#theta is a vector c(b0,b1)
n=length(y)
b0=theta[1]
b1=theta[2]
if(!is.na(b1.null)){ b1=b1.null }
I11 = sum(exp(b0+b1*x)/(1+exp(b0+b1*x))^2)
I21 = I12 = sum(x*exp(b0+b1*x)/(1+exp(b0+b1*x))^2)
I22 = sum(x^2*exp(b0+b1*x)/(1+exp(b0+b1*x))^2)
fisher=matrix(c(I11,I21,I12,I22),nrow=2,ncol=2)
return(fisher)
}

2
Problem 2. The dataset gsscrack.csv contains several variables from the General Social Survey from
2012. Two variables of interest for this problem are:

crack : a Bernoulli/binary variable. 1 if respondant has smoked crack, 0 otherwise.


f emale : a Bernoulli/binary variable. 1 if female, 0 if male.

Wed like to determine whether a man or a woman is more likely to use crack.
**Be sure to include any R commands and output that you used to perform the calculations for any
parts of this problem.
(a) Use Rs optim command and your logistic.loglik function above to calculate the maximum
likelihood estimates of 0 and 1 .

*Helpful hint: in the optim command, you can provide an additional optional argument: hessian =
TRUE. The results will then include the numeric estimate of the hessian matrix (the matrix of second-
order partial derivatives for the log-likelihood function) evaluated at the MLEs. This will be useful to
double-check some answers below.

> loglik=optim(par=c(0,0),fn=logistic.loglik,x=crackdata$female,
+ y=crackdata$crack,control=list(fnscale=-1),hessian=T)
> mle=loglik$par #MLE
> mle
[1] -2.4041148 -0.5918556

(b) Use your estimates of 0 and 1 to estimate the probability of a woman having smoked crack.
What is the estimate for a man?
The estimated probability for women is:

e0 +1 e2.4040.592
pf = = = 0.0476
1 + e0 +1 1 + e2.4040.592

The estimated probability for men is:

e0 e2.404
pm = = = 0.0829
1 + e0 1 + e2.404

(c) Write a set of hypotheses in terms of model parameters (for the logistic regression model described
in problem #1) to determine whether the probability of smoking crack is different for men and women.

H0 : 1 = 0 vs. HA : 1 6= 0

(d) Perform an asymptotic likelihood ratio test for the hypotheses above. Be sure to include an
estimate for the test statistic and the p-value. *Note: you may have to modify your function to
include a beta1.null argument like in HW #6.

> logliknull=optim(par=c(0,0),fn=logistic.loglik,x=crackdata$female,y=crackdata$crack,b1.null=0
> chisqstat=-2*(logliknull$value-loglik$value) #chi-square test statistic
> logliknull$value
[1] -232.8913
> loglik$value
[1] -230.409

3
> chisqstat
[1] 4.964711
> 1-pchisq(chisqstat,df=1) #p-value
[1] 0.02586963

(e) Perform a score test for the hypotheses above. Be sure to include an estimate for the test statistic
and the p-value.

> theta0=c(logliknull$par[1],0)
> Unull=logistic.score(theta0,x=crackdata$female,y=crackdata$crack)
> Inull=logistic.fisher(theta0,x=crackdata$female,y=crackdata$crack)
> scorestat=t(Unull)%*%solve(Inull)%*%Unull #score test statistic
> scorestat
[,1]
[1,] 4.906536
> 1-pchisq(scorestat,df=1) #p-value
[,1]
[1,] 0.02675525

(f) Perform a Wald test for the hypotheses above. Be sure to include an estimate for the test statistic
and the p-value.

> Imle=logistic.fisher(mle,x=crackdata$female,y=crackdata$crack)
> Waldstat=t(mle-theta0)%*%Imle%*%(mle-theta0) #wald test statistic
> Waldstat
[,1]
[1,] 4.885426
> 1-pchisq(Waldstat,df=1) #p-value
[,1]
[1,] 0.02708435

(g) Calculate a 95% confidence interval for 1 . You can use any of the asymptotic likelihood-based
methods (likelihood ratio, score, or Wald approach).

> #Wald confidence interval - need to invert the whole Fisher information matrix
> c(mle[2]-1.96*sqrt(solve(Imle)[2,2]),mle[2]+1.96*sqrt(solve(Imle)[2,2]))
[1] -1.12107836 -0.06263291
> #an approximation that ignores beta_0 altogether
> c(mle[2]-1.96/sqrt(Imle[2,2]),mle[2]+1.96/sqrt(Imle[2,2]))
[1] -1.0106816 -0.1730297

(h) In 2-3 sentences, summarize the results of the tests above to determine whether smoking crack is
related to sex.
The LRT, score test, and Wald test all suggested rejecting the null hypothesis. We have sufficient
evidence to believe that the proportion of women smoking crack is different than that of men; in fact,
it is smaller.

(i) For the data and model above, calculate the p-value for the likelihood ratio test for two scenarios:
1) H0 : 1 0 vs. HA : 1 < 0 and 2) H0 : 1 0 vs. HA : 1 > 0.

4
(i) The LRTthe 2 statistic would still be 4.96 and the test is just the one-sided version of the test
in part (d). So the p-value is half of 0.02587, which is about 0.0129.
(ii) The LRT statistic equals 1 (the 2 statistic will be zero) because the MLE of 1 = 0.591
is inside the null space. The LRT statistic is always less than or equal to (as extreme as)
1, so the p-value equals 1. Alternatively, we could take the unthinking approach, and find
min(P ( 0.591|H0 )) 1 (0.02587/2) = 0.9871, which is not 100% correct.

(j) Double-check your answers with Rs canned logistic regression model. The function call in R would
be something like: model = glm(y x, family = binomial), and then summary(model) can be used
to view the complete results. Are the estimates the same as your calculations? How about the Wald
test statistic (R performs the Wald test and calls it a z-test in the table of coefficients)?
Call:
glm(formula = crack ~ female, family = binomial, data = crackdata)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.4047 0.1651 -14.565 <2e-16 ***
female -0.5910 0.2700 -2.189 0.0286 *

Problem 3. The above hypotheses could also be tested using the two-sample z-test (Normal approx-
imation to the Binomial) based on the two sample proportions which estimate the two population
proportions: p1 : the proportion of women who have smoked crack and p2 : the proportion of men who
have smoked crack. Let X1 Bin(n1 = 483, p1 ) and X2 Bin(n2 = 483, p2 ), and pj = Xj /nj . We
can assume that all observations are independent.
(a) What are the asymptotic sampling distributions of p1 and p2 ?
Using the Normal approximation to the Binomial:
   
p1 (1 p1 ) p2 (1 p2 )
p1 N p1 , , p2 N p2 ,
n1 n2

(b) Calculate the asymptotic sampling distribution of D = p1 p2 . If the true population proportions
are equal (call the common proportion, p), D = p1 p2 = 0, how does the variance simplify?
Linear combinations of Normalr.v.s are also Normal, and based on properties of means and variances
we can determine:  
p1 (1 p1 ) p2 (1 p2 )
p1 p2 N p1 p2 , +
n1 n2
And if the proportions are equal, this simplifies to:
  
1 1
p1 p2 N 0, p(1 p) +
n1 n2
(c) Calculate p1 and p2 directly from the data.
> mean(crack[female==1])
[1] 0.04761905
> mean(crack[female==0])
[1] 0.08281573

5
Note: these match perfectly the estaimtes from the logistic regression model...thats a good thing.
(d) Wed like to test whether the proportion of people who have smoked crack is the same in the two
groups (the sexes). Perform an appropriate asymptotic hypothesis test based on the sample statistic
D to make this determination (remember: do all of your calculations assuming the null hypothesis to
be true). Be sure to state your hypotheses, the test statistic, the critical region/value, the p-value,
and the conclusion in context of the problem. Note: there is no need to adjust for a correction of
continuity here.
> p0=mean(crack) #pooled proportion
> var0=p0*(1-p0)*(1/n1+1/n2)
> Dhat=mean(crack[female==1])-mean(crack[female==0])
> Dhat/sqrt(var0)
[1] -2.215253
> 2*(1-pnorm(abs(Dhat),0,sqrt(var0)))
[1] 0.0267427
H0 : p1 = p2 vs. HA : p1 6= p2
D 0
Z=r   = 2.215
p(1 p) n11 + 1
n2

critical region is |Z| > 1.96. p-value = 0.0267


Since the p-value is less than 0.05 we can reject the null hypothesis. There is evidence that men and
women use crack at different rates (women use it less often).

Problem 4. Lets instead take a Bayesian approach to the example above. Wed like to compare two
models, one where we set the priors to be p1 Unif(0, 1) and independently p2 Unif(0, 1) [call this
Model 1]; and one where we set p1 = p2 = p and we set the prior on p Unif(0, 1) [Model 2]. Use the
crack dataset to answer the following problems. You can assume the sample sizes are fixed: n1 = 483
and n2 = 483.
(a) In general, what is the likelihood, L(p1 , p2 ) = f (X1 , X2 |p1 , p2 ) for the data X1 and X2 (do not use
asymptotic results here)? You can ignore the multiplicative constant/coefficient.
n1 X1 X2
L(p1 , p2 ) pX
1 (1 p1 )
1
p2 (1 p2 )n2 X2
(b) Calculate the probability of the data we see given Model 1; that is, given Model 1 is true, calculate
the marginal probability, P (X1 = 23, X2 = 40). *Note: this double integral simplifies since p1 and p2
are independent. Z Z 1 1
P (X1 = 23, X2 = 40|M1 ) = P (X1 = 23, X2 = 40)|p1 , p2 )f (p1 , p2 )dp1 dp2
0 0
Z 1 Z 1
= P (X1 = 23|p1 ) 1dp1 P (X2 = 40|p2 ) 1dp2
0 0
Z 1  Z 1 
483 23 460 483 40
= p (1 p1 ) dp1 p (1 p2 )443 dp2
0 23 1 0 40 2
   Z 1 Z 1
483 483
= p23
1 (1 p 1 ) 460
dp 1 p40
2 (1 p2 )
443
dp2
23 40 0 0
1 2
       
483 483 23!460! 40!443!
= = 4.27 106
23 40 484! 484! 484

6
These integrals have a closed form solution since the functional form of the integrands match that of
a Beta distribution.
(c) Calculate the probability of the data we see given Model 2.
Z 1      
483 483 63 483 483 63!903!
P (X1 = 23, X2 = 40|M1 ) = 903
p (1 p) dp = 9.29 106
0 23 40 23 40 967!

(d) Calculate Bayes Factor comparing Model 1 to Model 2.

9.29 106
BF = = 2.17
1 2

484

(e) Interpret the Bayes factor. Is there evidence that the model to predict crack use should include
sex?
Since the BF is 2.17 which is greater than 1, the model that combines the two groups together is
actually preferred (by a little bit). Sex is not needed in a model to predict crack usage.

(f) Calculate the BIC (Bayesian Information Criterion) for each of the two models (you will need to
evaluate two likelihood functions first). Interpret the results.
Recall the definition of BIC:

BIC = 2 log(L()) + k(log(n) + log(2))

Using this for our models and data (note, we are ignoring the binomial coefficients, which should
technically be included, but does not affect the comparison between models since both models have
the same value which cancel each other out):

BIC1 = 2 (X1 log(p1 ) + (n1 X1 ) log(1 p1 ) + X2 log(p2 ) + (n2 X2 ) log(1 p2 )) + k(log(n) + log(2))
= 2 (23 log(0.0476) + 460 log(0.9524) + 40 log(0.0828) + 423 log(0.9172)) + 2(log(966) + log(2))
= 474.78
BIC2 = 2 (X1 log(p1 ) + (n1 X1 ) log(1 p1 ) + X2 log(p2 ) + (n2 X2 ) log(1 p2 )) + k(log(n) + log(2))
= 2 (63 log(0.0652) + 903 log(0.9348)) + 1(log(966) + log(2))
= 474.49

These calculated values are almost identical. The larger model is only slightly preferred.

You might also like