HW 4

Aaron Vincent 17 April 2014 Statistics 516 Carriers of Steptococcus Pyogenes and Tonsil Size Part 1: Answer:
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.9606 0.1904 -15.546 <2e-16 *** SizeNormal -0.3035 0.3015 -1.007 0.3141 SizeVery Large 0.5440 0.2857 1.904 0.0569 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7.3209e+00 on 2 degrees of freedom Residual deviance: -1.9984e-15 on 0 degrees of freedom AIC: 20.851 Number of Fisher Scoring iterations: 3
R Input:
> model1=glm(cbind(Carrier, Non.carrier)~Size, family=binomial, data=Tonsils) > summary(model1)
R Output:
Call: glm(formula = cbind(Carrier, Non.carrier) ~ Size, family = binomial, data = Tonsils) Deviance Residuals: [1] 0 0 0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.9606 0.1904 -15.546 <2e-16 *** SizeNormal -0.3035 0.3015 -1.007 0.3141 SizeVery Large 0.5440 0.2857 1.904 0.0569 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7.3209e+00 on 2 degrees of freedom Residual deviance: -1.9984e-15 on 0 degrees of freedom AIC: 20.851 Number of Fisher Scoring iterations: 3
Part 2:
Aaron Vincent 17 April 2014 Statistics 516 Answer:

Analysis of Deviance Table Model 1: cbind(Carrier, Non.carrier) ~ 1 Model 2: cbind(Carrier, Non.carrier) ~ Size Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 2 7.3209 2 0 0.0000 2 7.3209 0.02572 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
R Input:
> model1=glm(cbind(Carrier, Non.carrier)~Size, family=binomial, data=Tonsils) > model1.5=glm(cbind(Carrier, Non.carrier)~1, family=binomial, data=Tonsils) > anova(model1.5, model1, test="LRT")
R Output:
Analysis of Deviance Table Model 1: cbind(Carrier, Non.carrier) ~ 1 Model 2: cbind(Carrier, Non.carrier) ~ Size Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 2 7.3209 2 0 0.0000 2 7.3209 0.02572 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Part 3: Answer: LOG-ODDS Normal: -3.264151 CI: (-3.722312, -2.805990) Large: -2.960641 CI: (-3.333902, -2.587380) Very Large: -2.416658 CI: (-2.834200, -1.999116) ODDS: Normal: 0.03822938 CI: (0.02417801, 0.06044688) Large: 0.05178571
Aaron Vincent 17 April 2014 Statistics 516 CI: (0.03565371, 0.07521686) Very Large: 0.08921933 CI: (0.05876555, 0.13545503) PROBABILITY: Normal: 0.03682171 CI: (0.02360723, 0.05700134) Large: 0.04923599 CI:
(0.03442629, 0.06995505)
Very Large: 0.08191123 CI: (0.05550381, 0.11929577) R Input:

> source("http://dl.dropboxusercontent.com/u/10884844/Rcode/contrastfix.R") > install.packages("contrast") > library(contrast) > cont=contrast(model1, list(Size=levels(Tonsils$Size)), cnames = levels(Tonsils$Size)) > contrastfix(cont, wald=TRUE) > contrastfix(cont, wald=TRUE, exp=TRUE) > Normal1=plogis(c(-3.264151, -3.722312, -2.805990)) > Large1=plogis(c(-2.960641, -3.333902, -2.587380)) > Very.Large1=plogis(c(-2.416658,-2.834200, -1.999116))
R Output:
> contrastfix(cont, wald=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) Large -2.960641 0.1904428 -3.333902 -2.587380 -15.55 NA 0 Normal -3.264151 0.2337598 -3.722312 -2.805990 -13.96 NA 0 Very Large -2.416658 0.2130355 -2.834200 -1.999116 -11.34 NA 0 > contrastfix(cont, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) Large 0.05178571 0.1904428 0.03565371 0.07521686 -15.55 NA 0 Normal 0.03822938 0.2337598 0.02417801 0.06044688 -13.96 NA 0 Very Large 0.08921933 0.2130355 0.05876555 0.13545503 -11.34 NA 0 > Normal1 [1] 0.03682171 0.02360723 0.05700134 > Large1 [1] 0.04923599 0.03442629 0.06995505 > Very.Large1
Aaron Vincent 17 April 2014 Statistics 516

[1] 0.08191123 0.05550381 0.11929577
Part 4: Answer: Large vs. Normal:

Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.354605 0.3015164 0.7501732 2.446042 1.01 NA 0.1571 This means that you have a 1.35x higher chance of getting Streptococcus if you have large tonsils vs. normal tonsils. Normal vs. Very Large Contrast S.E. Lower Upper t df Pr(>|t|) 1 2.33379 0.3162717 1.255599 4.337832 2.68 NA 0.0037 This means that you have a 2.33x higher chance of getting Streptococcus if you have very large tonsils vs. normal tonsils.
Large vs. Very Large

Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.722856 0.2857492 0.9840538 3.016332 1.9 NA 0.0285 This means that you have a 1.72x higher chance of getting Streptococcus if you have very large tonsils vs. large tonsils.
R Input:
cont2=contrast(model1, list(Size="Large"), list(Size="Normal")) contrastfix(cont2, wald=TRUE, exp=TRUE) cont3=contrast(model1, list(Size="Very Large"), list(Size="Large")) contrastfix(cont3, wald=TRUE, exp=TRUE) cont4=contrast(model1, list(Size="Very Large"), list(Size="Normal")) contrastfix(cont4, wald=TRUE, exp=TRUE)
R Output:
> contrastfix(cont2, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.354605 0.3015164 0.7501732 2.446042 1.01 NA 0.1571 > contrastfix(cont3, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.722856 0.2857492 0.9840538 3.016332 1.9 NA 0.0285 > contrastfix(cont4, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1 2.33379 0.3162717 1.255599 4.337832 2.68 NA 0.0037
Moth Coloration and Natural Selection Part 1: Answer: Parameter estimates:

(Intercept) -1.128987

Distance 0.018502 Morphlight 0.411257 Distance:Morphlight -0.027789 Standard errors: (Intercept) 0.197906 Distance 0.005645 Morphlight 0.274490 Distance:Morphlight 0.008085
R Input:
summary(model) modela=glm(cbind(Removed, Placed-Removed)~Distance+Morph+Distance:Morph, data=case2102, family=binomial) summary(modela) newdata1 <- expand.grid(Distance = seq(0, 52, length = 100), Morph = c("light","dark")) newdata1$yhat <- predict(modela, newdata1, type = "response") plot1 <- ggplot(case2102, aes(x = Distance, y =Removed/Placed, color = Morph)) plot1 <- plot1 + geom_point() + geom_line(aes(y = yhat), data = newdata1) plot1 <- plot1 + ylab("Observed/Predicted Proportion") plot(plot1)
R Output:
> summary(modela) Call: glm(formula = cbind(Removed, Placed - Removed) ~ Distance + Morph + Distance:Morph, family = binomial, data = case2102) Deviance Residuals: Min 1Q Median 3Q Max -2.21183 -0.39883 0.01155 0.68292 1.31242 Coefficients: (Intercept) Estimate Std. Error z value Pr(>|z|) -1.128987 0.197906 -5.705 1.17e-08 ***

Distance 0.018502 0.005645 3.277 0.001048 ** Morphlight 0.411257 0.274490 1.498 0.134066 Distance:Morphlight -0.027789 0.008085 -3.437 0.000588 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 35.385 on 13 degrees of freedom Residual deviance: 13.230 on 10 degrees of freedom AIC: 83.904 Number of Fisher Scoring iterations: 4 > newdata1 <- expand.grid(Distance = seq(0, 52, length = 100), Morph = c("light","dark")) > newdata1$yhat <- predict(modela, newdata1, type = "response") > plot1 <- ggplot(case2102, aes(x = Distance, y =Removed/Placed, color = Morph)) > plot1 <- plot1 + geom_point() + geom_line(aes(y = yhat), data = newdata1) > plot1 <- plot1 + ylab("Observed/Predicted Proportion") > plot(plot1)
Part 2: Answer:
glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) light 0.9907562 0.005788365 0.9795796 1.002060 -1.60 NA 0.0543 dark 1.0186745 0.005645336 1.0074653 1.030008 3.28 NA 0.0005
This means that for every 1 unit increase in distance chance of light being removed goes down by 0.01x and dark goes up by 1.01x. R Input:
source("http://dl.dropboxusercontent.com/u/10884844/Rcode/contrastfix.R") install.packages("contrast") library(contrast) conta=contrast(modela, list(Morph=c("light", "dark"), Distance=2), list(Morph=c("light", "dark"), Distance=1), cnames=c("light", "dark")) contrastfix(conta, wald=TRUE, exp=TRUE)
R Output:
> conta=contrast(modela, list(Morph=c("light", "dark"), Distance=2), list(Mor ph=c("light", "dark"), Distance=1), cnames=c("light", "dark")) > contrastfix(conta, wald=TRUE, exp=TRUE) #so for every 1 unit increase in di stance chance of light being removed goes up by 1.009x and dark goes down by .98x glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) light 0.9907562 0.005788365 0.9795796 1.002060 -1.60 NA 0.0543 dark 1.0186745 0.005645336 1.0074653 1.030008 3.28 NA 0.0005
Part 3: Answer:

Contrast S.E. Lower Upper t df Pr(>|t|) 1.508713 0.2744898 0.8809688 2.583764 1.5 NA 0.067
This means that at a distance of 0 units the chance of being removed if a moth is a light morph is 1.5x higher than if it was dark.
Contrast S.E. Lower Upper t df Pr(>|t|) 2.659651 0.2195914 1.729451 4.090169 4.45 NA 0
This means that at a distance of 50 units the chance of being removed if a moth is a dark morph is ~2.66x higher than if it was light. R Input:
contb=contrast(modela, list(Morph="light", Distance=0), list(Morph= "dark", Distance=0)) contrastfix(contb, wald=TRUE, exp=TRUE) contc=contrast(modela, list(Morph="dark", Distance=50), list(Morph= "light", Distance=50)) contrastfix(contc, wald=TRUE, exp=TRUE)
R Output:
> contrastfix(contb, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1.508713 0.2744898 0.8809688 2.583764 1.5 NA 0.067 > contrastfix(contc, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 2.659651 0.2195914 1.729451 4.090169 4.45 NA
Part 4: Answer: There is no overdispersion because there is only one point that is greater than 2 on the residual vs. predicted plot and, in the summary statistics of the model, the residual deviance is not >2x the null deviance.
R Input:
plot(predict(modela, type = "response"), rstudent(modela))
R Output:

> plot(predict(modela, type = "response"), rstudent(modela))
Snow Geese Overdispersion Part 1: Answer: Parameter estimates:

(Intercept) -0.45550 observerobs2 -3.09079 photo 0.80007 observerobs2:photo 0.30457
Standard Errors:
(Intercept) 3.83921 observerobs2 5.64216 photo 0.06317 observerobs2:photo 0.09580
Using the quasipoisson function gives you the same parameter estimates but higher standard error estimates, thus making your evaluations of the model more accurate by giving you bigger confidence intervals and more robust test statistics and p-values. Additionally, the quasipoisson distribution has adequately dealt with the overdispersion because only a few points are greater than 2 on the residual vs predicted plot and, in the summary statistics of the model, the residual deviance is not >2x the null deviance.
R Input:
model3.1 <- glm(count ~ observer + photo + observer:photo, data = snowgeese.long, family = quasipoisson(link = identity)) summary(model3.1) summary(model) plot(predict(model3.1, type = "response"), rstudent(model3.1)) plot(predict(model, type = "response"), rstudent(model))
R Output:
> summary(model3.1) Call:

glm(formula = count ~ observer + photo + observer:photo, family = quasipoisson(link = identity), data = snowgeese.long) Deviance Residuals: Min 1Q Median 3Q Max -7.4540 -1.5993 -0.5203 0.7914 12.2823 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.45550 3.83921 -0.119 0.90584 observerobs2 -3.09079 5.64216 -0.548 0.58525 photo 0.80007 0.06317 12.666 < 2e-16 *** observerobs2:photo 0.30457 0.09580 3.179 0.00205 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for quasipoisson family taken to be 10.69254) Null deviance: 7049.97 on 89 degrees of freedom Residual deviance: 859.79 on 86 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 5 > plot(predict(model3.1, type = "response"), rstudent(model3.1))
Part 2: Answer: Parameter Estimates:

(Intercept) 1.42894 observerobs2 -2.94485 photo 0.75447 observerobs2:photo 0.29679
Standard Error:
(Intercept) 2.25435 observerobs2 3.36972 photo 0.05827 observerobs2:photo 0.09396
Using a negative binomial gives you different parameter estimates and higher standard error estimates, thus making your evaluations of the model more accurate by giving you bigger confidence intervals and more robust test statistics and p-values. Additionally, the negative binomial distribution has adequately dealt with the overdispersion because only a few points are greater than 2 on the residual vs predicted plot and, in the summary statistics of the model, the residual deviance is not >2x the null deviance.
10
R Input:
library(MASS) model3.2 <- glm.nb(count ~ observer + photo + observer:photo, data = snowgeese.long, link = identity) summary(model3.2) plot(predict(model3.2, type = "response"), rstudent(model3.2)) plot(predict(model, type = "response"), rstudent(model))
R Output:
> summary(model3.2) Call: glm.nb(formula = count ~ observer + photo + observer:photo, data = snowgeese.long, link = identity, init.theta = 11.0702114) Deviance Residuals: Min 1Q Median 3Q Max -2.0805 -0.6863 -0.2228 0.3886 2.9868 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.42894 2.25435 0.634 0.52617 observerobs2 -2.94485 3.36972 -0.874 0.38217 photo 0.75447 0.05827 12.947 < 2e-16 *** observerobs2:photo 0.29679 0.09396 3.159 0.00158 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for Negative Binomial(11.0702) family taken to be 1) Null deviance: 793.411 on 89 degrees of freedom Residual deviance: 84.989 on 86 degrees of freedom AIC: 779.36 Number of Fisher Scoring iterations: 1
Theta: 11.07 Std. Err.: 1.92 2 x log-likelihood: -769.363

HW 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW 4

Uploaded by

Copyright:

Available Formats

Aaron Vincent 17 April 2014 Statistics 516 Carriers of Steptococcus Pyogenes and Tonsil Size Part 1: Answer:

Aaron Vincent 17 April 2014 Statistics 516 Answer:

Very Large: 0.08191123 CI: (0.05550381, 0.11929577) R Input:

Aaron Vincent 17 April 2014 Statistics 516

Part 4: Answer: Large vs. Normal:

Large vs. Very Large

Moth Coloration and Natural Selection Part 1: Answer: Parameter estimates:

Aaron Vincent 17 April 2014 Statistics 516

Aaron Vincent 17 April 2014 Statistics 516

Aaron Vincent 17 April 2014 Statistics 516

Aaron Vincent 17 April 2014 Statistics 516

Snow Geese Overdispersion Part 1: Answer: Parameter estimates:

Aaron Vincent 17 April 2014 Statistics 516

Part 2: Answer: Parameter Estimates:

Aaron Vincent 17 April 2014 Statistics 516

Theta: 11.07 Std. Err.: 1.92 2 x log-likelihood: -769.363

You might also like