You are on page 1of 10

Aaron Vincent 17 April 2014 Statistics 516 Carriers of Steptococcus Pyogenes and Tonsil Size Part 1: Answer:

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.9606 0.1904 -15.546 <2e-16 *** SizeNormal -0.3035 0.3015 -1.007 0.3141 SizeVery Large 0.5440 0.2857 1.904 0.0569 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7.3209e+00 on 2 degrees of freedom Residual deviance: -1.9984e-15 on 0 degrees of freedom AIC: 20.851 Number of Fisher Scoring iterations: 3

R Input:
> model1=glm(cbind(Carrier, Non.carrier)~Size, family=binomial, data=Tonsils) > summary(model1)

R Output:
Call: glm(formula = cbind(Carrier, Non.carrier) ~ Size, family = binomial, data = Tonsils) Deviance Residuals: [1] 0 0 0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.9606 0.1904 -15.546 <2e-16 *** SizeNormal -0.3035 0.3015 -1.007 0.3141 SizeVery Large 0.5440 0.2857 1.904 0.0569 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7.3209e+00 on 2 degrees of freedom Residual deviance: -1.9984e-15 on 0 degrees of freedom AIC: 20.851 Number of Fisher Scoring iterations: 3

Part 2:

Aaron Vincent 17 April 2014 Statistics 516 Answer:


Analysis of Deviance Table Model 1: cbind(Carrier, Non.carrier) ~ 1 Model 2: cbind(Carrier, Non.carrier) ~ Size Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 2 7.3209 2 0 0.0000 2 7.3209 0.02572 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

R Input:
> model1=glm(cbind(Carrier, Non.carrier)~Size, family=binomial, data=Tonsils) > model1.5=glm(cbind(Carrier, Non.carrier)~1, family=binomial, data=Tonsils) > anova(model1.5, model1, test="LRT")

R Output:
Analysis of Deviance Table Model 1: cbind(Carrier, Non.carrier) ~ 1 Model 2: cbind(Carrier, Non.carrier) ~ Size Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 2 7.3209 2 0 0.0000 2 7.3209 0.02572 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Part 3: Answer: LOG-ODDS Normal: -3.264151 CI: (-3.722312, -2.805990) Large: -2.960641 CI: (-3.333902, -2.587380) Very Large: -2.416658 CI: (-2.834200, -1.999116) ODDS: Normal: 0.03822938 CI: (0.02417801, 0.06044688) Large: 0.05178571

Aaron Vincent 17 April 2014 Statistics 516 CI: (0.03565371, 0.07521686) Very Large: 0.08921933 CI: (0.05876555, 0.13545503) PROBABILITY: Normal: 0.03682171 CI: (0.02360723, 0.05700134) Large: 0.04923599 CI:
(0.03442629, 0.06995505)

Very Large: 0.08191123 CI: (0.05550381, 0.11929577) R Input:


> source("http://dl.dropboxusercontent.com/u/10884844/Rcode/contrastfix.R") > install.packages("contrast") > library(contrast) > cont=contrast(model1, list(Size=levels(Tonsils$Size)), cnames = levels(Tonsils$Size)) > contrastfix(cont, wald=TRUE) > contrastfix(cont, wald=TRUE, exp=TRUE) > Normal1=plogis(c(-3.264151, -3.722312, -2.805990)) > Large1=plogis(c(-2.960641, -3.333902, -2.587380)) > Very.Large1=plogis(c(-2.416658,-2.834200, -1.999116))

R Output:
> contrastfix(cont, wald=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) Large -2.960641 0.1904428 -3.333902 -2.587380 -15.55 NA 0 Normal -3.264151 0.2337598 -3.722312 -2.805990 -13.96 NA 0 Very Large -2.416658 0.2130355 -2.834200 -1.999116 -11.34 NA 0 > contrastfix(cont, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) Large 0.05178571 0.1904428 0.03565371 0.07521686 -15.55 NA 0 Normal 0.03822938 0.2337598 0.02417801 0.06044688 -13.96 NA 0 Very Large 0.08921933 0.2130355 0.05876555 0.13545503 -11.34 NA 0 > Normal1 [1] 0.03682171 0.02360723 0.05700134 > Large1 [1] 0.04923599 0.03442629 0.06995505 > Very.Large1

Aaron Vincent 17 April 2014 Statistics 516


[1] 0.08191123 0.05550381 0.11929577

Part 4: Answer: Large vs. Normal:


Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.354605 0.3015164 0.7501732 2.446042 1.01 NA 0.1571 This means that you have a 1.35x higher chance of getting Streptococcus if you have large tonsils vs. normal tonsils. Normal vs. Very Large Contrast S.E. Lower Upper t df Pr(>|t|) 1 2.33379 0.3162717 1.255599 4.337832 2.68 NA 0.0037 This means that you have a 2.33x higher chance of getting Streptococcus if you have very large tonsils vs. normal tonsils.

Large vs. Very Large


Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.722856 0.2857492 0.9840538 3.016332 1.9 NA 0.0285 This means that you have a 1.72x higher chance of getting Streptococcus if you have very large tonsils vs. large tonsils.

R Input:
cont2=contrast(model1, list(Size="Large"), list(Size="Normal")) contrastfix(cont2, wald=TRUE, exp=TRUE) cont3=contrast(model1, list(Size="Very Large"), list(Size="Large")) contrastfix(cont3, wald=TRUE, exp=TRUE) cont4=contrast(model1, list(Size="Very Large"), list(Size="Normal")) contrastfix(cont4, wald=TRUE, exp=TRUE)

R Output:
> contrastfix(cont2, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.354605 0.3015164 0.7501732 2.446042 1.01 NA 0.1571 > contrastfix(cont3, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1 1.722856 0.2857492 0.9840538 3.016332 1.9 NA 0.0285 > contrastfix(cont4, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1 2.33379 0.3162717 1.255599 4.337832 2.68 NA 0.0037

Moth Coloration and Natural Selection Part 1: Answer: Parameter estimates:


(Intercept) -1.128987

Aaron Vincent 17 April 2014 Statistics 516


Distance 0.018502 Morphlight 0.411257 Distance:Morphlight -0.027789 Standard errors: (Intercept) 0.197906 Distance 0.005645 Morphlight 0.274490 Distance:Morphlight 0.008085

R Input:
summary(model) modela=glm(cbind(Removed, Placed-Removed)~Distance+Morph+Distance:Morph, data=case2102, family=binomial) summary(modela) newdata1 <- expand.grid(Distance = seq(0, 52, length = 100), Morph = c("light","dark")) newdata1$yhat <- predict(modela, newdata1, type = "response") plot1 <- ggplot(case2102, aes(x = Distance, y =Removed/Placed, color = Morph)) plot1 <- plot1 + geom_point() + geom_line(aes(y = yhat), data = newdata1) plot1 <- plot1 + ylab("Observed/Predicted Proportion") plot(plot1)

R Output:
> summary(modela) Call: glm(formula = cbind(Removed, Placed - Removed) ~ Distance + Morph + Distance:Morph, family = binomial, data = case2102) Deviance Residuals: Min 1Q Median 3Q Max -2.21183 -0.39883 0.01155 0.68292 1.31242 Coefficients: (Intercept) Estimate Std. Error z value Pr(>|z|) -1.128987 0.197906 -5.705 1.17e-08 ***

Aaron Vincent 17 April 2014 Statistics 516


Distance 0.018502 0.005645 3.277 0.001048 ** Morphlight 0.411257 0.274490 1.498 0.134066 Distance:Morphlight -0.027789 0.008085 -3.437 0.000588 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 35.385 on 13 degrees of freedom Residual deviance: 13.230 on 10 degrees of freedom AIC: 83.904 Number of Fisher Scoring iterations: 4 > newdata1 <- expand.grid(Distance = seq(0, 52, length = 100), Morph = c("light","dark")) > newdata1$yhat <- predict(modela, newdata1, type = "response") > plot1 <- ggplot(case2102, aes(x = Distance, y =Removed/Placed, color = Morph)) > plot1 <- plot1 + geom_point() + geom_line(aes(y = yhat), data = newdata1) > plot1 <- plot1 + ylab("Observed/Predicted Proportion") > plot(plot1)

Part 2: Answer:
glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) light 0.9907562 0.005788365 0.9795796 1.002060 -1.60 NA 0.0543 dark 1.0186745 0.005645336 1.0074653 1.030008 3.28 NA 0.0005

This means that for every 1 unit increase in distance chance of light being removed goes down by 0.01x and dark goes up by 1.01x. R Input:
source("http://dl.dropboxusercontent.com/u/10884844/Rcode/contrastfix.R") install.packages("contrast") library(contrast) conta=contrast(modela, list(Morph=c("light", "dark"), Distance=2), list(Morph=c("light", "dark"), Distance=1), cnames=c("light", "dark")) contrastfix(conta, wald=TRUE, exp=TRUE)

R Output:
> conta=contrast(modela, list(Morph=c("light", "dark"), Distance=2), list(Mor ph=c("light", "dark"), Distance=1), cnames=c("light", "dark")) > contrastfix(conta, wald=TRUE, exp=TRUE) #so for every 1 unit increase in di stance chance of light being removed goes up by 1.009x and dark goes down by .98x glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) light 0.9907562 0.005788365 0.9795796 1.002060 -1.60 NA 0.0543 dark 1.0186745 0.005645336 1.0074653 1.030008 3.28 NA 0.0005

Part 3: Answer:

Aaron Vincent 17 April 2014 Statistics 516


Contrast S.E. Lower Upper t df Pr(>|t|) 1.508713 0.2744898 0.8809688 2.583764 1.5 NA 0.067

This means that at a distance of 0 units the chance of being removed if a moth is a light morph is 1.5x higher than if it was dark.
Contrast S.E. Lower Upper t df Pr(>|t|) 2.659651 0.2195914 1.729451 4.090169 4.45 NA 0

This means that at a distance of 50 units the chance of being removed if a moth is a dark morph is ~2.66x higher than if it was light. R Input:
contb=contrast(modela, list(Morph="light", Distance=0), list(Morph= "dark", Distance=0)) contrastfix(contb, wald=TRUE, exp=TRUE) contc=contrast(modela, list(Morph="dark", Distance=50), list(Morph= "light", Distance=50)) contrastfix(contc, wald=TRUE, exp=TRUE)

R Output:
> contrastfix(contb, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 1.508713 0.2744898 0.8809688 2.583764 1.5 NA 0.067 > contrastfix(contc, wald=TRUE, exp=TRUE) glm model parameter contrast Contrast S.E. Lower Upper t df Pr(>|t|) 2.659651 0.2195914 1.729451 4.090169 4.45 NA

Part 4: Answer: There is no overdispersion because there is only one point that is greater than 2 on the residual vs. predicted plot and, in the summary statistics of the model, the residual deviance is not >2x the null deviance.

R Input:
plot(predict(modela, type = "response"), rstudent(modela))

R Output:

Aaron Vincent 17 April 2014 Statistics 516


> plot(predict(modela, type = "response"), rstudent(modela))

Snow Geese Overdispersion Part 1: Answer: Parameter estimates:


(Intercept) -0.45550 observerobs2 -3.09079 photo 0.80007 observerobs2:photo 0.30457

Standard Errors:
(Intercept) 3.83921 observerobs2 5.64216 photo 0.06317 observerobs2:photo 0.09580

Using the quasipoisson function gives you the same parameter estimates but higher standard error estimates, thus making your evaluations of the model more accurate by giving you bigger confidence intervals and more robust test statistics and p-values. Additionally, the quasipoisson distribution has adequately dealt with the overdispersion because only a few points are greater than 2 on the residual vs predicted plot and, in the summary statistics of the model, the residual deviance is not >2x the null deviance.

R Input:
model3.1 <- glm(count ~ observer + photo + observer:photo, data = snowgeese.long, family = quasipoisson(link = identity)) summary(model3.1) summary(model) plot(predict(model3.1, type = "response"), rstudent(model3.1)) plot(predict(model, type = "response"), rstudent(model))

R Output:
> summary(model3.1) Call:

Aaron Vincent 17 April 2014 Statistics 516


glm(formula = count ~ observer + photo + observer:photo, family = quasipoisson(link = identity), data = snowgeese.long) Deviance Residuals: Min 1Q Median 3Q Max -7.4540 -1.5993 -0.5203 0.7914 12.2823 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.45550 3.83921 -0.119 0.90584 observerobs2 -3.09079 5.64216 -0.548 0.58525 photo 0.80007 0.06317 12.666 < 2e-16 *** observerobs2:photo 0.30457 0.09580 3.179 0.00205 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for quasipoisson family taken to be 10.69254) Null deviance: 7049.97 on 89 degrees of freedom Residual deviance: 859.79 on 86 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 5 > plot(predict(model3.1, type = "response"), rstudent(model3.1))

Part 2: Answer: Parameter Estimates:


(Intercept) 1.42894 observerobs2 -2.94485 photo 0.75447 observerobs2:photo 0.29679

Standard Error:
(Intercept) 2.25435 observerobs2 3.36972 photo 0.05827 observerobs2:photo 0.09396

Using a negative binomial gives you different parameter estimates and higher standard error estimates, thus making your evaluations of the model more accurate by giving you bigger confidence intervals and more robust test statistics and p-values. Additionally, the negative binomial distribution has adequately dealt with the overdispersion because only a few points are greater than 2 on the residual vs predicted plot and, in the summary statistics of the model, the residual deviance is not >2x the null deviance.

Aaron Vincent 17 April 2014 Statistics 516

10

R Input:
library(MASS) model3.2 <- glm.nb(count ~ observer + photo + observer:photo, data = snowgeese.long, link = identity) summary(model3.2) plot(predict(model3.2, type = "response"), rstudent(model3.2)) plot(predict(model, type = "response"), rstudent(model))

R Output:
> summary(model3.2) Call: glm.nb(formula = count ~ observer + photo + observer:photo, data = snowgeese.long, link = identity, init.theta = 11.0702114) Deviance Residuals: Min 1Q Median 3Q Max -2.0805 -0.6863 -0.2228 0.3886 2.9868 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.42894 2.25435 0.634 0.52617 observerobs2 -2.94485 3.36972 -0.874 0.38217 photo 0.75447 0.05827 12.947 < 2e-16 *** observerobs2:photo 0.29679 0.09396 3.159 0.00158 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for Negative Binomial(11.0702) family taken to be 1) Null deviance: 793.411 on 89 degrees of freedom Residual deviance: 84.989 on 86 degrees of freedom AIC: 779.36 Number of Fisher Scoring iterations: 1

Theta: 11.07 Std. Err.: 1.92 2 x log-likelihood: -769.363

You might also like