Part 2 Project-Basic Inferential Stat

Part 2 project-Basic Inferential Data Analysis
Nilrey Jim Cornites

November 7, 2016
Load necessary library
library(ggplot2)
library(datasets)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
##
filter, lag
## The following objects are masked from 'package:base':
##
##
intersect, setdiff, setequal, union
This is part 2 of the Statistical Inference Course project. We will show the basic of inferential data analysis.
We will be analyzing TootGrowth data in the R datasets package.
Load the ToothGrowth data and perform some basic exploratory data analyses
data("ToothGrowth")
We will perform some basic exploratory data analyses such as plotting the observations and the dimension of
the data.
str(ToothGrowth)
## 'data.frame':
60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ggplot(ToothGrowth, aes(x = supp, y = len)) + geom_boxplot() +
labs(title="Boxplot of Tooth Length by Supplement Type",x="supplement type", y="tooth length")
Boxplot of Tooth Length by Supplement Type
tooth length
30
20
10
OJ
VC
supplement type
We see that the data has 60 observations with 3 variables, namely the len or length of the tooth, the supp or
supplement type and dose or the dose in mg/day. Now looking on the scatterplot, the median of tooth length
in vitamin C is lower compared to orange juice but its more variable.
We observed also that dose is a numeric class and since we want to compare the length per supp and dose
also, we need to convert the dose class to factor.
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x = dose, y = len)) + geom_boxplot() +
labs(title="Boxplot of Tooth Length by Dose Type",x="dose type", y="tooth length")
Boxplot of Tooth Length by Dose Type
tooth length
30
20
10
0.5
dose type
We see in the boxplot that their is difference between dose, as dose increases the tooth length also increases.
Basic summary of the data.

We can use summary function in R to find the basic statistics per supp and dose on tooth length.
summary(ToothGrowth)
##
##
##
##
##
##
##
len
Min.
: 4.20
1st Qu.:13.07
Median :19.25
Mean
:18.81
3rd Qu.:25.27
Max.
:33.90
supp
OJ:30
VC:30
dose
0.5:20
1 :20
2 :20
In general, the length of the tooth has the mean of 18.81, with range of 4.20 to 33.90. Our supp variable is a
two level factor while dose is a 3 level factor. We will use again the summary function to summarize the
length per group (similar to the boxplot results above)
tapply(ToothGrowth$len, ToothGrowth$supp, summary)
## $OJ
##
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
3
##
8.20
15.52
##
## $VC
##
Min. 1st Qu.
##
4.20
11.20
22.70
Median
16.50
20.66
25.72
30.90
Mean 3rd Qu.

16.96
23.10
Max.
33.90
tapply(ToothGrowth$len, ToothGrowth$dose, summary)

## $`0.5`
##
Min. 1st Qu.
##
4.200
7.225
##
## $`1`
##
Min. 1st Qu.
##
13.60
16.25
##
## $`2`
##
Min. 1st Qu.
##
18.50
23.52
Median
9.850
Mean 3rd Qu.

10.600 12.250
Max.
21.500
Median
19.25
Mean 3rd Qu.

19.74
23.38
Max.
27.30
Median
25.95
Mean 3rd Qu.

26.10
27.83
Max.
33.90
Lets explore also the variability of each group.

tapply(ToothGrowth$len, ToothGrowth$supp, var)
##
OJ
VC
## 43.63344 68.32723
tapply(ToothGrowth$len, ToothGrowth$dose, var)
##
0.5
1
2
## 20.24787 19.49608 14.24421
We see that in supp, VC is more variable compared to OJ while in dose, 0.5 and 1.0 has very small difference
compared to 2.0 dose which has least variability.
Now to confirm that supp and dose has effect on the length of the tooth, we will use t test (we can use anova,
but since this project instruction is to use the test that had been discussed.
Confidence intervals and/or hypothesis tests to compare tooth growth by supp

and dose.
We begin with testing the difference of the mean tooth length by supp.
t.test(ToothGrowth$len~ToothGrowth$supp, alternative="two.sided", var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: ToothGrowth$len by ToothGrowth$supp
## t = 1.9153, df = 55.309, p-value = 0.06063
4
##
##
##
##
##
##
alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:
-0.1710156 7.5710156
sample estimates:
mean in group OJ mean in group VC
20.66333
16.96333
Now lets conduct t.test to different dose pair, (0.5,1.0),(0.5,2.0) and (1.0, 2.0).
## first pair (0.5,1.0)
t.test(subset(ToothGrowth, dose==0.5)$len,subset(ToothGrowth, dose==1.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##
##
##
##
##
Welch Two Sample t-test

data: subset(ToothGrowth, dose == 0.5)$len and subset(ToothGrowth, dose == 1)$len
t = -6.4766, df = 37.986, p-value = 1.268e-07
-11.983781 -6.276219
sample estimates:
mean of x mean of y
10.605
19.735
## second pair
##
##
##
##
##
##
##
##
##
##
##

data: subset(ToothGrowth, dose == 0.5)$len and subset(ToothGrowth, dose == 2)$len
t = -11.799, df = 36.883, p-value = 4.398e-14
-18.15617 -12.83383
sample estimates:
mean of x mean of y
10.605
26.100
## third pair
##
##
##
##
##
##
##

data: subset(ToothGrowth, dose == 1)$len and subset(ToothGrowth, dose == 2)$len
t = -4.9005, df = 37.101, p-value = 1.906e-05
5
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
##
19.735
26.100
For dose, we will use Bonferroni Correction since we have more than 1 test. We will reject the null hypothesis
that there is no difference in mean between dose if p-value is less than alpha/m test or (we use the conventional
level of significance, alpha = 0.05) or 0.0166667.
Conclusions and the Assumptions

In using the t.test function (since n is small), we assume that variance between group is not equal using two
sided test, our alpha is 0.05. 1. Between supplement type, OJ and VC, p value is 0.06 which is greater than
our alpha 0.05, we fail to reject null that mean of OJ is equal to group VC. 2. In dose, we reject the null
hypotheses since all the p-value are less than to alpha/m or 0.0166667 and conclude that the 3 level dose is
significantly different to each other.

Part 2 Project-Basic Inferential Stat

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 2 Project-Basic Inferential Stat

Uploaded by

Copyright:

Available Formats

Part 2 project-Basic Inferential Data Analysis

Nilrey Jim Cornites

Boxplot of Tooth Length by Supplement Type

Boxplot of Tooth Length by Dose Type

Basic summary of the data.

Mean 3rd Qu.

Mean 3rd Qu.

tapply(ToothGrowth$len, ToothGrowth$dose, summary)

Mean 3rd Qu.

Mean 3rd Qu.

Mean 3rd Qu.

Lets explore also the variability of each group.

Confidence intervals and/or hypothesis tests to compare tooth growth by supp

alternative hypothesis: true difference in means is not equal to 0

Welch Two Sample t-test

Welch Two Sample t-test

Welch Two Sample t-test

Conclusions and the Assumptions

You might also like