You are on page 1of 6

Part 2 project-Basic Inferential Data Analysis

Nilrey Jim Cornites


November 7, 2016
Load necessary library
library(ggplot2)
library(datasets)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
##
filter, lag
## The following objects are masked from 'package:base':
##
##
intersect, setdiff, setequal, union
This is part 2 of the Statistical Inference Course project. We will show the basic of inferential data analysis.
We will be analyzing TootGrowth data in the R datasets package.

Load the ToothGrowth data and perform some basic exploratory data analyses
data("ToothGrowth")
We will perform some basic exploratory data analyses such as plotting the observations and the dimension of
the data.
str(ToothGrowth)
## 'data.frame':
60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ggplot(ToothGrowth, aes(x = supp, y = len)) + geom_boxplot() +
labs(title="Boxplot of Tooth Length by Supplement Type",x="supplement type", y="tooth length")

Boxplot of Tooth Length by Supplement Type

tooth length

30

20

10

OJ

VC

supplement type
We see that the data has 60 observations with 3 variables, namely the len or length of the tooth, the supp or
supplement type and dose or the dose in mg/day. Now looking on the scatterplot, the median of tooth length
in vitamin C is lower compared to orange juice but its more variable.
We observed also that dose is a numeric class and since we want to compare the length per supp and dose
also, we need to convert the dose class to factor.
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x = dose, y = len)) + geom_boxplot() +
labs(title="Boxplot of Tooth Length by Dose Type",x="dose type", y="tooth length")

Boxplot of Tooth Length by Dose Type

tooth length

30

20

10

0.5

dose type
We see in the boxplot that their is difference between dose, as dose increases the tooth length also increases.

Basic summary of the data.


We can use summary function in R to find the basic statistics per supp and dose on tooth length.
summary(ToothGrowth)
##
##
##
##
##
##
##

len
Min.
: 4.20
1st Qu.:13.07
Median :19.25
Mean
:18.81
3rd Qu.:25.27
Max.
:33.90

supp
OJ:30
VC:30

dose
0.5:20
1 :20
2 :20

In general, the length of the tooth has the mean of 18.81, with range of 4.20 to 33.90. Our supp variable is a
two level factor while dose is a 3 level factor. We will use again the summary function to summarize the
length per group (similar to the boxplot results above)
tapply(ToothGrowth$len, ToothGrowth$supp, summary)
## $OJ
##
Min. 1st Qu.

Median

Mean 3rd Qu.

Max.
3

##
8.20
15.52
##
## $VC
##
Min. 1st Qu.
##
4.20
11.20

22.70
Median
16.50

20.66

25.72

30.90

Mean 3rd Qu.


16.96
23.10

Max.
33.90

tapply(ToothGrowth$len, ToothGrowth$dose, summary)


## $`0.5`
##
Min. 1st Qu.
##
4.200
7.225
##
## $`1`
##
Min. 1st Qu.
##
13.60
16.25
##
## $`2`
##
Min. 1st Qu.
##
18.50
23.52

Median
9.850

Mean 3rd Qu.


10.600 12.250

Max.
21.500

Median
19.25

Mean 3rd Qu.


19.74
23.38

Max.
27.30

Median
25.95

Mean 3rd Qu.


26.10
27.83

Max.
33.90

Lets explore also the variability of each group.


tapply(ToothGrowth$len, ToothGrowth$supp, var)
##
OJ
VC
## 43.63344 68.32723
tapply(ToothGrowth$len, ToothGrowth$dose, var)
##
0.5
1
2
## 20.24787 19.49608 14.24421
We see that in supp, VC is more variable compared to OJ while in dose, 0.5 and 1.0 has very small difference
compared to 2.0 dose which has least variability.
Now to confirm that supp and dose has effect on the length of the tooth, we will use t test (we can use anova,
but since this project instruction is to use the test that had been discussed.

Confidence intervals and/or hypothesis tests to compare tooth growth by supp


and dose.
We begin with testing the difference of the mean tooth length by supp.
t.test(ToothGrowth$len~ToothGrowth$supp, alternative="two.sided", var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: ToothGrowth$len by ToothGrowth$supp
## t = 1.9153, df = 55.309, p-value = 0.06063
4

##
##
##
##
##
##

alternative hypothesis: true difference in means is not equal to 0


95 percent confidence interval:
-0.1710156 7.5710156
sample estimates:
mean in group OJ mean in group VC
20.66333
16.96333

Now lets conduct t.test to different dose pair, (0.5,1.0),(0.5,2.0) and (1.0, 2.0).
## first pair (0.5,1.0)
t.test(subset(ToothGrowth, dose==0.5)$len,subset(ToothGrowth, dose==1.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##
##
##
##
##

Welch Two Sample t-test


data: subset(ToothGrowth, dose == 0.5)$len and subset(ToothGrowth, dose == 1)$len
t = -6.4766, df = 37.986, p-value = 1.268e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.983781 -6.276219
sample estimates:
mean of x mean of y
10.605
19.735

## second pair
t.test(subset(ToothGrowth, dose==0.5)$len,subset(ToothGrowth, dose==2.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##
##
##
##
##

Welch Two Sample t-test


data: subset(ToothGrowth, dose == 0.5)$len and subset(ToothGrowth, dose == 2)$len
t = -11.799, df = 36.883, p-value = 4.398e-14
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-18.15617 -12.83383
sample estimates:
mean of x mean of y
10.605
26.100

## third pair
t.test(subset(ToothGrowth, dose==1.0)$len,subset(ToothGrowth, dose==2.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##

Welch Two Sample t-test


data: subset(ToothGrowth, dose == 1)$len and subset(ToothGrowth, dose == 2)$len
t = -4.9005, df = 37.101, p-value = 1.906e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
5

## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
##
19.735
26.100
For dose, we will use Bonferroni Correction since we have more than 1 test. We will reject the null hypothesis
that there is no difference in mean between dose if p-value is less than alpha/m test or (we use the conventional
level of significance, alpha = 0.05) or 0.0166667.

Conclusions and the Assumptions


In using the t.test function (since n is small), we assume that variance between group is not equal using two
sided test, our alpha is 0.05. 1. Between supplement type, OJ and VC, p value is 0.06 which is greater than
our alpha 0.05, we fail to reject null that mean of OJ is equal to group VC. 2. In dose, we reject the null
hypotheses since all the p-value are less than to alpha/m or 0.0166667 and conclude that the 3 level dose is
significantly different to each other.