Professional Documents
Culture Documents
The following is a summary of R commands that we will be using in SDS 302. Please refer to
the pre-labs for examples of their usage, including the appropriate arguments of the commands.
Create a vector
newdata <- c(1,3,5,7,9) creates a vector containing the numbers 1, 3, 5, 7 and 9
credit <- Karloff$CreditHours creates a vector of credit hours for all students in Karloff
full_temp <- Karloff$Temp[Karloff$Full.Time == Yes] a vector of temp of full-time only
Create a dataframe
males <- Karloff[Karloff$Sex==M,] creates dataframe with just the males from Karloff
fulltime <- Karloff[Karloff$Credit.Hours >=12,] creates dataframe with students with 12+ hrs
Descriptive statistics
fivenum(x) data summary of variable x
mean(x) sample mean of variable x
sd(x) sample standard deviation of variable x
cor(x,y) correlation between variable x and variable y
pnorm(0.375) area under the normal curve at or below z = 0.375
1 - pnorm(0.375) area under the normal curve above z = 0.375
Histogram options
hist(x) histogram of variable x
hist(x, main = title, xlab = x axis label) with title and axis labels
hist(x, n=15) with 15 bins
hist(x, breaks=seq(1.5,5.25,.25)) with bins of size .25 that range from 1.5 to 5.25
Displays
plot(x, main = title, xlab = x axis label) barplot of categorical variable x
barplot(table(x,y), Legend = T, beside = T, main = title) side by side barplot of variables x, y
boxplot(x) boxplot of variable x
boxplot(x~y) side-by-side boxplots of variable x by category y
plot(x,y, main = title, xlab = x axis label, ylab = y axis label) scatterplot of y against x
abline(lm(y~x)) generates line-of-best-fit on scatterplot
Modeling
linFit(x,y) fit a linear model to predict y from x
expFit(x,y) fit an exponential model to predict y from x
logisticFit(x,y) fit a logistic model to predict y from x
tripleFit(x,y) fit all three models simultaneously
Predict values
expFitPred(x,y,95) use exponential model to predict value of y when x=95
logisticFitPred(x,y,95) use logistic model to predict value of y when x=95
Random sampling
sample(x,n=10) draw random sample of size n=10 from variable x
t-tests
t.test(x, mu=100) run one-sample t-test where the null says =100
t.test(x, mu=100, alternative = greater) run one-tailed t-test where alternative says >100
t.test(x1, x2) run two-sample t-test
Chi Square
chisq.test(table(x,y), correct = F) test of independence for variable x and variable y
chisq.test(table(x), p=c(.25,.25,.25,.25), correct = F) goodness of fit test where expected
values for variable x are equally distributed with 25% in each category
ANOVA
aggregate(Pulse~Class,Karloff,mean) mean of pulse for every class in the Karloff dataset
aggregate(Pulse~Class,Karloff,sd) sd of pulse for every class in the Karloff dataset
model <- aov(Karloff$Pulse~Karloff$Class) compare pulse rates of different classes
summary(model) shows the summary table from the ANOVA test
TukeyHSD(model) shows the results of the post-hoc TukeyHSD test