You are on page 1of 2

Intro stats Formula interface One categorical variable One quantitative variable

with mosaic Use for graphics, statistics, inference, and Counts by category Make output more readable
modeling operations. tally(~ sex, data = HELPrct) options(digits = 3)
(lattice version)
goal(y ~ x, data = mydata) Percentages by category Compute summary statistics
Read as “Calculate goal for y using
Essential R syntax mydata “broken down by” x, or
tally(~ sex, format = mean(~ cesd, data = HELPrct)
"percent", data = HELPrct)
Names in R are case sensitive “modeled by” x. bargraph(~ sex, type = Other summary statistics work similarly
mean(age ~ sex, data = HELPrct) "percent", data = HELPrct) median() iqr() max() min()
Function and arguments
For graphics: fivenum() sd() var() sum()
rflip(10)
goal(y ~ x | z, groups = w, Table of summary statistics
Optional arguments
data = mydata) favstats(~ cesd, data = HELPrct)
rflip(10, prob = 0.8)
y : y-axis variable (optional) Summary statistics by group
Assignment x : x-axis variable (required) favstats(cesd ~ sex,
x <- rflip(10, prob = 0.8)
z : panel-by variable (optional) data = HELPrct)
Getting help on any function w : color-by variable (optional) Quantiles
help(mean)
bwplot(wage ~ sex, data = CPS85) quantile(~ cesd, data = HELPrct,
prob = c(0.25, 0.5, 0.8))
Loading packages Histogram
Tests and confidence intervals histogram(~ cesd, width = 5,
library(mosaic)
Exact test center = 2.5, data = HELPrct)
Arithmetic operations result1 <-
binom.test(~ (homeless ==
+ - * / basic operations "homeless"), data = HELPrct)
^ exponentiation Approximate test (large samples)
( ) grouping result2 <-
sqrt(x) square root prop.test(~ (homeless ==
xyplot(wage ~ educ | sex, "homeless"), data = HELPrct)
abs(x) absolute value
data = CPS85)
log10(x) logarithm, base 10 Extract confidence intervals and p-values
log(x) natural logarithm, base e confint(result1)
pval(result2)
exp(x) exponential function ex Normal probability plot
factorial(k)  qqmath(~ cesd, dist = "qnorm",
data = HELPrct)
Logical operators Examining data Density plot
densityplot(~ cesd, data =
== is equal to (note double equal sign) Print short summary of all variables HELPrct)
!= is not equal to xyplot(wage ~ educ, inspect(HELPrct)
Dot plot
< is less than groups = sex, data = CPS85, Number of rows and columns
auto.key = TRUE) dotPlot(~ cesd, data = HELPrct)
<= is less than or equal to dim(HELPrct)
> is greater than nrow(HELPrct) One-sample t-test
ncol(HELPrct) result <- t.test(~ cesd, mu =
>= is greater than or equal to
34, data = HELPrct)
& A & B is TRUE if both A and B are Print first rows or last rows
head(KidsFeet) Extract confidence intervals and p-values
TRUE
tail(KidsFeet, 10) confint(result)
| A | B is TRUE if one or both of A and
pval(result)
B are TRUE Names of variables
%in% includes; for example names(HELPrct)
"C" %in% c("A", "B") is FALSE

RStudio® is a trademark of RStudio, Inc. • CC BY Michael maviolette • statman54@gmail.com Adapted from A Student’s Guide to R by NJ Horton, R Pruim & DT Kaplan • Updated: 02/18
Two categorical variables Two quantitative variables Data management Quantitative response,
Contingency table with margins Correlation coefficient From dplyr package categorical predictor
tally(~ substance + sex, cor(cesd ~ mcs, data = HELPrct) Drop or reorder variables
Two-level predictor: two-sample t test
margins = TRUE, select()
data = HELPrct) Scatterplot with regression line and smooth Numeric summaries
xyplot(cesd ~ mcs, Create new variables from existing ones
favstats(~cesd | sex,
Percentages by column type = c("p", "r", "smooth"), mutate()
data = HELPrct)
tally(~ sex |substance, data = HELPrct) Retain specific rows from data
format = "percent", filter() Comparative normal probability plot
data = HELPrct) qqmath(~cesd | sex, data = HELPrct,
Sort data rows layout = c(1, 2)) # also bwplot
Mosaic plot arrange()
mosaicplot(~ substance + sex, Compute summary statistics by group
color = TRUE, data = HELPrct) group_by()
summarize()
Merge data tables
left_join()
inner_join()
Simple linear regression
cesdmodel <- lm(cesd ~ mcs,
data = HELPrct) Importing data
msummary(cesdmodel)
Chi-square test Prediction Import file from computer or URL Dotplot for smaller samples
xchisq.test(~ substance + sex, lmfunction <- makeFun(cesdmodel) MustangPrice <- xyplot(sex ~ length, alpha = 0.6,
data = HELPrct, lmfunction(mcs = 35) read.file("C:/MustangPrice.csv") cex = 1.4, data = KidsFeet)
correct = FALSE) # NOTE: R uses forward slashes!
Extract useful quantities Dome <- Two-sample t-test and confidence interval
anova(cesdmodel) read.file("http://www.mosaic- result <- t.test(cesd ~ sex,
Distributions coef(cesdmodel) web.org/go/datasets/Dome.csv") var.equal = FALSE, data = HELPrct)
confint(cesdmodel) confint(result)
Normal distribution function rsquared(cesdmodel) Randomization and More than two levels: Analysis of variance
pnorm(13, mean = 10, sd = 2) Diagnostics; plot residuals Numeric summaries
histogram(~resid (cesdmodel),
simulation favstats(cesd ~ substance,
Normal distribution function with graph
data = HELPrct)
xpnorm(1.645, mean = 0, sd = 1) density = TRUE) Fix random number sequence
qqmath(~resid(cesdmodel)) set.seed(42) Graphic summaries
Normal distribution quantiles
Diagnostics; plot residuals vs. fitted bwplot(cesd ~ substance, pch = "|",
qnorm(0.95) # mean = 0, sd = 1 Tossing coins data = HELPrct)
xyplot(resid(cesdmodel) ~ rflip(10) # default prob is 0.5
Normal distribution quantiles with graph fitted(cesdmodel), Fitt and summarize model
xqnorm(0.85, mean = 10, sd = 2) Do something repeatedly modsubstance <- lm(cesd ~ substance,
type = c("p", "smooth", "r"))
do(5) * rflip(10, prob = 0.75) data = HELPrct)
Binomial density function (“size” means n)
Draw a simple random sample anova(modsubstance)
dbinom(5, size = 8, prob = 0.65) Categorical response, sample(LETTERS, 10) Which differences are significant?
Binomial distribution function
pbinom(5, size = 8, prob = 0.65)
quantitative predictor deal(Cards, 5) # poker hand pairwise <- TukeyHSD(modsubstance)
Resample with replacement mplot(pairwise)
Central portion of distribution Small <- sample(KidsFeet, 10)
Logistic regression
cdist("norm", 0.95) resample(Small)
logit_mod <-
cdist("t", c(0.90, 0.99), df = 5)
glm(homeless ~ age + female, Random permutation (shuffling)
Plotting distributions family = binomial, data = HELPrct) shuffle(Cards)
plotDist("binom", size = 8, msummary(logitmod)
Random values from distributions
prob = 0.65, xlim = c(-1, 9)) Odds ratios and confidence intervals rbinom(5, size = 10, prob = 0.7)
plotDist("norm", mean = 10, exp(coef(logit_mod)) rnorm(5, mean = 10, sd = 2)
sd = 2) exp(confint(logit_mod))

RStudio® is a trademark of RStudio, Inc. • CC BY Michael maviolette • statman54@gmail.com Adapted from A Student’s Guide to R by NJ Horton, R Pruim & DT Kaplan • Updated: 02/18

You might also like