Intro Stats With Mosaic: One Quantitative Variable Formula Interface One Categorical Variable

Intro stats Formula interface One categorical variable One quantitative variable
with mosaic Use for graphics, statistics, inference, and Counts by category Make output more readable
modeling operations. tally(~ sex, data = HELPrct) options(digits = 3)
(lattice version)
goal(y ~ x, data = mydata) Percentages by category Compute summary statistics
Read as “Calculate goal for y using
Essential R syntax mydata “broken down by” x, or
tally(~ sex, format = mean(~ cesd, data = HELPrct)
"percent", data = HELPrct)
Names in R are case sensitive “modeled by” x. bargraph(~ sex, type = Other summary statistics work similarly
mean(age ~ sex, data = HELPrct) "percent", data = HELPrct) median() iqr() max() min()
Function and arguments
For graphics: fivenum() sd() var() sum()
rflip(10)
goal(y ~ x | z, groups = w, Table of summary statistics
Optional arguments
data = mydata) favstats(~ cesd, data = HELPrct)
rflip(10, prob = 0.8)
y : y-axis variable (optional) Summary statistics by group
Assignment x : x-axis variable (required) favstats(cesd ~ sex,
x <- rflip(10, prob = 0.8)
z : panel-by variable (optional) data = HELPrct)
Getting help on any function w : color-by variable (optional) Quantiles
help(mean)
bwplot(wage ~ sex, data = CPS85) quantile(~ cesd, data = HELPrct,
prob = c(0.25, 0.5, 0.8))
Loading packages Histogram
Tests and confidence intervals histogram(~ cesd, width = 5,
library(mosaic)
Exact test center = 2.5, data = HELPrct)
Arithmetic operations result1 <-
binom.test(~ (homeless ==
+ - * / basic operations "homeless"), data = HELPrct)
^ exponentiation Approximate test (large samples)
( ) grouping result2 <-
sqrt(x) square root prop.test(~ (homeless ==
xyplot(wage ~ educ | sex, "homeless"), data = HELPrct)
abs(x) absolute value
data = CPS85)
log10(x) logarithm, base 10 Extract confidence intervals and p-values
log(x) natural logarithm, base e confint(result1)
pval(result2)
exp(x) exponential function ex Normal probability plot
factorial(k) qqmath(~ cesd, dist = "qnorm",
data = HELPrct)
Logical operators Examining data Density plot
densityplot(~ cesd, data =
== is equal to (note double equal sign) Print short summary of all variables HELPrct)
!= is not equal to xyplot(wage ~ educ, inspect(HELPrct)
Dot plot
< is less than groups = sex, data = CPS85, Number of rows and columns
auto.key = TRUE) dotPlot(~ cesd, data = HELPrct)
<= is less than or equal to dim(HELPrct)
> is greater than nrow(HELPrct) One-sample t-test
ncol(HELPrct) result <- t.test(~ cesd, mu =
>= is greater than or equal to
34, data = HELPrct)
& A & B is TRUE if both A and B are Print first rows or last rows
head(KidsFeet) Extract confidence intervals and p-values
TRUE
tail(KidsFeet, 10) confint(result)
| A | B is TRUE if one or both of A and
pval(result)
B are TRUE Names of variables
%in% includes; for example names(HELPrct)
"C" %in% c("A", "B") is FALSE
RStudio® is a trademark of RStudio, Inc. • CC BY Michael maviolette • statman54@gmail.com Adapted from A Student’s Guide to R by NJ Horton, R Pruim & DT Kaplan • Updated: 02/18
Two categorical variables Two quantitative variables Data management Quantitative response,
Contingency table with margins Correlation coefficient From dplyr package categorical predictor
tally(~ substance + sex, cor(cesd ~ mcs, data = HELPrct) Drop or reorder variables
Two-level predictor: two-sample t test
margins = TRUE, select()
data = HELPrct) Scatterplot with regression line and smooth Numeric summaries
xyplot(cesd ~ mcs, Create new variables from existing ones
favstats(~cesd | sex,
Percentages by column type = c("p", "r", "smooth"), mutate()
data = HELPrct)
tally(~ sex |substance, data = HELPrct) Retain specific rows from data
format = "percent", filter() Comparative normal probability plot
data = HELPrct) qqmath(~cesd | sex, data = HELPrct,
Sort data rows layout = c(1, 2)) # also bwplot
Mosaic plot arrange()
mosaicplot(~ substance + sex, Compute summary statistics by group
color = TRUE, data = HELPrct) group_by()
summarize()
Merge data tables
left_join()
inner_join()
Simple linear regression
cesdmodel <- lm(cesd ~ mcs,
data = HELPrct) Importing data
msummary(cesdmodel)
Chi-square test Prediction Import file from computer or URL Dotplot for smaller samples
xchisq.test(~ substance + sex, lmfunction <- makeFun(cesdmodel) MustangPrice <- xyplot(sex ~ length, alpha = 0.6,
data = HELPrct, lmfunction(mcs = 35) read.file("C:/MustangPrice.csv") cex = 1.4, data = KidsFeet)
correct = FALSE) # NOTE: R uses forward slashes!
Extract useful quantities Dome <- Two-sample t-test and confidence interval
anova(cesdmodel) read.file("http://www.mosaic- result <- t.test(cesd ~ sex,
Distributions coef(cesdmodel) web.org/go/datasets/Dome.csv") var.equal = FALSE, data = HELPrct)
confint(cesdmodel) confint(result)
Normal distribution function rsquared(cesdmodel) Randomization and More than two levels: Analysis of variance
pnorm(13, mean = 10, sd = 2) Diagnostics; plot residuals Numeric summaries
histogram(~resid (cesdmodel),
simulation favstats(cesd ~ substance,
Normal distribution function with graph
data = HELPrct)
xpnorm(1.645, mean = 0, sd = 1) density = TRUE) Fix random number sequence
qqmath(~resid(cesdmodel)) set.seed(42) Graphic summaries
Normal distribution quantiles
Diagnostics; plot residuals vs. fitted bwplot(cesd ~ substance, pch = "|",
qnorm(0.95) # mean = 0, sd = 1 Tossing coins data = HELPrct)
xyplot(resid(cesdmodel) ~ rflip(10) # default prob is 0.5
Normal distribution quantiles with graph fitted(cesdmodel), Fitt and summarize model
xqnorm(0.85, mean = 10, sd = 2) Do something repeatedly modsubstance <- lm(cesd ~ substance,
type = c("p", "smooth", "r"))
do(5) * rflip(10, prob = 0.75) data = HELPrct)
Binomial density function (“size” means n)
Draw a simple random sample anova(modsubstance)
dbinom(5, size = 8, prob = 0.65) Categorical response, sample(LETTERS, 10) Which differences are significant?
Binomial distribution function
pbinom(5, size = 8, prob = 0.65)
quantitative predictor deal(Cards, 5) # poker hand pairwise <- TukeyHSD(modsubstance)
Resample with replacement mplot(pairwise)
Central portion of distribution Small <- sample(KidsFeet, 10)
Logistic regression
cdist("norm", 0.95) resample(Small)
logit_mod <-
cdist("t", c(0.90, 0.99), df = 5)
glm(homeless ~ age + female, Random permutation (shuffling)
Plotting distributions family = binomial, data = HELPrct) shuffle(Cards)
plotDist("binom", size = 8, msummary(logitmod)
Random values from distributions
prob = 0.65, xlim = c(-1, 9)) Odds ratios and confidence intervals rbinom(5, size = 10, prob = 0.7)
plotDist("norm", mean = 10, exp(coef(logit_mod)) rnorm(5, mean = 10, sd = 2)
sd = 2) exp(confint(logit_mod))
RStudio® is a trademark of RStudio, Inc. • CC BY Michael maviolette • statman54@gmail.com Adapted from A Student’s Guide to R by NJ Horton, R Pruim & DT Kaplan • Updated: 02/18

Intro Stats With Mosaic: One Quantitative Variable Formula Interface One Categorical Variable

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro Stats With Mosaic: One Quantitative Variable Formula Interface One Categorical Variable

Uploaded by

Copyright:

Available Formats

Intro stats Formula interface One categorical variable One quantitative variable

You might also like