You are on page 1of 25

ggplot2

a layer-based introduction
NYC R Meetup
December
October 3rd,
1st, 2009

Harlan D. Harris
harlan@harris.name
ggplot's philosophy
● Graphics are (should be!) created by combining
a specification with data. (Wilkinson, 2005)
● The specification is not the name of the visual
form (bar graph, scatterplot, histogram).
● The specification is a collection of rules that
together describe how to build a graph, a

Grammar of Graphics

December 3, 2009 Harlan D. Harris 2


graphics as grammar
12

data
10

12 8 Column
10 1
6
8 Colum Column
2
6
4
n1
Colum date ct sz z 4

2
Column
n2 3
2 Colum 0
0 n3
Row 2 Row 4
Row 2 Row 4 Row 1 Row 3
Row 1 Row 3

x=date
y=ct/sz
me bars
group by z

December 3, 2009 Harlan D. Harris 3


advantages
● Flexible
● can define new graph types by changing
specifications
● can combine many forms into single graphs
● Smart
● compact: rules have useful defaults
● graphs always have meaning
● Reusable
● can plug new data into old specification
● can explore many types of plots from a set of data

December 3, 2009 Harlan D. Harris 4


ggplot2
● Hadley Wickham (Rice Univ.)
● also: reshape, plyr, etc.
● Extends & implements
The Grammar of Graphics (Wilkinson, 1995, 2005)
● Focus on layers; based on grid
● Specification as R objects constructed by functions
● Large library of components with good defaults
● ggplot2: Elegant Graphics for Data Analysis
(Wickham, 2009)

December 3, 2009 Harlan D. Harris 5


my gripes
● Specification is hierarchical structure;
grammar is left-to-right R expression
● Can't see the structure (usefully)
● Abuses both notation and R semantics
● Deep Magic with lazy evaluation, proto objects
● Existing tutorials lead to conceptual confusion,
requires relearning of fundamentals
● Start with the structure, not with the shortcuts

December 3, 2009 Harlan D. Harris 6


goal

December 3, 2009 Harlan D. Harris 7


data to plot

December 3, 2009 Harlan D. Harris 8


ggplot likes “long” data

December 3, 2009 Harlan D. Harris 9


will plot model vs. empirical

December 3, 2009 Harlan D. Harris 10


simplest plot
aes=”aesthetics”=”create mapping”

December 3, 2009 Harlan D. Harris 11


you don't need
to know this!
structure
ggplot(data=d.long.EI, mapping=aes(x=Parameter, y=Errors,
color=Condition)) +
layer(geom="line")

ggplot
data layers mapping scales coords facets options
x=Param.
(copy) Ø y=Errs
color=Cond.

layer[1]
data mapping geom stat geom_ stat_
identity params params
line

● structure(p), str(p)
December 3, 2009 Harlan D. Harris 12
add empirical data and chance

December 3, 2009 Harlan D. Harris 13


you don't need
to know this!
structure so far
ggplot
data layers mapping scales coords facets options
x=Param.
(copy) y=Errs
color=Cond.

layer[1]
data mapping geom stat geom_ stat_
layer[1] line identity params params
data mapping geom stat geom_ stat_
layer[1] params params
(U)
data mapping
point
geom
identity
stat
size=3
geom_ stat_
layer[1] params params
(K)
data yint=Errs
mapping hline
geom hline
stat size=2
geom_ stat_
color=”black”
params params
linetype=2
yint=[64] hline hline size=.5

December 3, 2009 Harlan D. Harris 14


scales

December 3, 2009 Harlan D. Harris 15


coordinates & scales
● coordinates affect display of axes
● cartesian, polar, map, etc.
● scales affect data mapping
● colors, shapes, lines
● source of confusion
● set axis ticks/breaks and labels with
scale_x_continuous() or scale_y_discrete(), but
● set axis AND DATA range with
coord_cartesian(xlim=c(1,10))

December 3, 2009 Harlan D. Harris 16


options

December 3, 2009 Harlan D. Harris 17


shortcuts
● All those layer() calls are tedious!
● geom_*() creates a layer with a specific geom
(and various defaults, including a stat)
● stat_*() creates a layer with a specific stat
(and various defaults, including a geom)
● qplot() creates a ggplot and a layer

December 3, 2009 Harlan D. Harris 18


quick note on stats
● stat=”identity”
● stat=”lm”
● fit y=f(x) with lm(), generate new data to be plotted
by geom_line(), CIs with geom_ribbon()
● stat=”smooth”
● fit y=f(x) with loess()
● stat=”summary”
● y=f(x) with arbitrary f()
● stat=”bin”
● histograms
December 3, 2009 Harlan D. Harris 19
simplest faceted plot

December 3, 2009 Harlan D. Harris 20


everything else (+alpha)

December 3, 2009 Harlan D. Harris 21


other things I find useful
● scale_x_continuous(breaks=seq(1,9,2),
labels=c(“one”, “”, “five”, “”, “nine”))
● geom_text(aes(x=.., y=.., label=..))
● annotate(geom=”text”, x=14, y=19, “outlier!”)
● geom_density()
● stat_summary(fun.data=”mean_cl_boot”,
geom=”crossbar”)
● geom_jitter(position=position_jitter(width=.5))

December 3, 2009 Harlan D. Harris 22


takehomes
● a ggplot graph is generated by a specification +
data
● ggplot specifications are a core object plus
layers
● mappings among data, x/y, scales, and other
attributes are fundamental
● geom and stat shortcuts allow smart/compact
construction of graphs
● ggplot encourages good graphs, with facets,
good use of color, no chartjunk
December 3, 2009 Harlan D. Harris 23
thanks!

December 3, 2009 Harlan D. Harris 24


resources
● Wickham, H. (2009) ggplot2: Elegant Graphics
for Data Analysis. Springer.
● http://had.co.nz/ggplot2/
● http://groups.google.com/group/ggplot2
● http://stackoverflow.com/questions/tagged/r

December 3, 2009 Harlan D. Harris 25

You might also like