Professional Documents
Culture Documents
BEGINNERS
guide to
FROM IDG
Table of Contents
Create choropleth maps in R
n Syntax 1: By equation
n Syntax 4: mapply()
n Syntax 5: dplyr
n dplyr basics
More R Resources
fy <- c(2010,2011,2012,2010,2011,2012,2010,2011,2012)
company <- c("Apple","Apple","Apple","Google","Google",
"Google","Microsoft","Microsoft","Microsoft")
revenue <- c(65225,108249,156508,29321,37905,50175,62484,
69943,73723)
profit <- c(14013,25922,41733,8505,9737,10737,18760,23150,
16978)
companiesData <- data.frame(fy, company, revenue, profit)
The code above will create a data frame like the one below,
stored in a variable named "companiesData":
fy
company
revenue
profit
2010
Apple
65225
14013
2011
Apple
108249
25922
2012
Apple
156508
41733
2010
29321
8505
2011
37905
9737
2012
50175
10737
2010
Microsoft
62484
18760
2011
Microsoft
69943
23150
2012
Microsoft
73723
16978
str(companiesData)
'data.frame': 9 obs.
$ fy : num 2010 2011
$ company: Factor w/
2 2 3 3 3
$ revenue: num 65225
$ profit : num 14013
of 4 variables:
2012 2010 2011 ...
3 levels "Apple","Google",..: 1 1 1 2
108249 156508 29321 37905 ...
25922 41733 8505 9737 ...
company
revenue
profit
margin
2010
Apple
65225
14013
21.48409
2011
Apple
108248
25922
23.94664
2012
Apple
156508
41733
26.66509
2010
29321
8505
29.00651
2011
37905
9737
25.68790
2012
50175
10737
21.39910
2010
Microsoft
62484
18760
30.02369
2011
Microsoft
69943
23150
33.09838
2012
Microsoft
73723
16978
23.02945
company
revenue
profit
margin
2010
Apple
65225
14013
21.5
2011
Apple
108248
25922
23.9
2012
Apple
156508
41733
26.7
2010
29321
8505
29.0
2011
37905
9737
25.7
2012
50175
10737
21.4
2010
Microsoft
62484
18760
30.0
2011
Microsoft
69943
23150
33.1
2012
Microsoft
73723
16978
23.0
Thats fine for a function like sum, where you take each
number and do the same thing to it. But lets go back to our
earlier example of calculating a profit margin for each row.
In that case, we need to pass profit and revenue in a certain
order its profit divided by revenue, not the other way
around and then multiply by 100.
How can we pass multiple items to apply() in a certain
order for use in an anonymous function(x)? By referring
to the items in our anonymous function as x[1] for the first
one, x[2] for the second, etc., such as:
companiesData$margin <- apply(companiesData[,
c('revenue', 'profit')], 1,
function(x) { (x[2]/x[1]) * 100 } )
That line of code above creates an anonymous function
that uses the second item in this case profit, since its
listed second in companiesData[,c('revenue', 'profit')] and
divides it by the first item in each row, revenue. This will work
because there are only two items here, revenue and profit
remember, we told apply() to use only those columns.
Syntax 4: mapply()
This, and the simpler sapply(), also can apply a function to
some but not necessarily all columns in a data frame,
without having to worry about numbering each item like
x[1] and x[2] above. The mapply() format to create a new
column in a data frame is:
dataFrame$newColumn <- mapply(someFunction,
dataFrame$column1, dataFrame$column2,
dataFrame$column3)
The code above would apply the function someFunction()
to the data in column1, column2 and column3 of each row
of the data frame.
Note that the first argument of mapply() here is the name
of a function, not an equation or formula. So if we want
(profit/revenue) * 100 as our result, we could first write our
own function to do this calculation and then use it with
mapply().
Heres how to create a named function, profitMargin(),
that takes two variables in this case were calling them
netIncome and revenue just within the function and
return the first variable divided by the second variable
times 100, rounded to one decimal place:
8
11
bestMargin
Apple
26.7
29.0
Microsoft
33.1
company
revenue
profit
margin
bestMargin
2010
Apple
65225
14013
21.5
26.7
2011
Apple
108248
25922
23.9
26.7
2012
Apple
156508
41733
26.7
26.7
2010
29321
8505
29.0
29.0
2011
37905
9737
25.7
29.0
2012
50175
10737
21.4
29.0
2010
Microsoft
62484
18760
30.0
33.1
2011
Microsoft
69943
23150
33.1
33.1
2012
Microsoft
73723
16978
23.0
33.1
12
Note that this result shows the profit margin for each
company and year in the margin column along with the
bestMargin repeated for each company and year. The only
way to tell which year has the best margin is to compare the
two columns to see where theyre equal.
ddply() lets you apply more than one function at a time, for
example:
myResults <- ddply(companiesData, 'company',
transform, highestMargin = max(margin),
lowestMargin = min(margin))
This gets you:
fy
company revenue
profit
margin
highestMargin
lowestMargin
2010
Apple
65225
14013
21.5
26.7
21.5
2011
Apple
108248
25922
23.9
26.7
21.5
2012
Apple
156508
41733
26.7
26.7
21.5
2010
29321
8505
29.0
29.0
21.4
2011
37905
9737
25.7
29.0
21.4
2012
50175
10737
21.4
29.0
21.4
2010
Microsoft
62484
18760
30.0
33.1
23.0
2011
Microsoft
69943
23150
33.1
33.1
23.0
2012
Microsoft
73723
16978
23.0
33.1
23.0
company
revenue
profit
margin
2012
Apple
156508
41733
26.7
2010
29321
8505
29.0
2011
Microsoft
69943
23150
33.1
That may look a bit daunting, but really its not so bad once
you break it down. Lets take it step by step.
The ddply(companiesData, company, function(x)) portion
should look familiar by now: companiesData is the original
data frame and function(x) says that an anonymous
13
15
16
vDates
vDates.bymonth
2013-06-01
2013-06-01
2013-07-08
2013-07-01
2013-09-01
2013-09-01
2013-09-15
2013-09-01
The new column gives the starting date for each month,
making it easy to then slice by month.
Ph.D. student Mollie Taylors blog post Plot Weekly or
Monthly Totals in R introduced me to this shortcut, which
isnt apparent if you simply read the cut() help file. If you
ever work with analyzing and plotting date-based data, this
short and extremely useful post is definitely worth a read.
Her downloadable code is available as a GitHub gist.
17
company
revenue
profit
margin
2011
Microsoft
69943
23150
33.1
2010
Microsoft
62484
18760
30.0
2010
29321
8505
29.0
2012
Apple
156508
41733
26.7
2011
37905
9737
25.7
2011
Apple
108249
25922
23.9
2012
Microsoft
73723
16978
23.0
2010
Apple
65225
14013
21.5
2012
50175
10737
21.4
Note how you can see the original row numbers reordered
at the far left.
If youd like to sort one column ascending and another
column descending, just put a minus sign before the one
thats descending. This is one way to sort this data first by
year (ascending) and then by profit margin (descending) to
see which company had the top profit margin by year:
companiesData[order(companiesData$fy,
-companiesData$margin),]
If you dont want to keep typing the name of the data frame
followed by the dollar sign for each of the column names,
Rs with() function takes the name of a data frame as the
first argument and then lets you leave it off in subsequent
arguments in one command:
companiesOrdered <- companiesData[with(companiesData,
order(fy, -margin)),]
18
While this does save typing, it can make your code somewhat
less readable, especially for less experienced R users.
Packages offer some more elegant sorting options. The
doBy package features orderBy() using the syntax
orderBy(~columnName + secondColumnName,
data=dataFrameName)
The ~ at the beginning just means by (as in order by
this). If you want to order by descending, just put a minus
sign after the tilde and before the column name. This also
orders the data frame:
companiesOrdered <- orderBy(~-margin, companiesData)
Both plyr and dplyr have an arrange() function with the syntax
arrange(dataFrameName, columnName, secondColumnName)
To sort descending, use desc(columnName))
companiesOrdered <- arrange(companiesData,
desc(margin))
company
revenue
profit
margin
2010
Apple
65225
14013
21.5
2011
Apple
108249
25922
23.9
2012
Apple
156508
41733
26.7
2010
29321
8505
29.0
2011
37905
9737
25.7
2012
50175
10737
21.4
2010
Microsoft
62484
18760
30.0
2011
Microsoft
69943
23150
33.1
2012
Microsoft
73723
16978
23.0
19
company
variable
value
2010
Apple
revenue
65225.0
2011
Apple
revenue
108249.0
2012
Apple
revenue
156508.0
2010
revenue
29321.0
2011
revenue
37905.0
2012
revenue
50175.0
2010
Microsoft
revenue
62484.0
2011
Microsoft
revenue
69943.0
2012
Microsoft
revenue
73723.0
10
2010
Apple
profit
14013.0
11
2011
Apple
profit
25922.0
12
2012
Apple
profit
41733.0
13
2010
profit
8505.0
14
2011
profit
9737.0
15
2012
profit
10737.0
16
2010
Microsoft
profit
18760.0
17
2011
Microsoft
profit
23150.0
18
2012
Microsoft
profit
16978.0
19
2010
Apple
margin
21.5
20
2011
Apple
margin
23.9
21
2012
Apple
margin
26.7
22
2010
margin
29.0
23
2011
margin
25.7
24
2012
margin
21.4
25
2010
Microsoft
margin
30.0
26
2011
Microsoft
margin
33.1
27
2012
Microsoft
margin
23.0
20
company
revenue
profit
margin
2010
Apple
65225
14013
21.48409
company
financialCategory
value
2010
Apple
revenue
65225
2010
Apple
profit
14013
2010
Apple
margin
21.5
company
variable
value
2010
Apple
revenue
65225.0
2011
Apple
revenue
108249.0
2012
Apple
revenue
156508.0
2010
revenue
29321.0
2011
revenue
37905.0
2012
revenue
50175.0
2010
Microsoft
revenue
62484.0
2011
Microsoft
revenue
69943.0
2012
Microsoft
revenue
73723.0
10
2010
Apple
profit
14013.0
11
2011
Apple
profit
25922.0
12
2012
Apple
profit
41733.0
13
2010
profit
8505.0
14
2011
profit
9737.0
15
2012
profit
10737.0
16
2010
Microsoft
profit
18760.0
17
2011
Microsoft
profit
23150.0
18
2012
Microsoft
profit
16978.0
19
2010
Apple
margin
21.5
20
2011
Apple
margin
23.9
21
2012
Apple
margin
26.7
22
2010
margin
29.0
23
2011
margin
25.7
24
2012
margin
21.4
25
2010
Microsoft
margin
30.0
26
2011
Microsoft
margin
33.1
27
2012
Microsoft
margin
23.0
23
company
financialCategory
amount
2010
Apple
revenue
65225.0
2011
Apple
revenue
108249.0
2012
Apple
revenue
156508.0
2010
revenue
29321.0
2011
revenue
37905.0
2012
revenue
50175.0
2010
Microsoft
revenue
62484.0
2011
Microsoft
revenue
69943.0
2012
Microsoft
revenue
73723.0
10
2010
Apple
profit
14013.0
11
2011
Apple
profit
25922.0
12
2012
Apple
profit
41733.0
13
2010
profit
8505.0
14
2011
profit
9737.0
15
2012
profit
10737.0
16
2010
Microsoft
profit
18760.0
17
2011
Microsoft
profit
23150.0
18
2012
Microsoft
profit
16978.0
19
2010
Apple
margin
21.5
20
2011
Apple
margin
23.9
24
21
2012
Apple
margin
26.7
22
2010
margin
29.0
23
2011
margin
25.7
24
2012
margin
21.4
25
2010
Microsoft
margin
30.0
26
2011
Microsoft
margin
33.1
27
2012
Microsoft
margin
23.0
company
revenue
profit
margin
2010
Apple
65225
14013
21.5
25
dplyr basics
The goal of dplyr is to offer a fairly easy, rational data manipulation. Creator Hadley Wickham talks about just a handful
of basic, core things you want to do when manipulating data:
NN To choose only certain observations or rows by 1 or more
criteria: filter()
NN To choose only certain variables or columns: select()
NN To sort: arrange()
26
27
ga <- data.table::fread("GAontime.csv")
# You can turn this into a dplyr class tbl_df object with
ga <- tbl_df(ga)
# Now see what happens if you just type the variable name
ga
# Look at the structure:
str(ga)
# Theres also a dplyr-specific function glimpse() with a
slightly better format
glimpse(ga)
# Lets just get Hartfield data. We want to filter for
either ORIGIN or DEST being Hartsfield with code ATL
atlanta <- filter(ga, ORIGIN == "ATL" | DEST == "ATL")
Now there are all sorts of questions we can answer with this
data.
Whats the average, median and longest delay for flights to a
specific place by carrier? Ill use Bostons Logan Airport:
bosdelays1 <- atlanta %>%
filter(DEST == "BOS") %>%
group_by(CARRIER) %>%
summarise(
avgdelay = mean(DEP_DELAY, na.rm = TRUE),
mediandelay = median(DEP_DELAY, na.rm = TRUE),
maxdelay = max(DEP_DELAY, na.rm = TRUE)
)
bosdelays1
28
29
ggplot2 101
Theres a reason ggplot2 is one of the most popular add-on
packages for R: Its a powerful, flexible and well-thought-out
platform to create data visualizations you can customize to
your hearts content.
But it also can be a bit overwhelming. While I find the logic
of plot layers to be intuitive, some of the syntax can be a
bit of a challenge. Unless you do a lot of work in ggplot2,
Im not sure how easy it is to remember that, for example,
the simple task of make my graph title bold requires the
rather wordy theme(plot.title = element_text(face =
"bold")).
What follows is a short, highly simplified guide to
visualizing data with ggplot2 along with a table of
commands for a lot of basic, useful tasks.
Theres a visualization philosophy behind ggplot2 called
the Grammar of Graphics (thats where the gg in ggplot2
comes from) to describe various components of a graphic.
Here Ill focus primarily on what code you need to build a
few basic visualizations layer by layer.
Layer 1 defines which variables are going to do what. And
thats all. Its mapping things like what data frame variable
holds your data and which column will be on your x and y
axes.
Heres an important point about the first layer: When you
use a property like color or size as an aesthetic property
(aes) in this first layer, you are not setting a specific color
or a specific size. You are saying something like I want
the color of my points to change based on the values of
this column and NOT Make the colors of my points the
specific color light blue. Picking your color(s) comes later.
A first layer might look something like this:
myplot <- ggplot(mydf, aes(x="colname1",
y="colname2", color="colname3")
That says: Create a plot using data in mydf and use the
following aesthetics: Set the x axis to colname1 values
in mydf, set the y axis to colname2 values in mydf and use
different colors depending on the values in mydf colname3.
What layer 1 doesnt do is say what kind of visualization
you want: scatterplot, bar graph, histogram, etc. For that
30
31
PLOT TYPE
FORMAT
NOTE
Any
ggplot(data=mydf,
aes(x=myxcolname,
y=myycolname))
Create basic
scatterplot
Scatterplot
+ geom_point()
Scatterplot, points
on line graph and
others
+ geom_point(size=mynumber)
Solve scatterplot
Scatterplot
issue of too many
points exactly on top
of each other
+ geom_point(position = "jitter")
Scatterplot, points
on line graph and
others
+ geom_point(shape=mynumber)
Scatterplot, points
on line graph and
others
+ geom_
point(aes(shape=mycategory))
+ scale_shape_
manual(values=myshapevector)
mycategory needs to be a
categorical variable. See chart of
available shapes.
Line graph
+ geom_line()
Line graph
+
geom_line(aes(color=mycategory))
+ geom_mychoice(color="mycolor")
ggplot(mydf, aes(x=myxcolname,
y=myycolname,
color=mygroupingcol)) +
geom_mychoice()
Any
32
TASK
PLOT TYPE
FORMAT
NOTE
Scatterplot
+ geom_point(aes(color=mygr
oupingvariable)) + scale_color_
gradient(low="mylowcolor",
high="myhighcolor")
+ geom_point(aes(color
=mygroupingvariable)) +
scale_color_brewer(type="seq",
palette="mypalettechoice")
+ geom_line(size=mysizenumber)
+ geom_line(color="mycolor")
Bar
+ geom_bar(stat="identity")
Bar
+ geom_bar()
ggplot(data = mydf,
aes(x=reorder(myxcolname,
-myycolname), y=myycolname)) +
geom_mychoice()
Bar
ggplot(mydf, aes(x=myxcolname,
y=myycolname,
fill=mygroupcolname))+
geom_bar(stat="identity",
position="dodge")
Without position="dodge", a
stacked barchart is created
33
TASK
PLOT TYPE
FORMAT
NOTE
+ geom_mychoice(fill="mycolor")
+ geom_mychoice(color="mycolor")
Bar
ggplot(mydf, aes(x=myxcolname,
y=myycolname, fill=myxcolname))
+ geom_bar(stat="identity")
+ scale_fill_manual(values=c("mycolor1",
"mycolor2", "mycolor3"))
Customize colors in
a bar graph where
colors have been
defined to change
by a category - use
RColorBrewer
Bar
+ scale_fill_brewer(palette="mycolo
rbrewerpalettename")
Create basic
histogram
Histogram
ggplot(data=mydf,
aes(x=myxcolname)) +
geom_histogram()
Histogram
+ geom_histogram(binwidth=mynumber)
Histogram
+ geom_histogram(fill="mycolor")
34
TASK
PLOT TYPE
FORMAT
NOTE
Any
+
geom_hline(yintercept=mynumber)
Set color with color argument, width with size arg and
type with linetype, such as
geom_hline(yintercept=100,
color=red", size=2,
linetype="dashed").
Any
+
geom_vline(xintercept=mynumber)
Scatterplot
+ stat_smooth(method=lm,
level=FALSE)
Scatterplot
+ stat_smooth(method=lm,
level=0.95)
Any
+ theme_mychoice()
Any
+ theme(plot.title = element_
text(size = myinteger))
Change headline
color
Any
+ theme(plot.title = element_
text(color = "mycolor"))
Any
+ theme(plot.title = element_
text(face = "bold"))
Any
Any
Any
+ scale_x_discrete(labels=myvector
oflabels)
35
+ theme(plot.title = element_
text(size = rel(myinteger))) sets
the headline size relative to the
plots base font.
TASK
PLOT TYPE
FORMAT
NOTE
Any
Any
+ ylim(mymin, mymax)
Any
+ theme(axis.text.x= element_
text(angle=myrotationAngle,
hjust=myOptionalTweak,
vjust=myOptionalTweak2))
+ theme(axis.title.y = element_
text(angle = 0))
Any
+ theme(legend.position = "none")
Change order of
legend items
Any
Any
+ theme(legend.title =
element_text(size=mypointsize))
Change legend
labels size
Any
+ theme(legend.text =
element_text(size=mypointsize))
36
TASK
PLOT TYPE
FORMAT
NOTE
+ facet_grid(mycolname1 ~
mycolname2)
Any
Any
+ annotate("text", x=myxposition,
y=myyposition, label="My text")
directlabels package
must be installed
and loaded.
Line graph
37
TASK
PLOT TYPE
direct.label(myplot,
list(last.points, hjust
= 0.7, vjust = 1))
directlabels package
must be installed
and loaded. first.
points is another
option to label at
start of line instead
of end.
Save plot
Any
FORMAT
NOTE
ggsave(filename="myname.ext")
CATEGORY
DESCRIPTION
SAMPLE USE
AUTHOR
devtools
install_
github("rstudio/
leaflet")
Hadley Wickham
& others
38
PACKAGE
CATEGORY
DESCRIPTION
SAMPLE USE
AUTHOR
installr
misc
updateR()
readxl
data import
Fast way to read Excel files in read_excel("myR, without dependencies such spreadsheet.xls",
as Java. CRAN.
sheet = 1)
googlesheets
Jennifer Bryan
RMySQL
data import
myresults <dbSendQuery(con,
"SELECT * FROM
mytable")
readr
data import
Hadley Wickham
rio
Thomas J. Leeper
& others
psych
data analysis
William Revelle
sqldf
data wrangling,
data analysis
G. Grothendieck
39
import("myfile")
sqldf("select * from
mydf where mycol
> 4")
Hadley Wickham
PACKAGE
CATEGORY
DESCRIPTION
SAMPLE USE
jsonlite
myjson <toJSON(mydf,
pretty=TRUE)
mydf2 <fromJSON(myjson)
XML
quantmod
rvest
dplyr
data wrangling,
data analysis
plyr
data wrangling
Hadley Wickham
reshape2
data wrangling
Hadley Wickham
tidyr
data wrangling
Hadley Wickham
40
AUTHOR
Duncan Temple
Lang
Hadley Wickham
PACKAGE
CATEGORY
DESCRIPTION
SAMPLE USE
AUTHOR
data.table
data wrangling,
data analysis
Useful tutorial
stringr
data wrangling
lubridate
data wrangling
Hadley Wickham
mdy("05/06/2015")
+ months(1)
More examples in
Garrett Grolthe package vignette emund, Hadley
Wickham & others
zoo
data wrangling,
data analysis
editR
data display
Simon Garnier
knitr
data display
listviewer
jsonedit(mylist)
Kent Russell
DT
data display
datatable(mydf)
RStudio
41
editR("path/to/
myfile.Rmd")
PACKAGE
CATEGORY
DESCRIPTION
SAMPLE USE
ggplot2
data visualization
qplot(factor
(myfactor), data
=mydf, geom="bar",
fill=factor(myfactor))
AUTHOR
See my searchable
ggplot2 cheat sheet
and
time-saving code
snippets.
Hadley Wickham
dygraphs
data visualization
Create HTML/JavaScript
graphs of time series - oneline command if your data is
an xts object. CRAN.
dygraph
(myxtsobject)
googleVis
data visualization
JJ Allaire &
RStudio
plot(Column)
Numerous examples
here
Markus Gesmann
& others
metricsgraphics
data visualization
RColorBrewer
data visualization
Erich Neuwirth
plotly
data visualization
rOpenSci project
leaflet
mapping
See my tutorial
RStudio
choroplethr
mapping
42
Bob Rudis
PACKAGE
CATEGORY
DESCRIPTION
SAMPLE USE
AUTHOR
state_choropleth(df_
pop_state)
Free email course by
pkg author
Ari Lamstein
tmap
mapping
fitbitScraper
misc
cookie <login(email="",
password="")
df <- get_daily_
data(cookie,
what="steps",
"2015-01-01",
"2015-05-18")
Cory Nisson
rga
Web analytics
RSiteCatalyst
Web analytics
roxygen2
package
development
on writing R
packages
Hadley Wickham
& others
shiny
data visualization
RStudio
openxlsx
misc
write.xlsx(mydf,
"myfile.xlsx")
Alexander Walker
gmodels
data wrangling,
data analysis
CrossTable
(myxvector,
myyvector, prop.t=
FALSE, prop.chisq
= FALSE)
Gregory R. Warnes
43
Martijn Tennekes
Randy Zwitch
PACKAGE
CATEGORY
DESCRIPTION
SAMPLE USE
AUTHOR
car
data wrangling
recode(x, "1:3='Low';
4:7='Mid';
8:hi='High'")
rcdimple
data visualization
foreach
data wrangling
foreach(i=1:3) %do%
sqrt(i)
Revolution
Analytics, Steve
Weston
downloader
data acquisition
download("https://
url.com/filename",
"myfilename.zip",
mode = "wb")
NA
scales
data wrangling
comma(mynumvec)
Hadley Wickham
plotly
data visualization
44
Kent Russell
PACKAGE
CATEGORY
plot_ly(d, x = carat,
y = price, text =
paste("Clarity: ",
clarity), mode =
"markers", color =
carat, size = carat)
DESCRIPTION
SAMPLE USE
45
AUTHOR
46
47
48
49
50
51
52
53
54
leaflet(nhmap) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(stroke=FALSE,
smoothFactor = 0.2,
fillOpacity = .8,
popup=nhpopup,
color= ~clintonPalette(nhmap@data$ClintonPct)
)
56
57
58
addPolygons(stroke=TRUE,
weight=1,
smoothFactor = 0.2,
fillOpacity = .75,
popup=scpopup,
color= ~trumpPalette(scmap@data$`Donald J TrumpPct`),
group="Trump"
) %>%
addPolygons(stroke=TRUE,
weight=1,
smoothFactor = 0.2,
fillOpacity = .75,
popup=scpopup,
color= ~rubioPalette(scmap@data$`Marco
RubioPct`),
group="Rubio"
) %>%
addPolygons(stroke=TRUE,
weight=1,
smoothFactor = 0.2,
fillOpacity = .75,
popup=scpopup,
color= ~cruzPalette(scmap@data$`Ted
CruzPct`),
group="Cruz"
) %>%
addPolygons(stroke=TRUE,
weight=1,
smoothFactor = 0.2,
fillOpacity = .75,
popup=scpopup,
color= ~edPalette(scmap@
data$PctCollegeDegree),
group="College degs"
) %>%
addLayersControl(
baseGroups=c("Winners", "Trump", "Rubio", "Cruz",
"College degs"),
position = "bottomleft",
options = layersControlOptions(collapsed = FALSE)
)
59
Interactive map with multiple layers. Click on the radio buttons at the bottom left to change which layer displays.
addLayersControl can have two types of groups:
baseGroups, like used above, which allow only one
layer to be viewed at a time; and overlayGroups, where
multiple layers can be viewed at once and each turned off
individually.
60
# install.packages("htmlwidgets")
save(widget=scGOPmap2, file="scGOPprimary_withdependencies.
html", selfcontained=FALSE, libdir = "js")
This should get you started on creating your own
choropleth maps with R.
61
62
download.file("https://opendata.socrata.com/api/views/ddymzvjk/rows.csv?accessType=DOWNLOAD", destfile="starbucks.
csv", method="curl")
Heres code to read in the data and make the map:
starbucks <- read.csv("starbucks.csv",
stringsAsFactors = FALSE)
str(starbucks)
atlanta <- subset(starbucks, City == "Atlanta" & State ==
"GA")
leaflet() %>% addTiles() %>% setView(-84.3847, 33.7613,
zoom = 16) %>%
addMarkers(data = atlanta, lat = ~ Latitude, lng = ~
Longitude,popup = atlanta$Name) %>%
addPopups(-84.3847, 33.7616, "Data journalists at work,
<b>NICAR 2015</b>")
63
64
65
66
Whichever way you find it, save that value in a variable so you
dont have to keep typing it. You can use a command like:
id <- "1234567"
(Replace the number with your actual ID, and make sure to
put it between quote marks.) This stores your profile ID as
the variable id.
Step 3: Extract data
Now were ready to start pulling some data using the ga
instance we just created. The getData method will actually
extract data from your Google Analytics account that you
can then store in another new R variable. If you want to see
all available methods for your ga object, run:
ga$getRefClass()
You can query the Google API for metrics and dimensions.
Metrics are things like page views, visits and organic
searches; dimensions include information like traffic
sources and visitor type. (See Googles Dimensions Metrics
Reference for full details.)
In addition, you can focus your query by criteria like visits
from search, visits with conversions (assuming youve set
that up in Google Analytics beforehand) and even visits just
from tablets, by including segments in a query. Finally, you
can also create your own filters to narrow your results.
68
69
70
71
72
More R Resources
For more resources to help improve your R skills, see
Computerworlds 60+ R resources.
73