You are on page 1of 68

R Graphics

Hukum Chandra
ICAR-National Fellow & Principal Scientist
Email: hchandra@iasri.res.in

ICAR-Indian Agricultural Statistics Research Institute


Library Avenue, PUSA, New Delhi, India
www.iasri.res.in
Outline
Introduction to Graphics in R
Examples of commonly used graphics functions
Common options for customizing graphs
High-Level Plot Functions
Low-Level Plot Functions
Saving Graphs

2
Simple Graphics
Graphics - one of the most important aspects of presentation and
analysis of data is generation of proper graphics

Graphic features of a data can be viewed very effectively using R

R is capable of creating high quality graphics

Graphs are typically created using a series of high-level and low-


level plotting commands

High-level functions create new plots and low-level functions add


information to an existing plot

Customize graphs (line style, symbols, color, etc) by specifying


graphical parameters
Specify graphic options using the par() function
3
Graphic Parameters
The function par() is used to set or get graphical parameters

This function contains 70 possible settings and allows you to


adjust almost any feature of a graph

Graphic parameters are reset to the defaults with each new


graphic device

Most elements of par() can be set as additional arguments to a


plot command, however there are some that can only be set by a
call to par(),mfrow, mfcol see the documentation for others

4
High-Level Plot Functions

5
Low-Level Plot Functions

6
Scatterplot and Line Graphs
Scatter plots: are useful for studying dependencies between variables.

The plot() function is used for producing scatterplots and line graphs

See ? plot

Using the plot command


x <- seq(0,10,0.2)
y <- sqrt(x)
plot(x,y); grid()
As one might guess, the last command adds a grid to the plot.

7
8
plot(x,y); grid()
plot(x,y, type="b", col="blue", lwd=1, lty=4, pch=5, main="My plot",
xlab="x axis", ylab="y axis")
grid(col="red")

9
Common arguments for plot()
type 1-character string denoting the plot type
xlim x limits, c(x1, x2)
ylim y limits, c(y1, y2)
main Main title for the plot
sub Sub title for the plot
xlab x-axis label
ylab y-axis label
col Color for lines and points, either a character string or a number
that indexes the palette()
pch Number referencing a plotting symbol or a character string
cex A number giving the character expansion of the plot symbols
lty Number referencing a line type
lwd Line width

10
plot(x,y,type="b",col="blue",lwd=1,lty=4,pch=5, main="My plot", xlab="x
axis", ylab="y axis")
grid(col="red")
text(8,2,"this is my example plot")
abline(h=1,v=4, col=c("darkred","green"), lty=c(1,4), lwd=c(4,6))

reg.lm=lm(x~y)
abline(reg.lm, col="red",lwd=6) #To add the regression line

11
There is wealth of plotting parameters you can set
plot(x,y)
plot(x,y, pch=16) # plot with new mark with dark circle

x1<- seq(1,5,0.1)
lines(x1,.5*x1) #lines will add (x,y) values

12
## EXAMPLES with Yield Data #########################

data2=read.csv("yielddata.csv",header=T)
plot(data2$Fert,data2$Yield)
grid()

plot(data2$Fert,data2$Yield, type="p", col="blue", lwd=1, lty=4, pch=1, main="My


plot for yield versus fertiliser ", xlab="yield", ylab="Fertiliser")
grid(col="red")

plot(data2$Fert,data2$Yield,type="p",col="green",lwd=1,lty=4,pch=9, main="My
plot", xlab="x axis", ylab="y axis")
text(250000,30000,"this is my example plot")
abline(h=20000,v=200000, col=c("darkred","green"), lty=c(1,4), lwd=c(1,2))

reg.lm=lm(x~y)
abline(reg.lm, col="red",lwd=6) #To add the regression line

13
dx<- rnorm(20,5,5) ## generate 100 random number from standard normal distribution
dy<- rchisq(20,5) ## generate 100 random number from chisq distribution with mean 5
plot(dx,dy,pch=1)
fit<-lm(dx~dy)
abline(fit,col="red",lwd=4)
text(10,4,"Fitted line")

See ? plot
See ? points

14
x <- rnorm(50) ;y <- rnorm(50)
group <- rbinom(50, size=1, prob=.5)

# Basic Scatterplot
plot(x, y)
plot(x, y, xlab="X values", ylab="Y values", main="Simple Y vs X", pch=15, col="red")
3
2
1
y

0
-1
-2

-2 -1 0 1

15
# Distinguish between two separate groups
plot(x, y, xlab="X values", ylab="Y values", main="Grouped data Y vs X",
pch=ifelse(group==1, 5, 19), col=ifelse(group==1, "red", "blue"))

plot(x, y, xlab="X", ylab="Y", main="Y vs X", type="n")


points(x[group==1], y[group==1], pch=5, col="red")
points(x[group==0], y[group==0], pch=19, col="blue")

plot(x, y, xlab="X", ylab="Y", main="Y vs X", type="n")


points(cbind(x,y)[group==1,], pch=5, col="red")
points(cbind(x,y)[group==0,], pch=19, col="blue")
16
Line Graphs
# Basic Line Graphs
plot(sort(x), sort(y), type="l", lty=2, lwd=2, col="blue")

plot(x, y, type="n")
lines(sort(x), sort(y), type="b")
lines(cbind(sort(x),sort(y)), type="l", lty=1, col="blue")

17
plot(sort(x), type="n")
lines(sort(x), type="b", pch=8, col="red")
lines(sort(y), type="l", lty=6, col="blue")

18
Histogram and Density Plot
Histograms: used to study the distribution of continuous data, use
command hist.
hist: function to plot histogram

## generate 100 random numbers from standard normal distribution


# Basic Histogram
u<- rnorm(100)
hist(u) #default histogram

19
hist(u, density=20) #with shading

20
The sequence of commands below plots two histograms in one window

par(mfrow=c(1,2)); hist(u);hist(u, density=50)

par(mfrow=c(a,b)) gives a rows with b plots on each row.


21
#with specific number of bins
par(mfrow=c(1,2)); hist(u, density=5, breaks=20); hist(u, density=20, breaks=20)

Read in the help file about hist- help(hist)


22
# Probability/proportion, instead of frequency also specifying y-axis
hist(u, density=20, breaks=-3:3, ylim=c(0,.5), prob=TRUE)

23
hist(u,freq=F,ylim = c(0,0.8))
curve(dnorm(x), col = 2, lty = 2, lwd = 2, add = TRUE)

The freq=F argument to hist ensures that the histogram is in terms of


densities rather than absolute counts

24
# overlay normal curve with x-lab and ylim # colored normal curve
# Uses the observed mean and standard deviation for plotting the normal curve
m<-mean(u) ;std<-sqrt(var(u))
hist(u, density=20, breaks=20, prob=TRUE, xlab="x-variable", col="red",
ylim=c(0, 0.7), main="normal curve over histogram")
curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE)

25
hist(u, density=10, breaks=20, col="red", prob=TRUE, xlab="x-variable", ylim=c(0,0.8),
main="Density curve over histogram")
lines(density(u),col = "blue")

# Kernel Density Plot


u<- rnorm(100)
d <- density(u) # returns the density data
plot(d)

26
Boxplots
Boxplots: also a useful tool for studying data. It shows the median,
quartiles and possible outliers.
The R command is boxplot, which we use on the same variables as the
histogram:

# Basic boxplot
boxplot(u, xlab="my variable", boxwex=.4)
boxplot(u, xlab="my variable", boxwex=.6, col="blue", border= "red, lty=2,
lwd=2)

27
## we creat data: three variables

u1<- rnorm(100) ## generate 100 random number from standard normal distribution
u2<- rchisq(100,5) ## generate 100 random number from chisq distribution with mean 5
u3<- rnorm(100,5,1) ## generate 100 random number from normal distribution with mean 5, sd 1

boxplot(u1,u2,u3, boxwex=.4)
boxplot(u1,u2,u3, boxwex=c(.2,.4,.6),col=c("red","blue","green"))

28
variablename<-c("low","medium", "high")
boxplot(u1,u2,u3,names=variablename,boxwex=c(.2,.4,.6), col=c("red","blue","green"),
ylim=c(-5, 20), xlab="variable status")
boxplot(u1,u2,u3,names=variablename, boxwex=c(.2,.4,.6),col=c("red","blue","green"),ylim=c(-
5, 20),xlab="variable status", notch = TRUE)

## try
boxplot(u, xlab="my variable", pars = list(boxwex = 0.5, staplewex = .5, outwex = 0.5),plot = F)
boxplot(u, xlab="my variable", pars = list(boxwex = 0.5, staplewex = .5, outwex = 0.5),plot = T)
?boxplot

29
Barchart (or barplot)
The R command is barplot
MPCE <- c(400, 300,600,550,425)

Suppose data in MPCE are average MPCE of some states whose names are to be
assigned against their value. Following commands are required:
names(MPCE)<-c("UP","MP","Punjab","TN","WB")

To assign names of states. Double quotation mark means that names are
characters not numeric.

barplot(MPCE, names=names(MPCE), ylab="MPCE (Rs)",col="blue")

30
barplot(MPCE, names=names(MPCE),ylab="MPCE (Rs)", col = c("blue","red","gray","orange","black"))

600
500
400
MPCE (Rs)

300
200
100
0

UP MP Punjab TN WB

31
barplot(MPCE, space=2,names=names(MPCE),xlab="States", ylab="MPCE (Rs)", col =
c("blue","red","gray","orange","black"))

?barplot

32
You can plot more than one curve on a single plot, and label them via a
legend:
range <- seq(-10,10, by = 0.001)
norm1 <- dnorm(range, mean=0, sd=1)
norm2 <- dnorm(range, mean=1, sd=2)
plot(range,norm1, type="l", lty=1, col="red", main="Two Normal Distributions",
xlab="Range", ylab="Probability Density")
points(range, norm2, type="l", lty=2,col="blue")
legend(x=-10,y=0.4,legend= c("N(0,1)", "N(1,2)"), lty=c(1,2),col=c("red","blue"))

33
34
curve()
The function curve() draws a curve corresponding to a given function
If the function is written within curve() it needs to be a function of x
If you want to use a multiple argument function, use x for the argument
you wish to plot over

# Plot a 5th order polynomial


curve(3*x^5-5*x^3+2*x, from=-1.25, to=1.25, lwd=2, col="blue")

35
# Plot the gamma density
curve(dgamma(x, shape=2, scale=1), from=0, to=7, lwd=2, col="red")

# Plot multiple curves, notice that the first curve determines the x-axis
curve(dnorm, from=-3, to=5, lwd=2, col="red")
curve(dnorm(x, mean=2), lwd=2, col="blue", add=TRUE)

# Add vertical lines at the means


lines(c(0, 0), c(0, dnorm(0)), lty=2, col="red")
lines(c(2, 2), c(0, dnorm(2, mean=2)), lty=2, col="blue")

36
Clean out the workspace
rm(list=ls())

#List objects in workspace


ls()
#File path is relative to working directory
#Get or Set Working Directory
getwd()
setwd()
# e.g. setwd("C:/Documents and Settings/ Desktop")

37
Saving Graphs
Graphs can be saved using several different formats, such as PDFs,
JPEGs, and BMPs, by using pdf(), jpeg() and bmp(), respectively

Graphs are saved to the current working directory


Save graphics by choosing File -> Save as

# Create a single pdf of figures, with one graph on each page

Graphics devices for BMP, JPEG, PNG and TIFF format bitmap files.
png(file="My Histogram.png",width=400,height=350) # Start graphics device
par(mar=c(5,4,2,2)+0.1) #margin size c(bottom, left, top, right)
m<-mean(u) ;std<-sqrt(var(u))
hist(u, density=20, breaks=20, prob=TRUE, xlab="x-variable", col="red", ylim=c(0, 0.7))
curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE)
dev.off() # Stop graphics device

#bmp(filename = "plot.bmp", )
#jpeg(filename = "plot.jpg",
#pdf("C://SavingExample.pdf", width=7, height=5)

38
# Create multiple pdfs of figures, with one pdf per figure

pdf(width=7, height=5, onefile=FALSE)


x <- rnorm(100)
hist(x, main="Histogram of X")
plot(x, main="Scatterplot of X")
dev.off() # Stop graphics device

39
Packages

Packages are collections of R functions, data, and compiled code in a


well-defined format. The directory where packages are stored is called
the library

The base distribution comes with some high priority add on packages,
for example, boot, nlme, stats, grid, foreign, MASS, spatial etc

The packages included as default in base distribution implement


standard statistical functionality, for example, linear models, classical
tests etc

Packages not included in the base distribution can be downloaded and


installed directly from R prompt

Once installed, they have to be loaded into the session to be used

Currently, the CRAN package repository has 4348 packages


40
library() # To see all installed packages

help("INSTALL") or help("install.packages") in R for information on


how to install packages from this repository

Adding Packages
Choose Install Packages from the Packages menu
Select a CRAN Mirror
Select a package (e.g. car)
Then use the library(package) function to load it for use (e.g.
library(car))

41
Load R PACKAGES

42
43
44
45
Alternative way
Load from local drive, first download from site

46
47
48
Package car (Companion to Applied Regression)

library(car)
Before starting with the use of any package it is advisable to go through its
documentation.

http://cran.r-project.org/web/packages/car/index.html

http://cran.r-project.org/web/packages/car/car.pdf

49
50
Creating Your Own Package
We may want to share our code with other people, or simply make it easier
to use ourselves. There are two popular ways of starting a new package:

Load all functions and data sets you want in the package into a clean
R session, and run package.skeleton(). The objects are sorted into
data and functions, skeleton help files are created for them using
prompt() and a DESCRIPTION file is created. The function then prints
out a list of things for you to do next

Create it manually, which is usually faster for experienced developers

51
Structure of a package
The extracted sources of an R package are simply a directory
somewhere on your hard drive. The directory has the same name as the
package and the following contents:

A file named DESCRIPTION with descriptions of the package, author,


and license conditions in a structured text format that is readable by
computers and by people

A man/ subdirectory of documentation files.


An R/ subdirectory of R code.
A data/ subdirectory of datasets.

Less commonly it contains

A src/ subdirectory of C, Fortran or C++ source


exec/ for other executables (eg Perl or Java)

52
Simple Scatterplot
? mtcars
mtcars
attach (mtcars)
plot(wt, mpg, main="Scatterplot Example", xlab="Car Weight ",
ylab="Miles Per Gallon ", pch=19)

53
# Add fit lines

# regression line (y~x)


abline(lm(mpg~wt), col="red")

# lowess line (x,y) : Normally a local linear polynomial fit is used


lines(lowess(wt,mpg), col="blue")

54
The scatterplot( ) function in the car package offers many enhanced
features, including fit lines, marginal box plots, conditioning on a factor,
and interactive point identification

# Enhanced Scatterplot of mpg vs. weight by number of Car cylinders


# Load package car
library(car)
scatterplot(mpg ~ wt |cyl, data=mtcars, xlab="Weight of Car", ylab="Miles
Per Gallon", main="Enhanced Scatter Plot", labels=row.names(mtcars))

55
56
Scatterplot Matrices

# Basic Scatterplot Matrix


pairs(~mpg+disp+drat+wt,data=mtcars, main="Simple Scatterplot Matrix")

57
The car package can condition the scatterplot matrix on a factor, and optionally
include lowess and linear best fit lines, and boxplot, densities, or histograms in
the principal diagonal, as well as rug plots in the margins of the cells.

# Scatterplot Matrices from the car Package


library(car)
scatterplotMatrix(~mpg+disp+drat+wt|cyl, data=mtcars, main="Three Cylinder Options")

58
The gclus package provides options to rearrange the variables so that
those with higher correlations are closer to the principal diagonal. It can
also color code the cells to reflect the size of the correlations.
# Scatterplot Matrices from the glus Package

library(gclus)
dta <- mtcars[c(1,3,5,6)] # get data
dta.r <- abs(cor(dta)) # get correlations
dta.col <- dmat.color(dta.r) # get colors

# reorder variables so those with highest correlation are closest to the


diagonal

dta.o <- order.single(dta.r)


cpairs(dta, dta.o, panel.colors=dta.col, gap=.5,main="Variables Ordered
and Colored by Correlation" )

59
60
High Density Scatterplots
When there are many data points and significant overlap, scatterplots
become less useful
There are several approaches that be used when this occurs
The hexbin(x,y) function in the hexbin package provides bivariate
binning into hexagonal cells

# High Density Scatterplot with Binning


Load hexbin package
library(hexbin)
x <- rnorm(1000)
y <- rnorm(1000)
bin<-hexbin(x, y, xbins=50)
plot(bin, main="Hexagonal Binning")

61
Hexagonal Binning

3
Counts
60
2
56
53
1 49
45
42
0 38

y
34
30
-1
27
23
-2 19
16
12
-3 8
5
1

-4 -2 0 2
x

bin<-hexbin(x, y, xbins=50)
plot(bin, main="Hexagonal Binning")
Another option for a scatterplot with significant point overlap is the
sunflowerplot.

See help(sunflowerplot) for details


# High Density Scatterplot with Color Transparency

62
3D Scatterplots
# 3D Scatterplot
Load package scatterplot3d
library(scatterplot3d)
attach(mtcars)

scatterplot3d(wt,disp,mpg, color="red", col.axis="blue", pch=16,


col.grid="lightblue", main="3D Scatterplot")

63
# 3D Scatterplot with Coloring and Vertical Drop Lines

library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE, type="h",col.axis="blue", main="3D
Scatterplot")

64
#3D Scatterplot with Coloring and Vertical Lines and Regression Plane

library(scatterplot3d)
attach(mtcars)
s3d <-scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE,
type="h", main="3D Scatterplot")
fit <- lm(mpg ~ wt+disp)
s3d$plane3d(fit)

65
Spinning 3D Scatterplots

You can also create an interactive 3D scatterplot using the plot3D(x,


y, z) function in the rgl package
It creates a spinning 3D scatterplot that can be rotated with the
mouse
The first three arguments are the x, y, and z numeric vectors
representing points
col= and size= control the color and size of the points respectively

Load package rgl


library(rgl)
plot3d(wt, disp, mpg, col="red", size=3)

66
67
You can perform a similar function with the scatter3d(x, y, z) in the Rcmdr
package.

68

You might also like