You are on page 1of 27

R Programming

Overview
1. The Basics
2. R Data Structures
3. Data Input/Output
4. In-Built Functions
5. Data Visualization
What R does and does not
• data handling and storage: • is not a database, but
numeric, textual connects to DBMSs
• matrix algebra • has no graphical user
interfaces, but connects to
• hash tables and regular
Java, TclTk
expressions
• language interpreter can be
• high-level data analytic
very slow, but allows to
and statistical functions
call own C/C++ code
• classes (“OO”)
• no spreadsheet view of
• graphics data, but connects to
• programming language: Excel/MsOffice
loops, branching, • no professional /
subroutines commercial support
R and statistics
• Packaging: a crucial infrastructure to efficiently produce, load
and keep consistent software libraries from (many) different
sources / authors
• Statistics: most packages deal with statistics and data analysis
• State of the art: many statistical researchers provide their
methods as R packages
R as a Calculator
> 1550+2000
[1] 3550
or various calculations in the same row
> 2+3; 5*9; 6-6
[1] 5

1.0
[1] 45
[1] 0

0.5
sin(seq(0, 2 * pi, length = 100))
> log2(32)
[1] 5

0.0
> sqrt(2)

-0.5
[1] 1.414214
> seq(0, 5, length=6) -1.0

[1] 0 1 2 3 4 5
> plot(sin(seq(0, 2*pi, length=100))) 0 20 40

Index
60 80 100
Variables

> i = 81
> sqrt(i) numeric
[1] 9

> prov = "All that Glitters are not Gold"


character
> sub("Glitters ","Glisters",prov)
[1] " All that Glisters are not Gold“ string

> 1>2
[1] FALSE logical
Object orientation

primitive (or: atomic) data types in R are:

• numeric (integer, double, complex)


• character
• logical
• function
Numbers in R: NAN and NA

• NAN (not a number)


• NA (missing value)
– Basic handling of missing values
>x
[1] 1 2 3 4 5 6 7 8 NA
> mean(x)
[1] NA
> mean(x,na.rm=TRUE)
[1] 4.5
Objects in R
• Objects in R obtain values by assignment.
• This is achieved by the gets arrow, <-, and not the
equal sign, =.
• Objects can be of different kinds.
R Data Structures

 Vector
 Matrix
 Array
 Factor
 Data Frame
 List
Vectors
• vector: an ordered collection of data of the same type

> a = c(1,2,3)
> a*2
[1] 2 4 6

• In R, a single number is the special case of a vector with 1


element.
• Other vector types: character strings, logical
Vectors
• Create a vector
> x <- 1:10
• Give the elements some names
> names(x) <- c("first","second","third","fourth","fifth")

• Select elements based on another vector


> i <- c(1,5)
> x[i]
first fifth
1 5
> x[-c(i,8)]
second third fourth <NA> <NA> <NA> <NA>
2 3 4 6 7 9 10
Matrices

• matrix: a rectangular table of data of the same type

• array: 3-,4-,..dimensional matrix


• example: the red and green foreground and background
values for 20000 spots on 120 chips: a 4 x 20000 x 120 (3D)
array.
Matrices
• Create an array
> x <- array(1:10, dim = c(2, 5))
>x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> attributes(x)
$dim
[1] 2 5
> dim(x)
[1] 2 5
Matrices
• Set column or row names
> colnames(x) <- c("col1", "col2", "col3", "col4", "5", "6")
>x
col1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> colnames(x)[1] <- "column1"
>x
column1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Matrix
• Set row and columns names using dimnames
> dimnames(x) <- list(c("first", "second"), NULL)
>x
column1 col2 col3 col4 col5
first 1 3 5 7 9
second 2 4 6 8 10
• Setting dimension names
> dimnames(x) <- list(my.rows = c("first", "second"), my.cols = NULL)
>x
my.cols
my.rows [,1] [,2] [,3] [,4] [,5]
first 1 3 5 7 9
second 2 4 6 8 10
Lists
• vector: an ordered collection of data of the same type.
> a = c(7,5,1)
> a[2]
[1] 5

• list: an ordered collection of data of arbitrary types.


> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28

• Typically, vector elements are accessed by their index (an integer),


list elements by their name (a character string). But both types
support both access methods.
Data frames
• data frame: is supposed to represent the typical data table that
researchers come up with – like a spreadsheet.

• It is a rectangular table with rows and columns; data within


each column has the same type (e.g. number, text, logical), but
different columns may have different types.

Example:
>a
localisation tumorsize progress
XX348 proximal 6.3 FALSE
XX234 distal 8.0 TRUE
XX987 proximal 10.0 FALSE
Factors

• A character string can contain arbitrary text. Sometimes it is useful


to use a limited vocabulary, with a small number of allowed
words. A factor is a variable that can only take such a limited
number of values, which are called levels.
• Example
• a family of two girls (1) and four boys(0),
>kids = factor(c(1,0,1,0,0,0),levels=c(0,1), labels=c("boy","girl"))
> Kids
[1] girl boy girl boy boy boy
Levels: boy girl
> class(kids)
[1] "factor"
Data Input/Output
Directory management
• dir() list files in directory
• setwd(path) set working directory
• getwd() get working directory
• ?files File and Directory Manipulation

Standard ASCII Format


• read.csv read comma-delimited file
• write.csv write comma-delimited file
Reading

> sets <- read.csv("Sets_All.csv", header = TRUE)


> sets$Ordered.Year <- ordered(sets$Year)
> sets$SpotCd.Fac <- factor(sets$SpotCd, exclude = NULL)
> spotted.sets <- sets[sets$Sp1Cd == 2, ]

> write.csv(spotted.sets, file = "spotted.txt", row.names = FALSE)


Data Visualization
• plot() is the main graphing function
• Automatically produces simple plots for
vectors, functions or data frames
Sample Data Set
Plotting a Vector
• plot(v) will print the elements of the vector v
according to their index
# Plot height for each observation
> plot(dataset$Height)
# Plot values against their ranks
> plot(sort(dataset$Height))
Common Parameters for
plot()
• Specifying labels:
– main: provides a title
– xlab: label for the x axis
– Ylab: label for the y axis
• Specifying range limits
– ylim – 2-element vector gives range for x axis
– xlim – 2-element vector gives range for y axis
• Example
– plot(sort(dataset$Height), ylim = c(120,200), ylab = "Height
(in cm)", xlab = "Rank", main = "Distribution of Heights”)
Plotting Two Vectors
• plot()can pair elements from 2 vectors to
produce x-y coordinates
• plot() and pairs() can also produce composite
plots that pair all the variables in a data frame.
• Example
– plot(dataset$Hip, dataset$Waist, xlab = "Hip",
ylab = "Waist", main = "Circumference (in cm)",
pch = 2, col = "blue")
Histograms
• Generated by the hist() function
• The parameter breaks is key
– Specifies the number of categories to plot
or
– Specifies the breakpoints for each category
• The xlab, ylab, xlim, ylim options work as
expected
• Example
– hist(dataset$bp.sys, col = "lightblue", xlab = "Systolic
Blood Pressure", main = "Blood Pressure“)

You might also like