You are on page 1of 18

Representing data in R

1/22/13 3:28 PM

Representing data in R
Jeffrey Leek, Assistant Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 1 of 18

Representing data in R

1/22/13 3:28 PM

Important data types in R


Classes Character, Numeric, Integer, Logical Objects Vectors, Matrices, Data frames, Lists, Factors, Missing values Operations Subsetting, Logical subsetting For more information: Data Types

2/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 2 of 18

Representing data in R

1/22/13 3:28 PM

Character
firstName = "jeff" class(firstName)

## [1] "character"

firstName

## [1] "jeff"

3/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 3 of 18

Representing data in R

1/22/13 3:28 PM

Numeric
heightCM = 188.2 class(heightCM)

## [1] "numeric"

heightCM

## [1] 188.2

4/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 4 of 18

Representing data in R

1/22/13 3:28 PM

Integer
numberSons = 1L class(numberSons)

## [1] "integer"

numberSons

## [1] 1

5/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 5 of 18

Representing data in R

1/22/13 3:28 PM

Logical
teachingCoursera = TRUE class(teachingCoursera)

## [1] "logical"

teachingCoursera

## [1] TRUE

6/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 6 of 18

Representing data in R

1/22/13 3:28 PM

Vectors
A set of values with the same class
heights = c(188.2, 181.3, 193.4) heights

## [1] 188.2 181.3 193.4

firstNames = c("jeff", "roger", "andrew", "brian") firstNames

## [1] "jeff"

"roger"

"andrew" "brian"

7/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 7 of 18

Representing data in R

1/22/13 3:28 PM

Lists
A vector of values of possibly different classes
vector1 = c(188.2, 181.3, 193.4) vector2 = c("jeff", "roger", "andrew", "brian") myList = list(heights = vector1, firstNames = vector2) myList

## ## ## ## ##

$heights [1] 188.2 181.3 193.4 $firstNames [1] "jeff"

"roger"

"andrew" "brian"

8/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 8 of 18

Representing data in R

1/22/13 3:28 PM

Matrices
Vectors with multiple dimensions
myMatrix = matrix(c(1, 2, 3, 4), byrow = T, nrow = 2) myMatrix

## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4

9/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 9 of 18

Representing data in R

1/22/13 3:28 PM

Data frames
Multiple vectors of possibly different classes, of the same length
vector1 = c(188.2, 181.3, 193.4) vector2 = c("jeff", "roger", "andrew", "brian") myDataFrame = data.frame(heights = vector1, firstNames = vector2)

## Error: arguments imply differing number of rows: 3, 4

myDataFrame

## Error: object 'myDataFrame' not found

10/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 10 of 18

Representing data in R

1/22/13 3:28 PM

Data frames
vector1 = c(188.2, 181.3, 193.4, 192.3) vector2 = c("jeff", "roger", "andrew", "brian") myDataFrame = data.frame(heights = vector1, firstNames = vector2) myDataFrame

## ## ## ## ##

1 2 3 4

heights firstNames 188.2 jeff 181.3 roger 193.4 andrew 192.3 brian

11/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 11 of 18

Representing data in R

1/22/13 3:28 PM

Factors
Qualitative variables that can be included in models
smoker = c("yes", "no", "yes", "yes") smokerFactor = as.factor(smoker) smokerFactor

## [1] yes no yes yes ## Levels: no yes

12/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 12 of 18

Representing data in R

1/22/13 3:28 PM

Missing values
In R they are usually coded NA
vector1 = c(188.2, 181.3, 193.4, NA) vector1

## [1] 188.2 181.3 193.4

NA

is.na(vector1)

## [1] FALSE FALSE FALSE

TRUE

13/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 13 of 18

Representing data in R

1/22/13 3:28 PM

Subsetting
vector1 = c(188.2, 181.3, 193.4, 192.3) vector2 = c("jeff", "roger", "andrew", "brian") myDataFrame = data.frame(heights = vector1, firstNames = vector2) vector1[1]

## [1] 188.2

vector1[c(1, 2, 4)]

## [1] 188.2 181.3 192.3

14/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 14 of 18

Representing data in R

1/22/13 3:28 PM

Subsetting
myDataFrame[1, 1:2] ## heights firstNames ## 1 188.2 jeff

myDataFrame$firstNames

## [1] jeff roger andrew brian ## Levels: andrew brian jeff roger

15/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 15 of 18

Representing data in R

1/22/13 3:28 PM

Logical subsetting
myDataFrame[myDataFrame$firstNames == "jeff", ] ## heights firstNames ## 1 188.2 jeff

myDataFrame[heights < 190, ]

## heights firstNames ## 1 188.2 jeff ## 2 181.3 roger ## 4 192.3 brian

16/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 16 of 18

Representing data in R

1/22/13 3:28 PM

Variable naming conventions


Variable names should be short, but descriptive. Here are some common styles Camel caps
myHeightCM = 188

Underscore
my_height_cm = 188

Dot separated
my.height.cm = 188

17/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 17 of 18

Representing data in R

1/22/13 3:28 PM

Style guides
http://4dpiecharts.com/r-code-style-guide/ http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html http://wiki.fhcrc.org/bioc/Coding_Standards

18/18

file:///Users/jtleek/Dropbox/Jeff/teaching/2013/coursera/week1/005representingDataR/index.html#16

Page 18 of 18

You might also like