The R Book - 2010kaiser - Notes - 16092015

R Book - 2010kaiser_Notes
################################################################################
##############################################################
Chapter1#Getting Started
The > symbol is called the prompt and is R s way of saying What now? . This is where
you type in your commands. When working, you will sometimes see '+' symbol at th
e left-hand side of the screen instead of >. This means that the last command yo
u typed is incomplete.If you have made a mistake, then press the Esc key and the
command line prompt > will reappear. Then use the Up arrow key to retrieve your
last command, at which point you can correct the mistake, using the Left and Ri
ght arrow keys.
Getting Help in R
?read.table
Sometimes you cannot remember the precise name of the function, but you know the
subject help.search("data input")
find and apropos. The find function tells you what package something is in:
find("lowess")
[1] "package:stats"
apropos returns a character vector giving the names of all objects in the search
list that match your (potentially partial) enquiry:
apropos("lm")
[1] ". __C__anova.glm" ". __C__anova.glm.null" ". __C__glm"
[4] ". __C__glm.null" ". __C__lm" ". __C__mlm"
[7] "anova.glm" "anova.glmlist" "anova.lm"
Online Help
There is a tremendous amount of information about R on the web, but your first p
ort of call is likely to be CRAN at
http://cran.r-project.org/
Worked Examples of Functions
To see a worked example just type the function name (linear models, lm, in this
case)
example(lm)
Libraries in R
To use one of the libraries (listed in Table 1.1), simply type library( ) with
the name of the library inside the brackets. Thus, to load the spatial library t
ype
library(spatial)
Contents of Libraries
It is easy to use the help function to discover the contents of library packages
. Here is how you find out about the contents of the spatial library:
library(help=spatial)
Information on package "spatial"
Package: spatial
Description: Functions for kriging and point pattern analysis followed by a list
of all the functions and data sets. You can view the full list of the contents
of a library using objects with search() like this. Here are the contents of the
spatial library:
objects(grep("spatial",search()))
Installing Packages and Libraries
The base package does not contain some of the libraries referred to in this book
, but downloading these is very simple. Run the R program, then from the command
line use the install.packages function to download the libraries you want. You
will be asked to highlight the mirror nearest to you for fast downloading (e.g.
London), then everything else is automatic.
install.packages("Ime4")
install.packages("tree")
Command Line versus Scripts
Click on File then click on New script. At this point R will open a window entit
led Untitled - R Editor. You can type and edit in this, then when you want to ex
ecute a line or group of lines, just highlight them and press Ctrl+R (the Contro
l key and R together). The lines are automatically transferred to the command wi
ndow and executed.By pressing Ctrl +S you can save the contents of the R Editor
window in a file that you will have to name. It will be given a .R file extensio
n automatically.
Data Editor
Data can be edited manually by opening a dataframe in the eidtor. Changes get au
tomatically saved.
Suppose you want to edit the bacteria dataframe which is part of the MASS librar
y:
library(MASS)
attach(bacteria)
fix(bacteria)
Changing the Look of the R Screen
The R GUI Configuration Editor under Edit/GUI preferences is used to change the
look of the screen.
Disappearing Graphics
To stop multiple graphs whizzing by, use
par(ask=TRUE)
You can pause execution to mimic a slide show effect using the Sys.sleep functio
n
Good Housekeeping
To see what variables you have created in the current session, type
objects()
To see which libraries and dataframes are attached:
search()
Tidying Up
It is good practice to remove (rm) any variables names you have created :
rm(x,y,z)
detach(worms)
This command does not make the dataframe called worms disappear; it just means t
hat the variables within worms, such as Slope and Area, are no longer accessible
directly by name.To get rid of everything, including all the dataframes, type r
m(list=ls()) but be careful!!!
################################################################################
##############################################################
Chapter2#Essentials of the R Language

Screen prompt
> log(42/7.3)
[1] 1.749795
Two or more expressions can be placed on a single line so long as they are separ
ated by
semi-colons:
2+3; 5*7; 3-7
[1] 5
[1] 35
[1] -4
Built-in Functions
log(10)
[1] 2.302585
exp(1)
[1] 2.718282
log10(6)
[1] 0.7781513
log(9,3)
[1] 2
R knows the value of pi:
pi
[1] 3.141593
sin(pi/2)
[1] 1
cos(pi/2)
[1] 6.123032e-017(might not be exact zero hence be careful during coamparisons)
1.2e3 means 1200 because the e3 means move the decimal point 3 places to the righ
t
1.2e-2 means 0.012 because the e-2 means move the decimal point 2 places to the l
eft
Modulo and Integer Quotients
To obtain the quotient :
119 %/% 13
[1] 9
To obtain the reminder :
119 %% 13
[1] 2
15421 %% 7 == 0
[1] TRUE
floor(5.7)
[1] 5
ceiling(5.7)
[1] 6
Infinity and Things that Are Not a Number (NaN) Calculations can lead to answers
that are plus infinity, represented in R by Inf, or minus infinity, which is re
presented as -Inf:
3/0
[1] Inf
-12/0
[1] -Inf
Calculations involving infinity can be evaluated: for instance,
exp(-Inf)
[1] 0
0/Inf
[1] 0
(0:3)^Inf
[1] 0 1 Inf Inf
Other calculations, however, lead to quantities that are not numbers. These are
represented in R by NaN ( not a number ). Here are some of the classic cases:
0/0
[1] NaN
Inf-Inf
[1] NaN
Inf/Inf
[1] NaN
You need to understand clearly the distinction between NaN and NA (this stands f
or not available and is the missing-value symbol in R; see below). The function is
.nan is provided to check specifically for NaN, and is.na also returns TRUE for
NaN.
is.finite(10)
[1] TRUE
is.infinite(10)
[1] FALSE
is.infinite(Inf)
[1] TRUE
Missing values NA(Page 21)
x<-c(1:8,NA)
mean(x)
[1] NA
mean(x,na.rm=T)
[1] 4.5
vmv<-c(1:6,NA,NA,9:12)
vmv
[1] 1 2 3 4 5 6 NA NA 9 10 11 12
Making an index of the missing values in an array could use the seq function,
seq(along=vmv)[is.na(vmv)]
[1] 7 8
but the result is achieved more simply using which like this:
which(is.na(vmv))
[1] 7 8
If the missing values are genuine counts of zero, you might want to edit the NA
to 0. Use the is.na function to generate subscripts for this
vmv[is.na(vmv)]<- 0
vmv
[1] 1 2 3 4 5 6 0 0 9 10 11 12
or use the ifelse function like this
vmv<-c(1:6,NA,NA,9:12)
ifelse(is.na(vmv),0,vmv)
[1] 1 2 3 4 5 6 0 0 9 10 11 12
Assignment
To create a scalar constant x with value 5 we type
x<-5
and not x = 5. Notice that there is a potential ambiguity if you get the spacing
wrong.Compare our x<-5, x gets 5 , with x < -5 which is a logical question, asking
is x less than
minus 5? and producing the answer TRUE or FALSE.
Creating a Vector
Indexing starts from 1. Vectors are variables with one or more values of the sam
e type: logical, integer, real,complex, string (or character) or raw. A variable
with a single value (say 4.3) is often known as a scalar, but in R a scalar is
a vector of length 1.
Even if you declare x<-5 then x is a vector of unit length.Check out using x[1]
and it shows up 5 and length(y) shows up as 1.
vector y gets the sequence of integer values 10 to 16 using : (colon), the seque
nce-generating operator,
y <- 10:16
You may define your own vector
y <- c(10, 11, 12, 13, 14, 15, 16)
y[3]
[1] 12
you can enter the numbers from the keyboard one at a time using scan:(Page 23)
y <- scan()
Once you are done entering the numbers, just hit enter when prompted for a numbe
r.
In order to refer to a vector by name with an R session, you need to attach the
dataframe containing the vector (p. 18). Alternatively, you can refer to the dat
aframe name and the vector name within it, using the element name operator $ lik
e this: dataframe$y
When vectors are created by calculation from other vectors, the new vector will
be as long as the longest vector used in the calculation (with the shorter varia
ble(s) recycled
as necessary): here A is of length 10 and B is of length 3:
A<-1:10
B<-c(2,4,8)
A*B ----- Raghu
Vector Functions
cor(x,y) correlation between vectors x and y
sort(x) a sorted version of x
rank(x) vector of the ranks of the values in x
order(x) an integer vector containing the permutation to sort x into ascending o
rder
quantile(x) vector containing the minimum, lower quartile, median, upper quartil
e, and maximum of x
Summary Information from Vectors by Groups
Raghu (page 23 to 28 skipped)
Working with Vectors and Logical Subscripts(Page 28)
x<-0:10
sum(x)
[1] 55
sum(x<5)
[1] 5
sum(x) adds
up the values of the xs and sum(x<5) counts up the number of cases that pass the
logical condition x is less than 5 . This works because of coercion (p. 25). Logic
al TRUE has been coerced to numeric 1 and logical FALSE has been coerced to nume
ric 0.we want to add up all the values of the cases that pass.
sum(x[x<5])
[1] 10
x<5
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
[10] FALSE FALSE
You can imagine false as being numeric 0 and true as being numeric 1. Then the v
ector of subscripts [x<5] is five 1s followed by six 0s:
1*(x<5)
[1] 1 1 1 1 1 0 0 0 0 0 0
Now imagine multiplying the values of x by the values of the logical vector
x*(x<5)
[1] 0 1 2 3 4 0 0 0 0 0 0
sum(x*(x<5))
[1] 10
This produces the same answer as sum(x[x<5]), but is rather less elegant.
Suppose we want to work out the sum of the three largest values in a vector.
y<-c(8,3,5,7,6,6,8,9,2,3,9,4,10,4,11)
sort(y)
[1] 2 3 3 4 4 5 6 6 7 8 8 9 9 10 11
rev(sort(y))
[1] 11 10 9 9 8 8 7 6 6 5 4 4 3 3 2
rev(sort(y))[2]
[1] 10
rev(sort(y))[1:3]
[1] 11 10 9
Finding Closest Values
which(abs(xv-108)==min(abs(xv-108)))
[1] 332
i.e. location 332. xv[332] would give the value
Thus, we can write a function to return the closest value to a specified value (
sv)
closest<-function(xv,sv){
xv[which(abs(xv-sv)==min(abs(xv-sv)))] }
and run it like this:
closest(xv,108)
[1] 108.0076
Trimming Vectors Using Negative Subscripts
use negative subscripts to drop terms from a vector.
Suppose we wanted a new vector, z, to contain everything but the first element o
f x
x<- c(5,8,6,7,1,5,3)
z <- x[-1]
z
[1] 8 6 7 1 5 3
Logical Arithmetic
x<-0:6
x<4
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE
all(x>0)
[1] FALSE
any(x<0)
[1] FALSE
rep(9,5)
[1] 9 9 9 9 9
rep(1:4, 2)
[1] 1 2 3 4 1
rep(1:4, each
[1] 1 1 2 2 3
rep(1:4, each
[1] 1 1 2 2 3
2
=
3
=
3
3 4
2)
4 4
2, times = 3)
4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
Generate Factor Levels(Page 35 skipped)

Generating Regular Sequences of Numbers
10:18
[1] 10 11 12 13 14 15 16 17 18
18:10
[1] 18 17 16 15 14 13 12 11 10
-0.5:8.5
[1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
When the interval is not 1.0 you need to use the seq function.
seq(0,1.5,0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
seq(1.5,0,-0.2)
[1] 1.5 1.3 1.1 0.9 0.7 0.5 0.3 0.1
The sample Function
This function shuffles the contents of a vector into a random sequence while mai
ntaining all the numerical values intact. It is extremely useful for randomizati
on.
y
[1] 8 3 5 7 6 6 8 9 2 3 9 4 10 4 11
and here are two samples of y(sampling without replacement):
sample(y)
[1] 8 8 9 9 2 10 6 7 3 11 5 4 6 3 4
sample(y)
[1] 9 3 9 8 8 6 5 11 4 6 4 7 3 2 10
Sampling with replacement :
sample(y,replace=T)
[1] 9 6 11 2 9 4 6 8 8 4 4 4 3 9 3
In this next case, the are two 10s and only one 9:
sample(y,replace=T)
[1] 3 7 10 6 8 2 5 11 4 6 3 9 10 7 4
Matrices
There are several ways of making a matrix. You can create one directly like this
:
X<-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3)
X
[,1] [,2] [,3]

[1,] 1
0
0
[2,] 0
1
0
[3,] 0
0
1
By default, the numbers are entered columnwise.
class(X)
[1] "matrix"
attributes(X)
$dim
[1] 3 3
To indicate that elements to be taken row wise :
vector<-c(1,2,3,4,4,3,2,1)
V<-matrix(vector,byrow=T,nrow=2)
V
[,1] [,2] [,3] [,4]
[1,] 1
2
3 4
[2,] 4
3
2 1
Another way to convert a vector into a matrix is by providing the vector object
with two dimensions (rows and columns) using the dim function like this:
dim(vector)<-c(4,2)
We can check that vector has now become a matrix:
is.matrix(vector)
[1] TRUE
To get the transpose of the vector :
vector<-t(vector)
[2,]
[2,]
[3,]
[4,]
1
1
3
1
0
1
1
0
2
3
0
2
5
1
2
1
3
3
2
0
Calculations on rows or columns of the matrix

mean(X[,5])
[1] 2
var(X[4,])
[1] 0.7
i.e. variance of fourth row
rowSums(X)
[1] 11 9 8 4
colSums(X)
[1] 6 2 7 9 8
rowMeans(X)
[1] 2.2 1.8 1.6 0.8
colMeans(X)
[1] 1.50 0.50 1.75 2.25 2.00
These functions are built for speed, and blur some of the subtleties of dealing
with NA or NaN. If such subtlety is an issue, then use apply instead (p. 68). Re
member that columns
are margin no. 2 (rows are margin no. 1):
apply(X,2,mean)
[1] 1.50 0.50 1.75 2.25 2.00
Suppose that we want to shuffle the elements of each column of a matrix independ
ently. We apply the function sample to each column (margin no. 2) like this:
apply(X,2,sample)
Adding rows and columns to the matrix
In this particular case we have been asked to add a row at the bottom showing th
e column means, and a column at the right showing the row variances:
X<-rbind(X,apply(X,2,mean))
X<-cbind(X,apply(X,1,var))
When a matrix with a single row or column is created by a subscripting operation
, for example row <- mat[2,], it is by default turned into a vector
The sweep function(raghu dropped)
Arrays
Arrays are numeric objects with dimension attributes. We start with the numbers
1 to 25 in a vector called array:
array<-1:25
is.matrix(array)
[1] FALSE
dim(array)
NULL
The vector is not a matrix and it has no (NULL) dimensional attributes. We give
the object dimensions like this (say, with five rows and five columns):
dim(array)<-c(5,5)
Now it does have dimensions and it is a matrix:
dim(array)
[1] 5 5
is.matrix(array)
[1] TRUE
When we look at array(elements enter column wise by default) it is presented as
a two-dimensional table (but note that it is not a table object)
is.table(array)
[1] FALSE
Thus a vector is a one-dimensional array that lacks any dim attributes. A matrix
is a two-dimensional array.
Character Strings(Page 50)
In R, character strings are defined by double quotation marks:
a<-"abc"
b<-"123"
Numbers can be characters (as in b, above), but characters cannot be numbers.
as.numeric(a)
[1] NA
Warning message:
NAs introduced by coercion
as.numeric(b)
[1] 123
pets<-c("cat","dog","gerbil","terrapin")
class(pets)
[1] "character"
is.factor(pets)
[1] FALSE
length(pets)
[1] 4
nchar(pets)
[1] 3 3 6 7
if the vector of characters called pets was part of a dataframe, then R would co
erce all the character variables to act as factors:
df<-data.frame(pets)
is.factor(df$pets)
[1] TRUE
Concatenation produces a vector of two strings.
a<-"abc"
b<-"123"
c(a,b)
[1] "abc" "123"
R function to concatenate two strings is PASTE()
paste(a,b,sep="")
[1] "abc123"
If 'sep' argument is not provided then space is taken by default.
If one of the arguments to paste is a vector, each of the elements of the vector
is pasted to the specified character string to produce an object of the same le
ngth as the vector:
d<-c(a,b,"new")
e<-paste(d,"a longer phrase containing blanks")
e
[1] "abc a longer phrase containing blanks"
[2] "123 a longer phrase containing blanks"
[3] "new a longer phrase containing blanks"
Extracting parts of strings
We being by defining a phrase:
phrase<-"the quick brown fox jumps over the lazy dog"
substr(phrase,1,1)
t
substr(phrase,1,4)
the q
toupper(phrase)
[1] "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG"
tolower(toupper(phrase))
[1] "the quick brown fox jumps over the lazy dog"
Writing functions in R
We have a built in function mean() to provide us the mean of a set of values. Le
t us create a function replicating mean()
arithmetic.mean<-function(x) sum(x)/length(x)
y<-c(3,3,4,5,5)
arithmetic.mean(y)
[1] 4
Loops and Repeats
for (i in 1:5) { print(i^2) }
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
{} are optional for single line of code
The ifelse function
z <- ifelse (y < 0, -1, 1)
data<-read.table("c:\\temp\\worms.txt",header=T)
attach(data)
To convert the continuous variable Area into many levels.....
ifelse(Area>median(Area),"big","small")
[1] "big" "big" "small" "small" "big" "big" "big" "small" "small"
[10] "small" "small" "big" "big" "small" "big" "big" "small" "big"
[19] "small" "small"
Evaluating Functions with apply, sapply and lapply(Page 75)
apply and sapply
The apply function is used for applying functions to the rows or columns of matr
ices or
dataframes. For example:
(X<-matrix(1:24,nrow=4))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1
5
9 13 17 21
[2,] 2
6 10 14 18 22
[3,] 3
7 11 15 19 23
[4,] 4
8 12 16 20 24
Brackets over the command above made the matrix get printed without an extra com
mand.
sapply() ---- raghu dint do
Lists and lapply
We start by creating a list object that consists of three parts: character,numer
ic and logical
a<-c("a","b","c","d")
b<-c(1,2,3,4,4,3,2,1)
c<-c(T,T,F)
list.object<-list(a,b,c)
class(list.object)
[1] "list"
To see the contents of the list we just type its name:
list.object
[[1]]
[1] "a" "b" "c" "d"
[[2]]
[1] 1 2 3 4 4 3 2 1
[[3]]
[1] TRUE TRUE FALSE
The function lapply applies a specified function to each of the elements of a li
st in turn
lapply(list.object,length)
[[1]]
[1] 4
[[2]]
[1] 8
[[3]]
[1] 3
This shows that list.object has three vectors with number of elements as 4,8,8 (
not sure of types yet)
lapply(list.object,class)
[[1]]
[1] "character"
[[2]]
[1] "numeric"
[[3]]
[1] "logical"
Applying numeric functions to lists will only work for objects of class numeric
or objects (like logical values) that can be coerced into numbers. Here is what
happens when we use
lapply to apply the function mean to list.object:
lapply(list.object,mean)
[[1]]
[1] NA
[[2]]
[1] 2.5
[[3]]
[1] 0.6666667
The first vector could not be coerced to a number.
################################################################################
##############################################################
Chapter3#Data Input(Page 104)
Saving the File from Excel
Much the simplest way is to save all your dataframes from Excel as tab-delimited
text files.
data<-read.table("c:\\temp\\regression.txt",header=T)
After saving as .txt file, ensure that there are no spaces in between the variab
le names. If so, replace the spaces with a dot '.' , else R will show up errors.
R will treat the variable "House Ownership" as two different variables and there
occurs a mismatch.
Always use double backslash \\ rather than \ in the file path definition.
Missing values can not appear as blank spaces. If it were in excel, replae the m
issing values with NAs.
If variable values contain spaces, then better use a different separator say sep
="," as in case of csv files.
map<-read.table("c:\\temp\\bowens.csv",header=T,sep=",")
but it is quicker and easier to use read.csv in this case(and there is no need f
or header=T)
map<-read.csv("c:\\temp\\bowens.csv")
Setting the Working Directory
setwd("c:\\temp")
Built-in Data Files
There are many built-in data sets within the base package of R. You can see thei
r names
by typing
data()
You can read the documentation for a particular data set with the usual query:
?lynx
Chapter4#Dataframes
The values
in the body of a matrix can only be numbers; those in a dataframe can also be nu
mbers.Dataframe is similar to sas dataset.
attach(worms) ----makes the dataset accessible
names(worms) ----shows up the variable names
To see the contents of the dataframe, just type its name:
worms
-----shows up the dataframe
You can summarize a dataframe using summary(worms)

"by" and "aggregate" are used for summary of the dataframe on the basis of facto
r levels.Eg: Finding the mean by vegetation type
by(worms,Vegetation,mean)
It will provide means of all numeric variables for each vegetation type.
Subscripts and Indices
worms[3,5]
[1] 4.3
----- value from third row and fifth column
To extract the values of 7th column from rows 14 to 15
worms[14:19,7]
[1] 0 6 8 4 5 1
worms[1:5,2:3]
Area Slope
1 3.6 11
2 5.1 2
3 2.8 3
4 2.4 5
5 3.8 0
To get the entire third row :
worms[3,]
To get the entire third column:
worms[,3]
class(worms[3,])
[1] "data.frame"
class(worms[,3])
[1] "integer"
----------------raghu???
----------------raghu???
to extract all the rows for Field Name and the Soil pH (columns 1 and 5)
worms[,c(1,5)]
Sorting Dataframes
To sort by the "slope" column
worms[order(Slope),] -----------ascending order
worms[rev(order(Slope)),] -----------descending order
The original row numbers are retained in the leftmost column.Hence you get hapha
zard fashion post sorting.
worms[order(Vegetation,Worm.density),]
First sort based on vegetation and with in that, second sort done by density.Tie
s in first sort are broken by second sort.We can mention third sort variable as
well.
If you want only the 3rd,4th,5th and 7th columns in the order of 4th col,7th col
,5th col,3rd col post sorting :(Two ways)
worms[order(Vegetation,Worm.density),c(4,7,5,3)]
worms[order(Vegetation,Worm.density),c("Vegetation", "Worm.density", "Soil.pH",
"Slope")]
Using Logical Conditions to Select Rows from the Dataframe
Selecting only those rows where the value in "Damp" column is logical true.
worms[Damp == T,]
Need to use double = for checking logical TRUE and T without quotes.
"T" is character string and T is logical.
Selecting rows based on two conditions(& operator)
worms[Worm.density > median(Worm.density) & Soil.pH < 5.2,]
If we want to extract all the numeric columns
worms[,sapply(worms,is.numeric)]
If we want to extract the columns that were factors:
worms[,sapply(worms,is.factor)]
To drop rows from row 6 to row 15 :
worms[-(6:15),]
To find all the rows that are not grasslands (! operator):
worms[!(Vegetation=="Grassland"),]
If you wish to drop the rows using negative sign instead of NOT operator :
worms[-which(Damp==F),]
or
worms[Damp==T,]
Omitting Rows Containing Missing Values, NA
na.omit(data)
Above command drops all those rows with atleast one NA.
you can use the na.exclude function????
The function to test for the presence of missing values across a dataframe is co
mplete.cases:
complete.cases(data)
data[complete.cases(data),]
It is well worth checking the individual variables separately, because it is pos
sible that one (or a few) variable(s) contributes most of the missing values.
Use summary to count
the missing values for each variable in the dataframe, or use apply with the fun
ction is.na
to sum up the missing values in each variable:
apply(apply(data,2,is.na),2,sum)
Field.Name Area Slope Vegetation Soil.pH Damp Worm.density
0
1
1
0
1
1
1
Using the minus sign only works when sorting numerical variables. For factor lev
els you can use the rank function to make the levels numeric like this:
worms[order(-rank(Vegetation),-Worm.density),]
grep() returns the subscript (a number or set of numbers) indicating which chara
cter strings within a vector of character strings contained an upper-case S.
names(worms)
[1] "Field.Name" "Area" "Slope" "Vegetation"
[5] "Soil.pH" "Damp" "Worm.density"
grep("S",names(worms))
[1] 3 5
Finally, we can use these numbers as subscripts [,c(3,5)] to select columns 3 an
d 5:
worms[,grep("S",names(worms))]
You have a facility to suppress the row numbers.
We can convert a table in to dataframe.
Eliminating Duplicate Rows from a Dataframe
unique(dups)
To view the rows that are duplicates in a dataframe (if any) use the duplicated
function:
dups[duplicated(dups),]
Selecting Variables on the Basis of their Attributes
To extract only numeric columns :
sapply(nums,is.numeric)
Merging Two Dataframes

The R Book - 2010kaiser - Notes - 16092015

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The R Book - 2010kaiser - Notes - 16092015

Uploaded by

Copyright:

Available Formats

R Book - 2010kaiser_Notes

Chapter2#Essentials of the R Language

Generate Factor Levels(Page 35 skipped)

[,1] [,2] [,3]

Calculations on rows or columns of the matrix

i.e. variance of fourth row

You can summarize a dataframe using summary(worms)

You might also like