You are on page 1of 5

Page Not Found

The page you are looking for is not here. You can try a Search, use the Site Nav
igation or
return to the homepage and try again.
Using R for statistical analyses - Introduction
This page is intended to be a help in getting to grips with the powerful statist
ical program called R. It is not intended as a course in statistics (see here fo
r details about those). If you have an analysis to perform I hope that you will
be able to find the commands you need here and copy/paste them into R to get goi
ng.
I run training courses in data management, visualisation and analysis using Exce
l and R: The Statistical Programming Environment. From 2013 courses will be held
at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternativel
y I can come to you and provide the training at your workplace. See details on m
y Courses Page.
On this page learn how to create data files, read them into R and generally get
ready to perform analyses. Also find out about getting further help and document
ation.
See also: R Courses | R Tips, Tricks & Hints | MonogRaphs | Writer's bloc
What is R?
R is an open-source (GPL) statistical environment modeled after S and S-Plus. Th
e S language was developed in the late 1980s at AT&T labs. The R project was sta
rted by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics De
partment of the University of Auckland in 1995. It has quickly gained a widespre
ad audience. It is currently maintained by the R core-development team, a hard-w
orking, international team of volunteer developers. The R project web page is th
e main site for information on R. At this site are directions for obtaining the
software, accompanying packages and other sources of documentation.
R is a powerful statistical program but it is first and foremost a programming l
anguage. Many routines have been written for R by people all over the world and
made freely available from the R project website as "packages". However, the bas
ic installation (for Linux, Windows or Mac) contains a powerful set of tools for
most purposes.
Because R is a programming language it can seem a bit daunting; you have to type
in commands to get it to work. However, it does have a Graphical User Interface
(GUI) to make things easier. You can also copy and paste text from other applic
ations into it (e.g. word processors). So, if you have a library of these comman
ds it is easy to pop in the ones you need for the task at hand. That is the purp
ose of this web page; to provide a library of basic commands that the user can c
opy and paste into R to perform a variety of statistical analyses
Data files
You are going to need some data to perform your analyses on. You can type your d
ata into R directly but it is usually much better to use a separate program to h
old the information. A spreadsheet is an invaluable tool for this as you can man
ipulate the data quite easily. R can read plain text files in various formats (e
.g. tab delimited, space delimited, comma delimited) and most spreadsheets can s
ave data in these ways. The most useful is comma delimited (.CSV), which R can h
andle quite easily.
The layout of the data file will depend upon the analysis you are going to run:

In this case we have multiple variables arranged in columns. The rows are the re
plicates. This sort of arrangement is useful for analysis of variance and multip
le regression. However, it can also be used for comparing just two factors (you
don't need to use all the information) as in a t-test.
Multiple variables
Count
Site
Sward
Temp
Grass%
12
a
23
18
44
17
b
11
21
75
In this case we have heading on both columns and rows. We have the same informat
ion as above and a bit extra. The data may be used for the same kinds of analysi
s as before but could also be used for tests of association (e.g. Chi-squared) o
r for ordination.
Rows and columns
Site a
Site b
Site c
Site d
Spp 1
23
17
9
11
Spp2
47
19
22
15
In this instance we have two columns (samples) but the number of replicates is d
ifferent. R reads the file as a rectangular frame and blank cells are recorded a
s NA. This may have to be taken account of in some analyses but for now we can a
ssume it is not a problem.
nputting data
The next step is to get your data into R. If you have saved your data in a .CSV
file then you can use the read.csv(filename) command to get the information. You
need to tell R where to store the data and to do this you assign it a name. All
names must have at least one letter (otherwize it is a number of course!). You
can use a period (e.g. test.1) but no other punctuation marks. R is case sensiti
ve so the variable test is different from Test or teSt.
What you need to do is to copy the appropriate command into the clipboard. Then
paste into R at the > prompt. You can then edit the command as you like and when
ready press the enter key.
Reading data files
This command reads a .CSV file into R. You need to specify the exact filename.
variable
= read.csv(filename)

This command reads a .CSV file but the file.choose() part opens up an explorer t
ype window that allows you to select a file from your computer. By default R wil
l take the first row as the variable names.
variable
= read.csv(file.
choose())
This reads a .CSV file, allowing you to select the file, the header is set expli
citly. If you change to header=F then the first row will be treated like the res
t of the data and not as a label.
variable
= read.csv(file.choose()
, header=T)
In this case you can tell R that a specified column contains row names. This is
likely to be the first so edit the # to 1.
variable
= read.csv(file.
choose(), row.names=#)
To get a file into R with basic columns of data and their labels use:
variable = read.csv(file.choose(), header=T)
To get a file into R with column headings and row headings use:
variable = read.csv(file.choose(), row.names=1)
N.B. There are occasions when R won't like your data file. Check the file carefu
lly. In some cases the addition of an extra linefeed at the end will sort out th
e issue. To do this open the file in a word processor and make sure that non-pri
nting characters are displayed. Add the extra carriage return and save the file.
What data are loaded?
To see what data, variables etc. are loaded in R you can type a simple command:
> ls()
This lists the variables in memory.
In Windows you can list all the "objects" in memory from the Misc menu on the GU
I toolbar.
On a Mac you can do something similar using the Workspace menu. The Mac version
also has a "workspace browser". This shows all the variables and their propertie
s (you can also view the items).
In both operating systems you can save the current workspace to a file (you can
also read in a previously saved workspace). This will save any data and variable
s currently in memory (on Windows use the File menu and on the Mac use Workspace
).
You can also get a list of the variables for each dataset by typing:
> names(dataset)
Removing data sets
To remove a variable you can type a simple command:
> rm(variable)
This will remove the variable (in this case called variable) from the memory. If
you have variables that are attached to your data they don't show up. You can d
o the opposite of attach(data) and detach(data), which removes them if and when
the data are removed with rm(data).
In Windows you can remove all the "objects" in memory from the Misc menu on the
GUI toolbar.
On a Mac you can do something similar using the Workspace menu.
This should be used with caution!
The Mac version also has a "workspace browser". This shows all the variables and
their properties as well as allowing you to remove them.

Documents
There are plenty of sources of help and information regarding R. Most are to be
found on the R-Project website. Look under the 'Documentation' section. In the m
anuals section the "Introduction to R" document is a good start (available as HT
ML or a PDF). Also very good are:
Using R for Data Analysis and Graphics - Introduction, Examples and Commentary
John Maindonald [PDF].
Simple R by John Verzani [PDF]
These are available via the 'Contributed Documentation' section.
Courses
From 2009 I am going to be running a series of short courses in data analyses fo
r conservation biologists. Some of these courses are based on use of R. The cour
ses all run at the Preston Montford Field Centre in Shropshire, UK. More informa
tion can be found here or at the FSC website.
Help within R
The help system within R is comprehensive. There are several ways to access help
:
Click on the 'Help' menu. There are a number of options available (depending upo
n your OS) but the main documentation is in the form of HTML.
If you want help on a specific command you can enter a search directly from the
keyboard:
help(keyword)
A shortcut is to type:
?keyword
This is fine if you know the command you want. If you are not sure of the comman
d you can try the following:
apropos("part.word")
You type in a part.word and R will list all commands that contain that string of
letters. For example:
apropos("rank")
[1] "count.rank" "dsignrank" "psignrank" "qsignrank" "rsignrank" "rank"
>
This shows that there are actually 6 commands containing "rank"; we can now type
help() for any of those to get more detail.
If you run the HTML help you will see a heading entitled "Packages". This will l
ist the packages that you have installed with R. The basic package is 'base' and
comes with another called 'stats'. These two form the core of R. If you navigat
e to one of those you can browse through all the commands available.
R comes with a number of data sets. Many of the help topics come with examples.
Usually these involve data sets that are already included. This allows you to co

by

py and paste the commands into the console and see what happens.

You might also like