You are on page 1of 3

R Help Sheet 1: Getting data into R

Summary

How to read your data into R and check for errors.

Functions

read.table, the “gets” arrow <-, attach, detach, summary, names, fix

The R environment

In R, your data is saved by the program, but you only see it if you specifically ask for it to be listed,
and even then it is displayed in an awkward way and is very hard to modify the data. It is easiest to
enter data in Excel and then import it into R.

There are two parts to R:

1. The R Console. This is where all your results and calculations will appear. The whole lot can
be saved as a “workspace” (File -> Save Workspace), which will save both your commands in
(red text) and what R returned (blue text).
2. The R Editor. This provides a neater way of saving just the useful parts of the text you
entered, without any of what R returned. You can create a new R editor document using File
-> New script in the R console window (MAC: cmd N), and save it using File -> Save as… in
the editor window. Text written in the Editor can be sent directly to R using Ctrl + r (for
“run”) (MAC: cmd ↵). Although throwing away the information from the R console might
sound like a bad idea, you can re-run the code at any time to get it back, so as long as you
have your Editor files there’s no need to save R workspace documents unless you really want
to. This means that when you close R and it returns the following message:

…you can feel free to click “No” – as long as you saved your code as an editor file. Since we’ll be
using the R editor in the practical sessions, we recommend you also use the editor when trying out
stuff from these help sheets too.

Note on opening R Editor files:

R Editor files often save without a specific R extension. This means that when you want to re-open
them, you need to find the correct folder and then use the drop down menu to select the “All files”
file type (rather than the default “R files”), otherwise the script won’t appear.

Getting your data ready for R in Excel

BIO2426 Analysis of Biological Data Page 1 of 3


Before you can start to explore your data, you need to load it into the R workspace. If it’s already a
txt file, this is easy, if it isn’t, the easiest thing to do this is to save it as a text file in Excel, using File ->
Save as… then choose-> Text (Tab delimited) from the drop down list below the file name (see
below).

Excel will give you various dire warnings, just click ‘OK’ or ‘yes’ for these

There are some rules about how to organise data so that R can understand it:

1. Each variable (things like the name of the individual or their height, or any other
characteristic you’re measuring) should be a single column
2. Represent missing data by the symbol NA
3. Each column should be the same length. If they are not then you need to make them the
same length by adding NAs until they're all even.
4. There should be no spaces anywhere in your data. The most common problem with data is
that somewhere in it there are spaces that haven’t been spotted. Variable names or
category names should not contain spaces; if you want to separate two parts of a variable
name then use a full stop (e.g. horn length can be replaced by horn.length). Don't start
variable names with numbers or with symbols like $ or %.

Remember when deciding on your variable names that R is case sensitive; e.g. abundance is treated
differently to Abundance. You may find it easier to never use capitals.

Make up your own dataset with two or three columns of numerical and categorical variables a few
rows long, and saving it as a text file – then try doing the following:

Loading your data into R: setting the “working directory”

Before loading in the data, you need to tell R where to find it (i.e. the working directory). This is done
by clicking on file -> change dir…, (MAC: cmd D) the folder that contains the text document you want
to work with, and clicking OK. I suggest you create a folder directly on the main drive (usually the C
drive) of your PC called Rdata and always use this. If you create a folder in 'My Documents' it often
causes problems because this is a virtual location.

BIO2426 Analysis of Biological Data Page 2 of 3


Reading your data into R

Once you’ve set the working directory, you can use read.table(filename, header=T) to load your data
into R as a dataframe (see help sheet 2 for help with R jargon). The file name must be in quotation
marks and include “.txt” at the end, e.g. read.table(“data.txt”, header=T).

Entering the read.table command by itself will simply print your data in R; to be able to recall it,
you’ll need to use the “gets” arrow to give it an object name. For example:

mydata <- read.table(“data.txt”, header=T)

will assign you data to a new object called mydata. From this point, you can do several things:

• Typing mydata (or whatever you’ve called your dataframe) into the command window
displays your data in full.
• fix(mydata) brings it up in an Excel-style viewing window. You can also edit your data in this
window, but this isn’t really recommended as the changes won’t be saved for next time;
better to go back to your original Excel document and make the changes there.
• names(mydata) returns the names of all the variables in your dataframe.
• summary(mydata) returns a useful summary of all the variables in your dataframe; this can
be handy for checking for errors.

Attaching your data

Using read.table will get your data into R, but to be able to call your data using the names from your
Excel spreadsheet, you’ll need to attach it, using attach(mydata). Once this is done, you can access
any column from the dataframe separately by typing in the name of the column.

If you’re working with several dataframes, it’s a good idea to use detach(mydata) to detach a
dataframe when you’re done with it – otherwise, there can be confusion between variables with the
same name in different dataframes, leading to error messages.

Before you start working with your data, it’s often a good idea to view it manually or plot it (see R-
Help_9) to check that everything’s as it should be and avoid problems later on.

BIO2426 Analysis of Biological Data Page 3 of 3

You might also like