You are on page 1of 8

Docs 12.

Two Way Tables

12. Two Way Tables


Contents

Creating a Table from Data


Creating a Table Directly
Tools For Working With Tables
Graphical Views of Tables

Here we look at some examples of how to work with two way tables. We assume that you can enter data
and understand the different data types.

12.1. Creating a Table from Data


We rst look at how to create a table from raw data. Here we use a ctitious data set, smoker.csv. This
data set was created only to be used as an example, and the numbers were created to match an example
from a text book, p. 629 of the 4th edition of Moore and McCabes Introduction to the Practice of
Statistics. You should look at the data set in a spreadsheet to see how it is entered. The information is
ordered in a way to make it easier to gure out what information is in the data.

The idea is that 356 people have been polled on their smoking status (Smoke) and their socioeconomic
status (SES). For each person it was determined whether or not they are current smokers, former
smokers, or have never smoked. Also, for each person their socioeconomic status was determined (low,
middle, or high). The data le contains only two columns, and when read R interprets them both as factors:
>smokerData<read.csv(file='smoker.csv',sep=',',header=T)
>summary(smokerData)
SmokeSES
current:116High:211
former:141Low:93
never:99Middle:52

You can create a two way table of occurrences using the table command and the two columns in the data
frame:

>smoke<table(smokerData$Smoke,smokerData$SES)
>smoke

HighLowMiddle
current514322
former922821
never68229

In this example, there are 51 people who are current smokers and are in the high SES. Note that it is
assumed that the two lists given in the table command are both factors. (More information on this is
available in the chapter on data types.)

12.2. Creating a Table Directly


Sometimes you are given data in the form of a table and would like to create a table. Here we examine how
to create the table directly. Unfortunately, this is not as direct a method as might be desired. Here we
create an array of numbers, specify the row and column names, and then convert it to a table.
In the example below we will create a table identical to the one given above. In that example we have 3
columns, and the numbers are specied by going across each row from top to bottom. We need to specify
the data and the number of rows:

>smoke<matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
>colnames(o)<c("High","Low","Middle")
>rownames(o)<c("current","former","never")
>smoke<as.table(smoke)
>smoke
HighLowMiddle
current514322
former922821
never68229

12.3. Tools For Working With Tables


Here we look at some of the commands available to help look at the information in a table in different
ways. We assume that the data using one of the methods above, and the table is called smoke. First,
there are a couple of ways to get graphical views of the data:

>barplot(smoke,legend=T,beside=T,main='SmokingStatusbySES')
>plot(smoke,main="SmokingStatusBySocioeconomicStatus")

There are a number of ways to get the marginal distributions using the margin.table command. If you just
give the command the table it calculates the total number of observations. You can also calculate the
marginal distributions across the rows or columns based on the one optional argument:
>margin.table(smoke)
[1]356
>margin.table(smoke,1)

currentformernever
11614199
>margin.table(smoke,2)

HighLowMiddle
2119352

Combining these commands you can get the proportions:

>smoke/margin.table(smoke)

HighLowMiddle
current0.143258430.120786520.06179775
former0.258426970.078651690.05898876
never0.191011240.061797750.02528090
>margin.table(smoke,1)/margin.table(smoke)

currentformernever
0.32584270.39606740.2780899
>margin.table(smoke,2)/margin.table(smoke)

HighLowMiddle
0.59269660.26123600.1460674

That is a little obtuse, so fortunately, there is a better way to get the proportions using the prop.table
command. You can specify the proportions with respect to the different marginal distributions using the
optional argument:
>prop.table(smoke)

HighLowMiddle
current0.143258430.120786520.06179775
former0.258426970.078651690.05898876
never0.191011240.061797750.02528090
>prop.table(smoke,1)

HighLowMiddle
current0.43965520.37068970.1896552
former0.65248230.19858160.1489362
never0.68686870.22222220.0909091
>prop.table(smoke,2)

HighLowMiddle
current0.24170620.46236560.4230769
former0.43601900.30107530.4038462
never0.32227490.23655910.1730769

If you want to do a chi-squared test to determine if the proportions are different, there is an easy way to
do this. If we want to test at the 95% condence level we need only look at a summary of the table:

>summary(smoke)
Numberofcasesintable:356
Numberoffactors:2
Testforindependenceofallfactors:
Chisq=18.51,df=4,pvalue=0.0009808

Since the p-value is less that 5% we can reject the null hypothesis at the 95% condence level and can say
that the proportions vary.
Of course, there is a hard way to do this. This is not for the faint of heart and involves some linear algebra
which we will not describe. If you wish to calculate the table of expected values then you need to multiply
the vectors of the margins and divide by the total number of observations:

>expected<as.array(margin.table(smoke,1))%*%t(as.array(margin.table(smoke,2)))/margin.table(smoke)
>expected

HighLowMiddle
current68.7528130.3033716.94382
former83.5702236.8342720.59551
never58.6769725.8623614.46067

(The t function takes the transpose of the array.)

The result in this array and can be directly compared to the existing table. We need the square of the
difference between the two tables divided by the expected values. The sum of all these values is the Chi-
squared statistic:

>chi<sum((expectedas.array(smoke))^2/expected)
>chi
[1]18.50974

We can then get the p-value for this statistic:

>1pchisq(chi,df=4)
[1]0.0009808236

12.4. Graphical Views of Tables


The plot command will automatically produce a mosaic plot if its primary argument is a table.
Alternatively, you can call the mosaicplot command directly.

>smokerData<read.csv(file='smoker.csv',sep=',',header=T)
>smoke<table(smokerData$Smoke,smokerData$SES)
>mosaicplot(smoke)
>help(mosaicplot)
>

The mosaicplot command takes many of the same arguments for annotating a plot:

>mosaicplot(smoke,main="Smokers",xlab="Status",ylab="EconomicClass")
>

If you wish to switch which side (horizontal versus vertical) to determine the primary proportion then you
can use the sort option. This can be used to switch whether the width or height is used for the rst
proportional length:

>mosaicplot(smoke,main="Smokers",xlab="Status",ylab="EconomicClass")
>mosaicplot(smoke,sort=c(2,1))
>

Finally if you wish to switch which side is used for the vertical and horzintal axis you can use the dir option:

>mosaicplot(smoke,main="Smokers",xlab="Status",ylab="EconomicClass")
>mosaicplot(smoke,dir=c("v","h"))
>
Previous Next

Sponsorship

This site generously supported by Datacamp. Datacamp offers a free interactive introduction to R
coding tutorial as an additional resource. Already over 100,000 people took this free tutorial to
sharpen their R coding skills.

You might also like