You are on page 1of 8

Fall 2005

STATISTICS 579
R Tutorial : Vectors, Matrices, and Arrays

1. Creating Matrices:
Recall that a vector is an R object; matrix, array, and data frame are examples of other classes
of R objects. The R function matrix() as used below creates a 3 × 2 matrix object m using
the data in the form of a vector, the row, and the column sizes, respectively, as the arguments.
> m=matrix(c(1.2,3.5,4.7,1.8,-6.4,5.3),3,2)
> m
[,1] [,2]
[1,] 1.2 1.8
[2,] 3.5 -6.4
[3,] 4.7 5.3
The arguments to an R function may be specified as named arguments i.e., in the form
name=value or just by specifying the values if they are provided in the same sequence as
given in the function specification, i.e., as positional arguments. For example, arguments can
be specified to the matrix function in a different order from above using the named form, to
obtain the same result:
> matrix(c(1.2,3.5,4.7,1.8,-6.4,5.3),ncol=2,nrow=3)
[,1] [,2]
[1,] 1.2 1.8
[2,] 3.5 -6.4
[3,] 4.7 5.3
Also recall that data from external files may be directly “scanned” to matrix() for creating
data matrices. Recall that earlier we used:
> insulin1=matrix(scan("insulin.data"),ncol=3,byrow=T)
Read 24 items
The functions dim() and dimnames() may be used to determine or assign the corresponding
attributes, respectively, of matrix objects, as shown below. In a similar fashion, elements of
a vector object may also be assigned names using the names() function.
> dim(m)
[1] 3 2
> dimnames(m)
NULL
> dimnames(m)=list(paste("Row",1:3),paste("Col",c(1,2)))
> m
Col 1 Col 2
Row 1 1.2 1.8
Row 2 3.5 -6.4
Row 3 4.7 5.3
> dimnames(m)
[[1]]
[1] "Row 1" "Row 2" "Row 3"

[[2]]
[1] "Col 1" "Col 2"

1
> h
[1] 15.1 11.3 7.0 9.0
> names(h)=c("APE","BOX","CAT","DOG")
> h
APE BOX CAT DOG
15.1 11.3 7.0 9.0

There are several functions that help perform complex matrix operations. Two of these are
introduced below but their uses will be illustrated in examples that appear later. The row()
function operating on a matrix, returns a matrix of integers indicating the row number of
the elements of the matrix. Obviously, the returned matrix is of the same dimensions as the
argument. Similarly, the col() function returns a matrix of column numbers.
> row(m)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
> col(m)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3

Note carefully that these would be the same for any 2 × 3 matrix.

2. Matrix Operations:
A variety of operators (e.g.,%*%) and functions (e.g., rbind(), solve(), diag() etc. are
available to extract information from matrix objects or perform computations involving ma-
trix operations. The functions rbind() and cbind() allow appending of row or columns to
matrices:
> rm(m)
> mdata=c(1.2,3.5,4.7,1.8,-6.4,5.3)
> m=matrix(mdata,ncol=3,byrow=T);m
[,1] [,2] [,3]
[1,] 1.2 3.5 4.7
[2,] 1.8 -6.4 5.3
> m1=rbind(1:3,m);m1
[,1] [,2] [,3]
[1,] 1.0 2.0 3.0
[2,] 1.2 3.5 4.7
[3,] 1.8 -6.4 5.3
> m2=cbind(2,m1);m2
[,1] [,2] [,3] [,4]
[1,] 2 1.0 2.0 3.0
[2,] 2 1.2 3.5 4.7
[3,] 2 1.8 -6.4 5.3

The operator %*% requires that the two matrices conform to matrix multiplication. The func-
tion t() transposes a matrix and function solve() may be used either to find the inverse of
a matrix or to solve a set of linear equations, as illustrated below:

2
> mm=m2%*%t(m2)
> mm
[,1] [,2] [,3]
[1,] 18.0 26.30 8.90
[2,] 26.3 39.78 8.67
[3,] 8.9 8.67 76.29
> m
[,1] [,2] [,3]
[1,] 1.2 3.5 4.7
[2,] 1.8 -6.4 5.3
> m%*%c(1,-1)
Error in m %*% c(1, -1) : non-conformable arguments
> solve(mm)
[,1] [,2] [,3]
[1,] 2.09544227 -1.3659267 -0.08922338
[2,] -1.36592672 0.9161643 0.05523140
[3,] -0.08922338 0.0552314 0.01723990
> solve(mm,c(10.3,24.5,36.7))
[1] -15.156647 10.403973 1.066873
The previous command solves the linear system:
18.0x1 + 26.30x2 + 8.90x3 = 10.3
28.3x1 + 39.78x2 + 8.67x3 = 24.5
8.9x1 + 8.67x2 + 76.29x3 = 36.7
Several other linear algebra functions of interest are chol(), qr(), backsolve(), forwardsolve(),
and ginv(). These are useful for performing many statistical computations and will be dis-
cussed in other courses. As an example, chol() performs a factorization of a symmetric
positive-definite matrix X into the form X = R0 R where R is an upper triangular matrix. As
an application, we may use this factorization to generate random variables from the p−variate
multivariate Normal distribution y ∼ N (µ, Σ) using the relationship y = µ + Rz where
z ∼ N (0, I) and Σ = R0 R. It is easy to generate samples from the p−variate multivariate
multivariate Normal distribution N (0, I): just generate a random sample of size p from the
univariate standard Normal distribution.
> R=chol(mm)
> R
[,1] [,2] [,3]
[1,] 4.242641 6.198969 2.097750
[2,] 0.000000 1.163090 -3.726186
[3,] 0.000000 0.000000 7.616100
> t(R)%*%R
[,1] [,2] [,3]
[1,] 18.0 26.30 8.90
[2,] 26.3 39.78 8.67
[3,] 8.9 8.67 76.29
The function det() calculates the determinant, and eigen() operates on square matrices
and returns two components of a list:values containing the eigen values, and vectors is a
matrix containing the corresponding eigenvectors.
> det(mm)
[1] 1412.421

3
> eigen(mm)
$values
[1] 82.2943241 51.4420374 0.3336385

$vectors
[,1] [,2] [,3]
[1,] 0.2668772 -0.4810885 0.83506315
[2,] 0.3483415 -0.7597540 -0.54902828
[3,] 0.8985737 0.4374103 -0.03517792

The function diag() performs several operations depending on whether its argument is a
scalar, a vector, or a matrix. If the argument is a scalar, diag() returns an identity matirix
of that dimension; if the argument is a vector, it returns a diagonal matrix with the elements
of the vector as its diagonal elements. If the argument is a matrix, diag() returns a vector
containing the diagonal elements of the matrix.
> diag(4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
> diag(h)
[,1] [,2] [,3] [,4]
[1,] 15.1 0.0 0 0
[2,] 0.0 11.3 0 0
[3,] 0.0 0.0 7 0
[4,] 0.0 0.0 0 9
> diag(mm)
[1] 18.00 39.78 76.29
> m
[,1] [,2] [,3]
[1,] 1.2 3.5 4.7
[2,] 1.8 -6.4 5.3
> diag(m)
[1] 1.2 -6.4

3. Subscripting Vectors:
The elements of a vector can be extracted using an index vector enclosed in square brackets
i.e., it is said to be used as a substript. The use of subscripts is illustrated below to reference
or extract elements of vectors:
> hh
[1] 15.1 11.3 7.0 9.0 0.0 0.0 0.0 15.1 11.3 7.0 9.0
> hh[1:5]
[1] 15.1 11.3 7.0 9.0 0.0
> hh[c(1,5,8)]
[1] 15.1 0.0 15.1
> hh[-c(1,5,8)]
[1] 11.3 7.0 9.0 0.0 0.0 11.3 7.0 9.0

The use of the negative subscripts causes all values except those specified in the index vector
to be extracted. The use of logical values as indices is perhaps the most useful of all operations

4
involving subscripts. If an index vector consisting of TRUE and FALSE values is used as a
subscript, the values in the vector for which the subscript is TRUE are extracted. Such index
vectors are usually created by comparing the vector to a scalar using a comparison operator.
For example:
> hh>0
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
> hh[hh>0]
[1] 15.1 11.3 7.0 9.0 15.1 11.3 7.0 9.0
> attach(chickwts)
> weight
[1] 179 160 136 227 217 168 108 124 143 140 309 229 181 141 260 203 148 169 213
[20] 257 244 271 243 230 248 327 329 250 193 271 316 267 199 171 158 248 423 340
[39] 392 339 341 226 320 295 334 322 297 318 325 257 303 315 380 153 263 242 206
[58] 344 258 368 390 379 260 404 318 352 359 216 222 283 332
> wtmean=mean(weight)
> wtsd=sd(weight)
> outsiders=sum(weight<wtmean-2*wtsd|weight>wtmean+2*wtsd)
> outsiders
[1] 1
> weight[weight<wtmean-2*wtsd|weight>wtmean+2*wtsd]
[1] 423
> index=weight<wtmean-2*wtsd|weight>wtmean+2*wtsd
> index
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> weight[index]
[1] 423
> (1:length(weight))[index]
[1] 37
> seq(along=weight)[index]
[1] 37
> precip
Mobile Juneau Phoenix Little Rock
67.0 54.7 7.0 48.5
Los Angeles Sacramento
14.0 17.2 ...............................
...............................
Cheyenne San Juan
14.6 59.2

> seq(along=precip)[min(precip)==precip]
[1] 3
> names(precip)[seq(along=precip)[min(precip)==precip]]
[1] "Phoenix"
> names(precip)[min(precip)==precip]
[1] "Phoenix"

5
4. Subscripting Matrices:
Subscripts or index vectors can be used to extract or replace elements, entire rows or columns,
and submatrices of matrix objects:
> mdata=c(1.2,3.5,4.7,1.8,-6.4,5.4,-1.9,2.7,3.4,-2.0,7.2,4.5)
> m=matrix(mdata,3,4,byrow=T)
> m
[,1] [,2] [,3] [,4]
[1,] 1.2 3.5 4.7 1.8
[2,] -6.4 5.4 -1.9 2.7
[3,] 3.4 -2.0 7.2 4.5
> m[,2]
[1] 3.5 5.4 -2.0
> m[2,3]
[1] -1.9
> m[2,2:3]
[1] 5.4 -1.9
> m[2:3,c(1,3)]
[,1] [,2]
[1,] -6.4 -1.9
[2,] 3.4 7.2
> m[2,2]=5.5
> m
[,1] [,2] [,3] [,4]
[1,] 1.2 3.5 4.7 1.8
[2,] -6.4 5.5 -1.9 2.7
[3,] 3.4 -2.0 7.2 4.5
Just as in the case of a vector if the matrix is compared to a scalar, a matrix of logical values
(TRUE or FALSE) is created. This matrix can be used as a subscript to index the elements of
the matrix that correspond to the TRUE values. These element may be extracted or changed
to new values in place:
> m<0
[,1] [,2] [,3] [,4]
[1,] FALSE FALSE FALSE FALSE
[2,] TRUE FALSE TRUE FALSE
[3,] FALSE TRUE FALSE FALSE
> m[m<0]
[1] -6.4 -2.0 -1.9
> row(m)[m<0]
[1] 2 3 2
> col(m)[m<0]
[1] 1 2 3
> m[m<0]=0
> m
[,1] [,2] [,3] [,4]
[1,] 1.2 3.5 4.7 1.8
[2,] 0.0 5.5 0.0 2.7
[3,] 3.4 0.0 7.2 4.5
One very useful way of extracting information from a large matrix is to use the row or column
name attributes of the matrix in logical expressions as subscripts. This is illustrated using
the R built-in data set named state.x77 which is a matrix with 50 rows and 8 columns.

6
R currently contains several inter-related “state” data sets all of which are loaded using the
data(state) command.
> help("state")
> data(state)
> colnames(state.x77)
[1] "Population" "Income" "Illiteracy" "Life Exp" "Murder"
[6] "HS Grad" "Frost" "Area"

The technique used previously to extract a subset of a vector may be extended to extract
rows of a matrix that meets a specified condition. The first expression below causes the rows
of the matrix state.x77 (which correspond to States) for which values of the column named
Area (i.e., column 8) are greater than 80000 to be extracted. In the next expression only the
column named Income is printed from this subset of rows.
> state.x77[state.x77[,"Area"]>80000,]
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Idaho 813 4119 0.6 71.87 5.3 59.5 126 82677
Kansas 2280 4669 0.6 72.58 4.5 59.9 114 81787
Montana 746 4347 0.6 70.56 5.0 59.2 155 145587
Nevada 590 5149 0.5 69.03 11.5 65.2 188 109889
New Mexico 1144 3601 2.2 70.32 9.7 55.2 120 121412
Oregon 2284 4660 0.6 72.13 4.2 60.0 44 96184
Texas 12237 4188 2.2 70.90 12.2 47.4 35 262134
Utah 1203 4022 0.6 72.90 4.5 67.3 137 82096
Wyoming 376 4566 0.6 70.29 6.9 62.9 173 97203

> state.x77[state.x77[,"Area"]>80000,"Income"]
Alaska Arizona California Colorado Idaho Kansas Montana
6315 4530 5114 4884 4119 4669 4347
Nevada New Mexico Oregon Texas Utah Wyoming
5149 3601 4660 4188 4022 4566

As we may have anticipated there is an R function named subset() to perform this type of
operations on vectors, matrices, as well as data frames.
> subset(state.x77,state.x77[,"Area"]>80000)
> help(Orange)
> Orange
> subset(Orange,circumference<80)
Tree age circumference
1 1 118 30
2 1 484 58
8 2 118 33
. . . .
. . . .
29 5 118 30
30 5 484 49

7
5. Creating Arrays:
An array is a generalization of a matrix and may have one, two, three, or more dimensions.
Thus, the dim attribute of an array may have more than two elements. The function array()
may be used to create arrays. Below, we create an array a3 with 4 tiers (or faces, or slices)
of 2 × 3 matrices. In this example note how the two sets of indices correspond:
a3[1,1,1] <----- a[1]
a3[2,1,1] <----- a[2]
a3[1,2,1] <----- a[3]
a3[2,2,1] <----- a[4]
...........
a3[1,1,2] <----- a[7]
...........
a3[2,3,4] <----- a[24]

That is, the first index of a3 moves the fastest, the last index moves the slowest.
> a=seq(1,24)
> a3=array(a,dim=c(2,3,4))
> a3
, , 1

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

, , 2

[,1] [,2] [,3]


[1,] 7 9 11
[2,] 8 10 12

, , 3

[,1] [,2] [,3]


[1,] 13 15 17
[2,] 14 16 18

, , 4

[,1] [,2] [,3]


[1,] 19 21 23
[2,] 20 22 24

The following is a well-known multivariate data set known as the “iris3” data, that exists in
the R database:

> data(iris3)
> iris3

You might also like