You are on page 1of 13

Lecture 5: Matrix Algebra


In Song Kim†

September 7, 2011

1 Matrix Algebra
1.1 Definition
• Matrix: A matrix is an array of mn real numbers arranged in m rows by n columns.
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= .
 
.. .. .. 
 .. . . . 
am1 am2 · · · amn

Vectors are special cases of matrices; a column vector of length k is a k × 1 matrix, while
a row vector of the same length is a 1 × k matrix. You can also think of larger matrices as
being made up of a collection of row or column vectors. For example,


A = a1 a2 · · · am

1.2 Properties of Matrices


• Matrix Addition: Let A and B be two m × n matrices. Then
 
a11 + b11 a12 + b12 · · · a1n + b1n
 a21 + b21 a22 + b22 · · · a2n + b2n 
A+B=
 
.. .. . . .. 
 . . . . 
am1 + bm1 am2 + bm2 · · · amn + bmn

Matrices A and B must be the same size, allowing them to be conformable for addition.

Example 1    
1 2 3 0 1 0
A= , B=
4 5 6 1 1 1
 
1 3 3
A+B=
5 6 7

Please do not distribute without permission.

Ph.D. candidate, Department of Politics, Princeton University, Princeton NJ 08544.Email: insong@princeton.edu

1
• Scalar Multiplication: Given the scalar k, the scalar multiplication of kA is
   
a11 a12 · · · a1n ka11 ka12 · · · ka1n
 a21 a22 · · · a2n   ka21 ka22 · · · ka2n 
kA = k  . ..  =  ..
   
. .. . . .. . . .. 
 . . . .   . . . . 
am1 am2 · · · amn kam1 kam2 · · · kamn

Example 2  
1 2 3
k = 3, A=
4 5 6
 
3 6 9
kA =
12 15 18

• Matrix Multiplication: If A is an m×k matrix and B is a k ×n matrix, then their product


C = AB is the m × n matrix where

cij = ai1 b1j + ai2 b2j + · · · + aik bkj


   
a b   aA + bC aB + bD
A B
Example 3 1.  c d  =  cA + dC cB + dD 
C D
e f eA + f C eB + f D
 
  −2 5    
1 2 −1  1(−2) + 2(4) − 1(2) 1(5) + 2(−3) − 1(1) 4 −2
2. 4 −3 = =
3 1 4 3(−2) + 1(4) + 4(2) 3(5) + 1(−3) + 4(1) 6 16
2 1

The number of columns of the first matrix must equal the number of rows of the second matrix.
If so they are conformable for multiplication. The sizes of the matrices (including the resulting
product) must be
(m × k)(k × n) = (m × n)
NOTE HOW THE INSIDE DIMENSIONS ARE THE SAME!!

 
  −2 4 2
1 0 3 
Example 4 1 0 0
2 −1 −2
−1 1 −1

  
1 2 −1 2
Example 5
1 1 1 −1

 
3 2 0  
0 1 1 1 0
1 2 1 2 1 0
Example 6 
3 4 1
2 1 1 1
0 5 2

2
1.3 Matrix Algebra Laws
• Laws of Matrix Algebra:

1. Associative (A + B) + C = A + (B + C)
(AB)C = A(BC)
2. Commutative A + B = B + A
3. Distributive A(B + C) = AB + AC
(A + B)C = AC + BC

• Commutative law for multiplication does not hold – the order of multiplication matters:

AB 6= BA

Have class calculate the following, which is the reverse of above example.
  
−1 2 1 2
1 −1 1 1

Example 7    
1 3 2 1
A= , B=
−1 2 1 1
   
5 4 1 8
AB = , BA =
0 1 0 5

2 Transpose
Transpose: The transpose of the m × n matrix A is the n × m matrix AT (sometimes written
A0 ) obtained by interchanging the rows and columns of A.

• Examples:
 
  4 1
4 −1 2
1. A = , AT = −1 5 
1 5 −2
2 −2
 
2
BT = 2 −3 1

2. B = −3 ,
1
 
  1 3 2
1 2 2 1 2 4 2
3. As class exercize: A = 3 4 4 2, transpose: 
2

4 2
2 2 2 1
1 2 1

• The following rules apply for transposed matrices:

1. (A + B)T = AT + BT
2. (AT )T = A
3. (sA)T = sAT

3
4. (AB)T = BT AT

Example 8
Example of (AB)T = BT AT :
 
  0 1
1 3 2
A= , B = 2 2 
2 −1 3
3 −1
  T
  0 1  
T 1 3 2 2 2  = 12 7
(AB) = 
2 −1 3 5 −3
3 −1
 
  1 2  
T T 0 2 3  12 7
B A = 3 −1 =
1 2 −1 5 −3
2 3

Proposition 2.1
(AB)T = BT AT

It is important to understand that Ax combines the columns of A while xT AT combines the


rows of AT (show example). Many times in statistics we will make use of this sort of fact. If we
have a mathematical object xT AT that is is multiplied by other terms, we might want to express
it as Ax in order to simplify it with some other part of the equation.

Example 9 Calculate A + 2B > , AB, and BA using the following matrices.


 
  −2 5
1 2 −1
A= , B =  4 −3  ,
3 1 3
2 1

3 Square Matrices
• Square matrices have the same number of rows and columns; a k×k square matrix is referred
to as a matrix of order k.

• The diagonal of a square matrix is the vector of matrix elements that have the same sub-
scripts. If A is a square matrix of order k, then its diagonal is [a11 , a22 , . . . , akk ]0 .

• Trace: The trace of a square matrix A is the sum of the diagonal elements:

tr(A) = a11 + a22 + · · · + akk

Properties of the trace operator: If A and B are square matrices of order k, then

1. tr(A + B) = tr(A) + tr(B)


2. tr(AT ) = tr(A)

4
3. tr(sA) = str(A)
4. tr(AB) = tr(BA)

 
1 9 0
Example 10 2 3 9 , trace: 1
4 5 −3

• Here are some examples of the square matrix:

1. Symmetric Matrix: A matrix A is symmetric if A = AT ; this implies that aij = aji


for all i and j.
Examples:  
  4 2 −1
1 2
A= = AT , B =  2 1 3  = B0
2 1
−1 3 1
2. Diagonal Matrix: A matrix A is diagonal if all of its non-diagonal entries are zero;
formally, if aij = 0 for all i 6= j
Examples:  
  4 0 0
1 0
A= , B= 0
 1 0
0 2
0 0 1
3. Triangular Matrix: A matrix is triangular one of two cases. If all entries below the
diagonal are zero (aij = 0 for all i > j), it is upper triangular. Conversely, if all
entries above the diagonal are zero (aij = 0 for all i < j), it is lower triangular.
Examples:    
1 0 0 1 7 −4
ALT =  4 2 0 , AU T = 0 3 9 
−3 2 5 0 0 −3
4. Identity Matrix: The n × n identity matrix In is the matrix whose diagonal elements
are 1 and all off-diagonal elements are 0. Examples:
 
  1 0 0
1 0
I2 = , I3 = 0 1 0
0 1
0 0 1

4 The Inverse of a Matrix


From elementary algebra, we define an inverse of a as a1 such that a a1 = 1. THIS IS DIFFERENT
FROM THE INVERSE OF A FUNCTION!! A similar property is important in matrix algebra.

Definition 1 Identity Matrix: An n × n with all values off the diagonal equal to 0 and all on
diagonal values equal to 1.
 
1 0
Example 11 I2 =
0 1

5
Definition 2 Inverse Matrix: An n × n matrix A is invertible if there exists an n × n matrix
A−1 such that
AA−1 = A−1 A = In
A−1 is the inverse of A. If there is no such A−1 , then A is noninvertible (another term for
this is “singular”).

Example 12 Let
−1 23
   
2 3
A= , B=
2 2 1 −1
Since
AB = BA = In
we conclude that B is the inverse, A−1 , of A and that A is nonsingular.

Theorem 4.1 (Uniqueness of Inverse) The inverse of a matrix, if it exists, is unique. We


denote the unique inverse of A by A−1 .

Theorem 4.2 (Properties of Inverse) Let A and B be nonsingular n × n matrices.


1. AB is nonsingular and (AB)−1 = B −1 A−1 .

2. A−1 is nonsingular and (A−1 )−1 = A.

3. (A> )−1 = (A−1 )> .

One application of inverting a matrix is to solve a system of linear equations. In fact, matrices
can be motivated in terms of linear equations. Consider a set of m linear equations of the form

y1 = a11 x1 + a12 x2 + · · · + a1n xn


y2 = a21 x1 + a22 x2 + · · · + a2n xn
.. ..
. .
ym = am1 x1 + am2 x2 + · · · + amn xn

Then, its matrix representation is Y = AX where


     
a11 a12 · · · a1n x1 y1
 a21 a22 · · · a2n   x2   y2 
A= . ..  , X =  .. , Y = .
     
. .. .. ..
 . . . .   .   . 
am1 am2 · · · amn xm ym

We call A a coefficient matrix. With this notation, we can see that A−1 (provided that A is
nonsingular) solves this system since we obtain X = A−1 Y by premultiplying the equation by A−1 .

6
Example 13 Confirm that
−1
−1 32
  
2 3
=
2 2 1 −1
solve the following system of linear equations by using the inverse of matrix.
2x1 + 3x2 = 1
2x1 + 2x2 = 2
Class examples:    
−1 2 1 −2
Show B is inverse of A: A = ,B=
−1 1 1 −1
 
1 4
Find the inverse of the matrix A, A =
−1 −3
 
a b
• It turns out that there is a shortcut for 2x2 matrices that are invertible: A = , then
c d
 
d −b
A−1 = ad−bc
1
. Show this.
−c a

5 Finding the inverse of a matrix


• We know that if B is the inverse of A, then
AB = BA = In
Looking only at the first and last parts of this
AB = In
Solving for B is equivalent to solving for n linear systems, where each column of B is solved for
the corresponding column in In . In performing Gauss-Jordan elimination for each individual
system, the same row operations will be performed on A regardless of the column of B and
In . Hence, we can solve the systems simultaneously by augmenting A with In and performing
Gauss-Jordan elimination on A. Note that for the square matrix A, Gauss-Jordan elimination
should result in A becoming row equivalent to In . Therefore, if Gauss-Jordan elimination on
[A|In ] results in [In |B], then B is the inverse of A. Otherwise, A is singular.

To summarize: To calculate the inverse of A

1. Form the augmented matrix [A|In ]


2. Using elementary row operations, transform the augmented matrix to reduced row ech-
elon form.
3. The result of step 2 is an augmented matrix [C|B].
(a) If C = In , then B = A−1 .
(b) If C 6= In , then C has a row of zeros. A is singular and A−1 does not exist.
 
1 1 1
Exercise 1 Find the inverse of A = 0 2 3
5 5 1

7
6 Properties of matrices
In our discussion of systems of equations we said that there were three possible cases: one solution
(intersecting lines or planes), no solutions (parallel lines or planes), or infinite solutions (identical
lines or planes).
In matrix algebra we can identify which case a matrix fall into by identifying certain mathe-
matical properties of the matrix. There are three key pieces of information.

1. the number of equations m,

2. the number of unknowns n,

3. the rank of the matrix representing the linear system.

Definition 3 Rank: The rank of a matrix is the number of nonzero rows in its row echelon form.
The rank corresponds to the maximum number of linearly independent row or column vectors in
the matrix.
Properties of rank:
 an i, j matrix, rank(A) ≤ min(i, j); rank(AB) ≤ min(rankA, rankB)
For
1 2 3
0 4 5
0 0 0

• Existence of Solutions:

1. One solution: Necessary condition for a system to have a unique solution is that there
are exactly as many equations as unknowns

2. Infinite solutions: If a system has a solution and has more unknowns than equations,
then it has infinitely many solutions
3. No solution: There is a row with all 0’s in RREF form.

ASIDE: You will often hear the expression “degrees of freedom” problem in statistical analyses.
At the extreme, this is a problem where you have more independent variables than observations.
In this case, there does not exist a solution and you will not get anything out of your computer
software that performs the estimation. More generally, consider the equation 10 + 3 + 5 + x + y = 7.
This has one degree of freedom. Once we fix x then y is now no longer “free to vary”.

• Find the rank and solution to each system of equations.

1.
x − 3y = −3
2x + y = 8

2.
x + 2y + 3z = 6
2x − 3y + 2z = 14
3x + y − z = −2

8
3.
x + 2y − 3z = −4
2x + y − 3z = 4

4.
x + 2y + 3z + 4w = 5
x + 3y + 5z + 7w = 11
x − z − 2w = −6

7 Solving and Simplifying Linear Systems with Inverses


In elementary algebra we often multiplied by inverses to simplify equations. We will also do this
often with matrix algebra.

• Matrix representation of a linear system

Ax = b

If A is an n × n matrix,then Ax = b is a system of n equations in n unknowns.

If A is invertible then A−1 exists. To solve this system, we can left multiply each side by A−1
(must be left multiplication on both sides!!).

A−1 (Ax) = A−1 b


(A−1 A)x = A−1 b
In x = A−1 b
x = A−1 b

Hence, given A and b and given that A is nonsingular, then x = A−1 b is a unique solution to this
system.

• Notice also that the requirements for A to be nonsingular correspond to the requirements for
a linear system to have a unique solution: rankA =rowsA =columnsA.

8 Determinants
Finding the inverse of a particular matrix can be a pain, though there are some short cuts you could
use that we won’t cover here. Alternatively, we can make use of another property, the determinant.
If the determinant is non-zero then the matrix is invertible.

• For a 1 × 1 matrix A = a the determinant is simply a. We want the determinant to equal


zero when the inverse does not exist. Since the inverse of a, 1/a, does not exist when a = 0.

|a| = a
 
a11 a12 a11 a12
• For a 2 × 2 matrix A = , the determinant is defined as
= det A =
a21 a22 a21 a22
a11 a22 − a12 a21 == a11 |a22 | − a12 |a21 |

Notice that with matrices the verticle bars mean the determinant, not the absolut value.

9
8.1 Extension to larger matrices
Things get a bit more cumbersome once we start dealing with larger matrices. In particular:

• 1. Deteriminant of 3 × 3 matrix:

a11 a12 a13
a22 a23 a21 a23 a21 a22
a21 a22 a23 = a11
− a12
+ a13


a31 a32 a33 a31 a33 a31 a32
a32 a33

2. Deteriminant of n × n matrix. Let Aij be the (n − 1) × (n − 1) submatrix of A obtained


by deleting row i and column j. Let the (i, j)th minor of A be

Mij = |Aij |

Then for any n × n matrix A

|A| = a11 M11 − a12 M12 + · · · + (−1)n+1 a1n M1n

• Example: Does the following matrix have an inverse?


 
1 1 1
A = 0 2 3 
5 5 1

1. Calculate its determinant.



2 3 0 3 0 2
|A| = 1
− 1
+ 1

5 1 5 1 5 5
= 1(2 − 15) − 1(0 − 15) + 1(0 − 10)
= −13 + 15 − 10
= −8

2. Since |A| =
6 0, we conclude that A has an inverse.

There is nothing complicated here, just keep track of all the addition and substraction, etc.

Definition 4 Let A be an n × n matrix.

1. Let Mij be the (n − 1) × (n − 1) submatrix of A obtained by deleting the ith row and jth
column of A. Then, |Mij | is called the (i, j)th minor of A.

2. The cofactor of Aij of A is defined as Cij = (−1)i+j |Mij |

Now, the following theorem gives us a new method to compute determinants.

Theorem
Pn 8.1 (Cofactor Expansion) Let A be an n × n matrix. Then, for any i and j, |A| =
j=1 a ij C ij , where aij is the (i, j) the element of A.

10
8.2 Special matrices and their determinants
• Triangular or Diagonal Matrices: For any upper-triangular, lower-triangular, or diagonal
matrix, the determinant is just the product of the diagonal terms.

Suppose we have the following square matrix in row echelon form (i.e., upper triangular)
 
r11 r12 r13
R=  0 r22 r23 
0 0 r33

Then
r22 r23
|R| = r11
= r11 r22 r33
0 r33
Notice how the 0 terms let us cancel things out. Think about this in the diagonalized case!!

8.3 Properties of determinants


• Properties of Determinants:

1. |A| = |AT |

2. Interchanging rows changes the sign of the determinant


3. If two rows of A are exactly the same, then det A =0
4. If a row of A has all 0 then det A =0
5. If c 6= 0 is some constant then |A| = |cA|
6. Performing elementary row operations on a matrix does not change the determinant
other than by a scalar.
7. A square matrix is nonsingular iff its determinant is 6= 0
8. |AB| = |A||B|

8.4 Bringing it together: Relationship between rank, inverse, determinant, and


solutions:

Consider the following matrix equations that uses a non square matrix which has more rows than
columns.
 I.e.,we have
 more
 observations than we have variables.
a11 a12 b1  
x
a21 a22  X = b2  where X = 1
x2
a31 a32 b3
 
x1
We want to find X = to make this work. This matrix could be reduced using elementary
x2
row operations to zero out the last row. Thus, we would get a new matrix where the b3 would
have been transformed to something because of the application of the row operations to zero out
a31 , a32 . Lets call this new value of b3 , s + b3 , where s is just some scalar.

11
   
a11 a12 b1
Then we have: a21 a22  X =  b2 
0 0 s + b3

Performing matrix multiplication we obtain the following system of equations:

a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
0x1 + 0x2 = sb3

Now notice that no matter what we put into the equation for x1 , x2 we could never satisfy the
third equation (this could happen if b3 = 0, which means that no information at all is in the row
and it should have just been dropped from the matrix in the first place). Thus we have no solutions
to this system of equations.

Consider now the case of when we have


 more
 variables than observations, i.e., m < n.
    x1
a11 a12 a13 b
X = 1 where X = x2 
a21 a22 a23 b2
x3
Matrix multiplication gives us:

a11 x1 + a12 x2 + a13 x3 = b1


a21 x1 + a22 x2 + a23 x3 = b2

Elementary row operations will allow us to zero out two of the entries in the matrix, and will
in doing so put new coefficients on the a terms and will add something to the b terms. So lets just
use new variables to represent these transformed variables.

c11 x1 + 0x2 + c13 x3 = d1


0x1 + c22 x2 + c23 x3 = d2

Now notice that there is nothing we could do to zero out an entry in the third column without
losing the zero we obtained in the row we were trying to put a zero in the third column. Thus the
best we can do is to rewrite things as:

c11 x1 = d1 − c13 x3
c22 x2 = d2 − c23 x3

Which we can first manipulate to give us:

12
d1 c13 x3
x1 = −
c11 c11
d2 c23 x3
x2 = −
c22 c22
And here we see why we have an infinite number of solutions. Consider the first equation,
x1 = cd111 − c13 x3 c13
c11 . We have c11 in this expression. Lets say the value of this that we needed to solve
the equation was cc1311
= 5. But, we can get a 5 from this ratio in an infinite number of ways. E.g.,
c11 = 2 and c13 = 10, 10 2 . Thus there are an infinite number of values for c11 and c13 that would
solve our system of equations.

Assuming that our matrix A is in reduced echelong form, When m > n then we have no
solution and our rank is less than the number of rows, i.e., rank(A)<rows(A). When m < n we
have an infinite number of solutions and rank(A)<columns(A). Only when m = n do we have a
single solution, and rank(A)=row(A)=row(B). Furthermore, only when rank(A)=row(A)=row(B)
is the matrix invertible.

13

You might also like