You are on page 1of 89

umap

Modules and Monographs in


Undergraduate Mathematics
and its Applications Project

ELEMENTS OF
THE THEORY
OF GENERALIZED
INVERSES FOR
MATRICES
Randall E. Cline
Mathematics Department
University of Tennessee
Knoxville, Tennessee 37916

The Project acknowledges Robert M. Thrall,


Chairman of the UMAP Monograph Editorial
Board, for his help in the development and
review of this monograph.

Modules and Monographs in Undergraduate Mathematics


and its Applications Project
The goal of UMAP is to develop, through a community
of users and developers, a system of instructional modules
and monographs in undergraduate mathematics which may
be used to supplement existing courses and from which
complete courses may eventua1ly be bnilt.
The Project is gnided by a National Steering Committee
of mathematicians, scientists, and educators. UMAP is funded
by a grant from the National Science Foundation to Education
Development Center, Inc., a publicly supported, nonprofit
corporation engaged in educational research in the U.S. and
abroad.
The Project acknowledges the help of the Monograph
Editorial Board in the development and review of this
monograph. Members of the Monograph Editorial Board
include: Robert M. Thrall, Chairman, of Rice University;
Clayton Aucoin, Clemson University; James C. Frauenthal,
SUNY at Stony Brook; Helen Marcus-Roberts, Montclair
State College; Ben Noble, University of Wisconsin; Paul C.
Rosenbloom, Columbia University. Ex-officio members:
Michael Anbar, SUNY at Buffalo; G. Robert Boynton,
University of Iowa; Kenneth R. Rebman, California State
University; Carroll O. Wilde, Naval Postgraduate School;
Douglas A. Zahn, Florida State University.
The Project wishes to thank Thomas N.E. Greville of
the University of Wisconsin at Madison for his review of
this manuscript and Edwina R. Michener, UMAP Editorial
Consultant, for her assistance.
Project administrative staff: Ross L. Finney, Director;
Solomon Garfunkel, Associate Director/Consortium Coordinator;
Felicia DeMay, Associate Director for Administration; Barbara
Kelczewski, Coordinator for Materials Production.

ISBN-13: 978-0-8176-3013-3
DOl: 10.1007/978-1-4684-6717-8

e-ISBN-13: 978-1-4684-6717-8

Copyright1979 by Education Development Center, Inc. All rights reserved.


No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without
the prior written permission of the publisher.

TABLE OF CONTENTS

1.

2.

Preface .

INTRODUCTION

1.1

Preliminary Remarks

1.2
1.3

Matrix Notation and Terminology


A Rationale for Generalized Inverses

2
4

SYSTEMS OF EQUATIONS AND THE MOORE-PENROSE


INVERSE OF A MATRIX . . . . . . .
2.1
2.2
2.3
2.4

3.

4.

5.

Zero, One or Many Solutions of Ax


Full Rank Factorizations and the
Moore-Penrose Inverse
Some Geometric Illustrations
Miscellaneous Exercises

9
9

12
21
27

MORE ON MOORE-PENROSE INVERSES

29

3.1
3.2
3.3

29
40
56

Basic Properties of A+ . .
Applications with Matrices of Special Structure
Miscellaneous Exercises

DRAZIN INVERSES
4.1 The Drazin Inverse of a Square Matrix
4.2 An Extension to Rectangular Matrices
4.3 Expressions Relating Ad and A+
4.4 Miscellaneous Exercises

64

OTHER GENERALIZED INVERSES

71

5.1

Inverses That Are Not Unique

71

APPENDIX 1: Hints for Certain Exercises

79

APPENDIX 2: Selected References

85

Index to Principal Definitions

87

-iii-

57
57

68
70

Preface

The purpose of this monograph is to provide a concise


introduction to the theory of generalized inverses of
matrices that is accessible to undergraduate mathematics
majors.

Although results from this active area of research

have appeared in a number of excellent graduate level textbooks since 1971, material for use at the undergraduate
level remains fragmented.

The basic ideas are so fundamental,

however, that they can be used to unify various topics that


an undergraduate has seen but perhaps not related.
Material in this monograph was first assembled by the
author as lecture notes for the senior seminar in mathematics
at the University of Tennessee.

In this seminar one meeting

per week was for a lecture on the subject matter, and another
meeting was to permit students to present solutions to
exercises.

Two major problems were encountered the first

quarter the seminar was given.

These were that some of the

students had had only the required one-quarter course in


matrix theory and were not sufficiently familiar with
eigenvalues, eigenvectors and related concepts, and that many

-v-

of the exercises required fortitude.

At the suggestion of

the UMAP Editor, the approach in the present monograph is


(1) to develop the material in terms of full rank factorizations and to relegate all discussions using eigenvalues and
eigenvectors to exercises, and (2) to include an appendix of
hints for exercises.

In addition, it was suggested that the

order of presentation be modified to provide some motivation


for considering generalized inverses before developing the
algebraic theory.

This has been accomplished by introducing

the Moore-Penrose inverse of a matrix and immediately


exploring its use in characterizing particular solutions to
systems of equations before establishing many of its algebraic properties.
To prepare a monograph of limited length for use at the
undergraduate level precludes giving extensive references to
original sources.

Most of the material can be found in

texts such as Ben-Israel and Greville [2] or Rao and Mitra


[11] .

Every career is always influenced by colleagues.

The

author wishes to express his appreciation particularly to


T.N.E. Greville, L.D. Pyle and R.M. Thrall for continuing
encouragement and availability for consultation.
Randall E. Cline
Knoxville, Tennessee
September 1978

-vi-

1
Introduction

1.1

Preliminary Remarks
The material in this monograph requires a knowledge of

basic matrix theory available in excellent textbooks such as


Halmos

[7], Noble [9] or Strang [12].

Fundamental definitions

and concepts are used without the detailed discussion which


would be included in a self-contained work.

Therefore, it may

be helpful to have a standard linear algebra textbook for


reference if needed.
Many examples and exercises are included to illustrate
and complement the topics discussed in the text.
mended that every exercise be attempted.

It is recom-

Although perhaps

not always successful, the challenge of distinguishing among


what can be assumed, what is known and what must be shown is
an integral part of the development of the nebulous concept
called mathematical maturity.

-1-

1.2

Matrix Notation and Terminology


Throughout subsequent sections capital Latin letters

denote matrices and small Latin letters denote column vectors.


Unless otherwise stated, all matrices (and thus vectors--being
matrices having a single column) are assumed to have complex
numbers as elements.

Also, sizes of matrices are assumed to

be arbitrary, subject to conformability in sums and products.


For example, writing A+B tacitly assumes A and B have the
same size, whereas AB implies that A is m by nand B is n by
p for some m, nand p.

(Note, however, that even with AB

defined, BA is defined if and only if m = p.)


symbols I and

are

The special

used to denote the n by n identity matrix

and the m by n null matrix, respectively, with sizes determined by the context when no subscripts are used.

If it is

important to emphasize size we will write In or 0mn


For any A

(a .. ), the conjugate transpose of A is


1J H
_
_
written as A.
Thus A = (a .. ), where a .. denotes the conJ1
J1
jugate of the complex scalar a .. , and if x is a column vector
J1 H
with components xl"" ,x n ' then x is the row vector

Consequently, for a real matrix (vector) the superscript "H"


denotes transpose.
Given vectors x and y, we write the inner product of x
and y as
n

L x.y ..

i=l

Since only Euclidean norms will be considered, we write

I Ixl I

without a subscript to mean

I IxI I

= + v'{X;XT = +

IIi=l Ix. I
1

2.

To conclude this section it is noted that there are


certain concepts in the previously cited textbooks which are
-2-

either used implicitly or discussed in a manner that does not


emphasize their importance for present purposes.

Although

sometimes slightly redundant, the decision to include such


topics was based upon the desire to stress fundamental
understanding.
Exercises
1.1

Let x be any m-tuple and y be any n-tuple.


a.

Form XyH and yX H

b.

Suppose m = n and that neither x nor y is the zero vector.


Prove that xyH is Hermitian if and only if y = ax for some real
scalar a.

1.2

Let A be any m by n matrix with rows wI' ... ,w m and columns


Wit
h rows Yl H, ... 'Yn H an d
xl' ... ,x n an dl et Bb e any n by p matrix
columns zl' ... 'zp.
a.

Prove that the product AB can be written as

AB
(z ,w )
p m

and also as
AB

L x.y.

i=l

b.

Prove that A = 0 if and only if either AHA = 0 or AAH =

c.

Show that BAHA

o.

CAHA for any matrices A,B and C implies

BAH = CAH.
*1.3

Let A be any normal matrix with eigenvalues Al, ... ,A n and orthonormal eigenvectors xl' ... ,x n .

*Exercises or portions of exercises designated by an asterisk assume


a knowledge of eigenvalues and eigenvectors.

- 3-

a.

Show that A can be written as


A

b.

If E.

L A.x.x.
i=l I I I

= XiXiH,i=l, ... ,n, show that Ei is Hermitian and idem-

potent, and that E.E. = E.E. = 0 for all i ;. j.


I

c.

Use the expression for A in 1 .3a and the result of 1 .3b to


conclude that A is Hermitian if and only if all eigenvalues
A. are real.
I

1.3

A Rationale for Generalized Inverses


Given a square matrix, A, the existence of a matrix, X,

such that AX = I is but one of many equivalent necessary and


sufficient conditions that A is nonsingular.
(See Exercise
1.4.)
In this case X = A- l is the unique two-sided inverse of
A, and x = A-lb is the unique solution of the linear algebraic
system of equations Ax = b for every right-hand side b.

Loosely

speaking, the theory of generalized inverses of matrices is


concerned with extending the concept of an inverse of a square
nonsingular matrix to singular matrices and, more generally,
to rectangular matrices by considering various sets of equations which A and X may be required to satisfy.

For this

purpose we will use combinations of the following five fundamental equations:

Ak , for some positive integer k,

(1. l)

AkXA

(1. 2)

XAX

(1. 3)

(AX}H

AX,

(1. 4)

(XA}H

XA,

(1. 5)

AX

XA.

X,

(It should be noted that (l.l) with k > 1 and (1.5) implicitly
assume A and X are square matrices, whereas (l.l) with k
(1.2),

1,

(1.3), and (1.4) require only that X has the size of AH.

Also, observe that all of the equations clearly hold when A

-4-

is square and nonsingular, and X = A-I.) Given A and subsets


of Equations (l.l)-(l.S), it is logical to ask whether a
solution X exists, is it unique, how can it be constructed
and what properties does it have? These are the basic questions to be explored in subsequent chapters.
In Chapter 2 we establish the existence and uniqueness
of a particular generalized inverse of any matrix A (to be
called the Moore-Penrose inverse of A), and show how this
inverse can be used to characterize the minimal norm or least
squares solutions to systems of equations Ax = b when A has
full row rank or full column rank. This inverse is then
further explored in Chapter 3 where many of its properties
are derived and certain applications discussed. In Chapter 4
we consider another unique generalized inverse of square
matrices A (called the Drazin inverse of A), and relate this
inverse to Moore-Penrose inverses. The concluding chapter is
to provide a brief introduction to the theory of generalized
inverses that are not unique.
Exerc i ses
1.4

For any A, let N(A) denote the null space of A, that is,
N(A)
a.

= {zlAz = OJ.

If A is a n by n matrix, show that the following conditions


are equivalent:
(i)

A is nonsingular,

(ii)

N(A) contains only the null vector,

(i i i)

Rank (A)

(iv)

A has a right inverse,

(v)

Ax

=b

n,

has a unique solution for every right-hand

side b.
b.
1.5

What other equivalent statements can be added to this list?

Let
2

2
2

-/

-01

o
o

1
-0 _41]
0-1

-5-

-1

a.
b.

Show that X = A4

-1

If AS is the five by five matrix obtained by extending A4 in


the obvious manner, that is, AS = (a .. ) where
I

2' i f i = j - 1 ,
a .. = {
I J
1, otherwi se,

form AS- l
More generally, given An of this form for any
n ~ 2, what is A -1 7
n

c.

Prove that An is unimodular for all n

2, that is,

Idet An I = 1.
d.

b with b integral has

Show that any system of equations Anx


an integral solution x.

1.6

Let x be any vector with I Ixl I = 1 and let k be any real number.

Show that A

b.

Given the forty by forty matrix A = (a ij ) wi th


a ..
I

r'

+ kxx

a.

if i

is nonsingular for all k # -1.

= j,

= I, otherwise,

construct A-I.
c.

Show that A in 1.6a is an involution when k

d.

Show that A is idempotent when k

*e.

-2.

= -1.

Show that A has one eigenvalue equal to l+k and all other
eigenvalues equal to unity.

(Hint:

Consider x and any vector

y orthogonal to x.)

*f.
1.7

Construct an orthonormal set of eigenvectors for A in 1.6b.

Given the following pairs of matrices, show that A and X satisfy


(1 .1) wi th k

a.
A

= [1

-1

= I, (1. 2) , (1 .3), and (1 .4) .


0
0

_22J,

X. "10[:
-6-

-'o] ;
-2

b.

A:

c.

1.8

A:

[~

-1 \ '

[~

:J

21
2

X : 1/51

~615

lit
-6

J
X : 1/50G

-7
3

!] .

Show that the matrices

A:

r-,'
l~

-5

-6

-It
2

[-'

, X 1/2 ;

-5

-It

It

satisfy (1.1) with k : 2, ( 1 .2) and (1.5).

-7-

!] ;

2
Systems of Equations
and the
Moore -Penrose Inverse
of a Matrix
2 .1

Zero, One or Many Solutions of Ax = b

Given a linear algebraic system of m equations in n


unknowns written as Ax = b, a standard method to determine the
number of solutions is to first reduce the augmented matrix
[A,b) to row echelon form. The number of solutions is then
characterized by relations among the number of unknowns,
rank (A) and rank ([A, b) ). In particular, Ax = b is a cons i stent system of equations, that is, there exists at least one
solution, if and only if rank (A) = rank ([A,b)). Moreover,
a consistent system of equations Ax = b has a unique solution
if and only if rank (A) = n. On the other hand, Ax = b has no
exact solution when rank (A) < rank ([A,b)). It is the
purpose of this chapter to show how the Moore-Penrose inverse
of A can be used to distinguish among these three cases and
to provide alternative forms of representations which are
frequently employed in each case.
For any matrix, A, let CS(A) denote the column space of
A, that is,
-9-

CS (A) = {y I y = Ax, for some vector x},


Then Ax = b a consistent system of equations implies bECS (A)
and conversely (which is simply another way of saying that A
and [A,b] have the same rank).

Now by definition,

rank (A) = dimension (CS(A,


and if
N (A)

= {z IAz = O}

denotes the null space of A (cf. Exercise 1.4), then we have


the well-known relation that
rank (A) + dimension (N(A

= n.

Given a consistent system of equations Ax = b with A m by


n of rank r, it follows, therefore, that if r = n, then A has
full column rank and x = ALb is the unique solution, where AL
is any left inverse of A.

The problem in this case is thus

to construct ALb.
However, when r

<

n, so that N(A) consists of more than

only the zero vector, then for any solution, xl' of Ax = b,


any vector zEN(A) and any scalar a,x Z
tion of Ax = b.

xl + az is also a solu-

Conversely, if xl and Xz are any pair of

solutions of Ax = b, and if Z = xl - x z ' then Az = AX I - AX Z


b - b = 0 so that zEN(A). Hence all solutions to Ax = b in

this case can be written as


n-r

(Z .1)

2 a. z.

i=l

where xl is any particular solution, zl"",zn_r are any set


of vectors which form a basis of N(A) and a l , ... ,a n - r are
arbitrary scalars.
Often the problem now is simply to characterize all solutions.

More frequently, it is to determine those solutions

which satisfy one or more additional conditions as, for example,


in linear programming where we wish to construct a nonnegative
solution of Ax = b which also maximizes (c,x) where c is some
given vector and A, band c have real elements.
-10-

Given an inconsistent system of equations Ax = b, that


is, where rank (A) < rank [A,b] so that there is no exact
solution, a frequently used procedure is to construct a vector
say, which is a "best approximate" solution by some
criterion. Perhaps the most generally used criterion is that
of least squares in which it is required to determine
to
minimize IIAx - bll or, equivalently, to minimize I!Ax - b11 2 .
In this case, if A has full column rank, then ~ = (AHA)-lAHb
is the least squares solution (see Exercise 2.7).

x,

Exe rc i ses
2. I

Given the following matrices, Ai' and vectors, b i , determine which


of the sets of equations A.x = b. have a unique solution, infiI

nitely many solutions or no exact solution, and construct the


unique solutions when they exi st.
(i)
AI

1-\
L4

13

(i i i)
A

3
(v)

2.2

l~

A, - [;

-3

3
-2
0

7] ,

bl

r-']

(; ; )

A2

l-64 ;

:1 b,' r: 1'
3J

-:1'

I
3

['

(;,)

A4

[' i]' b,{l

3
I

b,{J

For any partitioned matrix A

(v i)
A6

[2
I

3
2 0

4]I ,b6 [112]


O
=

[B,R] with B nonsingular, prove that

columns of the matrix

form a basis of N(A).


2.3

_:J b[:j,
4'

Construct a basis for N(A 6 ) in Exercise 2. I.


-11-

2.4

Apply the Gram-Schmidt process to the basis in Exercise 2.3 to


construct an orthonormal basis of N(A 6 ).

2.5

Show that if zl and z2 are any vectors which form an orthonormal


basis of N(A 6 ) and if all solutions of A6x

= b6

are written as

where
xl

IT [0

-1

2) ,
for every solution such that IIxl1 2 = 25124.

then
2.6

Show that A, AHA and AAH have the same rank for every matrix A.

2.7

a.

Given any system of equations Ax = b wi th A m by nand


rank (A)

n, show by use of calculus that the least squares

solution, ~, has the form


b.

2.2

(AHA)-IAHb.

Construct the least squares solution of Ax

Suppose m = n?
=

b if

Full Rank Factorizations and the Moore-Penrose Inverse


Given any matrix A (not necessarily square), it follows

at once that if X is any matrix such that A and X satisfy (1.1)


with k
(2.2)

1, (1.2),

(1.3) and (1.4), then X is unique.

AXA = A, XAX = X, (AX)H

AX, (XA)H

For if

XA,

and if A and Y also satisfy these equations, then

XAX = X(AX)II = XXHAH = XXH(AYA)H


XXHAH(Ay)H = XAY = (XA)Hy = AHXHy
(AYA)HXHy = (YA)HAHXHy = YAXAY = YAY = Y.

Now if A has full row rank, then with X any right inverse of A,

AX = I is Hermitian and the first two equations in (2.2) hold.


Dually, if A has full column rank and X is any left inverse of
-12-

A, XA

I is Hermitian and again the first two equations in

C2.2) hold.

As shown in the following lemma, there is a

choice of X in both cases so that all four conditions hold.


LEMMA 1:

Let A be any matrix with full row rank or full

column rank.

If A has full row rank, then X = AHCAAH)-l is

the unique right inverse of A with XA Hermitian.


If A has
H
full column rank, then X = CAHA)-lA is the unique left
inverse of A with AX Hermitian.
Proof:

If A is any matrix with full row rank, AAH is non-

singular by Exercise 2.6.

Now X

AHCAAH)-l is a right

inverse of A, and
CXA)H

(AHCAAH)-lA)H

AHCAAH)-lA

XA.

Thus A and X satisfy the four equations in C2.2), and X is


unique.
The dual relationship when A has full column rank follows
in an analogous manner with AHA nonsingular .
It should be noted that X

A-I in C2.2) when A is square

and nonsingular, and that both forms for X in Lemma 1 reduce


to A-I in this case.

More generally, we will see in Theorem 4

that the unique X in C2.2) exists for every matrix A.

Such

an X is called the Moore-Penrose inverse of A and is written


A+.

Thus we have from Lemma 1 the special cases:


JAHCAAH)-l, if A has full row rank,

C2.3)

lCA HA)-lAH , if A has full column rank.


Example 2.1
If A

[~

[~

and so

-1]
2

-13 -

1:.[
2 -1]
3 -1 2

Example 2.2
If Y is the column vector

then
+

IS

[1

2-i

3iJ.

Note in general that the Moore-Penrose inverse of any nonzero


row or column vector is simply the conjugate transpose of the
vector multiplied by the reciprocal of the square of its
length.
Let us next consider the geometry of solving systems of
equations in terms of A+ for the special cases in (2.3).
Given any system of equations Ax

b and any matrix, X, such

that AXA = A, it follows at once that the system is consistent if and only if

(2.4)

AXb = b.

For if (2.4) holds, then x = Xb is a solution.

Conversely,

if Ax = b is consistent, multiplying each side on the left by


AX gives Ax = AXAx = AXb, so that (2.4) follows.

Suppose now

that Ax = b is a system of m equations in n unknowns where A


has full row rank.

Then Ax = b is always consistent since

(2.4) holds with X any right inverse of A, and we have from


(2.1) that all solutions can be written as
(2. 5)

Xb + Zy

with Z any matrix with n-m columns which form a basis of N(A)
and y an arbitrary vector.

Taking X to be the right inverse

A+ in this case gives Theorem 2.


THEOREM 2:

For any system of equations Ax = b where A has

full row rank, x = A+b is the unique solution with IIxl12


minimal.

-14-

Proof:

With X = A+ in (2.5),
(x,x) = (A+b+Zy,A+b+Zy)
(A+b,A+b) + (Zy,Zy)

IIA+bI1 2 + IIZyl12

since

o.

(A+b,Zy)
Thus

o.

where equality holds if and only if Zy


Example 2.3

If A is the matrix in Example 2.1 and b

J,

[_54

then

is the minimal norm solution of Ax = b wIth II x II

122

= -3-'

It was noted in Section 2.1 that the least squares solution of an inconsistent system of equations Ax = b when A has
full column rank is x = (AHA)-lAHb. From (2.3) we have,
therefore, that x = A+b is the least squares solution in this
case. Although this result can be established by use of
calculus (Exercise 2.7), the following derivation in terms of
norms is more direct.
THEOREM 3: For any system of equations Ax = b where A has
full column rank, x = A+b is the unique vector with I Ib-Axl 12
minimal.
If A is square or if m > n and Ax = b is consistent,
then with A+
(AHA) -1 AH
a left.Inverse of A and AA + b = b, the
vector x = A+b is the unique solution with Ilb-Axl1 2 = O. On
the other hand, if m > n and Ax = b is inconsistent,
Proof:

lib-Axil

II (I-AA )b-A(x-A b) II

Ilb-AA+bI1 2 + IIA(x-A+b) 112


-15-

since AH(I-AA+) = O. Hence Ilb-Axl1 2 > Ilb-AA+bI1 2 where


equality holds if and only if I IA(x-A+b) I 12 = O. But A with
full column rank implies I IAyl 12 > 0 for any vector y r 0,
in particular for y = x - A+b .
Example 2.4
If

then

1[3

IT

-4

3
7

-2]

-1 '

AA+b

and

is the least squares solution of Ax


minimal.

b with I Ib - Ax I I 2

144
11

Having established A+ for the special cases in Lemma 1,


it remains to establish existence for the general case of an
arbitrary matrix A. For this purpose we first require a
definition.
DEFINITION 1: Any product EFG with E m by r,
F r by rand G r by n is called a full rank
factorization if each of the matrices E, F and G
has rank r.
The importance of Definition 1 is that any nonnull matrix can
be expressed in terms of full rank factorizations, and that
the Moore-Penrose inverse of such a product is the product of
the corresponding inverse in reverse order.
To construct a full rank factorization of a nonnull
matrix, let A be any m by n matrix with rank r. Designate
columns of A as a l , ... ,an. Then A with rank r implies that
there exists at least one set of r columns of A which are
-16-

linearly independent.

Let J = {jl, ... ,jr} be any set of

indices for which ajl' ... ,ajr are linearly independent, and
let E be the m by r matrix
E =

[a.

)1

, ... ,a. 1.
)r

If r = n, then A = E is a trivial full rank factorization


(with F = G = I).

Suppose r < n.

Then for every column

a). ,jiJ, there is a column vector y., say, such that a. = Ey ..


)

))

Now form the r by n matrix, G, with columns gl, ... ,gn as


follows:
Let
if jiJ,

where ei,i=l, ... ,r, denote unit vectors.

For this matrix G

we then have

Moreover, since the columns e1, ... ,e r of G form a r by r


identity matrix, rank (G) = r, and with rank (E) = r by construction, A = EG is a full rank factorization (with F = I).
That a full rank factorization A = EFG is not unique is
apparent by observing that if M and N are any nonsingular
matrices, then A = EM(M-lFN)N-lG is also a full rank factorization.

The following example illustrates four full rank

factorizations of a given matrix, A, where F

I in each case.

Example 2.5
Let

-5

-1
6
-15

-1

-1

1 -1

Then

-17-

[4 6]
1

-1

-5 -15

4/5
-1/5

3/5

7/5

-3

-2/5

- 3/5

-7]
3

Using full rank factorization, the existence of the MoorePenrose inverse of any matrix follows at once.

The following

theorem, stated in the form rediscovered by Penrose [10] but


originally established by Moore [8], is fundamental to the
theory of generalized inverses of matrices.
THEOREM 4:

For any matrix, A, the four equations


X,

A, XAX

AXA

(AX)H

= AX, (XA)H = XA

have a unique solution X

A+.
If A = 0
is the m by n null
mn
If A is not the null matrix, then for any
nm
full rank factorization EFG of A, A+ = G+F-IE+

matrix, A+

Proof:

Uniqueness in every case follows from the remarks

after (2.2).
If A = 0 ,then XAX = X implies X = A+ = 0
If A is
nm
not the null matrix, then for any full rank factorization

EFG it follows by definition that E has full column rank,

F is nonsingular and G has full row rank.

Thus E+ = (EHE)-lE H

and G+ = GH(GGH)-l, by (2.3), with E+ a left inverse of E and


G+ a right inverse of G.
AX

Then if X = G+F-IE+, XA = G+G and

= EE+ are Hermitian, by Lemma 1.

Moreover, AXA

= A and

XAX = X, so that X = A+ .
It should be noted that although the existence of a full
rank factorization A

EG has been established for any non-

null matrix A, this does not provide a systematic computational


procedure for constructing a factorization.

Such a procedure

will be developed in Exercise 3.3, however, after we have


considered the relationship between A+ and the Moore-Penrose
inverse of matrices obtained by permuting rows or columns or
both rows and colu=s of A.

Observe, moreover, that if Ax = b

is any system of equations with A

EG a full rank factoriza-

tion, and if y

Gx, then y = E+b is the least squares

solution to Ey

b, by Theorem 3.
-18-

Now the system of equations

Gx = E+b is always consistent and has minimal norm solution


X = G+E+b, by Theorem 2.

Consequently we can combine the

results of Theorems 2 and 3 by saying that x

least squares solution of Ax

b with I Ixl

12

A+b is the

minimal.

Although of mathematical interest (see, for example, Exercises


3.12 and 3.13), most practical applications of least squares
require that problems be formulated in such a way that the
matrix A has full column rank.
Exercises
2.8

Show that xl

2.10

Let u be the column vector with n elements each equal to unity.


Show that
(n+l )uIH-uuH].
n+T [

[I ,u]+ --

2.11

a.

Given any real numbers bl, ... ,b n , show that all solutions to
the equations
1, .. ,n,

can be written as
X.

b. - _1_

L b.

n+l i=l

Cl,

1, ... ,n,

and
1

= -1

L b.I

n+ i=l

where
b.

Cl

Cl,

is arbitrary.

For what choice of


that

L x.

i=l

Cl

can we impose the additional condition

01

-19-

c.

Show that when the condition in 2. lIb is imposed, xn+l


becomes simply the mean of b l , ... ,b n , that is,
1 n

= -

L b ..

ni=l

d.

Show that the problem of solving the equations in 2.11a,


subject to the conditions in 2. lIb, can be formulated
equivalently as a system of equations Ax = b with the n+l
by n+l matrix A Hermitian and nonsingular.

2.12

Given any real numbers b l , ... ,b n , show that the mean is the least
squares solution to the equations
= 1, ... ,no

2.13

If Ax = b is any system of equations with A

uv H a matrix of rank

one, show that

is the least squares solution with minimal norm.


2.14

Let Ax = b be any consistent system of equations and let


zl' ... ,zn-r

be any set of vectors which form an orthonormal basis

of N(A), where rank (A) = r.

Show that if x is any solution of

Ax = b,
A+b = x -

n-r

L a.z.

i=l

with a. = (x,z.),
I

2.15

1, ... , n-r.

(Continuation):

Let A be any m by n matrix with full row rank, and

let Z be any n by n-m matrix whose columns form an orthonormal


basis of N(A).

2.16

Prove that if X is any right inverse of A,

Use the results of Exercises 2.4 and 2.15 to construct A6+ starting
with the right inverse
X

2 -3]2
[-10
0
o

0
-20-

2.3

Some Geometric Illustrations

In this section we illustrate the geometry of Theorems 2


and 3 with some diagrams:
Consider a single equation in three real variables of
the form
(2.6)
H

Then it is well known that all vectors x = [xl,x2,x31 which


satisfy (2.6) is a plane Pl(bl)' as shown in Figure 1. Now
the plane Pl(O) is

. . J ' - - - - - - f - - - - - - xl

Figure I.

The plane PI (b l ) and solution

x.

parallel to PI (b l ), and consists of all solutions


H
Z
= [zl,z2,z3 1 to the homogeneous equation.
(2.7)

Then if b l r 0, all solutions xEPl(b l ) can be written as


x = x + z for some zEPI(O), and conversely, as shown in Figure
2.
(Clearly, this is the geometric interpretation of (2.1)
for a single equation in three unknowns with two vectors
H
required to span PI (0).) If we now let a l = [aU,a12,a131,
so that (2.6) can be written as alHx = b l , Theorem 2 implies
-21-

~+------ Xl

H+
that the solution of the form x = a l b l is the point on
Pl(bl) with minimal distance from the origin. Also, since
the vector ~ is perpendicular to the planes Pl(O),Pl(b l ),
II II is the distance between PI (0) and PI (b l ). The representation of any solution x as x = alH+b l + alz, corresponding
to (2.1) in this case, is illustrated in Figure 3.
A

x3

~----r------- Xl

Figure 3.

The representation x

-22-

al

H+

b l + a1z.

Suppose next that we consider C2.6) and a second equation


C2.8)
Let the plane of solutions of C2.8) be designated as P 2 Cb Z).
Then it follows that either the planes PICbl) and P Z Cb 2 ) coincide, or they are parallel and distinct, or they intersect in
a straight line.
In the first case, when PICbl) and P 2 Cb Z)
coincide, the equation in C2.8) is a multiple of (Z.6) and
any point satisfying one equation also satisfies the other.
On the other hand, when PICb l ) and Pz(b Z) are parallel and
distinct, there is no exact solution. Finally, when PICb l )
and P 2 Cb Z) intersect in a straight line IZ' say, that is,
IZ = PI (b l ) n P z (b z), then any point on IZ satisfies both
(2.6) and (2.8). Observe, moreover, that with

the point on 112 with minimal distance from the origin is


x = A+b. This last case is illustrated in Figure 4, where
IZ is a "translation" of the subspace N(A) of the form
Pl(O)n Pz(O).

-23-

The extension to three or more equations is now obvious:


Given a third equation
(2.9)
let P 3 (b 3 ) be the associated plane of solutions. Then assuming the planes PI (b l ) and P 2 (b 2 ) do not coincide or are not
parallel and distinct, that is, they intersect in the line 12
as shown in Figure 4, the existence of a vector xH = (x l ,x 2 ,x 3 )
satisfying (2.6), (2.8) and (2.9) is determined by the conditions that either P 3 (b 3 ) contains the line 12' or P 3 (b 3 ) and
12 are parallel and distinct, or P 3 (b 3 ) and 12 intersect in
a single point.
(The reader is urged to construct figures to
illustrate these cases. An illustration of three different
planes containing the same line may also be found in Figure
S.) For m ~ 4 equations, similar considerations as to the
intersections of planes Pk (b k ) and lines .. = p. (b.) n P. (b.)
IJ
1 1
J
J
again hold, but diagrams become exceedingly difficult to
visualize.
For
y is the
space of
always a
that A+y

any system of equations Ax = b let y = AA+b, that is,


perpendicular projection of b onto CS(A), the column
A. Then it follows from (2.4) that Ax = y is
consistent system of equations, and from Theorem Z
= A+(AA+)b = A+b is the minimal norm solution. More-

over, we have from Theorem 3 that if Ax = b is inconsistent,


then

is minimal. Thus, the minimal norm solution A+y of Ax


also minimizes Ily-b 112.

Consider an inconsistent system of, say, three equations


in two unknowns, Ax = b, and suppose rank (A) = 2. Let

Y = AA b have components YI'Y2'Y3' and let a l ,a 2 ,a 3 desIgnate rows of A. Now if Pi(Yi) is the plane of all solutions
of aiHx = Yi' i = 1,2,3, then the set of all solutions of
Ax = y is the line 12 = PI (YI) n P 2 (Y2) as shown in Figure Sa,
A+y = A+b is the point on 12 of minimal norm and Ily-bl1 2 is
minimal, as shown in Figure Sb.
-24-

~~------------_ bl

(a)

Figure 5.

(b)

(a) Solutions of Ax

=y

where y

= AA+b.

(b) The vectors b, y and (I-AA+)b.

To concludc this section we remark that since

is an orthogonal decomposition of any vector b with

then the ratio


(2.10)

> 0

provides a measure of inconsistency of the system Ax = b. In


particular, = 0 implies b is orthogonal to CS(A), whereas
large values of imply that b is nearly contained in CSeA),
that is, II (I-AA+)bI1 2 is relatively small.
(For statistical
applications [1) [4) [5), the values IlbI1 2 ,IIAA+bI1 2 and
+
2
II (I-AA )b II are frequently referred to as TSS [Total sum of
squares], SSR [Sum of squares due to regression) and SSE [Sum
of squares due to error), respectively. Under certain general
assumptions, particular mUltiplcs of can be shown to have
distributions which can be used in tests of significance.)
-25-

Although the statistical theory of linear regression models


is not germane to the present considerations, formation of

in (2.10) can provide insight into the inconsistency of a


system of equations Ax = b.

(See Exercise 2.18.)

Exerc ises
2.17

Use the techniques of sol id analytic geometry to prove that the


1 ines 12 = PI (bl)n P2 (b 2 ), b l F 0 and b2 F 0 and Il2 = PI (o)n PZ(O)
are parallel. In addition, show by similar methods that if
H

Pi(b i ) = {xi a i x = b i , a i

= (ail,aiZ,ai3)}'

then

z.18

Given any points (xi'Yi)' i = O,l, ... ,n, in the (x,y) plane with
xO, ... ,x n distinct, it is well known that there is a unique
interpolating polynomial Pn(x) of degree 2 n (that is, Pn(x i ) = Yi
fo raIl i = 0, ... , n), and i f

then aO, ... ,a n can be determined by solving the system of equations Aa = y where

, a

, y

Now any matrix, A, with this form is called a Vandermonde matrix,


and it can be shown that
det(A) = IT (x.-x.).
i <j

-26-

Thus, with x O, ... ,x n distinct, A is nonsingular, and if Ak denotes


the submatrix consisting of the first k columns of A, k = 1,2, ... ,n,
then Ak has full column rank for every k.

For k

n, the least squares polynomial approximation of

degree k to the points (x. y.), i = 0, ... ,n, is defined to be that


I,

polynomial

which minimizes
n

L [yo

i=O

a.

- Pk(x.)]

Show that the coefficients a O' ... ,a k of the least squares


polynomial approximation of degree k are elements of the
vector a, where
+
a = Ak+ 1 y.

b.

c.

Show that with TSS

L y.

i=O

then SSR

Given the data


-I

-5

-4

-3

10

construct the best I inear, quadratic and cubic least squares


approximations.

For each case determine SSR and SSE.

What

conclusions can you draw from the data available?

2.4

Miscellaneous Exercises

2.19

Let A, ZI and Z2 be any matrices.


a.

Prove that a solution, X, to the equations XAX

X, AX

ZI

and XA = Z2' if it exists, is unique.


b.
2.20

For what choices of ZI and Z2 is X a general ized inverse of A?

Verify the following steps in the original Penrose proof of the


existence of X in (2.2):
-27-

a.

The equations of XAX = X and (AX)H = AX are equivalent to


the single equations XXHAH

X.

Dually, AXA = A and

(XA)H = XA are equivalent to the single equation XAA H


b.

c.

If there exists a matrix B satisfying BAHAAH


X = BAH is a solution of the equations XXHAH

AH.

AH, then
X and XAA H

The matrices AHA, (AHA)2, (AHA)3, ... , are not all linearly
independent, so that there exists scalars d l , ... ,d k not all
zero, for which
H
H 2
H
dlA A + d2 (A A) + ... + dk(A A) = O.
(Note that if A has n columns, k

d.

n +1.

Why?)

Let d s be the first nonzero scalar in the matrix polynomial


in 2.20c, and let

Then B(AHA)s+1 = (AHA)s.


e.

2.21

The matrix B also satisfies BAHAAH = AH.

Let A and X be any matrices such that AXA = A.

Show that if Ax

is a consistent system of equations, then all solutions can be


written as
x = Xb + (I-XA)y
where y is arbitrary.

(Note, in particular, that this expression

is equivalent to the form for x in (2.5) since columns of I-XA


form a basis for N(A).
2.22

(Continuation):

Why?)

Prove, more generally, that AWC

=B

is a consistent

system of equations if and only if AA+BC+C = B, in which case all


solutions can be written as

where Y is arbitrary.

-28-

3
More on
Moore -Penrose Inverses
3.1

Basic Properties of A+

The various properties of A+ discussed in this section


are fundamental to the theory of Moore-Penrose inverses. In
many cases, proofs simply require verification that the
defining equations in (2.2) are satisfied for A and some
particular matrix X. Having illustrated this proof technique
in a number of cases, we will leave the remaining similar
arguments as exercises.
LEMMA 5:

Let A be any m by n matrix.

(a)

A m by n implies A+ n by m;

(b)

(c)

A++

A',

(d)

AH+

A+H.,

(e)

A+

(f)

(AHA)+

A+
mn implies

(AHA)+AH

nm'

AH(AAH)+;

= A+AH+;
-29-

Then

(g)

(aA)+ = a+A+ for any scalar a, where


a

+ = f/a, i f a 'f 0,
0, if a = o,

(h)

IfU and V are unitary matrices,

(i)

If A

L A.

i=l

o whenever

where A. A.

(UAV) +

vHA+U H;

i 'f j, A+ =

L A.

i=l

(j)

If A is normal, A+A

(k)

A, A+, A+A and AA+ all have rank equal to trace (A+A).

AA+;

Properties (a) and (b) have been noted previously in


Section 1.3 and Theorem 4, respectively. The relations in
(c) and (d) follow by observing that there is complete
duality in the roles of A and X in the defining equations.

Proof:

To establish the first expression for A+ in (e), let


X = (AHA)+A H. Then XA = (AHA)+AHA is Hermitian, and also
AX = A(A HA) + AH by use of (d). Moreover, XAX
X and
AXA = A(AHA)+AHA = AH+AHA(AHA)+AHA

AH+AHA = A.

The second expression in (e) follows by a similar type of


argument, as do the expressions in (g) and (h).
H+
To prove (f) we have A
= A(AHA) + by (d) and (e). Then
A+AH+

(AHA)+AHA(AHA)+

(AHA)+.

To prove (i), observe first that AiHAj = 0 implies


A.+A.
and also A.+A.
J

A.HA.
J

0 since

A.+A.+HA.HA.

O.

Now we can again show that A and A+ satisfy the defining


equation.
That (j) holds follows by use of (e) to write
A+A

(AHA)+AHA

(AAH)+AAH
-30-

= AH+AH =

(AA+)H

AA+.

To show that A, A+, A+A and AA+ all have the same rank,
we can apply the fact that the rank of a product of matrices
never exceeds the rank of any factor to the equations
AA+A = A and A+AA+ = A+. Then rank (A) = trace (A+A) holds
since rank (E) = trace (E) for any idempotent matrix E [7) .
Observe in Lemma See) that these expressions for A+
reduce to the expressions in (2.3) whenever A has full row
rank or full column rank. Moreover, observe that the relationship (EG)+ = G+E+ which holds for full rank factorizations
EG, by Theorem 4, also holds for AHA where A is any matrix,
by Lemma S(f). The following example shows, however, that
the relation (BA)+ = A+B+ need not hold for arbitrary matrices
A and B.
Example 3.1
Let

[~

1-1].
1 -1

Then

[~

BA

~J

(BA) + ,

since BA is Hermitian and idempotent.


A+

(AHA)-lAH

and
B+

1/2 [ 2
-2

-!] [~

ll'[: :]

BH(BBH)-l

-1 -1

so that
A+B +

r-11

-1]
1

Also, we have
1
1

~]

[ 2 - 2]
-2

1/2 [ 2
-2

1/2

~]

[2o -']1 ,
o -1

(BA)+.

Let A be any mbyn matrix with columns al, ... ,a n , and


let Q designate the permutation matrix obtained by permuting
columns of In in any order {jl'" .,jn}'
-31-

Then

AQ

[a"

Jl

, ... , a"

In

1.

" "
H .. wmH deS1.gnate
"
In a s1.m1.1ar
manner, 1."f wI"
t h e rows 0 fA ,
and if P is the permutation matrix obtained by permuting
rows of 1m in any order {i l , .. ,i m}, then
H

PA

Combining these observations it follows, therefore,


A is any m by n matrix formed by permuting rows of A
umns of A or both rows and also columns of A in any
then A = PAQ for some permutation matrices P and Q.
since P and Q are unitary matrices,

that if
or colmanner,
Moreover,

by Lemma Seh), and thus

In other woyds, A+ can be obtained by permuting rows and/or


columns of A+.
Example 3.2
Construct B+ if

Go
1

B =

Since B in this case can be written as


B

PAQ

[~ ~]

Go
1

where A is the matrix in Example 2.1, then with P and Q


Hermitian
-32-

~]

i[ ~ -~].
-1

(It should be noted that B can be written alternately as

[~

o
1

:1[: : :]

and so we have also B+ = QlHA+.)


Further applications of full rank factorizations and of
permuted matrices PAQ in the computation of A+ will be
illustrated in the exercises at the end of this section.

We

turn now to a somewhat different method for computing A+.


This procedure essentially provides a method for constructing
the Moore-Penrose inverse of any matrix with k columns, given
that the Moore-Penrose inverse of the submatrix consisting of
the first k-l columns is known.
For any k

2, let Ak denote the matrix with k columns,

a l , ... ,a k . Then Ak can be written in partitioned form as


Ak = [Ak_l,akl. Assuming Ak - l + is known, Ak can be formed
using the formulas in Theorem 6.
THEOREM 6 :

For any matrix Ak


c

(I

[Ak_l,a k 1 , let

+
Ak-lA k - l )a k

and let
Yk

H
H+
+
a k Ak - l Ak - l a k

Then

(3.1)

A +
k

where

bk

[Ak- 1 -:kk ,. akb k]

r;'

(l+Yk)

if
-1

ck f

0,

H
H+
+
a k Ak _ l Ak - l '
-33-

if c k

O.

Proof:

Since c k is a column vector, the two cases c k

0 are exhaustive.

and c k

Let X designate the right-hand side of (3.1).

Then to

establish the representation for Ak+ requires only that we


show the defining equations in (2.2) are satisfied by Ak
and X for the two forms of b k .
Forming AkX and XAk gives
(3. 2)
by definition of c k ' and
Ak_l+ak(I-bkak)]
(3.3)

bka k
Continuing, using (3.2) gives
(3.4)

[A k - l + ckbkA k _ l , Ak_lAk_l+ak + ckbka k ]

and

(30 5)

since
Suppose now that c k

0 and b k

c k . Then

in (3.2) is Hermitian. Also, with c k c k = 1, and since


A+ k-lck = 0 lmp 1 les c k HA k-l H+ = 0 so t h at c +A k-l = 0 an d
k
0

thus c k a k

1, then

XA

in (3.3) is Hermitian.

:]
Moreover,
-34-

in (3.4), and

x
in (3.5).

Having shown that the defining equations hold,

f O.

then X = Ak+ in (3.1) when c k

Suppose c k = 0 and b k = (l+Yk)

in (3.2) is Hermitian.

-1

H
H+
+
a k Ak - l
Ak - l

Then

In this case, with

a nonnegative real number and

we have also

in (3.3) Hermitian.
since c k

Furthermore, with bkAk-1Ak-l

= 0 implies Ak-1A k - l a k = a k ,

in (3.4) and XAkX = X in (3.5). Thus, when c k = 0 it has been


shown again that Ak and X satisfy the defining equations for
the given form for b k .

That the formulas in Theorem 6 can be used not only


directly to construct Ak+' assuming Ak - l + is known, but also
recursively to form A+ for any matrix A is easily seen: Let
A be any matrix with n columns a l , ... ,a n , and for k
-35-

1, ... ,n,

let Ak designate the submatrix consisting of the first k


columns of A. Now A1 + = a 1 + follows directly from Lemma 5(b)
or (e), and if n -> 2, A2+" .. ,An + = A+ can be formed
sequentially using Theorem 6.
Example 3.3
Let

[~~ -~l

-1

-3

Then with Al = aI'


Al

1/5[2

-1],

and

Hence
1/29[3

10

6]

and so

[1/5 [2

-1]

1/29[3
1/145 [

55

-10

15

50

1/145 [3
10

10

6]

11
- 35]
= 1/29 [ 3
30

Continuing,
1/29 [ 29]
-58

and

o.
Thus, with Y3

5 and
-36-

-2

10

H H+ +
a 3 A2 A 2

+
H +
(A 2 a 3 ) A2

1/29[5

-22

-19] ,

we have
b3

1/174[5

-22

-19]

so that

A3

1/29

[113

-2
10

:]

1/174[5

[61
1/174 2:

-22

5
-10

-22
44

-19]
38

-19]

-23]
-2 .

10
16
-22

1/174[

-19

As will be indicated in Exercise 3.7, there is a converse


+

of Theorem 6 which can be used to construct Ak _ l ' glven


+
[Ak_l,a k ]
Combining Theorem 6 and its converse thus provides
a technique for constructing the Moore-Penrose inverse of a
matrix,

A,

say, starting from any matrix, A, of the same size

with A+ known.

(For practical purposes, however,

A and

should differ in a small number of columns.


Exercises

3. I

Let A be any matrix with columns a l , ... ,a n , and let A+ have rows
H
H
wI , . ,w n Prove that if K denotes any subset of the indices
I , ... ,n such that a i = 0, then wi H = 0 for all iEK.

3.2

Let A be any mby n matrix with rank r, 0 < r < min(m,n).


a.

Prove that there exist permutation matrices, P and Q, such


that A = PAQ has the partitioned form

with W r by rand nonsingular.


b.

Show that Z = YW-IX.

c.

Construct A+.
-37-

3.3

(Continuation):

A matrix [U,Vl is called upper trapezoidal if U

is upper triangular and nonsingular; a matrix, B, is called lower


trapezoidal if BH is upper trapezoidal.
a.

Show that any matrix, A, in Exercise 3.2 has a full rank


factorization

A=

trapezoidal.

(Such a factori zation

EG with E lower trapezoidal and G upper

trapezoidal decomposition of
b.

A=

EG is called a

A. )

Construct a trapezoidal decomposition of some matrix,

A,

obtained from
2

-1

Hint:
A

[;

Start with
0

,.

...

:] [:

-1

* *
0
*

:]
"

where the asterisk denotes elements yet to be determined,


and proceed to construct a P and Q, if necessary, so that
the elements e 22 and g22 of PE and GQ, respectively, are
both nonzero. (Note that if this is not possible, the
factorization is complete.)

Now compute the remaining ele-

ments in the second column of PE and the second row of GQ,


and continue.
c.
Let A

Compute A+.
[B,Rl be any mby n upper trapezoidal matrix wi th n

and let zl be any nonnull vector in N(A).

m+ 2

Show that Gaussian

el imination, together with a permutation matrix, Q, can be used to


reduce the matrix

to an upper trapezoidal matrix S, say.


any vector such that SZ2

Prove now that if z2 is

= 0, then QHz2~N(A) and (zl ,Q Hz2) = o.


-38-

3.5

Apply the procedure in Exercise 3.4 to construct an orthonormal


bas is for N(A) in Exercise 3.3.

3.6

Show that the two forms for [Ak_I,a k ) + in Theorem 6 can be written
in a single expression as

[Ak_I,a k)

[Ak-I-Adkk-H/akdkHj

where

d H
k

3.7

Let [Ak_I,a k)

be partitioned as

[Ak_I,a k)+ =

['H]
d H
k

wi th d kH a row vector

'1

f d k Ha .,.
..J. I ,
k

1.

b.

Construct A+ if
0

A [:

Hint:
3.8

*3.9

:].

-I

See Exercise 3.3c.

For any product AB let BI = A+AB and Al = ABIBI+' Then


(AB)+ = (A B )+ = B +A + Why does this expression reduce to
I I
I I
Theorem 4 when AB is a full rank factorization?
a.

Use Exercise 1.3 to prove that if A is any normal matrix,


+

A = L.:

I I

r- xXi H
1

-39-

where Ll indicates that the sum is taken over indices


with eigenvalues Ai I O.
b.

Prove that if A is normal. (An)+

c.

If \

Xl

2i-2. i = 1.2.3. and

~l

=...!..[
/2 -1

x2

1/3[~1
2

x3

=_1
312

[-~l
1

construct the Moore-Penrose inverse of the matrix. A. for


which AX i = Aix i

3.2

i = 1.2.3.

Applications with Matrices of Special Structure


For many applications of mathematics it is required to

solve systems of equations Ax = b in which A or b or both A


and b have some special structure resulting from the physical
considerations of the particular problem.

In some cases this

special structure is such that we can obtain information concerning the set of all solutions.

For example, the explicit

form for all solutions of the equations


xi + xn+l = b i ,

i = l, ... ,n,

given in Exercise 2.11, was obtained using the Moore-Penrose


inverse of the matrix [I,u) from Exercise 2.10 where u is
the n-tuple with each element equal to unity.

In this section

we introduce the concept of the Kronecker product of matrices


which can be used to characterize all solutions of certain
classes of problems that occur in the design of experiments
and in linear programming.
DEFINITION 2:
matrix.

For any mbyn matrix, P, and sbyt

(qkt)' the Kronecker product of P and

Q is the ms by nt matrix, P

-40-

XQ,

of the form

It should be noted in Definition 2 that if P = (p .. )


1J
and Q = (qk~)' then P X Q is obtained by replacing each element qk~ by the matrix qk~P, whereas Q X P is obtained by
replacing each element Pij by the matrix PijQ. Consequently
P X Q and Q X P differ only in the order in which rows and
columns appear, and there exist permutation matrices Rand S,
say, such that QXP = R[PXQ]S.
(We remark also that some
authors, for example, Thrall and Tornheim [13], define the
Kronecker product of P and Q alternately as P X Q = (p .. Q) ,
1J
that is, our Q X P. In view of the discussion in Section 3.1
of the Moore-Penrose inverses of matrices A and A, where A
is obtained by permuting rows of A, columns of A or both,
each of the following results obtained using the form for
P X Q in Definition 2 has a corresponding dual if the alternate
definition is employed.)
Examj2le 3.4
If

2'

[: oJ,

[~

2i

-1
-3
3

-:]

then

pXQ

[~

[~

4
i
12
3i

4
0

4
12
i
3i

-lJ

and

Q xP

-1
3
-3

2i

-;1

Given any Kronecker product P X Q, it follows from


Defini tion 2 that
(3.6)

[P X Q]H

H
(qn P )

pH X QH.

-41-

Also, for any matrices Rand S = (skt) with the products PR


and QS defined, the product [P X Q1 [R X Sl is defined, and we
have by use of block multiplication that

[PXQ1[RXS1

L q oS olPR
j=l mJ J

L q oSotPR
j=l mJ J

Therefore,
[P X Q1 [R X Sl = PR X QS.

(3. 7)

The following lemma can be established simply by combining


the relationships in (3.6) and (3.7) to show that the defining
equations in (2.2) are satisfied.
LEMMA 7:

For any matrices P and Q, [p X Q1 +

ExamEle 3.5
To construct A+ i f

[1

-1

-3

12

;]

4 '

-4

observe that A = P X Q, where


P =

"3

-~J,

Q =

[~ ~] .
-42-

Then we have
Q-l

Q+

-~] ,

-l/Z [ 4
-3

1:]

ir [22-9

p+

17

-8

so that

p+

XQ+

88

16

-44

-8

-36

60

18

-30

68
1
lZ2 -66

-3Z

-34

16

-12

ZZ

27

-45

-9

15

51

24

17

-8

ExamEle 3.6
To construct A+ if

observe first that if u. denotes the vector with all i


1

elements each equal to unity, then A can be written in


partitioned form as
(3.8)

A =

["3

u3

13

u3

13

0
u

13
0

II

Whereupon, the first two columns of A in (3.8) can be


written as the Kronecker product [u 3 ,1 3 l
that permuting columns of A to form
A

["3

u3

13

u3

13

13

u3

XU Z '

I'J

the last four columns of A become [u 3 , 1 3 l


-43-

Next observe

XI 2

Therefore

A can be wyitten as

and thus
(3.9)
Permuting rows of [I,u)+ in Exercise 2.10 now gives

[u i ' Ii)

so that

1
i+l

(3.9) becomes

(3.10)

Substituting numerical values into (3.10), a suitable permutation of rows of A+ yields A+


Matrices, A, as in Example 3.6, with elements zero or
one occur frequently in the statistical design of experiments,
and the technique of introducing Kronecker products can often
be used to construct A+, and thus all solutions of systems of
equations Ax ; b by use of Exercise 2.21.
restrictions on Ax ;

The additional

b to obtain solutions with particular

properties can then be formulated in terms of conditions on


N(A) or, equivalently, I-A+A.

(See Exercise 3.11.)

That Kronecker products can be combined with forms for


Moore-Penrose inverses of partitioned matrices to construct
A+ for other classes of structured matrices is shown by the
representation in Theorem 8.
THEOREM 8:

Let W be any mbyn matrix, and for any positive

integer p let G ;
p

(pI +WIlW)-l.
n

-44-

Then

Proof:

II

[G p Xu,
p w X I p - Gp W Xu p u p J.

(3.11)

Observe first that with

then
(3.12)

Also, observe that with (pI +WHW)W+ = pW+ + WH, the relation
n

(3.13)

together with the fact that G

is Hermitian, implies

II'G IV
P

is Hermitian.
Let

l
rI

XU H1
:XIP ,
p

and let
+

X= [G p XU p ,W XI p -GW
Xuu
p
p p J.
Then it follows from (3.12) and (3.7) that
XA=G Xuu

P P

+WWXI

-GWWXuu

G (I - W+W) Xu u H + W+W X I

-(1

is Hermitian.

P p

-W W) Xu u

Also, with upHu p


-45-

+ W WX I
p,

P P

Xu H

A.

AXA

-W (I - W W) Xu u
P
n
P p

+ WW W X Ip

Continuing, we have

XA(G

1
+
+
-(I -W W)G Xpu +W WG XU = GpXu p '
P
p n
p
p
p

Xu)
p

leI

XA(W+XI)
p

-w+W)w+Xu u H+W+WW+XI

P p

W+XI

p'

and

1
--0
- W+W) Gp W+
p n

XA (G W+ Xu u H)

P P

lienee, XAX = X.

AX

WG

Finally,

pG

H +
+
H
X pu u
+W WG W Xu u
p p
p
p p

forming AX gives

W+ X u p H - Gp W+ X p u PH

: u

WW+ X I

- WG W+ Xu u H

P P -

Now
W+ Xu H - G W+ X pu H

G WH Xu H

p'

by (3.13), which, with Gp and WGpW+ Hermitian,


CAX)H = AX.

implies

Having shown that A and X satisfy the equations in (2.2),


then X

A+ which establishes

(3.11) .

ExamEle 3.7

I f p = 3, then for any m by n matrix, W,

lI

xu 3

W X I3

li

-46-

and

['n XX 3
U

W 13

G3

W+-G W+
3

+
-1,3 W

G3

-G W+
3

W+-G W+
3

-G 3W+

G3

-G 11'+
3

-G 11'+
3

W+-G 3W+J

~G3W'1

I'

where G3 = (3I+W HW)-1.


Suppose now that we let T = T(p,W) denote the matrix In
Theorem 8 which is completely determined by p and the submatrix 11', that is,

T = T(p,W) =

-I Xu Hj
[n p .
WXI

Now given a system of equations Tx = b with II' m by n of rank


r, 0 < r ~ n, and p any positive integer it follows that if
we partition x and b as

x =

with x(l~ ... ,x(p) and b(O)n-tuples and b(l), ... ,b(P) m-tuples,
then x is a solution if and only if
(3.14)

Ix (j)
j=l

b(O)

and
(3.15)

Wx(j) = b(j) ,

j = 1, ... ,p.

In other words, each x(j) must be a solution of m equations


in n unknowns, subject to the condition that the sum of the
solutions is equal to b(O). These characterizations are
-47-

further explored for the general case of an arbitrary matrix,


W, in Exercises 3.16 and 3.17 and for an important special
case in Exercises 3.18 and 3.19.
Exercises
3.10

Complete the numerical construction of A+ in Example 3.6 and


verify that A and A+ satisfy the defining equations in (2.2).

3.11

Matrices of the form [up,I p ] and, more generally, Kronecker


products such as A in Example 3.6 in which at least one of the
matrices has this form occur frequently in statistical design of
experiments [1] [4].

For example, suppose it is requi red to

examine the effect of p different fertil izers on soy bean yield.


One approach to this problem is to divide a field into pq subsections (called plots), randomly assign each of the p type of
fertil izers to q plots, and measure the yield from each.

Neglect-

ing other factors which may effect yield, a model for this
experiment has the form
(3.16)

m + t. + e ..
I

where y .. is the yield of the jth plot to which ferti lizer i has
IJ

been appl ied, m is an estimate of an overall "main" effect, ti is


an estimate of the effect of the particular fertil izer treatment
and e ij is the experimental error associated with the particular
plot.

The question now is to determine m and tl , ... ,tp to mini-

mize the sum of squares of experimental error, that is,


p

i =1 j=l

a.

e ..
I

If y and e denote the vectors


y

(y 1 1 ' ... ,y p 1 ,y 12 ' ... ,y p2 ' ... ,y I q , ... ,y pq)

and

show that data for the model in (3.16) can be represented as


(3.1

where x

y =

Ax + e,

(m,t l , ... ,tp)

and

-48-

b.

c.

Show that rIA}

x=

p, and construct the minimal norm solution

A+y, to (3. 17).

For statistical appl ications it is also assumed that


p

It.o.

i=1

x from

Starting wi th

3.llb above, construct that solution X,

say, for which this additional condition holds.


3.12

(Continuation):

Given the experimental situation described in

Exercise 3. II, it is sometimes assumed that there is another


effect, cal led a block effect, present.
models is assumed:

In this case one of two

First, if there is no interaction between the

treatment and block effect, then


(3.18)
whereas if there is an assumed interaction between the treatment
and block effect, then
(3.19)

Y'l j' = m + t. + b. + (tb) .. + e .. ,


I

Ij

Ij

where Yij' m and ti have the same meaning as in (3.16),


b.,j=I, ... ,q, designate block effects, and (tb} .. ,i=I, ... ,p and
j

I j

j=1 , ... ,q, designate the effect of the interaction between treatment i and block j.
a.

Using the notation for Y and e from Exercise 3. I I, show that


the data for the model in (3.18) can be represented as

Y = Alx + e,
where now

and

AI =
b.

Show that the data for the model in (3.19) can be represented
as

where
x = (m, t I ' ... , t P , b I ' ... ,bq , ( t b) II ' ... , ( t b) I q , ... , ( t b) pi' ... , ( t b) pq)
- 4 9-

c.

Use the procedure of Example 3.6 to construct A2


the solution

x = A/Y.

and thus

(For statistical appl ications the

model in 0.19) is not meaningful unless there is more than


one observation for each pair of indices i and j, that is, a
model of the form

where k = 1, ... ,r.

In this case the unique solution is obtained

by assuming
p

L t.

i=l

q
=

L b.

j=l J

for all i and j.

and also

L(tb) ..
i=l
IJ

L (tb) ..

j=l

Note, in addi tion, that the construction of

Al+ in 3 12a above is somewhat more complicated, but can be


formed using related techniques.

The particular solution used

for statistical appl ications in this case assumes that


p

L t.

i=l
*3.13

L b.

j=l J

O. )

Show that if P and Q are any square matrices with x an eigenvector


of P corresponding to eigenvalue A and y an eigenvector of Q corresponding to eigenvalue fl, then xXy is an eigenvector of pXQ
corresponding to eigenvalue Afl.

3.14

a.

'~b.
3.15

Show that for any p and

w, !-T+r

= (I -W+W)

X(I p -u p

+).

Construct a complete orthonormal set of eigenvectors for

I-n.

Prove directly that for any mbyn matrix W of rank r and any

p,

rank (T) = n + r(p-l).


3.16

For any system of equations Tx = b with x and b partitioned to


give (3.14) and (3.15), let X and B denote the matrices
X = [x(l), ... ,x(p)l,

a.

B = [btl) , ... ,b(P)j.

Prove that Tx = b if and only if there exists a matrix X such


that

-50-

Xu

(3.20)

and
(3.21 )
b.

WX = B.

Prove that a necessary condition for a solution, X, to (3.20)


and (3.21) to exist is WW+B = Band \vb(o) = Bu .
p

For any eigenvector z. Xy. of I

(Continuation):
YJ'

-T+T, where
I
J
np
has components Y'I'"'Y' , let Z.. denote the matrix
Jp

Z.. = [y.lz., ... ,y. z.J.


I

JP

1 if and only if
(3.22)

Z .. U
I

J P

and
(3.23)
3.18

WZ = O.
IJ

The transportation problem in linear programming is an example of


a problem in which it is required to solve a system of equations
Tx = b.

This famous problem can be stated as follows:

Consider

a company with n plants which produce a l , ... ,an units, respectively,


of a given product in some time period. This company has p
distributors which require b l , ... ,b p units, respectively, of the
product in the same time period, where
n

L a.

i=l

L boO

j=l J

If there is a unit cost c .. for shipping from plant i to distributor


I

j, i=l , ... ,n and j=l , ... ,p, then how should the shipments be allocated in order to minimize total transportation cost?

This problem

can be illustrated in a schematic form (called a tableau) as shown


in Figure 6 where 0 1 , ... ,On designate origins of shipment (plants),
Dl , ... ,D p designate destinations (distributors) and for each i and
j, x .. denotes the number of units to be shipped from O. to D..
IJ

The problem now is to determine the x .. , i=l, ... ,n and j=l, ... ,p
to minimize the total shipping cost.

-51-

0.

bJ

0.

~x ..

x ll

x lj

xi 1

nl

bl

Figure 6.

al

x lp

a.

X.

IP

nj

b.
J

a
np

La.

= Lb.
j J

The transportation problem tableau.

(3.24)

L L c . .x . . ,

i=l j=l

IJ

IJ

subject to the conditions that


(3.25 )

L x ..

j=l

aj

1, .. . ,n,

and
n

(3.26)

L x ..

i =1

b.,
J

1, . ,p.

Also, we must have x .. > 0, for all i and j, and, assuming fracIJ -

tional units cannot be manufactured or shipped, all ai' bj and x ij


-52-

integers.

(This last requirement that the x .. are integers


IJ

fol lows automatically when the a. and b. are [6].)


J

a.

Show that if the x .. in Figure 6 are elements of an n by p


matrix X, and if b

t6)--

[al, ... ,a ] H, then the condi tions in

(3.25) can be written as Xu

become

= b(o)and the cond i t ions in (3.26)

Therefore, (3.25) and (3.26) together imply that any set of


numbers x .. which satisfy the row and column requirements of
I

b = [al, ... ,an,bl, ... ,b p ]


b.

the tableau is a solution of Tx = b where T = T(p,u

Prove that if T

[[r n -

) and

T(p,u n ), then

_1
n+p

u u
n n

HJ Xu p+ ,

+X

[1 p - _1
u u HJ]
n+p p p
.

Moreover, show that if x ij is the element in row

x = T+b,

j of the tableau form of

and column

then

1
x .. = -a.
IJ
P I

L a.

np i=l

for i = l, ... ,n and j=l, ... ,p.

*c.

Show that rank (T) = n + p - 1 when W = u H and thus rank


- T+r) = (n-l) (p-l).

n '

Also, construct a complete otrhonp


+
normal set of eigenvectors of r
- T 1. and show that z.I Xy.J
np
is an eigenvector corresponding to eigenvalue A.y. = 1 if and
(1

only if all row sums and column sums in the tableau form are
zero.
d.

The vector g = (1
inner product
(c,x)

np

- r+T)c is called the gradient of the

L L c . . x ..
i=l j=l IJ I J

in (3.24) .

Show that the elements, gij in the tableau form


for g can be written as
-53-

gij = c ij - n .1=L1C ij - P j.=L1C ij + np

L
i=l

Llc ij
j=

for i = 1, ... ,n and j=l, ... ,p.


3.19

(Continuation):

The transportation problem has been generalized

in a number of different ways, and one of these extensions follows


directly using matrices of the form T = T{p,W).

Suppose that we

are given q transportation problems, each with n origins and p


destinations, and let aik,bjk,Cijk and x ijk be the row sums, column sums, costs and variables, respectively, associated with the
kth tableau, k=l, ... ,q.

A "three-dimensional" transportation

problem is now obtained by adding the conditions that


(3.27)
for i=l, ... ,n and j=l, ... ,p, where dll, ... ,d np are given positive
integers.

(The choice of nomenclature "three-dimensional is

apparent by noting that if the tableaus are stacked to form a


parallelopiped with q layers each with np cells, then (3.27)
simply implies np conditions that must be satisfied when the x ijk
are summed in the vertical direction as shown in Figure 7, where
only the row, column and vertical sum requirements are indicated.)
a.

Show that the conditions


n

L a k

i=l I

L b k ,

j=l j

k=l, ... ,q,

Jla ik
q

L b k

k=l j

L d .. ,

j=l Ij
n

L d .. ,

i=l Ij

i=l, ... ,n,

j=l , ... ,p,

are necessary in order for a three-dimensional transportation


problems to have a solution.
b.

Show that the conditions which the x ijk must satisfy if there
is a solution can be written as Tx = b where T = T{q,W) with
-54-

Figure 7. The parallelopiped requirements for the


tableau of a three-dimensional transportation problem.

W; T

; T(p,u H) the matrix for the "two-dimensional transnp


n
portation problem in Exercise 3.18 and a suitable vector b.

c.

Show that

- _I
u u HJ XI]
[1-[1
qp
p+qpp
n

and that G T

q np

u;

+; [u,Vl where

I (I p -

n {n+q}

-55-

and

v
3.3
3.20

= u

Xn(p+q)
1 [1
n

n + 2p + q u u HJ
(n+p) (n+p+q) n n .

Miscellaneous Exercises
Prove that a necessary and sufficient condition that the equations
AX

= C,

XS

D have a common solution is that each equation has a

solution and that AD

= CS,

in which case X = A+C + DS+ - A+ADB+ is

a particular solution.
3.21

Prove Lemma 7.

3.22

Prove that

for any matrix B, and that

-56-

4
Drazin Inverses

4.1

The Drazin Inverse of a Square Matrix

In this section we consider another type of generalized


inverse for square complex matrices. The inverse in Theorem
9, due to Drazin [3], has a variety of applications.
THEOREM 9: For any square matrix, A, there is a unique
matrix X such that
(4.1)

Ak = Ak+1X, for some positive integer k,

(4. 2)

(4.3)
Observe first that if A = 0 is the null matrix, then
A and X = 0 satisfy (4.1), (4.2) and (4.3).
Proof:

Suppose A f 0 is any n by n matrix. Then there exist


scalars d l , ... ,d t , not all zero, such that
t

i=l

d.A 1

0,

-57-

where t < nZ+l since the Ai can be viewed as vectors with n Z


elements. Let d k be the first nonzero coefficient. Then we
can write
(4.4)
where
U

d.A
dk1 [I
i=k+l ~

i - k - l ).

Since U is a polynomial in A, U and A commute.


plying both sides of (4.4) by AU gives
Ak = Ak+ZU Z = Ak+3 U3 =

Also, multi-

... ,

and thus
(4.5)
for all m > 1.
Let X = AkU k +l .

Then for this choice of X,

and

by use of (4.5). Also, X and A commute since U and A commute.


Thus the conditions (4.1), (4.Z) and (4.3) hold for this X.
To show that X is unique, suppose that Y is also a
solution to (4.1), (4.Z) and (4.3), where X corresponds to
an exponent kl and Y corresponds to an exponent k Z in (4.1).
Let ~ = maximum (kl,k z). Then it follows using (4.1), (4.Z),
(4.3) and (4.5) that

X~+lAk

X3AZ

xk+lAk+ly

XAk+lyk+l = Akyk+l

XAY

AYZ

yZA = Y

to establish uniqueness .
-58-

We will call the unique matrix X in Theorem 9 the


inverse of A and write X alternately as X = Ad'

D~azin

Also, we

will call the smallest k such that (4.1) holds the index of

A.
That Ad is a generalized inverse of A is apparent by
noting that (4.1) holds with k
also (4.2) and (4.3) hold.

= 1 when X = A-I exists and

Observe, moreover, that in

general (4.1) can be rewritten as


(4.6)
and (4.2) becomes XAX = X, by use of (4.3), so that the
defining equations in Theorem 9 can be viewed as an alternative to those used for A+ in which AXA = A is replaced by
(4.6),

(1.2) remains unchanged, and (1.3) and (1.4) are

replaced by the condition in (4.3) that A and X commute.


(Various relationships between Ad and A+ will be explored
in the exercises at the end of this section and in Section
4.3. )
As will be discussed following the proof of Lemma 10,
full rank factorizations of A can be used effectively in
the construction of Ad'
LEMMA 10:

For any factorization A

BC, Ad

2
B(CB)d C.

Observe first that for any square matrix A and posim-n


tive integers k, m and n, we have A mAn = Ad
if m > n and
d
Am+nA n = Am Hm > k and A has index k.
d

Proof:

Let k denote the larger of the index of Be and the


index of CB.

Then

-59-

Suppose now that A = BlC l is a full rank factorization


where rank (A) = r l . Forming the r l byr l matrix ClB l , then
either ClB l is nonsingular, or ClB l = 0, or rank (ClB l ) = r Z
where 0 < r Z < r l .
In the first case, with ClB l nonsingular,
(ClBl)d
(ClBl)-l so that Ad = Bl(ClBl)-ZC l , by Lemma 10,
where

On the other hand, if ClB l = 0 then (ClBl)d = 0 and thus


Ad = 0 by again using Lemma 10.
Finally, if rank (ClB l )
< r Z < r l , then for any full rank factorization
BZC Z ' we have
Z
(ClBl)d = BZ(CZBZ)d C z

so that Ad in Lemma 10 becomes Ad = BlBZ(CZBZ)d3CZCl.

The

same argument now applies to CZB Z ' that is, either CZB Z is
nonsingular and

or CZB Z = 0 and thus Ad = 0, or rank (ClB l ) = r3 where

o < r3 < r Z ' and CZB Z = B3 C3 is a full rank factorization to


which Lemma 10 can be applied.
Continuing in this manner
with

then either BmCm


rank (BmCm)

0 for some index m, and so Ad = 0, or

rank (CmBm) > 0 for some index m, in which case

and thus

(4.7)

Ad

in Lemma 10.
A2
and

B1ClBlC l

= BlB Z ... Bm (C mBm)-m-l CmCm-l


Observe, moreover, that with A = BlC l ,

BlBZCZC l , ... , Am

-60-

= BlB Z

... BmCmC m_ l

... Cl

(4.8)

we have either Am = Am+ l = 0, and Ad = 0, or that Ad has the


form in (4.7) where, since each Bi has full column rank and
each Ci has full row rank,
B

m-l

m-l

BmCm

and

+
m

CmBm.

m
m+l
Therefore, in both cases we have rank (A ) = rank (A
).
Furthermore, it follows in both cases that (4.1) holds for
k = m and does not hold for any k < m. That is to say, k in
.
k
k+l
(4.1) is the smallest positive Integer such that A and A
have the same rank.
Example 4.1
If A is the singular matrix

[:

~l

-1

written as the full rank factorization


A

BlC l

[:

1" [~

2 -2]

1 '

then
ClB l

[:

:]

is nonsingular, so that A has index one, and

-61-

BleClB1)

Ad

-Z

0'.[:

Cl

-Z6
35
3

"]

-33
-1

ExamEle 4. Z
I f A is the matrix

li

with A

11

BIC l the full rank factorization,


0

l:

BIC l

where

1
1
0

o
1

-1

][:

B3C3

[~] [1

1
-1

1]

~] ,

is a full rank factorization.

[~

-1

~]

CZB Z

Continuing,

~]

so that
CZB2

OJ

is a full rank factorization with C3B3


index three and Ad becomes
-62-

7.

Hence A has

Ad

BIB2B3(C3B3)

-4

C3 C2 C1
1

1
24 01

343

. [1

0]

oJ

7"

II

10

~r

For the special case of matrices with index one we have


(4.9)

AAdA

A,

AdAAd

Ad'

AAd

AdA,

so that
(4.10)
by the duality in the roles of A and Ad.

Conversely, if

(4.10) holds, then the first and last relations in (4.9)


follow from the defining relations in (4.2) and (4.3) applied
to (Ad)d and Ad' and the second relation in (4.9) is simply
(4.2) for Ad and A.

Consequently,

if A has index one.

In this special case the Drazin inverse

(4.10) holds if and only

of A is frequently called the group inverse of A, and is


designated alternately as A#.

Thus X

is the unique solution of AXA = A, XAX

A#, when it exists,

= A and AX = XA, and

it follows from Lemma 10 that for any full rank factorization


A

= BC, A# = B(CB)-2C.

Exercises

4.1

Compute Ad for the matrices

Al

4.2

5
7

1] ,

A2

[:

2
0
0

-:J.

A{
3

-2

-ll

0
0

-2

-3

Given any matrices Band C of the same size where B has full
column rank, we will say that C is aZias to B if B+ = (CHB)+C H.

a.
b.

Prove that if C is alias to B, then CHB is nonsingular.


Show that the set of all matrices al ias to B form an
equivalence class.
-63-

;j'

c.

Prove that Ad

= A+

if and only if C is alias to B for any


= BC H.

full rank factorization A


d.

Note, in particular, that Ad

= A+

when A is Hermitian.

Prove this fact directly and also by using the result in


4.2c above.
4.3

Prove that (AH)d = Ad H and that ((Ad)d)d = Ad for any matrix A.

4.4

Prove that Ad

4.5

Prove that [PXQ1d = PdXQd for any square matrices P and Q.

=0

for any nilpotent matrix A.


What

is the index of pXQ?

4.2

An Extension to Rectangular Matrices


The Drazin inverse of a matrix, A, as defined in

Theorem 9, exists only if A is square, and an obvious question


is how this definition can be extended to rectangular matrices.
One approach to this problem is to observe that if B is a
m by n with m > n, say, then B can be augmented by m-n columns
of zeroes to form a square matrix A.

Now forming Ad' we

might then take those columns of Ad which correspond to the


locations of columns of B in A as a definition of the "Drazin
inverse" of B.

As shown in the following example, however,

1 such

the difficulty in this approach is that there are ( m


matrices A, obtained by considering all possible
m-n)

arrangements of the n columns of B (taken without any permutations) and the m-n columns of zeroes, and that Ad can be
different in each case.
Example 4.3
If
B

[: J
1

[:

and

:]

-1

AZ

"

[:

-64-

0
0

-1

z
A3

[:

-1

:]

then

:]

-2
1
-13

l]

10
3

}[l
[:

t[:

(A 2 ) d

0
0
0

-1

BC.

are obtained by applying Lemma 10 to the matrices Ai


where

o
o

o
1

Observe in Example 4.3 that the nonzero columns of each


matrix (Ai)d correspond to the product B(C i B)d 2 . Consequently, using the nonzero columns of (Ai)d to define the
"Drazin inverse" of B implies that the resulting matrix is a
function of Ci . That such matrices are uniquely determined
by a set of defining equations and are special cases of a
class of generalized inverses that can be constructed for
any matrix B will be apparent from Theorem 11.
THEOREM 11: For any mbyn matrix B and any nbym matrix IV,
there is a unique matrix X such that
(4.11)

(BW)k

(BW)k+l xw , for some positive integer k,

(4.12)

XWBWX

X,

(4.13)

BWX

XWB.

Proof: Let X
B(WB)d 2 . Then with XW = B(WB)d2W = (BW)d'
by Lemma 10, (4.11) holds with k the index of BW. Also,

and

so that (4.12) and (4.13) hold.


-65-

To 'show that X is unique we can proceed as in the proof


of Theorem 9. Thus, suppose Xl and Xz are solutions of
(4.11), (4.lZ) and (4.13) corresponding to positive integers
kl and kZ' respectively, in (4.11). Then with
k = maximum (kl,k Z)' it follows that
Xl

XlWBWX l

BWXlWX l

(BW)Z(XlW)ZX l

... = (BW)k(XlW)k Xl = (BW)k+lXZW(XlW)kXl


XZ(WB)k+lW(XlW)kXl

XZWBW(BW)k(XlW)k Xl

Continuing in a similar manner with

then
~

Xz(WXZ)k+l(WB)k+lWBWXl
Xz(WXZ)k+lW(BW)k+lXlWB

Therefore, with Xl = XZ' the solution to (4.11), (4.lZ) and


(4.13) is unique .
The unique matrix, X, in Theorem 11 will be called the
W-weighted Drazin inverse of B and will be written alternately
as X

(BW)d'

The choice of nomenclature W-weighted Drazin inverse of


B is easily seen by noting that with (BW)d = B(WB)d z , then
-66-

(BW)d

Bd when B is square and W is the identity matrix.

Also, observe more generally that with Band (BW)d of the


same size and with Wand WBW the size of BH , the relation
BW(BW)d = (BW)dWB in (4.13) can be viewed as a generalized
commutativity condition, and (BW)dWBW(BW)d = (BW)d in (4.12)
is analogous to (4.2) when written in the form XAX = X.
ExamEle 4.4
If

:J

I~

l3

is the matrix in Example 4.3, and

WI

II

l3

-1

-:]

w =
2

[1
0

~] ,

then

(BW )d
1

_1_
(91) 2

[16960 169
338] ,
87

-169

(BW ) d
2

"6 [:

:]

Exercises
4.6
Verify that Band (BW)d satisfy the defining equations in Theorem
11 for W = Cl , C2 , C3 in Example 4.3 and for W = Wl ' W2 in
Example 4.4.

4.7

Prove that E = Ed for any idempotent matrix E, and thus that


B2Bd when B is square and W = Bd . (Consequently,
B when W = Bd and B has index one.)
H

= W+ (WB) -1 .

4.8

Show that if W is any matrix alias to B, then (BW)d

4.9

Prove that (BW)d = BH+B+B H+ for any matrix B when W = BH.

(Note

that this result follows at once from Lemma 5(f) and Exercise
4.8 if B has full column rank, whereas Lemma 5(f) and Exercise
4.2d can be used for the general case.)

-67-

4.3
W

Expressions Relating Ad and AT


It is an immediate consequence of Exercise 4.9 that if
BH and so

then
B+ = W(BW)dW.
Thus, using W-weighted Drazin inverses with W = BH,(BW)d and
B+ are related directly in terms of products of matrices
which implies that (BW)d and B+ have the same rank.

In con-

trast, for any square matrix, A, we have


k
rank (A )

rank (A

k+l

),

with k the index of A, whereas rank (A+) = rank (A).

There-

fore, rank (Ad) ~ rank (A+) with equality holding if and


only if A has a group inverse.

The following result can be

used to give a general expression for the Drazin inverse of


a matrix, A, in terms of powers of A and a Moore-Penrose
inverse.
THEOREM 12:
(4.14)

For any square matrix A with index k,


AkYAk

for any matrix Y such that

Proof:

Starting with the right-hand side of (4.14) we have


AkYAk = A k+1 A2k+l yA 2k+1 A k+l
d
d

Observe in (4.15) that one obvious choice of Y is


(A 2k + l )+, and it then follows that A~,(A~)+ and Ad have the
same rank for every positive integer
-68-

> k.

In this case,

various relationships among A,(A)+ and Ad can be


established. For example, it can be shown that for any > k,
there is a unique matrix X satisfying
(4.16)

AXA = A ,

XAX = X

and
(4.17)

(XA)H

XA ,

AX = AA d

Dually, there is a unique matrix X satisfying (4.16) and


(4.18)

(AX)H

AX,

XA = AA
d

The unique solutions of (4.16) and (4.17) and of (4.16) and


(4.18) are called the left and right power inverses of A,
respectively, and are designated as (A)L and (A)R' Moreover, it can be shown (Exercise 4.11) that
(4.19)

and (Exercise 4.14) that (A)L and (A)R can be computed


using full rank factorizations.
Exercises
4.10 Show that if A and W satisfy AWA = A and (WA)H = WA for any
positive integer , then WA = (A)+A, and conversely. What is
the dual form of this result for AW?
4.11

Prove that (A) L in (4.19) is the unique solution to (4.16) and


(4.1]) .

4.12

Prove that for every > k, (A)+ = (A)LA(A)R and


Ad = (A)RA2 -1 (A)L' -

4.13

Use a sequence of full rank factorizations


A = B1C l ,

4.14

A2

B1B2C2Cl , ... ,

to show that A(A)+

Ak(Ak)+ and (A)+A

(Continuation):

Show that
-69-

(Ak)+Ak for all > k.

and

4.15

2
2
Construct (A )L and (A )R for the matrix A in Example 4.1

4.16

Prove that if Ax = b is a consistent system of equations and if A


has index one, then the general solution of Anx = b, n = 1,2, ... ,
can be written as x = ARnb + (I-ALA)y where y is arbitrary. (Note
that this expression reduces to x = A-nb when A is nonsingular.
The terminology "power inverse" of A was chosen since we use
powers of AR in a similar manner to obtain a particular solution
of Anx = b.)

4.4

Miscellaneous Exercises

4.17

Let Band W be any matrices, mby nand n by m, respectively, and


let p be any positive integer.
a.

Show that there is a unique matrix X such that


BWX

XWB,

a unique matrix X such that


WX

= WB(WB)/,

and that the unique X which satisfies both sets of equations


is X = B(WB)/.
b.

Show that if p ~ 1, q ~ -1 and r ~ 0 are integers such that


q + 2r + 2 = p, and if (WB)q = (WB)d when q = -1, then
B(WB)d P

= B(WB)q[WB)rW)(B(WB)q)]d 2 .

(Consequently, the unique X in 4.17a is the (WB)r W-weighted


Drazin inverse of B(WB)q.)
4.18

Prove that if A and B are any matrices such that Ad 2

AAd

= BB d
-70-

Bd ' then

5
Other Generalized Inverses

5.1

Inverses That Are Not Unique

Given matrices A and X, subsets of the relations in


(1.1) to (1.5) other than those used to define A+ and Ad
provide additional types of generalized inverses. Although
not unique, some of these generalized inverses exhibit the
essential properties of A+ required in various applications.
For example, observe that only the condition AXA = A was
needed to characterize consistent systems of equations
Ax = b by the relation AXb = b in (2.4). Moreover, if A
and X also satisfy (XA)H = XA, then XA = A+A, by Exercise
4.10, and with A+b a particular solution of Ax = b, the
general solution in Exercise 2.21 can be written as

with the orthogonal decomposition

-71-

(Note that this is an extension of the special case of


matrices with full row rank used in the proof of Theorem 2.)
In this section we consider relationships among certain of
these generalized inverses in terms of full rank factorizations, and illustrate the construction of such inverses with
numerical examples.
For any A and X such that AXA = A, rank (X) ~ rank (A),
whereas XAX = X implies rank (X) ~ rank (A). The-following
lemma characterizes solutions ofAXA = A and XAX = X in
terms of group inverses.
LEMMA 13: For any full rank factorizations A = BC and
X = YZ, AXA = A and XAX
X if and only if AX = (BZ)#BZ and
XA = (YC)#YC.
If A = BC and X = YZ are full rank factorizations
where B is mbyr, C is rbyn, Y is nbys and Z is sbym, then
AXA = A implies
Proof:

(5.1)

CYZB = I r ,

and XAX

X implies

(5.2)

ZBCY = Is.

Consequently, with r = s, ZB = (Cy)-l so that


AX = BCYZ = B(ZB)-lZ = (BZ)#BZ
and
XA

YZBC

Y(Cy)-lC

(YC)#YC,

by Lemma 10.
#

Conversely, since Z and C have full row rank, (BZ) BZB=B


and (YC)#YCY = Y. Hence AX = (BZ)#BZ gives AXA = A, and
XA = (YC)#YC gives XAX = X
It should be noted that the relation in (5.1) is both
necessary and sufficient to have AXA = A, and does not
require that YZ is a full rank factorization. Dually, (5.2)
is both necessary and sufficient to have XAX = X, and BC
-72-

need not be a full rank factorization. Observe, moreover,


that given any matrix, A, with full rank factorization A = BC,
then for any choice of Y such that CY has full column rank,
taking Z = (CY)LBL with (CY)L any left inverse of CY and BL
any left inverse of B gives a matrix X = YZ such that (5.2)
holds. Therefore, we can always construct matrices, X, of
any given rank not exceeding the rank of A with XAX = X. On
the other hand, given full rank factorizations A = BC and
X = YZ such that AXA = A and XAX = X, then for any matrix U
with full column rank satisfying CU = 0 and for any matrix V
with UVA defined we have
(5.3)

A(X+UV)A = A.

Now

(5.4)
where the first matrix on the right-hand side has full column
rank (Exercise 5.6). Thus, for any choice of V such that
the second matrix on the right-hand side of (5.4) has full
row rank, (5.3) holds and rank (X+UV) > rank (A).
The following example illustrates the construction of
matrices, X, of prescribed rank such that A and X satisfy at
least one of the conditions AXA = X and XAX = X.
Example 5.1
Let A be the matrix

from Example 4.1 with full rank factorization


A

BC

2
1

-2]

1 .

Then rank (A) = 2, and Xo = 0 satisfies XoAXo = Xo trivially.


To construct a matrix, Xl' of rank one such that XlAX l = Xl'
-73-

note first that


(5.5)

=} [-1Z

BL

-1

is a left inverse of B.
Cy

c:],

:]
Now if yH

194[-1

(Cy)+ =

[3

9],

zH

5], then

k[l9

-11

0]

so that
-33

-44
-55

z of

To next construct a matrix X


(and thus AXZA = A), let

rank two such that XZAX Z

[1 -1]
1
1

1.

-1

Then

CY [05 -3'
4] (Cy) -1 201[35
=

and with BL the left inverse of B in (5.5),

z
10
so that

-4
6
-4

Finally, to construct a matrix, X3 , of rank three such that


AX3A = A, let

-74-

where

Then

u~N(C).

-4

-4

~]

-3

with det X3 = -5.


That the procedure in Example 5.1 can be extended to
construct matrices X of given rank satisfying (AX)H = AX and
at least one of the conditions AXA

A and XAX

X is

apparent by observing that CY with full column rank implies


BCY has full column rank.

Hence, taking Z = (BCY)+,

holds and AX = BCY(BCY)+ is Hermitian.

(S.Z)

In the following

example we indicate matrices Z in Example 5.1 so that the


resulting matrices Xi satisfy (AXi)H

= AX i

, i

= 1,2,3.

Example 5.2
Given the matrix A and full rank factorization A
in Example 5.1, again let yH = [3

5].

BC

Then
1

804[17

8]

and
yz

8~4

51

21

68
85

28

24]
32

35

40
1.

satisfies XlAX l = Xl with AX I Hermitian and rank (Xl)


Continuing, if we again use

Y =

[1 -1]
1

-1

then

AY

BCY

10
[ 5
5

2]

5 , Z = (AY)+
1
-75-

1 [ 16
220 -20

35

and
Xz

[lB
lio - z

YZ

18

-15
ZO
-15

~]

with XzAX z = Xz ,AX Z Hermitian and rank (X Z) = Z.


Now taking uH = [Z -3 - 3] as in the previous example
and vH
1/110 [0 1 1] gives
X3
with AX3A

Xz

[lB

uv H = lio -z
18

-13
17
-18

A, AX 3 Hermitian and det X3

:]
=

-10.

Given any full rank factorization A = BC, first choosing


a matrix Z so that ZB (and thus ZBC) has full row rank provides a completely dual procedure to that in Example 5.1 in
which Y = CR(ZB)R with CR any right inverse of C and (ZB)R
any right inverse of ZB. Taking Y = (ZBC)+ then gives
matrices analogous to those in Example 5.Z in which we now
have (XiA)H = XiA, i = 1,Z,3.
We conclude this brief introduction to generalized
inverses that are not unique by observing that the question
of representing all solutions of particular subsets of equations such as AXA = A or XAX = X and AX or XA Hermitian has
not been considered. Also, although obvious properties of
matrices A and X satisfying AXA = A with AX and XA Hermitian
are included in the exercises, the more difficult question
when AXA = A is replaced by the nonlinear relation XAX = X
is only treated superficially. The interested reader is
urged to consult [Z] for a detailed discussion of these
topics.
Exerc i ses
5. I
Show that any two of the conditions AXA
rank (X) = rank (A) imply the third.
5.2

Show that XAX

= A+

if AXA

= A,

(AX)H

-76-

A, XAX

AX and (XA)H

X,

= XA.

5.3

Let A = BC and X = YZ where Y and ZH have full column rank.


a.

Show that XAX

= X and (AX)H

AX if and only if BCY

= Z+.

Dually, show that XAX = X and (XA)H = XA if and only if


ZBC = y+.
b.

Given the matrix

construct a matrix Xl of rank one such that X1AX l = Xl and


(AX1)H = AX 1 . Also, construct a matrix X2 of rank two such
5.4

Let A

that X2AX 2

X2 and (X 2A) H

= BC

YZ where A is square and Band CH have full

and X

= X2A.

column rank.

5.5

a.

Show that if AXA = A and XA = AX, then B


C = CBCYZ where (CB)-l exists.

b.

Why is it not of interest to consider also


cases when (ZB)-l or (Cy)-l exist?

c.

What equations must Y and Z satisfy if XAX = X and AX = XA?

YZBCB and
the special

Verify that the inverses constructed in Examples 5.1 and 5.2


satisfy the required properties.

5.6

Prove that if W = [Y,U] is any matrix with CY nonsingular and


columns of U in N(C) 1 inearly independent, then W has full column
rank.

5.7

Prove that if A = BC is any full rank factorization of a square


matrix, then CB

5.8

if and only if A is idempotent.

Show that if A = BC is any full rank factorization and Y is any


matrix such that CY is nonsingular, A+

-77 -

((AY)+AJ+(AY)+.

Appendix 1:
Hints for Certain Exercises

Chapter 1
Ex. 1.lb:

xy

yX H imp1 ies y

= ax

where

Ex. 1.3a:

If P is the matrix with columns xl' ... ,x n ' and A is the


diagonal matrix with diagonal elements Ai' i = 1, ... ,n, AP = PA. Hence
A = (PA)p H since P is unitary. 1.3c: If A is Hermitian, AiEi
IiEi'
i = 1, ... ,no

Ex. 1.5b:

If uk is the column vector with k elements each equal to

unity,
H

A -1
n

for all n > 2.

-U n_ 1

1.5c:

-u

n-l

Subtract the last row of A from each of the


n

-79-

preceding rows and expand the determinant using cofactors of the first
column.

1.5d:

H
Let X = I + dXX and determine d

Ex. 1 .6a:
A

A -1 has all integral elements.

1
20 H] where x = --u40.
= 6 [ I + TXX

1.6f:

1.6e:

/40

50

Ax

that AX

I.

1.6b:

= (l+k)x and Ay = y.

For any n > 2 the vectors

-1

-2

, ... ,
1

-(n-l)

are orthogonal.
Ex. 1.7b:

Form XA fi rst.

Chapter 2

= 0, and Za = 0 implies a = O.

Ex. 2.2:

AZ

Ex. 2.5:

xl is orthogonal to every vector zEN(A).

Ex. 2.6:

Let A be mbyn with rank r,

rank (AHA)

50

Hence

that dim N(A) = n-r.

Now assume

k < r, and let zl , ... ,zn-k denote any basis of N(AHA).


H

(zi,A Az i ) = (Azi,Az i )

impl ies AZ i = 0,

l, ... ,n-k.

IIAzil1

Hence dim N(A) > n-k > n-r, a contra-

diction.
Ex. 2.10:

Use Exercise 2.9 and apply Exercise 1.6a.

Ex. 2.11a:
zEN(A).

Use Exercise 2.10 to obtain A+b and Exercise 2.2 to form

2.11d:

Inthiscase

= [ I
u

Then

un]
0
-80-

A = uv H is a full rank factorization.

Ex. 2.13:

Use Theorem 4 and the

remarks in the final paragraph of Section 2.2.


Ex. 2.14:

x = A+b +

n-r

L a.z.

i=l

is an orthogonal decomposition of any vector x.


product of

x with

Ex. 2.15:

For any i = 1, ... ,m, column

Now take the inner

any vector z ..
I

of A+ is the minimal norm solu-

tion of Ax. = e ..
I

Ex. 2.20e:
if BA

Use Exercise 1 .2c and its dual that BAAH

CAA H if and only

CA.

Chapter 3

Ex. 3.2b:

A=

Ex. 3. 7a:

d k a k = 1 if and only if c k # O.

is a full rank factorization.

Ex. 3. lIb:

1 [
- CiTP+iT

u/

A+-

(p+1) I -u u
p

]XUH
q

p P

Now if x = A+y is written in terms of components as ~H


then
AlP

m=~l L L.y.
q\P+l/i=1 j=l IJ
and
1, ... ,po

hl.!..:

WI<h dIm N(A) = I "d , = [:}N(A)' all solutions of Ax

be written in terms of components as m = m - a and ti


i = 1, ... ,p, where a is arbitrary.

-81-

y can

If

L t.

0, then

i=1 1

= 2P L~

a.

i=1

t.1 = 2pq L~ L~

i=1 y=1

y .

IJ

+::-r:h
g
qWPT+III).

1=1 y=1

iJ

so that
"
I P
m=-L
pq i =1

LY ..

j=1 1J

and
:::

I ~
I P ~
L y .. - L L y ..
q j=1 IJ
pq i=1 j=1 IJ

t. = -

Ex. 3. 14a:

np

= I v I and.!. u H = u +
n 1\ p
P P
P

3.14b: Let zi , ... ,z be any complete orthonormal set of eigenvectors


+
n
of I - W W where zi , ... ,zr correspond to eigenvalue A = I and
n
zr+I, ... ,zn correspond to eigenvalue A = O. Combine these vectors with
those in the hint for Exercise 1.6f.
Ex. 3.15:

Use Gauss elimination to reduce

o X

Ex. 3.17:

Z.

T to

block form

Xy. corresponds to eigenvalue one if and only if


J

T(z. Xy.) = O.
1

Ex. 3.20:

AX = C and XB = D consistent implies AA+C

-82-

C and DB+B

D.

Chapter 4
(BHB)-IB H = (CHB)+C H.

Ex. 4.2a:

4.2b:

The relation is reflexive, by

If C is alias to B, then C = BB+C = B+H(BHC) is a full rank


factorization and the relation is symmetric since (B+ H)+ = BH. TransiLemma 5(e).

tivity follows by a similar type of argument.

Ex. 4.11: Use an argument similar to that one employed to establ ish
uniqueness in (2.2).
Ex. 4.16:

If A = BC is a full rank factorization and A has index one,

by Exercise 4.14.

Then

ALA = C+C = A+A,

An
R

for all n > 1.


Show first that X = B(WB)d P satisfies all six equations.

Ex. 4.17:

Then show that the first set of three equations implies the second set,
and that the second set implies X has the given form.
If A 2

Ex. 4.18:

Chapter 5
Ex. 5.2:

Use both the direct and dual form of Exercise 4.10 with

Ex. 5.3a: Applying Exercise 4.10 to XAX


AX = X+X = Z+Z.
Ex. 5.4a:
Ex. 5.8:

B has full column rank.

5.4b:

= X and

(AX)H

Then X = A#.

AY = B(CY) is a full rank factorization.

-83-

= AX

gives

1.

Appendix 2:
Selected References

1.

Anderson, R.L. and Bancroft, T.A.


Theory in Research.

2.

1952.

Statistical

New York: McGraw-Hill.

Ben-Israel, A. and Greville, T.N.E.

1974.

Inverses: Theory and Applications.

Generalized

New York:

Wiley.
3.

Drazin, M.P.

1958.

Pseudo-inverses in associative

rings and semi-groups.

Amer. Math. Monthly

65:506-513.
4.

Graybill, F.A.

1961.

An Introduction to Linear

Statistical Models, Volume I.


Hi11.
5.

Greville, T.N.E.

1961.

New York: McGraw-

Note on fitting of functions

of several independent variables.


Math.

9:109-115 (Erratum, p. 317).

-85-

SIAM J.

~.

6.

Hadley, G. 1962. Linear Programming.


Addison-Wesley.

Massachusetts:

7.

Halmos, P.R. 1958. Finite-Dimensional Vector Spaces.


New Jersey: Van-Nostrand.

8.

Moore, E.H. 1920. On the reciprocal of the general


algebraic matrix (abstract). Bull. Amer. Math.
Soc. 26:394-395.

9.

Noble, B. 1969. Applied Linear Algebra.


Prentice Hall.

New Jersey:

10.

Penrose, R. 1955. A generalized inverse for matrices.


Proc. ---Camb. Phil. --Soc. 51:406-413.
----

11.

Rao, C.R. and Mitra, S.K. 1971. Generalized Inverses


of Matrices and Its Applications. New York: Wiley.

12.

Strang, G. 1976. Linear Algebra and Its Applications.


New York: Academic Press.

13.

Thrall, R.M. and Tornhiem, L. 1957.


Matrices. New York: Wiley.

-86-

Vector Spaces and

Index to
Principal Definitions

Page

aUas . . . . .

63

Drazin inverse

59

full rank faatorization

16

fundamental equations
group inverse.

63

index . . . . . .

59

Kroneaker produat

40

Moore-Penrose inverse

13

power inverse.

. . .

69

trapezoidal deaomposition

38

W-weighted Drazin inverse

66

-87-

You might also like