The Moore-Penrose or Generalized Inverse

Downloaded 04/19/16 to 132.239.1.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.
php
The Moore—Penrose or
generalized inverse
1. Basic definitions
Equations of the form
Ax = b,AEC" lx ",xeC",beC m (1)
occur in many pure and applied problems. If AEC '1 <'1 and is invertible,
then the system of equations (1) is, in principle, easy to solve. The unique
solution is x = A 'b. If A is an arbitrary matrix in C"'"", then it becomes
-
more difficult to solve (1). There may be none, one, or an infinite number
of solutions depending on whether beR(A) and whether n-rank (A) > 0.
One would like to be able to find a matrix (or matrices) C, such that
solutions of (1) are of the form Cb. But if b0R(A), then (1) has no solution.
This will eventually require us to modify our concept of what a solution of
(1) is. However, as the applications will illustrate, this is not as unnatural
as it sounds. But for now we retain the standard definition of solution.
To motivate our first definition of the generalized inverse, consider the
functional equation
(2)
Y =f(x),xe 9 c III,
where f is a real-valued function with domain 9• One procedure for solving
(2) is to restrict the domain off to a smaller set 9' so that f=f is one to
one. Then an inverse function f -' from R(f) to 9 is defined by f - '(y) = x
if x e S' and f (x) = y. Thus f - '(y) is a solution of (2) for y e R(f ). This is
how the arcsec, arcsin, and other inverse functions are normally defined.
The same procedure can be used in trying to solve equation (1). As
usual, we let A be the linear function from C" into Cm defined by Ax = Ax
for xEC". To make A a one to one linear transformation it must be
restricted to a subspace complementary to N(A). An obvious one is
N(A)1 = R(A*). This suggests the following definition of the generalized
inverse.
Definition 1.1.1 Functional definition of the generalized inverse If
THE MOORE-PENROSE OR GENERALIZED INVERSE 9
Downloaded 04/19/16 to 132.239.1.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
AEC"'"", define the linear transformation At :Cm -+ C" by Atx = 0 if

I
xeR(A)- and Atx = (A R ( A *) ) " 1 x if xeR(A). The matrix of At is denoted At
and is called the generalized inverse of A.
It is easy to check that AAtx = 0 if xeR(A)l and AAtx = x if xER(A).
Similarly, AtAx = 0 if xeN(A) = R(A*) and AtAx = x if xER(A*) = R(At).
Thus AAt is the orthogonal projector of C"' onto R(A) while AtA is the
orthogonal projector of Cn onto R(A*) = R(At). This suggests a second
definition of the generalized inverse due to E. H. Moore
Definition 1.1.2 Moore definition of the generalized inverse If

A u C ' x then the generalized inverse of A is defined to be the unique
21 ",
matrix At such that

(a) AAt = P R(A) , and
(b) AtA = P R(A , ) .
Moore's definition was given in 1935 and then more or less forgotten.
This is possibly due to the fact that it was not expressed in the form of
Definition 2 but rather in a more cumbersome (no pun intended) notation.
An algebraic form of Moore's definition was given in 1955 by Penrose who
was apparently unaware of Moore's work.
Definition 1. 1.3 Penrose definition of the generalized inverse If

AeC' , then At is the unique matrix in C '2 such that
(i) AAtA = A,
(ii) AtAAt = At,
(iii) (AAt)* = AAt,
(iv) (AtA)* = AtA.
The first important fact to be established is the equivalence of the
definitions.
Theorem 1.1.1 The functional, Moore and Penrose definitions of the

generalized inverse are equivalent.
Proof We have already noted that if At satisfies Definition 1, then it

satisfies equations (a) and (b). If a matrix At satisfies (a) and (b) then it
immediately satisfies (iii) and (iv). Furthermore (i) follows from (a) by
observing that AAtA = P R(A) A = A. (ii) will follow from (b) in a similar
manner. Since Definition 1 was constructive and the At it constructs
satisfies (a), (b) and (i)-(iv), the question of existence in Definitions 2 and 3
is already taken care of. There are then two things remaining to be proven.
One is that a solution of equations (i)-(iv) is a solution of (a) and (b). The
second is that a solution of (a) and (b) or (i)-(iv) is unique.
Suppose then that At is a matrix satisfying (i)-(iv). Multiplying (ii) on
the left by A gives (AAt) 2 = ( AAt). This and (iii) show that AAt is an
orthogonal projector. We must show that it has range equal to the range
10 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
of A. Using (i) and the fact that R(BC) c R(B) for matrices B and C, we get
R(A) = R(AAtA) 9 R(AAt) c R(A), so that R(A) = R(AAt).
Thus AAt = PR(A) as desired. The proof that AtA = PR(A') is similar and is
left to the reader as an exercise. One way to show uniqueness is to show
that if At satisfies (a) and (b), or (i)—(iv), then it satisfies Definition 1.
Suppose then that At is a matrix satisfying (i)—(iv), (a), and (b). If xeR(A) 1 ,
then by (a), AAtx = 0. Thus by (ii) Atx = AtAAtx = At0 = 0. If xeR(A),
then there exist yeR(A*) such that Ay = x. But Atx = AtAy = y. The last
equality follows byobserving that taking the adjoint of both sides of
(i) gives PR(At) A = A* so that R(A*) 9 R(At). But y = (AI R(A*) ) - 'x. Thus
At satisfies Definition 1. n
As this proof illustrates, equations (i) and (ii) are, in effect, cancellation
laws. While we cannot say that AB = AC implies B = C, we can say that if
ATAB = AtAC then AB = AC. This type of cancellation will frequently
appear in proofs and the exercises.
For obvious reasons, the generalized inverse is often referred to as the
Moore—Penrose inverse. Note also that if A e C" and A is invertible,
then A -1 = At so that the generalized inverse lives up to its name.
2. Basic properties of the generalized inverse

Before proceeding to establish some of what is true about generalized
inverses, the reader should be warned about certain things that are not
true.
While it is true that R(A*) = R(At), if At is the generalized inverse,
condition (b) in Definition 2 cannot be replaced by AtA = P R(A ,^.
Example 1.2.1 Let A =

[ 1
0 ], X = 1 , 01. Since XA = AX =
C 1 01 = P
O O J
= [1 /2
R(A) _-P R(A')^ X satisfies AX = P R(A) and XA = P R(A•)' But
0] and hence X # At. Note that XA * P R(X) and thus

At
XAX * X. If XA = P R(A*) , AX = P R(A) , and in addition XAX = X, then
X = At. The proof of this last statement is left to the exercises.
In computations involving inverses one frequently uses (AB) - ' _
B - 'A - ' if A and B are invertible. This fails to hold for generalized
inverses even if AB = BA.
Fact 1.2.1 If A E C""` ", B c C" p, then (AB)t is not necessarily the same
as BtAt. Furthermore (At) 2 is not necessarily equal to (AZ)t.
Example 1.2.2 Let A = [

^]. Then At = 2
[
1 0
J . Now
THE MOORE —PENROSE OR GENERALIZED INVERSE 11
A Z = A while Ate = ?At. Thus (At) 2 A 2 = ? AtA which is not a projection.

Thus (At)2 * A')t
( .
Ways of calculating At will be given shortly. The generalized inverses in

Examples 1 and 2 can be found directly from Definition 1 without too
much difficulty.
Examples 2 illustrates another way in which the properties of the
generalized inverse differ from those of the inverse. If A is invertible, then
2 ea(A) if and only if 1 ea(A -1 ). If A = [ ß II as in Example 2, then
6(A) = {1,0} while a(At) = { i , 0 }.
If A is similar to a matrix C, then A and C have the same eigenvalues,
the same Jordan form, and the same characteristic polynomial. None of
these are preserved by taking of the generalized inverse.
1 1 —1
Example 1.2.2 Let A = 2 0 — 2 . Then A = BJB ' where -
—1 1 1
1 0 1 0 1 0
B= 0 1 1 and J= 10 0 0. The characteristic polynomial of A
1 0 oj 0 0 2
0 0 0
and J is 2 2 (2 — 2) with elementary divisors 2 2 and I — 2. Jt = 1 0 0
0 0 1/2
and the characteristic polynomial of Jt is 2 2 (1 — 1/2) with elementary
1 2-1
I
divisors 2 2 ,(2 — 1/2). An easy computation gives At = 1/12 6 0 6 .
[-1-2 1
But At has characteristic polynomial 2(1 — (1 + 13)/12)(A — (1 — 13)/12)
and hence a diagonal Jordan form.
Thus, if A and C are similar, then about the only thing that one can
always say about At and Ct is that they have the same rank.
A type of inverse that behaves better with respect to similarity is
discussed in Chapter VII. Since the generalized inverse does not have all
the properties of the inverse, it becomes important to know what properties
it does have and which identities it does satisfy. There are, of course, an
arbitrarily large number of true statements about generalized inverses. The
next theorem lists some of the more basic properties.
Theorem 1.2.1 Suppose that AEC'""". Then

(P1) (At)t = A
(P2) (At)* _ (A *)t
(P3) If leC, (IA)t = ItAt where At = - if I 0 and At = 0
if2 =0.
(P4) A* = A*AAt = AtAA*
(P5) (A*A)t = AtA *t


(P6) At = (A*A)tA* = A*(AA*)t

(P7) (UAV)t = V*AtU* where U, V are unitary matrices.
Proof We will discuss the properties in the order given. A look at

Definition 2 and a moment's thought show that (P1) is true. We leave
(P2) and (P3) to the exercises. (P4) follows by taking the adjoints of both
A = (AAt)A and A = A(AtA). (P5), since it claims that something is a
generalized inverse, can be checked by using one of the definitions.
Definition 2 is the quickest. AtA*tA*A = At(A*tA*)A = At(AAt)*A =
AtAAtA = AtA = PR(A , ) = PR(A•A). Similarly, A*AAtA*t = A*(AAt)A*t
A*(AAI)*A*t = A*(A*tA*)A*t = (A*A*t)(A*A*t) = A*A*t = P R(A*) =
PR(A*A) = PR(( A'A
)t) — PR((A•A )}) • Thus
(A* A)t = AtA *t by Definition 2.
(P6) follows from (P5). (P7) is left as an exercise. n
A,...... 0 t Ai : ..... 0
Proposition 1.2.1 _
0......'Ä 0......Ä,t"
The proof of Proposition 1 is left as an exercise.
As the proof of Theorem 1 illustrates, it is frequently helpful to know the
ranges and null spaces of expressions involving At. For ease of reference
we now list several of these basic geometric properties. (P8) and (P9) have
been done already. The rest are left as an exercise.
Theorem 1.2.2 If AEC` ' , then

(P8) R(A) = R(AAt) = R(AA*)
(P9) R(At) = R(A*) = R(AtA) = R(A*A)
(P10) R(I — AAt) = N(AAt) = N(A*) = N(At) = R(A
(P11) R(I — AtA) = N(AtA) = N(A) = R(A*) 1
3. Computation of At
In learning any type of mathematics, the working out of examples is useful,
if not essential. The calculation of At from A can be difficult, and will be
discussed more fully later. For the present we will give two methods which
will enable the reader to begin to calculate the generalized inverses of
small matrices. The first method is worth knowing because using it should
help give a feeling of what At is. The method consists of `constructing' At
according to Definition 1.
112
Example 1.3.1 Let A = j 0 2 2 I . Then R(A*) is spanned by
101
101
1 0 1 1
1
2
2 10
2
I I 101
1 1
A subset forming a basis of R(A*) is

1 1
13 0 3 3
1 Now A = 2 and A l= 4. Thus At
101, 10 2 1 1 2 21 1
2 1 2
3 0
and At 4 = 1 1 I . We now must calculate a basis for R(A)' = N(A*).
1 1
1
—1 —1
Solving the system A*x = 0 we get that x = x 3 1/2 + x 4 1/2 where
1 0
0 1
—1 —1
x 3 ,x 4 EC. Then At 1/2 = At 1/2 = OeC 3 . Combining all of this
1 0
0 1
3 3 —1 —1 1 0 0 0
gives At 2 4 1/2 1/2 = 10 1 0 ol or
2 1 1 0 Li 1 0 o]
21 0 1
1 0 0 01 3 3 —1 —1 -1
At = Jo 1 0 01 2 4 1/2 1/2
1 1 0 0 2 1 1 0
21 0 1
The indicated inverse can always be taken since its columns form a basis
for R(A) ® R(A)' and hence are a linearly independent set of four vectors
in C 4 .
Below is a formal statement of the method described in Example 1.
Theorem 1.3.1 Let A E Cm "" have rank r. If {v, , v 2 , ... , V r } is a basis for
R(A*) and {w 1 , w 2 , ..., w "_ r } is a basis for N(A*), then
I I I I I I I I I I I I I
1
At =[v II v2 I I ... I vr 101 I "' I 10][Av1 I Av2 II ... II Avr I I w1 I I w2 II I"'w
I I I I n- r
] -1
I I I I I I I I I I I I I
Proof By using Definition 1.1 it is clear that

I I I I I I I I I I
[
I I I I I I I I I I
I I I I I
_ I . 'I v r 1I 0 1 ... 1 0].
v1'.
1
Furthermore, {Av 1 , Aye, ... , Av r } must be a basis for R(A). Since R(A)-- _
N(A*), it follows that the matrix [Av, ; •.. Av r ; W 1 i ... W n _ r] must be
non-singular. The desired result is now immediate. •
The second method involves a formula which is sometimes useful. It

depends on the following fact:
Proposition 1.3.1 If AeCm"", then there exists BECm"', CEC', such

that A = BC and r = rank (A) = rank (B) = rank (C).
The proof of Proposition 1 is not difficult and is left as an exercise. It
means that the next result can, in theory, always be used to calculate At.
See Chapter 12 for a caution on taking (CC*) ' or (B*B) ' if C or B have
- -
small singular values.
Theorem 1.3.2 If A = BC where AECm"",BECm"', CeC, and

r = rank (A) = rank (B) = rank (C), then At = C *(CC*) - '(B*B) 'B*. -
Proof. Notice that B*B and CC* are rank r matrices in C'"' so that it
makes sense to take their inverses. Let X = C*(CC*) '(B*B) 'B*. We will
- -
show X satisfies Definition 3. This choice is made on the grounds that the
more complicated an expression is, the more difficult it becomes geometri-
cally to work with it, and Definition 3 is algebraic. Now AX =
BCC*(CC*) '(B*B) 'B* = B(B*B) - 'B*, so (AX)* = AX. Also XA =
- -
C*(CC*) '(B*B) 'B*BC = C*(CC*) 'C, so (XA)* = XA. Thus (iii) and
- - -
(iv) hold. To check (i) and (ii) use XA = C*(CC *) - ' C to get that A(XA) _
BC(C *(CC*) 'C) = BC = A. And (XA)X = C*(CC *) - 'CC*(CC *) - '
-
x (B*B) 'B* = C*(CC*) '(B*B) 'B* = X. Thus X = At by Definition 3.1

- - -
Example 1.3.2 Let A = [2 2 2 ]. r = rank (A) = 1. A = BC where

BeC 2 "' and CeC' " 3 . In fact, a little thought shows that A = [][11 2].
1 12
Then B*B = [5], CC* = [6]. Thus At = iii [1,2]1/30 = 1/30 ji 2 .
[2 4
Example 1 is typical as the next result shows.
Theorem 1.3.3 If AECm"" and rank(A) = 1, then At = A* where 1

a= TrA*A=Jja 1j j 2 . a
i.i
The proof is left to the exercises.
The method of computing At described in Example 1.3.1 and the method
of Theorem 1.3.2 may both be executed by reducing A by elementary row
operations.
Definition 1.3.1 A matrix EeC m "" which has rank r is said to be in row
echelon form if E is of the forrn
C O(m — r) X n
(1)
where the elements c ij of C ( = C r X ") satisfy the following conditions,

(i) c. j = 0 wheni>j.
(ii) The first non-zero entry in each row of C is 1.
(iii) If c uu = 1 is the first non-zero entry of the ith row, then the jth column
of C is the unit vector e ; whose only non-zero entry is in the ith position.
For example, the matrix
1 2 0 3 5 0 1
0 0 1 —2 4 0 0
E= 0 0 0 0 0 1 3 (2)
0 0 0 0 0 0 0
0 0 0 0 0 0 0
is in row echelon form. Below we state some facts about the row echelon
form, the proofs of which may be found in [65].
For AEC'°"" such that rank(A) = r:
(E1) A can always be row reduced to row echelon form by elementary
row operations (i.e. there always exists a non-singular matrix PEC'
such that PA = E A where E A is in row echelon form).
(E2) For a given A, the row echelon form E A obtained by row reducing
A is unique,
(E3) If E is the row echelon form for A and the unit vectors in E A
appear in columns i l , i 2 , ... , and i r , then the corresponding columns of A
are a basis for R(A). This particular basis is called the set of distinguished
columns of A. The remaining columns are called the undistinguished columns
of A. (For example, if A is a matrix such that its row echelon form is given
by (2) then the first, third, and sixth columns of A are the distinguished
columns.
(E4) If E A is the row echelon form (1) for A, then N(A) = N(E A ) = N(C).
(E5) If (1) is the row echelon form for A, and if BECtm r is the matrix
made up of the distinguished columns of A (in the same order as they are
in A), then A = BC where C is obtained from the row echelon form. This
is a full rank factorization such as was described in Proposition 1.
Very closely related to the row echelon form is the hermite echelon
form. However, the hermite echelon form is defined only for square
matrices.
Definition 1.3.2 A matrix HEC" is said to be in hermite echelon form

if its elements h ^1 satisfies the following conditions.
(i) H is upper triangular (i.e. h 1j = 0 when i > j).
(ii) h ij is either 0 or 1.
(iii) If h 11 = 0, then h lk = Ofor every k, 1 <_ k <— n.
(iv) If h 1 = 1, then h kl = Ofor every k i.
For example, the matrix

1 2 0 3 5 0 1
0 0 0 0 0 0 0
0 0 1 —2 4 0 0
H= 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 1 3
0 0 0 0 0 0 0
is in hermite form. Below are some facts about the hermite form, the proofs
of which may be found in [65].
For AEC'"":
(H1) A can always be row reduced to a hermite form. If A is reduced to
its row echelon form, then a permutation of the rows can always be
performed to obtain a hermite form.
(H2) For a given matrix A the hermite form H A obtained by row
reducing A is unique.
(H3) H = H A (i.e. H A is a projection).
(H4) N(A) = N(H A ) = R(I — H A ) and a basis for N(A) is the set of
non-zero columns of I — H A .
We can now present the methods of Theorems 1 and 2 as algorithms.
Algorithm 1.3.1 To obtain the generalized inverse of a square matrix

AEC.
(I) Row reduce A* to its hermite form H A ,.
(II) Select the distinguished columns of A*. Label these columns
V 1 , V 2 , ... , y r and place them as columns in a matrix L.
(III) Form the matrix AL.
(IV) Form I — H A , and select the non-zero columns from this matrix.
Label these columns w1, w2 , ...
(V) Place the columns of AL and the w 1 's as columns in a matrix
M = [AL; w l w 2 ; • • . W_ r] and compute M 1 . (Actually only the first r
rows of M - ' are needed.)
(VI) Place the first r rows of M -1 (in the same order as they appear in
M -1 ) in a matrix called R.
(VII) Compute At as At = LR.
Although Algorithm 1 is stated for square matrices, it is easy to use it for
non-square matrices. Add zero rows or zero columns to construct a square
[
matrix and use the fact that [A :0]t = ^* ] and __]t = [Ali0*] .
[
Algorithm 1.3.2 To obtain the full rank factorization and the generalized
inverse for any AEC'" " ".
THE MOORE -PENROSE OR GENERALIZED INVERSE 17
(I) Reduce A to row echelon form E A .

(II) Select the distinguished columns of A and place them as the columns
in a matrix B in the same order as they appear in A.
(III) Select the non-zero rows from E A and place them as rows in a
matrix C in the same order as they appear in E A .
(IV) Compute (CC*) -1 and (B*B) -1 .
(V) Compute At as At = C *(CC*) - 1 (B*B) -1 B *.
Example 1.3.3 We will use Algorithm 1 to find At where

1 2 1 4
_ 2 4 0 6
A 0 0 1 1•
0 0 0 1
(I) Using elementary row operations on A* we get that its hermite
echelon form is
10 10
_ 0 1 —Z 0
H A* = 0 0 00 '
00 01
(II) The first, second and fourth columns of A* are distinguished. Thus
1 2 0
L= 2 4 0
1 0 0
4 6 1
(III) Then
22 34 4
34 56 6
AL —
5 6 1
4 6 1
(IV) The non-zero column of I — H A , is w' = [ — 1,1/2,1,0]*.
(V) Putting AL and w l into a matrix M and then computing M -1 gives
22 34 4 —1 40 — 20 50 — 90
34 56 6 1/2 1 — 1 — 19 14 —26 18
11,M
M _ 5 6 1 90 —46 — 4 —44 342
4 6 1 0 [-40 20 40 0
(VI) The first three rows of M ' give R as -
40 — 20
—R=9 19 14
[-46 — 4
0 50
—26
— 90
18 .
—44 342


(VII) Thus
1 4 —1 —27'
AT =LR
=45 20 —10 25 —45
0 0 0 45
Example 1.3.4 We will now use Algorithm 2 to find At where

1 2 1 4 1
A
= 24 0 6 6
1 2 0 3 3
2 4 0 66
(I) Using elementary row operations we reduce A to its row echelon
form
1 2 0 3 3
_ 0 0 1 1 —2
E ^ — 0 0 0 0 0'
0 0 0 0 0
(II) The first and third columns are distinguished. Thus
11
20
B=[
1 0 '
20
(III) The matrix C is made up of the non-zero rows of E A so that
_ 1 2 0 3 3
C— [0 0 1 1 —2]'
(IV) Now CC* _ [23

3 and B*B = [ 1 0 1 ]. Calculating
1 6
(CC*) and (B*B) - ' we get (CC*) -1 = 3 and (B*B) -1 =
129 323
[— 1 10]
(V) Substituting the results of steps (II), (III) and (IV) into the formula
for At gives
27 6 3 6
54 12 6 12
AT = C*(CC*) -1 (B*B) -1 B* = 207 — 40 — 20 — 40
1161
288 —22 — 11 —22
—333 98 49 98
THE MOORE -PENROSE OR GENERALIZED INVERSE 19
Theorem 2 is a good illustration of one difficulty with learning from a

text in this area. Often the hard part is to come up with the right formula.
To verify it is easy. This is not an uncommon phenomenon. In differential
equations it is frequently easy to verify that a given function is a solution.
The hard part is to show one exists and then to find it. In the study of
generalized inverses, existence is usually taken care of early. There remains
then the problem of finding the right formula. For these reasons, we urge
the reader to try and derive his own theorems as we go. For example,
can you come up with an alternative formula to that of Theorem 2? The
resulting formula should, of course, only involve At on one side. Ideally,
it would not even involve Bt and C. Then ask yourself, can I do better by
imposing special conditions on B and C or A? Under what conditions does
the formula simplify? The reader who approaches each problem, theorem
and exercise in this manner will not only learn the material better, but will
be a better mathematician for it.
4. Generalized inverse of a product

As pointed out in Section 2, one of the major shortcomings of the
Moore—Penrose inverse is that the `reverse order law' does not always
hold, that is, (AB)t is not always BtAt. This immediately suggests two
questions. What is (AB)t? When does (AB)t = BtAt?
The question, `What is (AB)t?' has a lot of useless, or non-answer,
answers. For example, (AB)t = (AB)tAB(AB)t is a non-answer. It merely
restates condition (ii) of the Penrose definition of (AB)t. The decision as to
whether or not an answer is an answer is subjective and comes with
experience. Even then, professional mathematicians may differ on how
good an answer is depending on how they happen to view the problem
and mathematics.
The authors feel that a really good answer to the question, `What is
(AB)t?' does not, and probably will not exist. However, an answer should:
(A) have some sort of intuitive justification if possible;
(B) suggest at least a partial answer to the other question, `When does
(AB) t = B'A'?'
Theorem 4.1 is, to our knowledge, the best answer available.
We shall now attack the problem of determining a formula for (AB)t.
The first problem is to come up with a theorem to prove. One way to come
up with a conjecture would be to perform algebraic manipulations on
(AB)t using the Penrose conditions. Another, and the one we now follow,
is to draw a picture and make an educated guess. If that guess does not
work, then make another.
Figure 1.1 is, in a sense, not very realistic. However, the authors find it a
convenient way to visualize the actions of linear transformations. The
vertical lines stand for C°, C", and Cm. A sub-interval is a subspace. The
rest of the interval is a (possibly orthogonal) complementary subspace. A
d"
C
b"
cP C'7
- =B =A
=ABR (B)
? (A')B
Fig. 1.1
shaded band represents the one to one mapping of one subspace onto
another. It is assumed that AEC°"`" and BEC""p. In the figure: (a, c) =
R(B*), (a', c') = R(B), (b', d') = R(A*), and (b", d") = R(A). The total shaded
band from CP to C" represents the action of B (or B). The part that is
shaded more darkly represents P R(A , ) B = AAB. The total shaded area
from C" to Cm is A. The darker portion is AP R(R) = ABBt. The one to one
portion of the mapping AB from C° to C'" may be viewed as the v-shaped
dark band from C" to C'". Thus to `undo' AB, one would trace the dark
band backwards. That is, (AB)t = (P R(A ,^B)t(AP R(B ^)t.
There are only two things wrong with this conjecture. For one,

AB = A(AtA)(BBt)B (1)
and not AB = A(BBt)(AtA)B. Secondly, the drawing lies a bit in that the
interval (b', c') is actually standing for two slightly skewed subspaces and
not just one. However, the factorization (1) does not help and Fig. 1.1 does
seem to portray what is happening. We are led then to try and prove the
following theorem due to Cline and Greville.
Theorem 1.4.1 If A EC n'" and BEC""P, then (AB)t = (P R(A' ) B)t(AP R(B) )t
Proof Assuming that one has stumbled across this formula and is
wondering if it is correct, the most reasonable thing to do is see if the
formula for (AB)t satisfies any of the definitions of the generalized inverse.
Let X = (P R( A , ) B)t(AP R(B) )t = (AtAB)t(ABBt)t. We will proceed as follows.
We will assume that X satisfies condition (i) of the Penrose definition.
We will then try and manipulate condition (i) to get a formula that can be
verified independently of condition (i). If our manipulations are reversible,
we will have shown condition (i) holds. Suppose then that ABXAB = AB,
or equivalently,
AB(AtAB)t(ABBt)tAB = AB. (2)
Multiply (2) on the left by At and on the right by Bt. Then (2) becomes
AtAB(ATAB)t(ABBt)tABBt = AtABBt, (3)
or, PR(AfAB) PR(BBtA• ) = P R(A•) PR(B) . Equivalently,

P A t A R(B) P BBtR(A*) = P R(A ^, ) P R(B) . (4)
To see that (4) is true, we will show that if
E1 PAtAR(B)PBB,R(As),E2 = PR(A*)PR(B)1 (5)
then E l u = E 2 u for all ueE". Suppose first that ueR(B)l. Then E l u =
E 2 u = 0. Suppose then that UER(B). Now to find E l u we will need to
calculate P BB , R(A , ) u. But BBtR(A*) is a subspace of R(B). Let u = u l ®u 2
where u l EBBtR(A*) and u 2 e [BBTR(A*)] 1 n R(B). Then

E l u = P A , A R(B) P BB} R(A , ) U (6)
t
= P AtA R(B) U 1 = A Au l . ( 7)
Equality (7) follows since u l c-R(B) and the projection of R(B) onto AtAR(B)
is accomplished by A?A. Now E 2 u = PR(A•)PR(B)u = P R(A * ) U = AtAu, so
(4) will now follow provided that AtAu = AtAu I , that is, if u 2 EN(A) _
R(A*Y. Suppose then that veR(A*). By definition, u 2 J BBtv. Thus
0 = (BBt v , u 2 ) _ (v, BBtu 2 ) = (v, u 2 ). Hence u 2 e R(A*)1 as desired. (4) is now
established. But (4) is equivalent to (3). Multiply (3) on the left by A and
the right by B. This gives (2). Because of the particular nature of X it turns
out to be easier to use ABXAB = AB to show that X satisfies the Moore
definition of (AB)t. Multiply on the right by (AB)t. Then we have
ABXPR(AB) = P R(AB) • But N(X) N((ABBt)t) = R(ABB'}L = R(AB)l.
Thus XP R(AB) = PR(AB) and hence
ABX = P R(AB) . (8)
Now multiply ABXAB = AB on the left by (AB)t to get P R(B , A* ) XAB =

PR((A6)t). But R(X) R((AtAB)t) = R((AtAB)*) = R(B*A*A *t) = R(B*A*)
and hence P R(B , A , ) X = X. Thus
(9)
XAB = PR((AB)Y)•
Equations (8) and (9) show that Theorem 1 holds by the Moore definition.
n
Comment It is possible that some readers might have some misgivings
about equality (7). An easy way to see that P A , A R(B) U 1 = AtAu l is as
follows. Suppose that u l ER(B). Then AtAu l EAtAR(B). But
A(u l — AtAu l ) = Au l — Au, = 0. Thus u l — AtAu l ER(A *y c (AtAR(B))J .
Hence u l = AtAu l ® (u 1 — AtAu l ) where AtAu l EAtAR(B) and
(u l — AtAu 1 )e(AtAR(B)y. Thus P R(AtAB) U l = AtAu l .
The formula for (AB)t simplifies if either P R(A , ) = I or P R(B) = I.
Corollary 1.4.1 Suppose that AEC"`"" and BeC 1'
(i) If rank (A) = n, then (AB)t = Bt(AP R(B )t

(ii) If rank (B) = n, then (AB)t = (PR(A• )B)tAt
Part (i) (or ii) of Corollary I is, of course, valid if A (or B) is invertible.
Corollary 1.4.2 Suppose that AEC"' " ", Be C" x P, and rank (A) = rank (B)
= n. Then (AB) = BtAt and At
, = A*(AA*) - ' while B 1 = B*B) -1 B*.
(
The formulas for At and Bt in Corollary 1 come from Theorem 3.2

and the factoring of A = Al" and B = I"B. In fact Corollary 1 can be
derived from Theorem 3.2 also.
The assumptions of Corollary 2 are much more stringent than are
necessary to guarantee that (AB)t = BtAt.
0 101 a 0 0
Example 1.4.1 Let A = 10 0 ii and B= 10 b 01 where a, b, ceC.
0 00] 00c
0 0 0 at 0 0
It is easy to see that At = 1 0 0 and Bt= 0 bt 0 .Now
0 10] 0 0 ct
0 b °1 00
AB = 0 0 c and (AB)t rb t 0 . On the other hand
0 0 0 0 cr 0
0 0 0 0 a 0
BlAt =I[bt 0 0 so that (AB) 1 = BtAt. Notice that BA = 10 0 b I.
0 Ct 0] 0 0 0]
By varying the values of a, b and c one can see that (AB)t = BtAt is
possible without
AB = BA (a # b)
(i)
(ii) rank (A.) = rank (B) (a = b = 0, c = 1)
(iii) rank (AB) = rank (BA) (a = b # 0, c = 0).
The list can be continued, but the point has been made. The question
remains, when is (AB)t = BtA 1 ? Consider Example I again. What was
it about that A and B that made (AB)t = BtAt? The only thing that A
and B seem to have in common are some invariant subspaces. The
subspaces R(A), R(A*), N(A), and N(A*) are all invariant for both A and B.
A statement about invariant subspaces is also a statement about projectors.
A possible method of attack has suggested itself. We will assume that
(AB)t = BtAt. From this we will try to derive statements about projectors.
In Example 1, A *A and B were simultaneously diagonalizable, so we
should see if A*A works in. Finally, we should check to see if our conditions
are necessary as well as sufficient.
Assume then that AEC'" ", BeC""', and
(AB)t = BTAt. (10)
Theorem 1 gives another formula for (AB)t. Substitute that into (10) to get
(AtAB)t(ABBt)t = BtAt. To change this into a projector equation,
THE MOORE—PENROSE OR GENERALIZED INVERSE 23
multiply on the left by (AtAB) and on the right by (ABBt), to give

z (11)
PR(A t AB) PR( BBfA*) = (P R(A*) PR(B) )
By equation (4), (11) can be rewritten as PR(A*)PR(B) = (PR(A•)PR(B))z and
hence PR(A•)PR(B) is a projector. But the product of two hermitian projectors
is a projector if and only if the two hermitian projectors commute. Thus
(recall that [X, Y] = XY — YX)
[P R(A , ) ,P R(B) ] = 0, or equivalently, [AtA,BBt] = 0, if (AB)t = BtAt.
(12)
Is (12) enough to guarantee (10)? We continue to see if we can get any
additional conditions. Example 1 suggested that an A*A term might be
useful. If (10) holds, then ABBtAt is hermitian. But then A*(ABBtAt)A is
hermitian. Thus
A*(ABBt)AtA = AtABBtA*A. (13)
Using (12) and the fact that AtAA* = A*, (13) becomes A*ABBt =
BBtA *A or
[A*A, BBt] = 0. (14)
Condition (14) is a stronger condition than (12) since it involves A*A and
not just PR(A*A).
In Theorem 1 there was a certain symmetry in the formula. It seems
unreasonable that in conditions for (10) that either A or B should be more
important. We return to equation (10) to try and derive a formula like
(14) but with the roles of A and B `reversed'. Now BtAtAB is hermitian
since we are assuming (10) holds. Thus BB`AtABB* is hermitian. Proceed-
ing as before, we get that
[BB*, At A] = 0. (15)
Continued manipulation fails to produce any conditions not implied by
(14) and (15). We are led then to attempt to prove the following theorem.
Theorem 1.4.2 Suppose that AEC'm >< ' and BEC ""p. Then the following
statements are equivalent:
(i) (AB) t = B'A'.
(ii) BB*AtA and A*ABBt are hermitian.
(iii) R(A*) is an invariant subspace for BB* and R(B) is an invariant
subspace of A*A.
(iv) PN(A)BB*PN(A) = 0 and P N(B*) A*APN (ß , ) 0.
(v) ATABB*A* = BB*A* and BBtA*AB = A*AB.
Proof Statement (iii) is equivalent to equations (14) and (15) so that
(i) implies (ii). We will first show that (ii) —(v) are all equivalent. Since
BB* and A*A are hermitian, all of their invariant subspaces are reducing.
Thus (ii) and (iii) are equivalent. Now observe that if C is a matrix and M a


subspace, then CP L = ( I — P M )CP M + P M CP M PCPM + P M CPM. This

says that M 1 is an invariant subspace if and only if P M CPM = 0. Thus
(iii) and (iv) are equivalent. Since (v) is written algebraically, we will show
it is equivalent to (ii). Assume then that BB*AtA is hermitian.
Then BB*AtA = AtABB *. Thus BB*(AtA)A* = AtABB*A*, or
BB*A* = AtABB*A*. (16)
Similarly if A *ABBt is hermitian, then

BBtA*AB = A*AB (17)
so that (ii) implies (v). Assume now that (16) and (17) hold. Multiply (17) on
the right by Bt and (16) on the right by A*t. The new equations are
precisely statement (iii) which is equivalent to (ii). Thus (ii)—(v) are all
equivalent. Suppose now that (ii)—(v) hold. We want to prove (i). Observe
that BtAt = Bt(BBt)(AtA)At = Bt(AtA)(BBt)At, while by Theorem 1
(AB)t = (AtAB)t(ABBt)t. Theorem 2 will be proven if we can show that
(AtAB)t = Bt(AtA) and (18)
(ABBt)t = BBtAt. (19)
To see if (18) holds, check the Penrose equations. Let X = Bt(AtA). Then
AtABX = AtABBtAtA = AtAAtABBt = AtABBt = PR( A*ABBt) = PR(A'AB )'
Thus Penrose conditions (i) and (iii) are satisfied. Now X(AtAB) =
Bt(AtA)(AtA)B = Bt(AtA)B. Thus X(AtAB)X = Bt(AtA)BBt(AtA) _
BtAtA = X and Penrose condition (ii) is satisfied. There remains then only
to show that X(AtAB) is hermitian. But AtABB* is hermitian by assump-
tion (ii). Thus Bt(AtABB*)Bt* is hermitian and Bt(AtA)BB*B*t =
Bt(AtA)B = X(AtAB) as desired. The proof of (19) is similar and left to
the exercises. •
It is worth noticing that conditions (ii)—(v) of Theorem 1.4.2 make
Fig. 1.1 correct. The interval (b', c') would stand for one subspace
R(AtABBt) rather than two skewed subspaces, R(AtABBI) and R(BBtAtA).
A reader might think that perhaps there is a weaker appearing set of
conditions than (ii)—(v) that would imply (ABt) = BtAt. The next Example
shows that even with relatively simple matrices the full statement of the
conditions is needed.
0 0 0 1 0 01
Example 14.2 Let A = 0 1 1 and B = 0 1 0 . Then B = B* _
0 1 0 0 0 0
0 0 0 01
1 1 1 =10 0 1 . Now BB*AtA is Bt = B 2 and At =
roo [1 0 roo 1 —1
r
hermitian so that [BB*, AtA] = 0 and [BBt, AtA] = 0. However,
0 0
A*ABBE = 2 0 which is not hermitian so that (AB)t BtAt.
oo1 0


An easy to verify condition that implies (AB)t = BtAt is the following.
Corollary 1.4.3 If A*ABB* = BB*A*A, then (AB)t = BtAt .
The proof of Corollary 2 is left to the exercises.

Corollary 2 has an advantage over conditions (ii)—(iv) of Theorem 2 in
that one does not have to calculate At, Bt, or any projectors to verify it.
It has the disadvantage that it is only sufficient and not necessary. Notice
that the A, B in Example 1 satisfy [A*A, BB*] = 0 while those in Example 2
do not.
There is another approach to the problem of determining when (AB)t =
BtAt. It is to try and define a different kind of inverse of A, call it A , so
that (AB) = B A . This approach will not be discussed.
5. Exercises
1. Each of the following is an alternative set of equations whose unique
solution X is the generalized inverse of A. For each definition show
that it is equivalent to one of the three given in the text.
(a) AX = P R(A) , N(X*) = N(A).
(b) AX = P R(A) , XA = P RE A*) , XAX = X.
(c) XAA* = A*, XX*A* = X.
x if xER(A*)
(d) XAx =
0 if xEN(A*).
(e) XA = P RE A*) , N(X) = N(A*
Comment: Many of these have appeared as definitions or theorems in the
literature. Notice the connection between Example 2.1, Exercise 1(b), and
Exercise 3 below.
2. Derive a set of conditions equivalent to those given in Definition 1.2 or
Definition 1.3. Show they are equivalent. Can you derive others?
3. Suppose that AEC'""". Prove that a matrix X satisfies AX = P R(A ^,
XA = P R(A*) if and only if X satisfies (i), (iii), and (iv) of Definition 1.3.
Such an X is called a (1, 3,4) -inverse of A and will be discussed later.
Observe that it cannot be unique if it is not At since trivially At is
also a (1, 3,4) -inverse.
1 2 1
4. Calculate At from Theorem 3.1 when A = ' and when
[
121]
1 3
A= 1 1
1 1
Hint: For the second matrix see Example 6.
5. Show that if rank (A) = 1, then At = -A* where k =
i= 1 j= 1
6. If AEC"'"" and rank (A) = n, notice that in Theorem 3.2, C may be
chosen as an especially simple invertible matrix. Derive the formula


for At under the assumption that rank A = n. Do the same for the case
when rank (A) = m.
1 0 12
I.
1-1
7. Let A =I 0 0
0 -1 0
0 1 Calculate At from Definition 1.1.
8. Verify that (At)* = (A*)t.

9. Verify that (.1A)t = ).tAt, A E C, where At = 1 if .. 0 and Ot = 0.
10. If AEC""` ", and U, V are unitary matrices, verify that (UAV)t =
V*AtU*.
11. Derive an explicit formula for At that involves no (t)'s by using the
singular value decomposition.
*12. Verify that At =f o
e- A*dt.
* 13. Verify the At = 2in

Ic Z(zI - A*A) - l A *dz where C is a closed contour
containing the non-zero eigenvalues of A*A, but not containing the
zero eigenvalue of A*A in or on it.
14. Prove Proposition 1.1.
Exercises 15-21 are all drawn from the literature. They were originally
done by Schwerdtfeger, Baskett and Katz, Greville, and Erdelyi. Some
follow almost immediately from Theorem 4.1. Others require more work.
15. Prove that if AECtm < 1 and BeC""m such that rank A = rank B and if
'
the eigenvectors corresponding to non-zero eigenvalues of the two

matrices A*A and BB* span the same space, then (AB)t = BtAt.
X
16. Prove that if AEC' , BEC""" and AA t = AtA, BBt = BtB,(AB)tAB =
AB(AB)t, and rank A = rank B = rank(AB), then (AB)t = BtAt.
17. Assume that A E Ctm ", B E C""p. Show that the following statements
X
are equivalent.
(i) (AB)t = BtAt .
(ii) AtABB*A*ABBt = BB*A*A.
(iii) AtAB = B(AB)tAB and BBtA* = A*AB(AB)t.
(iv) (AtABBt)* = (AtABBt)t and the two matrices ABBtAt and
BtAtAB are both hermitian.
18. Show that if [A, PR(B)] = 0, [At, P RA B) ] = 0, [B, PR(A)] = 0, and
[Bt, P R(A , ) ] = 0, then (AB)t = BtAt.
19. Prove that if A* = At and B* = Bt and if any of the conditions of
Exercise 18 hold, then (AB)t = BtAt = B*A*.
20. Prove that if A* = At and the third and fourth conditions of Exercise
18 hold, then (AB)t = BtAt = BtA*.
21. Prove that if B* = Bt and the first and second conditions of Exercise 18
hold, then (AB)t = BtAt.
22. Prove that if [A*A, BB*] = 0, then (AB)t = BtAt.
23. Verify that the product of two hermitian matrices A and B is hermitian
if and only if [A, B] = 0.
24. Suppose that P. Q are hermitian projectors. Show that PQ is a
projector if and only if [P, Q] = 0.
25. Assuming (ii) —(v) of Theorem 8, show that (AP R B))t = PR (B) At =
PR(B) At without directly using the fact that (AB) ^ = BtA?.
26. Write an expression for (PAQ)t when P and Q are non-singular.
27. Derive necessary and sufficient conditions on P and A, P non-singular,
for (P - 'AP)t to equal P -1 AtP.
28. Prove that if A ' ' and the entries of A are rational numbers, then
the entries of At are rational.

The Moore-Penrose or Generalized Inverse

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Moore-Penrose or Generalized Inverse

Uploaded by

Copyright:

Available Formats

Downloaded 04/19/16 to 132.239.1.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.

AEC"'"", define the linear transformation At :Cm -+ C" by Atx = 0 if

Definition 1.1.2 Moore definition of the generalized inverse If

matrix At such that

Definition 1. 1.3 Penrose definition of the generalized inverse If

Theorem 1.1.1 The functional, Moore and Penrose definitions of the

Proof We have already noted that if At satisfies Definition 1, then it

2. Basic properties of the generalized inverse

Example 1.2.1 Let A =

0] and hence X # At. Note that XA * P R(X) and thus

Example 1.2.2 Let A = [

A Z = A while Ate = ?At. Thus (At) 2 A 2 = ? AtA which is not a projection.

Ways of calculating At will be given shortly. The generalized inverses in

Theorem 1.2.1 Suppose that AEC'""". Then

12 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

(P6) At = (A*A)tA* = A*(AA*)t

Proof We will discuss the properties in the order given. A look at

Theorem 1.2.2 If AEC` ' , then

THE MOORE-PENROSE OR GENERALIZED INVERSE 13

Proof By using Definition 1.1 it is clear that

The second method involves a formula which is sometimes useful. It

Proposition 1.3.1 If AeCm"", then there exists BECm"', CEC', such

small singular values.

Theorem 1.3.2 If A = BC where AECm"",BECm"', CeC, and

x (B*B) 'B* = C*(CC*) '(B*B) 'B* = X. Thus X = At by Definition 3.1

Example 1.3.2 Let A = [2 2 2 ]. r = rank (A) = 1. A = BC where

Theorem 1.3.3 If AECm"" and rank(A) = 1, then At = A* where 1

where the elements c ij of C ( = C r X ") satisfy the following conditions,

For example, the matrix

Definition 1.3.2 A matrix HEC" is said to be in hermite echelon form

For example, the matrix

Algorithm 1.3.1 To obtain the generalized inverse of a square matrix

(I) Reduce A to row echelon form E A .

Example 1.3.3 We will use Algorithm 1 to find At where

18 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Example 1.3.4 We will now use Algorithm 2 to find At where

(IV) Now CC* _ [23

Theorem 2 is a good illustration of one difficulty with learning from a

4. Generalized inverse of a product

or, PR(AfAB) PR(BBtA• ) = P R(A•) PR(B) . Equivalently,

Now multiply ABXAB = AB on the left by (AB)t to get P R(B , A* ) XAB =

Corollary 1.4.1 Suppose that AEC"`"" and BeC 1'

(i) If rank (A) = n, then (AB)t = Bt(AP R(B )t

The formulas for At and Bt in Corollary 1 come from Theorem 3.2

multiply on the left by (AtAB) and on the right by (ABBt), to give

24 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

subspace, then CP L = ( I — P M )CP M + P M CP M PCPM + P M CPM. This

THE MOORE-PENROSE OR GENERALIZED INVERSE 25

An easy to verify condition that implies (AB)t = BtAt is the following.

Corollary 1.4.3 If A*ABB* = BB*A*A, then (AB)t = BtAt .

The proof of Corollary 2 is left to the exercises.

26 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

8. Verify that (At)* = (A*)t.

* 13. Verify the At = 2in

the eigenvectors corresponding to non-zero eigenvalues of the two

You might also like

(P6) At = (AA)tA = A(AA)t

x (BB) 'B = C(CC) '(BB) 'B = X. Thus X = At by Definition 3.1

Corollary 1.4.3 If AABB = BBAA, then (AB)t = BtAt .