Professional Documents
Culture Documents
If we set
f (x) = 12 kAx bk22 ,
then the first-order necessary conditions for a point x to solve LLS is that f (x) = 0. In order to
use this fact, we need to compute anPexpression for the gradient of f . P
For i = 1, 2, . . . , m, set i (x) = ( nj=1 Aij xj bi )2 , then f (x) = 12 m i=1 i (x) Observe that for
i {1, 2, . . . , m} and a given j0 {1, 2, . . . , n}
n n
X X
i (x) = ( Aij xj bi )2 = 2Aij0 ( Aij xj bi ).
xj0 xj0 j=1 j=1
Consequently,
m n
X X
f (x) = Aij0 ( Aij xj bi ) = ATj0 (Ax b),
xj0 i=1 j=1
and so
x1
f (x) AT1 (Ax b)
f (x) AT2 (Ax b)
x2 = AT (Ax b) .
f (x) = =
.. ..
. .
xn
f (x) ATn (Ax b)
Therefore, if x IRn solves LLS, then it must be the case that 0 = f (x) = AT (Ax b) or
equivalently
AT Ax = AT b .
This system of equations are called the normal equations for the linear least-squares problem LLS.
We must now address the question of whether the exists a solution to the normal equations. For
this we make use of the following lemma.
Lemma 1. For every matrix A IRmn we have
Null (AT A) = Null (A) and Ran (AT A) = Ran (AT ) .
Proof. Note that if x Null (A), then Ax = 0 and so AT Ax = 0, that is, x Null (AT A). Therefore,
Null (A) Null (AT A). Conversely, if x Null (AT A), then
AT Ax = 0 = xT AT Ax = 0 = (Ax)T (Ax) = 0 = kAxk22 = 0 = Ax = 0,
or equivalently, x Null (A). Therefore, Null (AT A) Null (A), and so Null (AT A) = Null (A).
1
2
Since Null (AT A) = Null (A), the Fundamental Theorem of the Alternative tells us that
Ran (AT A) = Ran ((AT A)T ) = Null (AT A) = Null (A) = Ran (AT ),
which proves the lemma.
Let us now examine the existence of solutions to the the normal equations in light of this lemma.
The normal equations are AT Ax = AT b. By definition, AT b Ran (AT ). The lemma tells us that
Ran (AT ) = Ran (AT A). Hence, the must exist and x such that AT Ax = AT b, that is, the normal
equations are always consistent regardless of the choice of matrix A IRmn and vector b IRm .
Theorem 2. The normal equations are consistent for all A IRmn and b IRm .
This tells us that the linear least-squares problem LLS always has a critical point. But it does
not tells us when these critical points for LLS are global solutions to LLS. In this regard, we have
the following surprising result.
Theorem 3. Let A IRmn and b IRm . Then every solution x to AT Ax = AT b satisfies
(1) kAx bk2 kAx bk2 x IRn ,
that is x is a global solution to LLS.
Proof. Given u, v IRm , we have
(2) ku + vk22 = (u + v)T (u + v) = uT u + 2uT v + v T v = kuk22 + 2uT v + kvk22 .
Consequently, for every x IRn ,
kAx bk22 = k(Ax Ax) + (Ax b)k22
= kA(x x)k22 + 2(A(x x))T (Ax b) + kAx bk22 (by (2))
T T
2(x x) A (Ax b) + kAx bk22 (since kA(x x)k22 0)
= kAx bk22 (since AT (Ax b) = 0),
or equivalently, (1) holds.
So far we know that the normal equations are consistent and that every solution to the normal
equations solves the linear least-squares problem. That is, a solution to the linear least-squares
problem always exists. We now address the question of when the solution is unique. This is equiv-
alent to asking when the normal equations have a unique solution. From our study nonsingular
matrices, we know this occurs precisely when the matrix AT A is nonsingular or equivalently, inver-
itable, in which case the unique solution is given by x = (AT A)1 AT b. Note that AT A is invertible
if and only if Null (AT A) = {0}. But, by Lemma 1, this is equivalent to Null (A) = {0}.
Theorem 4. The linear least-squares problem LLS has a unique solution if and only if Null (A) =
{0}.
so that P is an orthogonal projection. Moreover, since we know that Ran (QQT ) = Ran (Q) = S,
P is necessarily the orthogonal projector onto S.
Let us now return to the problem (5). We show that z solves this problem if and only if z = PS b
where PS is the orthogonal projection onto S. To see this, let P := PS and z S so that z = P z.
Then
kz bk22 = kP z P b (I P )bk22
= kP (z b) + (I P )bk22
= kP (z b)k22 + 2(z b)T P (I P )b + k(I P )bk22
= kP (z b)k22 + k(I P )bk22
k(P I)bk22
= kz bk22 ,
which shows that kz bk2 kz bk2 for all z S.
Let us now consider the linear least-squares problem LLS when m >> n and Null (A) = {0}. In
this case, we have shown that x = (AT A)1 AT b solves LLS, and z = PS b solves (5) where PS is the
orthogonal projector onto S = Ran (A). Hence, by (4),
PS b = z = Ax = A(AT A)1 AT b.
Since this is true for all possible choices of the vector b, we have
(6) PS = A(AT A)1 AT !
That is, the matrix A(AT A)1 AT is the orthogonal projector onto the range of A. One can also
check this directly by showing that the matrix M = A(AT A)1 AT satisfies M 2 = M , M T = M ,
and Ran (M ) = Ran (A).
where S is the null-space of A. This problem is of the form (3). Consequently, the solution is given
by z = PS x0 where PS is now the orthogonal projection onto S := Null (A).
In this context, note that (I PS ) is the orthogonal projector onto Null (A) = Ran (AT ).Recall
that the formula (6) shows that if M IRks is such that Null (M ) = {0}, then the orthogonal
projector onto R := Ran (M ) is given by
(9) PR = M (M T M )1 M T .
In our case, M = AT and M T M = AAT . Our working assumption (7) implies that
Null (M ) = Null (AT ) = Ran (A) = (IRm ) = {0}
and consequently, by (9), the orthogonal projector onto R = Ran (AT ) = S is given by
PS = AT (AAT )1 A .
5