Singular Value Decomposition

Denitions and Notations Singular Value Decomposition Theorem Applications Summary
Singular Value Decomposition

Why and How
Royi Avital
1
1
Signal Processing Algorithms Department
Electromagnetic Area
Missile Division
Rafael
June, 2011
1
Denitions and Notations
Notations
Denitions
Introduction
2
Singular Value Decomposition Theorem
SVD Theorem
Proof of the SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Solving Linear Equation System
Total Least Squares
Principal Component Analysis
Notations
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Notations
Capital letter stands for a Matrix
A C
mxn
, A R
mxn
Small letter stands for a column Vector
a C
mx1
, a R
mx1
Referring a Row of a Matrix
A
i
The i th Row of a Matrix
Referring a Column of a Matrix
A
j
The j th Column of a Matrix
Denitions
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Denitions
Unless written otherwise, The Complex Field is the default
Conjugate Operator
A
Transpose Operator
A
T
: A
T
ij
= A
ji
Complex Conjugate Transpose Operator
A
H
: A
H
ij
= A
ji
Range Space and Null Space of Operator

Let L : X Y be an Operator (Linear or otherwise). The
Range Space !(L) Y is
!(L) = y = Lx : x X
The Null Space A (L) X is
A (L) = x X : Lx = 0
Introduction
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Introduction
Each Linear Operator A : C
n
C
m
denes spaces as follows
The following properties hold
!(A) A
_
A
H
_
, !
_
A
H
_
A (A)
rank (A) = dim(!(A)) = dim
_
!
_
A
H
__
= rank
_
A
H
_
Introduction
The action of linear operator A C
mxn
The following properties hold
rank (A) = rank
_
AA
H
_
= rank
_
A
H
A
_
= rank
_
A
H
_
!(A) = !
_
AA
H
_
, !
_
A
H
_
= !
_
A
H
A
_
SVD Theorem
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
SVD Theorem
Theorem
SVD Theorem:
Every Matrix A C
mxn
can be factored as A = UV
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
has the form = diag (
1
,
2
, . . . ,
p
) , p = min (m, n).
Corollary (i)
The columns of U are the Eigenvectors of AA
H
(Left Eigenvectors).
AA
H
= UV
H
_
UV
H
_
H
= UV
H
V
H
U
H
= U
H
U
H
The columns of V are the Eigenvectors of A
H
A (Right
Eigenvectors).
A
H
A =
_
UV
H
_
H
UV
H
= V
H
U
H
UV
H
= V
H
V
H
SVD Theorem
Theorem
SVD Theorem:
Every Matrix A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Corollary (ii)
The p singular values on the diagonal of are the square roots of
the non zero eigenvalues of both AA
H
and A
H
A.
The SVD is unique up to permutations of (u
i
,
i
, v
i
) as long as
i
,=
j
i ,= j . If the "Algebraic Multiplicity" of a certain
eigenvalue of A
H
A / AA
H
is larger than 1, Then, theres a freedom
of choosing the the vectors which spans the the null space
AA
H
I / A
H
A I.
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Theorem
A = UV
H
In order to prove the SVD Theorem 2 propositions should be used.
Proposition I
A C
mxn
A
H
A or AA
H
are Hermitian Matrix.
Proof.
C
ij
= A
H
i
A
j
=
_
A
H
i
A
j
_
H
H
=
_
A
j
H
_
A
H
i
_
H
_
H
= A
H
j
A
i
= C
ji
H
= C
ji

Proposition II - Spectral Decomposition
A C
nxn
: A
ij
= A
ji
(Hermitian Matrix) Can be diagonalized

using Unitary Matrix U C
nxn
s.t. U
H
AU = .
Proof.
The Spectral Decomposition is a result of few properties of
Hermitian Matrices:
For Hermitian Matrices the Eigenvectors of distinct
Eigenvalues are Orthogonal.
Schurs Lemma
A C
nxn
U C
nxn
Unitary s.t. U
H
AU = T
Where T C
nxn
is Upper Triangular Matrix.
When A has n distinct Eigenvalues the Proposition II is immediate.
Otherwise it can be shown that if A is Hermitian, T is Hermitian
and since it is Upper Triangular, it must be Diagonal Matrix.
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Proof.
Let
A
H
AV = Vdiag (
1
,
2
, . . . ,
n
)
be the Spectral Decomposition of A
H
A.
Where the columns of V = [v
1
, v
2
, . . . , v
n
] are Eigenvectors and
1
,
2
, . . . ,
r
> 0,
r +1
=
r +2
= . . . =
n
= 0, Where r p.
For 1 i r , Let
u
i
=
Av
i
i
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Proof. Continued
Notice that
u
i
, u
j
=
i j
The set u
i
, i = 1, 2, . . . , r can be extended using the
Graham-Schmidt procedure to form an Orthonormal basis for C
m
.
Let
U = [u
1
, u
2
, . . . , u
m
]
Then the set of u
i
are the Eigenvectors for AA
H
.
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Proof. Continued
This is clear for the non zero Eigenvalues of AA
H
.
For the zero Eigenvalues, The Eigenvectors must come from the
Null Space of AA
H
. Since the Eigenvectors with zero Eigenvalues
are, By construction, Orthogonal to the Eigenvectors with non zero
Eigenvalues that are in the Range of AA
H
, Hence must be in the
Null Space of AA
H
.
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Proof. Continued
Examining the elements of U
H
AV.
For i r the (i , j ) element of U
H
AV is
u
i
H
Av
j
=
1
i
v
i
H
A
H
Av
j
=

j
i
v
i
H
v
j
=
_
ij
For i > r We get
AA
H
u
i
= 0
Thus A
H
u
i
A (A) and also A
H
u
i
!
_
A
H
_
as a Linear
Combination of the columns of A
H
. Yet !
_
A
H
_
A (A) Hence
A
H
u
i
= 0.
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Proof. Continued.
Since A
H
u
i
= 0 we get u
i
H
Av
j
= v
j
H
A
H
u
i
= 0. Thus U
H
AV = ,
Where is diagonal (Main Diagonal).
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Alternative Proof
Notice that A
H
A and AA
H
share the same non zero eigenvalues
(Could be proved independently from the SVD).
Let AA
H
u
i
=
i
2
u
i
for i = 1, 2, . . . , m.
By Spectral Theorem:
U = [u
1
, u
2
, . . . , u
m
] , U C
mxm
, UU
H
= U
H
U = I
m
Thus
_
_
A
H
u
i
=
i
_
_
for i = 1, 2, . . . , m.
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Alternative Proof Continued
Let A
H
Av
i
=
i
2
v
i
for i = 1, 2, . . . , m.
By Spectral Theorem:
V = [v
1
, v
2
, . . . , v
m
] , V C
nxn
, VV
H
= V
H
V = I
n
Utilizing the above for the non zero
i
2
:
AA
H
u
i
=
i
2
u
i
A
H
AA
H
u
i
. .
z
i
=
i
2
A
H
u
i
. .
z
i
Meaning z
i
and
i
2
are eigenvectors and eigenvalues of A
H
A.
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
Examining z
i
yields:
z
j
H
z
i
= u
j
H
AA
H
u
i
=
i
2
u
j
H
u
i
z
i
= u
i
i
v
i
=
z
i
i
=
A
H
u
i
i
Consider the following n equations for i = 1, 2, . . . , m:
Av
i
= AA
H
u
i
i
(or zero) =
i
u
i
(or zero)
Theorem
A C
mxn
H
.
Where U C
mxm
, V C
nxn
are Unitary.
C
mxn
1
,
2
, . . . ,
p
) , p = min (m, n).
These equations can be written as:
AV = U A = UV
H
Where U and V as dened above, is an mxn matrix with the top
left nxn block in diagonal form with
i
on the diagonal and the
bottom are zeros.
SVD Properties
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
SVD Properties
It is often convenient to break the matrices in the SVD into two
parts, corresponding to the nonzero singular values and the zero
singular values. Let
=
_
2
_
where
1
= diag (
1
,
2
, . . . ,
r
) R
rxr
and
1

2
. . .
r
,
2
= diag (
r +1
,
r +2
, . . . ,
p
)
= diag (0, 0, . . . , 0) R
(mr )x(nr )
Then the SVD can be written as
A =
_
U
1
U
2
2
_ _
V
1
H
V
2
H
_
= U
1
1
V
1
H
Where U
1
C
mxr
, U
2
C
mx(mr )
, V
1
C
nxr
and V
2
C
nx(nr )
.
SVD Properties
The SVD can also be written as
A =
r
i =1
i
u
i
v
i
H
The SVD can also be used to compute 2 matrix norms:
Hibert-Schmidt / Frobenius Norm
|A|
2
F
=
i ,j
|A
ij
|
2
=
r
i =1
i
2
l
2
Norm
|A|
2
= sup
x,=0
|Ax|
|x|
= max ( (A)) =
1
Which implies
argmax
x,=0
|Ax|
|x|
= v
1
, argmax
x,=0
_
_
x
H
A
_
_
|x|
= u
1
SVD Properties
The Rank of a matrix is the number of nonzero singular values
along the main diagonal of . Using the notation used before
rank (A) = r
The SVD is numerically stable way of computing the rank of a
matrix.
The range (Column Space) of a matrix is
!(A) = b C
m
: b = Ax
=
_
b C
m
: b = UV
H
x
_
= b C
m
: b = Uy
= b C
m
: b = U
1
y = span (U
1
)
The range of a matrix is spanned by the orthogonal set of
vectors in U
1
, the rst r columns of U.
SVD Properties
Generally, the other fundamental spaces of a matrix A can
also be determined from the SVD:
!(A) = span (U
1
) = !
_
AA
H
_
A (A) = span (V
2
)
!
_
A
H
_
= span (V
1
) = !
_
A
H
A
_
A
_
A
H
_
= span (U
2
)
The SVD thus provides an explicit orthogonal basis and a
computable dimensionality for each of the fundamental spaces
of a matrix.
SVD Properties
Since the SVD is a decomposition of a given matrix into 2 Unitary
matrices and a diagonal matrix, all matrices could be described as
a rotation, scaling and another rotation.
This intuition is a result of the properties of unitary matrices which
basically rotate the multiplied matrix.
This property is farther examined when dealing Linear Equations.
SVD Example
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
SVD Example
Finding the SVD of a matrix (Numerically) using MATLAB
command - [U S V] = svd(A).
Let
A =
_
1 2 3
6 5 4
_
Then
A = UV
H
Where
U =
_
0.355 0.934
0.934 0.355
_
=
_
9.362 0 0
0 1.831 0
_
V =
_
_
0.637 0.653 0.408
0.575 0.050 0.8165
0.513 0.754 0.408
_
_
SVD Example
Let A be a Diagonal Matrix
A =
_
2 0
0 4
_
=
_
0 1
1 0
_ _
4 0
0 2
_ _
0 1
1 0
_
In this case, the U and V matrices just shue the columns around
and change the signs to make the singular values positive.
Let A be a Square Symmetric Matrix
A =
_
_
5 6 2
6 1 4
2 4 7
_
_
= UV
H
Where
U = V =
_
_
0.592 0.616 0.518
0.526 0.191 0.828
0.610 0.763 0.211
_
_
, =
_
_
12.391 0 0
0 4.383 0
0 0 3.774
_
_
In this case, the SVD is the regular Eigen Decomposition.
Order Reduction
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Order Reduction
The SVD of a matrix can be used to determine how near (In the
sense of l
2
-norm) the matrix is to a matrix of a lower rank. It can
also be used to nd the nearest matrix of a given lower rank.
Theorem
Let A be an mxn matrix with rank(A) = r and let A = UV
H
.
Let k < r and let
A
k
=
k
i =1
i
u
i
v
i
H
= U
k
V
H
where
k
= diag (
1
,
2
, . . . ,
k
)
Then |A A
k
|
2
=
k+1
, and A
k
is the nearest matrix of rank k to
A (In the sense of l
2
-norm / Frobenius norm):
min
rank(B)=k
|A B|
2
= |A A
k
|
2
Order Reduction
Proof.
Since A A
k
= Udiag (0, 0, . . . , 0,
k+1
, . . . ,
r
, 0, . . . , 0) V
H
it
follows that |A A
k
|
2
=
k+1
.
The second part of the proof is a "Proof by Inequality". By
Denition of the matrix norm, for any unit vector z the following
holds:
|A B|
2
2
|(A B) z|
2
2
Let B be a rank - k matrix of size mxn. Then there exist vectors
x
1
, x
2
, . . . , x
nk
that span A (B) where x
i
C
n
.
Consider the vectors from the matrix V of the SVD,
v
1
, v
2
, . . . , v
k+1
where v
i
C
n
.
Order Reduction
Proof. Continued.
The intersection, span (x
1
, . . . , x
nk
) span (v
1
, . . . , v
k+1
) R
n
,
cannot be zero since there are total of n + 1 vectors. Let z be a
vector from this intersection, normalized s.t. |z|
2
= 1. Then:
|A B|
2
2
|(A B) z|
2
2
= |Az|
2
2
Since z span (v
1
, v
2
, . . . , v
k+1
), Az =
k+1
i =1
i
_
v
i
H
z
_
u
i
Now
|A B|
2
2
|Az|
2
2
=
k+1
i =1
2
i
_
v
i
H
z
_
2

2
k+1
Lower bound is achieved by B =
k
i =1
i
u
i
v
i
H
, with z = v
k+1
.
Order Reduction
Applications of Order Reduction:
Image Compression
Order Reduction
Applications of Order Reduction:
Noise Reduction
Basic assumption - Noise is mainly pronounced in the small
singular values.
Noiseless Matrix Noisy Matrix - Std 1 Noisy Matrix - Std 6 Noisy Matrix - Std 11
Order Reduction
Analyzing the eect of noise on the Singular Values
Order Reduction
Order Reduction
Order Reduction
Ground Truth Added Noise Std 6 Reconstruction using 140 Singular
Values
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Consider the solution of the equation Ax = b.
If b !(A) there is at least one solution:
If dim(A (A)) = 0 there is only one unique solution,
x
r
!
_
A
H
_
, s.t. Ax
r
= b.
If dim(A (A)) 1 , The columns of A are not independent,
there are innite solutions. Any vector of the form x = x
r
+ x
n
where x
r
!
_
A
H
_
is the solution from the previous case and
x
n
A (A) is a solution s.t. A(x
r
+ x
n
) = b.
Which solution should be chosen?
Usually the solution with the minimum norm, x
r
.
If b / !(A) there is no solution.
Usually, the following vector is searched, x s.t. |A x b|
2
is
brought to minimum.
Assuming x = min
x
|Ax b|
2
.
By denition

b = A x !(A).
Meaning, the search is for

b s.t.
_
_
b b
_
_
2
is minimized.
According to the Projection Theorem, only one vector

b exists s.t.
_
_
b b
_
_
2
is minimized. This vector is the projection of b on
!(A).
b = Proj
!(A)
(b) = A
H
b
Moreover,
x = min
x
|Ax b|
2

_
A
H
A
_
x = A
H
b
Intuitively, the procedure is as following:
Project b onto the Column Space !(A), namely,
b = Proj
!(A)
(b) = A
H
b.
Project x onto the Row Space !
_
A
H
_
, namely,
Proj
!(A
H
)
( x) = Ax.
Project the previous result Ax onto the Column Space !(A),
namely, Proj
!(A)
(A x) = A
H
(A x).
The equation
_
A
H
A
_
x = A
H
b is called the Normal Equations. If
the columns of A are independent then A
H
A is invertible and x
could be calculated as the following:
x =
_
A
H
A
_
1
A
H
b
This is the Least Squares solution using the Pseudo Inverse of A:
A
=
_
A
H
A
_
1
A
H
Yet, if the columns of A are linearly dependent the Pseudo Inverse
of A cant be calculated directly.
If A has dependent columns, then the nullspace of A is not trivial
and there is no unique solution.
The problem becomes selecting one solution out of the innite
number of possible solutions.
As mentioned, commonly accepted approach is to select the
solution with the smallest norm (Length).
This problem could be solved using the SVD and denition of the
generalized Pseudo Inverse of a matrix.
Denition
The Pseudo Inverse of a matrix A = UV
H
, denoted A
is given
by
A
= V
U
H
Where
is obtained by transposing and inverting all non zero

entries.
Proposition III
Let A = UV
H
and x
= A
b = V
U
H
b. Then A
H
Ax
= A
H
b.
Namely, using the solution given by the Pseudo Inverse matrix
calculated using the SVD holds the Normal Equations. This
denition of Pseudo Inverse exists for any matrix.
Proposition III
Let A = UV
H
and x
= A
b = V
U
H
b. Then A
H
Ax
= A
H
b.
proof
Its sucient to show that A
H
_
Ax
b
_
= 0.
Ax
b =
_
UV
H
_
V
U
H
b b
=
_
U
U
H
I
_
b
=
_
U
_
I
_
U
H
_
b
Proof. Continued.
Thus,
A
H
_
Ax
b
_
= V
H
U
H
U
_
I
_
U
H
b
= V
H
_
I
_
U
H
b
One should observe that
H
=
_

H
rxr
0
rx(mr )
0
(nr )xr
0
(mr )x(mr )
_
Where
r
is r by r submatrix of non zero diagonal entries in and
I =
_
0
rxr
0
rx(mr )
0
(nr )xr
I
(mr )x(mr )
_
Hence the multiplication yields the zero matrix.
Proposition IV
The vector x = A
b is the shortest Least Squares solution to

Ax = b, namely,
| x|
2
= min |x|
2
: |Ax b|
2
is minimal
proof
Using the fact both U and V are Unitary
min
|x|
2
_
min
x
|Ax b|
2
_
= min
|x|
2
_
min
x
_
_
_UV
H
x b
_
_
_
2
_
= min
|V
H
x|
2
_
min
x
_
_
_V
H
x U
H
b
_
_
_
2
_
= min
|y|
2
_
min
x
_
_
_y U
H
b
_
_
_
2
_
Proof.
Observing at min
|y|
2
_
min
x
_
_
y U
H
b
_
_
2
_
. Since is diagonal (Main
diagonal to the least) theres only one Least Squares solution,
y =
U
H
b.
Thus,
x = V y = V
U
H
b
will attain the minimum norm.
As written previously, any solution which holds the Normal
Equations is the Least Squares solution.
x = min
x
|Ax b|
2

_
A
H
A
_
x = A
H
b
Yet, one should observe x !
_
A
H
_
, namely, the solution lies in
the Row Space of A. Hence, its norm is minimal among all
solutions. In short, the Pseudo Inverse simultaneously minimizes
the norm of the error as well as the norm of the solution.
Example I
Examining the following Linear System:
Ax = b
Where,
A =
_
_
8 10 3 30
9 6 6 18
1 1 10 3
_
_
, x =
_
_
x
1
x
2
x
3
x
4
_
_
=
_
_
1
2
3
6
_
_
, b =
_
_
217
147
51
_
_
Obviously, A
1
cant be calculated. Moreover, since rank (A) = 3
neither
_
A
H
A
_
1
exists. Yet the Pseudo Inverse using the SVD
does exists.
Using the SVD approach A = UV
H
.
Hence, A
= V
U
H
.
Using MATLAB to calculate the SVD yields:
=
_
_
39.378 0 0 0
0 10.002 0 0
0 0 3.203 0
_
_

=
_
_
0.025 0 0
0 0.1 0
0 0 0.312
0 0 0
_
_
Calculating x yields:
x = V
U
H
b =
_
_
1
2
3
6
_
_
= x
The SVD cancelled the 4th column which is dependent on the 2nd
column of A.
Since b !(A) the exact solution could be calculated.
Example II
In this case
A =
_
_
5 0 0 0
0 2 0 0
0 0 0 0
_
_
, x =
_
_
x
1
x
2
x
3
x
4
_
_
=
_
_
1
2
3
6
_
_
, b =
_
_
5
4
3
_
_
Obviously, b / !(A). Neither A
1
nor
_
A
H
A
_
1
exist. Using the
SVD Pseudo Inverse:
x = V
U
H
b =
_
_
1
2
0
0
_
_
Examining the solution using the SVD. First, Since rank (A) = 2
its Column Space is spanned by the rst 2 columns of U.
Calculating the projection of b onto the Column Space of A is
given by

b = Proj
!(A)
(b) =
2
i =1
U
i
H
bU
i
=
_
_
5
4
0
_
_
.
Now given the updated Linear System A x =

b which has innite
number of solutions.
One could calculate that A (A) = span
_
_
_
_
_
_
0
0
1
0
_
_
,
_
_
0
0
0
1
_
_
_
_
_
_
.
Hence x =
_
b
1
A
1,1
b
2
A
2,2
0
0
_
_
+
_
_
_
_
s
_
_
0
0
1
0
_
_
+ t
_
_
0
0
0
1
_
_
_
_
_
_
= x
r
+ x
n
where s, t R.
The target is the solution with the minimum norm.
Since x
r
x
n
the norm of this solution is
| x|
2
= | x
r
|
2
+| x
n
|
2
The minimum norm solution is obtained by taking x
n
= 0.
This results in the Pseudo Inverse solution as above.
Numerically Sensitive Problems
Systems of equations that are poorly conditioned are sensitive to
small change in values.
Since, practically speaking, there are always inaccuracies in
measured data, the solution to these equations may be almost
meaningless.
The SVD can help with the solution of ill-conditioned equations by
identifying the direction of sensitivity and discarding that portion
of the problem. The procedure will be illustrated by the following
example.
Example III
Examining the following system of equations Ax = b
_
1 + 3 1 3
3 3 +
_ _
x
1
x
2
_
=
_
b
1
b
2
_
The SVD of A is
A =
1
20
_
1 3
3 1
_ _
2
5
2
5
_ _
1 1
1 1
_
From which the exact inverse of A is
A
1
=
20
_
1 1
1 1
_
_
1
2
5
1
2
5
_
_
1 3
3 1
_
=
1
20
_
1 +
3
3
1
1
3
3 +
1
_
Easily, one can convince himself that for small the matrix A
1
has large entries which makes x = A
1
b unstable.
Observe that the entry
1
2
5
multiplies the column
_
1
1
_
. This is
the sensitive direction. As b changes slightly, the solution changes
in a direction mostly along the sensitive direction.
If is small,
2
= 2
5 may be set to zero to approximate A.

A
1
20
_
1 3
3 1
_ _
2
5
0
_ _
1 1
1 1
_
The Pseudo Inverse is
A
20
_
1 1
1 1
_
_
1
2
5
0
_
_
1 3
3 1
_
=
1
20
_
1 3
1 3
_
In this case the multiplier of the sensitive direction vector is zero,
no motion in the sensitive direction occurs. Any Least Squares
solution to the equation Ax = b is of the form x = A
b so that
x = c
_
1
1
_
for c R, meaning perpendicular to the sensitive
direction.
As this example illustrates, the SVD identies the stable and
unstable directions of the problem and, by zeroing small singular
values, eliminates the unstable directions.
The SVD could be used to both illustrate poor conditioning and
provide a cure for the ailment. For the equation Ax = b with
solution x = A
1
b, writing the solution using the SVD:
x = A
1
b =
_
UV
H
_
1
=
r
i =1
v
i
u
H
i
b
i
If the singular value
i
is small, then a small change in b or a small
change in either U or V may be amplied into a large change in
the solution x. A small singular value responds to a matrix which
is nearly singular and thus more dicult to invert accurately.
Another point of view, considering the equation
Ax
0
= b
0
x
0
= A
1
b
0
Let b = b
0
+ b where b is the error or noise, etc.
Therefore
Ax = b
0
+ b x = A
1
b
0
+ A
1
b = x
0
+ x
Investigating how small or large is this error in the answer for a
given amount of error. Note that
x = A
1
b |x|
_
_
A
1
_
_
|b|
Or since
_
_
A
1
_
_
=
max
_
A
1
_
=
1
min
(A)
the following holds
|x|
|b|
min
(A)
However recalling that x
0
= A
1
b
0
and therefore
|x
0
|
min
_
A
1
_
|b
0
| =
|b
0
|
max
(A)
Combining the equations yields
|x|
|x
0
|

|b|
|b
0
|
max
(A)
min
(A)
The last fraction,

max
(A)
min
(A)
, is called The Condition Number of A.
This number is indicative of the magnication of error in linear
equation of interest. In most problems, a matrix with very large
condition number is called ill conditioned and will result in severe
numerical diculties.
The solution to those numerical diculties using the SVD is
basically rank reduction:
1
Compute the SVD of A.
2
Examine the singular values of A and zero out any that are
"small" to obtain a new approximate matrix.
3
Compute the solution by x = V
U
H
b.
Determining which singular values are "small" is problem
dependent and requires some judgement.
Total Least Squares
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Total Least Squares
In the classic Least Squares problems, the solution minimizing
|Ax b|
2
is sought after. The hidden assumption is that matrix
A is correct, any error in the problem is in b.
The Least Squares problem nds a vector x s.t.
|A x b|
2
= min
which is accomplished by nding some perturbation r of the right
hand side of minimum norm
Ax = b + r
s.t. (b + r ) !(A). In the Total Least Squares problem, both
the right and left side of the equation assumed to have errors. The
solution of the perturbed equation
(A + E) x = b + r
is sought s.t. (b + r ) !(A + E) and the norm of the
perturbations is minimized.
Total Least Squares
Intuitively, The right hand side is "bent" toward the left hand side
while the left hand side is "bent" toward the right hand side.
Total Least Squares
Let A be an mxn matrix. To nd the solution to the TLS problem
one may observe the homogeneous form
_
A + E[b + r
_
x
1
_
= 0
__
A[b
+
_
E[r
_
_
x
1
_
= 0
Let C =
_
A[b
C
mx(n+1)
and let =
_
E[r
be the perturbation
of the data. In order for the homogeneous form to have solution
the vector
_
x
1
_
must lie in the Null Space of C + and in order
for the solution not to be trivial, the perturbation must be such
that C + is rank decient.
Total Least Squares
Analyzing the TLS problem using the SVD. We bring
(A + E) x = (b + r ) into the form
_
A + E[b + r
_
x
1
_
= 0
Let
_
A + E[b + r
= UV
H
be the SVD of the above form. If
n+1
,= 0 then rank
__
A + E[b + r
_
= n + 1 which means the
!
__
A + E[b + r
_
= R
n+1
, hence theres no nonzero vector in
the orthogonal complement of the Row Space hence the set of
equations is incompatible. To obtain solution the rank of
_
A + E[b + r
must be reduced to n.
As shown before the best approximation of rank n in both
Frobenius and l
2
norm is given by the SVD
_
A[
= U

V
H
,

= diag (
1
,
2
, . . . ,
n
, 0)
Total Least Squares
The minimal TLS correction is given by
n+1
= min
rank
__
A[
b
__
=n
_
_
_
A[b
A[
b
_
_
F
Attained for
_
E[r
=
n+1
u
n+1
v
H
n+1
Note that the TLS correction matrix has rank one.
It is clear that the approximate set
_
A[
_
x
1
_
= 0 is compatible
and the solution is given by the only vector, v
n+1
, that belongs to
A
__
A[
b
_
.
The TLS solution is obtained by scaling v
n+1
until its last
component equals to 1, or
_
x
1
_
=
1
V
n+1,n+1
v
n+1
Total Least Squares
For simplicity it assumed that V
n+1,n+1
,= 0 and
n
>
n+1
hence
the solution exists and it is unique. Otherwise, the solution might
not exists or isnt unique (Any superposition of few columns of V).
For complete analysis of the existence and uniqueness of the
solution see [].
Basic algorithm of the TLS would be:
Given Ax b, where A C
mxn
, b C
m
the TLS solution could
be obtained by
Compute the SVD of
_
A[b
= UV
H
.
If V
n+1,n+1
,= 0 the TLS solution would be
x
TLS
=
1
V
n+1,n+1
v
n+1
(1 : n)
Total Least Squares
The geometric properties of the solution could be described as
following, the TLS solution minimizes the distance between the
vector b to the plane dened by the solution x
TLS
.
Let C = UV
H
From the denition of the l
2
Norm of a matrix
|Cv|
2
|v|
2

n+1
Where |v|
2
,= 0. Equality holds if and only if v S
c
where
S
c
= span v
i
and v
i
are the columns of V which satisfy
u
H
i
Cv
i
=
n+1
.
The TLS problem amounts to nding vector x s.t.
_
_
_
_
_
A[b
_
x
1
__
_
_
_
2
_
_
_
_
_
x
1
__
_
_
_
2
=
n+1
Total Least Squares
By squaring everywhere
min
x
_
_
_
_
_
A[b
_
x
1
__
_
_
_
2
2
_
_
_
_
_
x
1
__
_
_
_
2
2
= min
x
m
i =1
A
H
i
x b
i
2
x
H
x + 1
The quantity
[
A
H
i
xb
i [
2
x
H
x+1
is the square of the distance from the point
_
A
H
i
b
_
C
n+1
to the nearest point on the hyperplane P dened by
P =
__
a
b
_
[a C
n
, b C, b = x
H
a
_
So the TLS problem amounts to nding the closest hyperplane to
the set of points
__
A
H
1
b
1
_
,
_
A
H
2
b
2
_
, . . . ,
_
A
H
m
b
m
__
.
Total Least Squares
The minimum distance property can be shown as following. Let P
be the plane orthogonal to the normal vector n R
n+1
s.t.
P =
_
r C
n+1
: r
H
n = 0
_
and let n have the following form n =
_
x
1
_
. Let p =
_
A
H
m
b
m
_
be a
point in C
n+1
. Finding a point, q C
n+1
which belongs to the
plane P and is closest to the point p is a constrained optimization
problem, minimize |p q| subject to n
H
q = 0. The minimization
function
J (q) = |p q|
2
+ 2n
H
q = p
H
p 2p
H
q + 2n
H
q + q
H
q
= (q p + n)
2
(q p + n) + 2p
H
n
2
n
H
n
This is clearly minimized when q = p n.
Total Least Squares
Determining by the constrain
n
H
q = n
H
p n
H
n = 0 =
n
H
p
n
H
n
Inserting results into the "Minimization Function" yields
J (q) = 2p
H
n
2
n
H
n
=
2n
H
ppp
H
n
n
H
n

n
H
np
H
pn
H
n
n
H
nn
H
n
=
_
n
H
p
_
2
n
H
n
=
_
x
H
A
m
b
m
_
2
x
H
x + 1
Alternative solution using the "Projection Theorem". The distance
from the point p to the plane P can be found by nding the length
of the projection of p onto n, which yields
d
2
min
(p, P) =
p, n
2
|n|
2
=
_
x
H
, 1
_
A
m
H
b
m
_
x
H
x + 1
1
Notations
Denitions
Introduction
2
SVD Theorem
SVD Properties
SVD Example
3
Applications
Order Reduction
Total Least Squares
Principal Component Analysis (PCA) is the mathematical
procedure that uses an orthogonal transformation to convert a set
of observations of possibly correlated variables into a set of values
of uncorrelated variables called Principal Components.
The number of Principal Components is less than or equal to the
number of original variables.
This transformation is dened in such way that the rst component
has a variance as high as possible (Thats, accounts for as much of
the variability in data as possible), and each succeeding component
in turn has the highest variance possible under the constraint that
it is orthogonal to (Uncorrelated with) the preceding components.
PCA is mathematically dened as an orthogonal linear
transformation that transforms the data to a new coordinate
system such that the greatest variance by any projection of the
data comes to lie on the rst coordinate (Called the rst Principal
Component), the second greatest variance on the second
coordinate and so on.
Assuming given a collection of data of columns vectors
a
1
, a
2
, . . . , a
m
R
n
.
The projection of the data onto a subspace U R
r
, r m which
is is spanned by the orthogonal basis u
1
, u
2
, . . . , u
r
is given by
a
i
= f
i 1
u
1
+ f
i 2
u
2
+ . . . + f
ir
u
r
, i = 1 : m
for some coecients f
ij
.
Note that f
ij
= a
H
i
u
j
, the projection of a
i
along the direction of u
j
.
By the Projection Theorem This projection is the closest in the
l
2
norm sense to the data given by a
i
.
The search is after the orthogonal basis u
1
, u
2
, . . . , u
r
.
Formulation the constraint of maximization of the variance along
the direction of u
1
yields
max
|w|=1
m
i =1
a
H
i
w
2
=
_
_
_A
H
w
_
_
_
2
=
_
A
H
w
_
H
_
A
H
w
_
= w
H
AA
H
w
Using the SVD of A = UV
H
, Then AA
H
= U
H
U
H
.
Observing,
w
H
AA
H
w
w
H
w
=
_
U
H
w
_
H
H
_
U
H
w
_
(U
H
w)
H
(U
H
w)
Noticing that there are only r non zero entries in by the
properties the SVD. Dening x = U
H
w yields
w
H
AA
H
w
w
H
w
=

2
1
x
2
1
+
2
2
x
2
2
+ . . . +
2
r
x
2
r
x
2
1
+ x
2
2
+ . . . + x
2
m
Now we have
max
w,=0
w
H
AA
H
w
w
H
w
= max
w,=0
2
1
x
2
1
+
2
2
x
2
2
+ . . . +
2
r
x
2
r
x
2
1
+ x
2
2
+ . . . + x
2
m
Assuming
1

2
. . .
r
.
Then
max
w,=0
2
1
x
2
1
+
2
2
x
2
2
+ . . . +
2
r
x
2
r
x
2
1
+ x
2
2
+ . . . + x
2
m
=
2
1
=
1
Which the largest eigenvalue of A. The vector x which makes the
maximum is x
1
= 1 and x
i
= 0 for i = 2 : m. Which corresponds
to w = Ux = u
1
.
The rst Principal Component is indeed achieved by the rst
eigenvector u
1
of AA
H
.
Calculating the second Principal Component under the constraint
being orthogonal to the rst and maximizing the projection
max
|w|=1,w
H
u
1
=0
m
i =1
a
H
i
w
2
= max
w,=0,w
H
u
1
=0
w
H
_
AA
H
_
w
w
H
w
Using the denition from above yields
max
x,=0,x
H
U
H
u
1
=0
2
1
x
2
1
+
2
2
x
2
2
+ . . . +
2
r
x
2
r
x
2
1
+ x
2
2
+ . . . + x
2
m
=
max
x,=0,x
1
=0
2
1
x
2
1
+
2
2
x
2
2
+ . . . +
2
r
x
2
r
x
2
1
+ x
2
2
+ . . . + x
2
m
=
2
2
=
2
Which is the second largest eigenvalue of AA
H
.
The vector x which makes the maximum is x
2
= 1 and x
i
= 0 for
i = 1, 3 : m.
This corresponds to w = Ux = u
2
, The second Eigenvector, u
2
, of
AA
H
.
Continuing this pattern, u
i
is the i th Principal Component.
The set of orthogonal vectors which spans the subspace the data is
projected to and maximizes the variance of the data is the rst r
vectors which consists the orthogonal matrix from the SVD, U.
Observing the SVD yields the the result immediately
A = UV
H
Y = U
H
A = V
H
Observing the scatter matrix of Y
C
Y
= YY
H
= U
H
A
_
U
H
A
_
H
= U
H
AA
H
U = U
H
C
X
U
Since the matrix U is the Eigenvectors matrix of C
X
= XX
H
by
the Diagonalization Theorem C
Y
is diagonal. Another look yields
YY
H
= V
H
_
V
H
_
H
= V
H
V
H
=
H
= diag(
2
1
,
2
2
, . . . ,
2
r
)
Namely, the Scatter matrix, hence the Covariance Matrix of Y is
diagonal. Moreover, The constraint on the variance holds.
The SVD is a decomposition which can be applied on any
matrix.
The SVD exposes fundamental properties of a linear operator
such as the fundamental spaces, Frobenius Norm and l
2
Norm.
The SVD can be utilized in many applications such as solving
linear systems (Least Squares, Total Least Squares) and order
reduction (Compression, Noise Reduction, Principal
Component Analysis).
To Be Continued
Regulating Linear Equations System.
Appendix
For Further Reading
A. Author.
Handbook of Everything.
Some Press, 1990.
S. Someone.
On this and that.
Journal on This and That. 2(1):50100, 2000.
R. Avital.
On this and that.
Journal on This and That. 2(1):50100, 2000.

Singular Value Decomposition - Lecture Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Singular Value Decomposition - Lecture Notes

Uploaded by

Copyright:

Available Formats

Denitions and Notations Singular Value Decomposition Theorem Applications Summary

Range Space and Null Space of Operator

Denitions and Notations Singular Value Decomposition Theorem Applications Summary

(Hermitian Matrix) Can be diagonalized

is obtained by transposing and inverting all non zero

b is the shortest Least Squares solution to

5 may be set to zero to approximate A.

You might also like