Professional Documents
Culture Documents
Valerio Lucarini
ABSTRACT
Total Least Squares (TLS) is an extension of the usual Least Squares method: it allows
dealing also with uncertainties on the sensitivity matrix. In this paper the TLS method is
analyzed with a robust use of the SVD decomposition technique, which gives a clear
understanding of the sense of the problems and provides a solution expressed in closed
form in the cases where a solution exists. We discuss its relations with the LS problem
and give the expression for the parameter governing the stability of the solutions. At the
end we present two algorithms for computing xTLS, solution of the problem.
In the LS problem we are given a vector of data b∈ℜmx1 and a sensitivity matrix A∈ℜmxn
(rank(A)=r) and seek the x∈ℜnx1 that minimizes the 2-norm of the residual ( Ax − b ) 2 ,
where 2
is the usual Euclidean norm. We can generalize the problem introducing a
weighting matrix D∈ℜmxm (rank(D)=m), D = diag (d1 ,.., d m ) , ∀i di>0 , which takes into
account the various uncertainties associated with the quantities bi, so that essentially 1/di
is the standard deviation of the ith measurement. The solution xLS of the problem is unique
if r=n, while if r<n there is a unique solution xLS with minimum Euclidean norm. The
solution can always be expressed as:
x LS = (DA) Db
+
Equation 1
(
x LS = AT D 2 A ) −1
AT D 2 b .
Equation 2
We observe that scaling the weighting matrix D as D→λD the solution doesn’t change:
this is the mathematical expression of the physical fact that in the computation of the best
fit it is only the ratio of the uncertainties of the measurements di with respect to one of
them (let’s say d1) that determines the result.
This ordinary LS problem can be restated as follows:
∈ℜ m Dr
Minimize over r∈ 2
∈Range(A).
where (b+r)∈
In the case of TLS problem we want to take into account also the perturbations to the
sensitivity matrix; the TLS problem is defined as:
1
See Appendix A
where:
= ∑ Bi , j
2
• B F
i, j
• (E|r) denotes the mx(n+1) matrix having E as first n column and r as last column,
• T∈ℜ(n+1)x(n+1), T = diag (t1 ,.., t n+1 ) with ∀i, ti >0 is the matrix that weights the
tolerant we are for uncertainties on the matrix A. If this ratio is large the matrix E has to
have small values so that D( E | r )T F
could be small, therefore we expect that if it goes
SOLUTION
By the very definition of the TLS problem, we want to find a vector x such that:
( A + E )x = b + r ;
Equation 3
x
(D( A | b )T + D(E | r )T )T −1 =0 ;
− 1
Equation 4
C = UΣV T ,
U = (u1 | .. | u m ),
V = (v1 | .. | v n +1 ),
Σ = diag (σ 1 ,..,σ n +1 ),σ 1 ≥ σ 2 ≥ .., ≥ σ k +1 = .. = σ n +1
Equation 5
We take into account the right singular vectors [vk+1,..,vn+1] having the minimum singular
values. If it exists a linear combination of these vectors:
n +1
y
v= ∑λ v = ;
α
i i
i = k +1
Equation 6
1
xTLS = − T1 y
αt n +1
Equation 7
(
D( E | r )T = D( A | b )T I − vv T ) .
Equation 8
Therefore the perturbation terms E and r are obtained respectively as the first n columns
and last column of the matrix (A|b)T(I-vvT)T-1. We stress the fact that the fundamental
point in this solution is the fact that α must be different from zero. If a linear
combinations of the combination of the [vk+1,..,vn+1] having not vanishing last element
cannot be found the problem doesn’t have solution.
We observe that as previously inferred from physical reasons in the solution we have the
ratio between the matrix T1 and the element tn+1.
If k=n in the last expression of equation (5), that is if there is a unique right singular
vector with minimum singular value, if the solution exists it’s obviously unique.
If k<n if the solution exists it is not unique. Analogously with the case of LS problem
with rank-deficient matrix A, in this case it is possible to obtain a solution with smallest
norm. The norm under which such solution is the smallest is the norm T-1, defined in the
following way: x T −1
≡ T −1 x . The y giving such a xTLS through equation (7) can be
2
W y
(vk +1 | .. | vn+1 )Q =
0 α
Equation 9
It can be shown that if at least one of the vectors [vk+1,..,vn+1] has a non vanishing (n+1)th
component, such a construction exists and is unique2.
A sufficient condition guaranteeing us that a solution exists is that the right singular
vector of the matrix D(A|b)T having smallest singular value is unique and has a non
vanishing (n+1)th component.
We define Aˆ = DAT1 and bˆ = Db and indicate with (σˆ 1 ,..,σˆ n ) the n singular values of the
matrix  ordered by magnitude in the usual way. The interlace property3 says that:
2
See Appendix B
3
See Appendix C
σ 1 ≥ σˆ 1 ≥ σ 2 ≥ ... ≥ σˆ n ≥ σ n+1
Equation 10
where as usual the σ I’s indicate the singular values of the matrix D(A|b)T .
If σˆ n > σ n +1 we have that σ n +1 is a non repeated singular value (therefore if the solution
exists it is unique) and that the right singular vector vn+1 of the matrix D(A|b)T has a non
vanishing (n+1)th component, otherwise, as can be proved, it would be a right singular
vector of DAT1 with related singular value σ n +1 , thus contradicting the fact that by
Therefore the condition σˆ n > σ n+1 together with the interlace property guarantees that
vn+1 is the only one among the right singular vectors of D(A|b)T having σ n+1 as related
singular value (that is k=n) and that vn+1 has a non vanishing (n+1)th component.
Under the condition of σˆ n > σ n+1 it is possible to find the following expression for the
(
xTLS = T1 Aˆ T Aˆ − σ n+1 I )
−1
(
Aˆ T bˆ = T1 T1 AT D 2 AT1 − σ n+1 I )
−1
T1 AT D 2b =
(
= AT D 2 A − σ n2+1T1− 2 )
−1
AT D 2 b
Equation 11
The solution obtained looks like the LS solution except for the term σ n2+1T1−2 between the
parentheses. If tn+1=0 we have that σn+1=0 (the range of the matrix D(A|b)T loses one
dimension and so one singular values has to be zero) and so we obtain exactly the LS
solution as claimed at the beginning.
Recalling that the solution to the LS constrained problem:
∈ℜ n D( Ax − b ) 2 + µ T −1 x
2 2
Minimize over x∈
2
is given by the usual Ridge Regression expression:
(
x LS (µ ) = AT D 2 A + µT1−2 )−1
AT D 2 b ;
Equation 12
we conclude that the TLS solution is equivalent to a Ridge Regression with parameter
µ=-σn+12, therefore TLS is a sort of deregularizing procedure, since it includes a negative
damping in the inversion of the problem. Therefore the TLS solution is always less stable
than the usual LS. The parameter regulating the instability of the TLS solution is
( σˆ n − σ n +1 ), essentially since it gives a measure of the distance between the set problems
that for sure can be solved ( σˆ n > σ n +1 ) and the set of problems that may have no solution
( σˆ n = σ n +1 ). This difference is also a rough measure of how ill conditioned is the matrix
We present a survey of other results that define the properties of the solution xTLS:
x T T1−2 x + t n−+21
i
i =1
Equation 13
where aiT is the ith row of A. When tn+1→0 the function Ψ(x) becomes proportional to the
function whose minimum is obtained with the solution of the D-weighted LS problem.
Equation 14
where ρ LS
2
( 2
)
= min D( Ax − b ) 2 = D( Ax LS − b ) 2 , where xLS represents the solution to the
2
t n +1 bˆ ρ LS
xTLS − x LS T −1
≤ 2
σˆ n2 − σ n2+1
Equation 15
As we see the difference between the TLS and LS fits goes to zero linearly with tn+1.
ALGORITHMS
The second algorithm uses the result contained in equation (11) and exploits the sum rule
(14); it provides us also with the value of σˆ n − σ n+1 which gives a quantitative evaluation
ρ LS
2
( 2
)
= min D( Ax − b ) 2 = D( Ax LS − b ) 2 ;
2
1 n n Uˆ ji b 2j
t n2+1 ∑∑
3. find σ such that σ 2
+ = ρ LS
2
;
2
i =1 j =1 σˆ 2
i − σ
a) if σ ≥ σˆ n there is no solution→END.
(
4. Compute xTLS = AT D 2 A − σ 2T1−2 )
−1
AT D 2 b
END.
CONCLUSIONS
The TLS problem is an extension of the LS problem in the domain of uncertainty of the
sensitivity matrix. The TLS solutions are less stable than the usual LS ones, the profound
reason being that while the LS problem has always a solution, the TLS problem doesn’t.
The reason why the TLS is a suitable method of analysis of linear problems (and
nonlinear, with an iterative approach) is that it gives the possibility to explore the validity
of the model used, giving a measure of the change in the sensitivity matrix needed to
have the best solution. Therefore the TLS can give the scientist hints for recognizing
mistakes in the model, while this in the case of LS is forced to work. The TLS is
therefore more suitable since conceptually more correct in the cases where the model
used is extremely empirical and its apriori scientific motivations are weak.
BIBLIOGRAPHY
G. Golub, C. F. Von Loan, Matrix Computations (J. Hopkins University Press, Baltimore,
1996), Chapters 5, 8, 12 and references therein, particularly:
G. Golub, C. F. Von Loan, SIAM, J. Numer. Anal. Vol. 17 No. 6 (1980)
upon which most of this work has been based.
APPENDIX A
If a matrix B∈ℜmxn, m≥n, has rank r, computing the SVD decomposition of B (the SVD
decomposition can be defined also for rectangular matrices) we will find that n-r of its
singular values are zero:
A. 1
The matrix B is not invertible, but it is possible to define B+, the pseudoinverse of B,
defined as:
A. 2
BB + = B + B = diag 1{ ,..,0 ∈ ℜ nxn
,1..,1, 0{
r times n−r times
A. 3
Let’s consider the equation VQ where V = (v k +1 | .. | v n+1 ) , and take its take the transpose:
Q T V T = Q T C = Q T (c1 | .. | cn +1 )
B. 1
direction) and form the Householder reflection matrix Q=I-2ssT/sTs. Applying this matrix
to the left of the matrix VT=C we obtain:
W T 0
(
Q C = I − 2 ss
T T T
)
s s (c1 | .. | c n+1 ) = Y ,
cn+1
y
B. 2
that is Q kills the first n elements of the last column. The last element is different from
zero if and only if at least one of the vectors (vk+1,..,vn+1) has a non vanishing (n+1)th
component.
Taking the transpose of B. 2 we obtain the equation (9).
APPENDIX C
Let F be a symmetric (n+1)x(n+1) matrix and let F’ its nxn upper left sub matrix. The
following important property holds:
λ1 ≥ λ1′ ≥ λ 2 ≥ λ 2′ ≥ ... ≥ λ n′ ≥ λ n +1
C. 1
where the λ’s are the eigenvalues of F, the λ’ ’s are the eigenvalues of F’ both ordered in
magnitude.
We remember that the squares of the singular values of a matrix B are the eigenvalues of
BTB. In our case let B= D( A | b )T :
T1 AT D 2 AT1 T1 AT D 2btn+1
(D( A | b)T ) (D( A | b)T ) = ((DAT1 | Dbtn+1 ))
T T
((DAT1 | Dbtn+1 )) = T 2 2 T 2
t n+1b D AT1 t n+1b D b
C. 2
We observe that the upper left block is nothing but (DAT1 ) (DAT1 ) , so that the its
T
eigenvalues are interlaced with those of (D( A | b )T ) (D( A | b )T ) . But the eigenvalues of
T
(DAT1 )T (DAT1 ) are nothing but the squares of the singular values of DAT1 , the
eigenvalues of (D( A | b )T ) (D( A | b )T ) are nothing but the squares of singular values of
T