You are on page 1of 13

TOTAL LEAST SQUARES

Valerio Lucarini

ABSTRACT

Total Least Squares (TLS) is an extension of the usual Least Squares method: it allows
dealing also with uncertainties on the sensitivity matrix. In this paper the TLS method is
analyzed with a robust use of the SVD decomposition technique, which gives a clear
understanding of the sense of the problems and provides a solution expressed in closed
form in the cases where a solution exists. We discuss its relations with the LS problem
and give the expression for the parameter governing the stability of the solutions. At the
end we present two algorithms for computing xTLS, solution of the problem.

INTRODUCTION – LEAST SQUARES PROBLEM


REVISED

In the LS problem we are given a vector of data b∈ℜmx1 and a sensitivity matrix A∈ℜmxn
(rank(A)=r) and seek the x∈ℜnx1 that minimizes the 2-norm of the residual ( Ax − b ) 2 ,
where 2
is the usual Euclidean norm. We can generalize the problem introducing a

weighting matrix D∈ℜmxm (rank(D)=m), D = diag (d1 ,.., d m ) , ∀i di>0 , which takes into

account the various uncertainties associated with the quantities bi, so that essentially 1/di
is the standard deviation of the ith measurement. The solution xLS of the problem is unique
if r=n, while if r<n there is a unique solution xLS with minimum Euclidean norm. The
solution can always be expressed as:
x LS = (DA) Db
+

Equation 1

where + denotes the Moore-Penrose pseudoinverse1. If rank(A)=n, we have that:

(
x LS = AT D 2 A ) −1
AT D 2 b .

Equation 2

We observe that scaling the weighting matrix D as D→λD the solution doesn’t change:
this is the mathematical expression of the physical fact that in the computation of the best
fit it is only the ratio of the uncertainties of the measurements di with respect to one of
them (let’s say d1) that determines the result.
This ordinary LS problem can be restated as follows:

∈ℜ m Dr
Minimize over r∈ 2
∈Range(A).
where (b+r)∈

The two statements of problem are equivalents since if Dr 2


is minimum and b+r=Ax

for some x the problem previously defined is solved and xLS=x.


r can be interpreted as a perturbation which added to the data gives a vector which
belongs to the image of the sensitivity matrix A.

THE TOTAL LEAST SQUARES PROBLEM

In the case of TLS problem we want to take into account also the perturbations to the
sensitivity matrix; the TLS problem is defined as:

Minimize over r∈ ∈ℜ mxn D( E | r )T


∈ℜ m and E∈ F
∈Range(A+E).
with (b+r)∈

1
See Appendix A
where:

= ∑ Bi , j
2
• B F
i, j

• (E|r) denotes the mx(n+1) matrix having E as first n column and r as last column,
• T∈ℜ(n+1)x(n+1), T = diag (t1 ,.., t n+1 ) with ∀i, ti >0 is the matrix that weights the

uncertainties of the model.

The matrix E plays the role of a perturbation to the given matrix A.


Since the solution of the problem doesn’t’ change if T→λT, we have that the only
physically relevant thing is the ratio between the ti’s, in particular observing that it is
T1 = diag (t1 ,.., t n ) ∈ ℜ nxn which weights the matrix E , the ratio T1/tn+1 determines how

tolerant we are for uncertainties on the matrix A. If this ratio is large the matrix E has to
have small values so that D( E | r )T F
could be small, therefore we expect that if it goes

to infinity we go back to the usual LS problem.

SOLUTION

By the very definition of the TLS problem, we want to find a vector x such that:

( A + E )x = b + r ;

Equation 3

this relation can be written as:

x 
(D( A | b )T + D(E | r )T )T −1  =0 ;
− 1

Equation 4

with the previously defined use of the square parentheses.


Let’s perform the SVD decomposition of the matrix C≡D(A|b)T:

C = UΣV T ,
U = (u1 | .. | u m ),
V = (v1 | .. | v n +1 ),
Σ = diag (σ 1 ,..,σ n +1 ),σ 1 ≥ σ 2 ≥ .., ≥ σ k +1 = .. = σ n +1

Equation 5

We take into account the right singular vectors [vk+1,..,vn+1] having the minimum singular
values. If it exists a linear combination of these vectors:

n +1
 y
v= ∑λ v =   ;
α 
i i
i = k +1

Equation 6

such that y∈ℜn, α∈ℜ,α≠0, the vector

1
xTLS = − T1 y
αt n +1

Equation 7

is the solution to equation (4) when

(
D( E | r )T = D( A | b )T I − vv T ) .

Equation 8

Therefore the perturbation terms E and r are obtained respectively as the first n columns
and last column of the matrix (A|b)T(I-vvT)T-1. We stress the fact that the fundamental
point in this solution is the fact that α must be different from zero. If a linear
combinations of the combination of the [vk+1,..,vn+1] having not vanishing last element
cannot be found the problem doesn’t have solution.
We observe that as previously inferred from physical reasons in the solution we have the
ratio between the matrix T1 and the element tn+1.
If k=n in the last expression of equation (5), that is if there is a unique right singular
vector with minimum singular value, if the solution exists it’s obviously unique.
If k<n if the solution exists it is not unique. Analogously with the case of LS problem
with rank-deficient matrix A, in this case it is possible to obtain a solution with smallest
norm. The norm under which such solution is the smallest is the norm T-1, defined in the
following way: x T −1
≡ T −1 x . The y giving such a xTLS through equation (7) can be
2

found constructing the orthogonal Householder matrix Q∈ℜ(n-k+1)x(n-k+1) such that:

W y
(vk +1 | .. | vn+1 )Q =  
0 α 

Equation 9

It can be shown that if at least one of the vectors [vk+1,..,vn+1] has a non vanishing (n+1)th
component, such a construction exists and is unique2.

TLS AS A CONSTRAINED LS PROBLEM

A sufficient condition guaranteeing us that a solution exists is that the right singular
vector of the matrix D(A|b)T having smallest singular value is unique and has a non
vanishing (n+1)th component.

We define Aˆ = DAT1 and bˆ = Db and indicate with (σˆ 1 ,..,σˆ n ) the n singular values of the

matrix  ordered by magnitude in the usual way. The interlace property3 says that:

2
See Appendix B
3
See Appendix C
σ 1 ≥ σˆ 1 ≥ σ 2 ≥ ... ≥ σˆ n ≥ σ n+1

Equation 10

where as usual the σ I’s indicate the singular values of the matrix D(A|b)T .
If σˆ n > σ n +1 we have that σ n +1 is a non repeated singular value (therefore if the solution
exists it is unique) and that the right singular vector vn+1 of the matrix D(A|b)T has a non
vanishing (n+1)th component, otherwise, as can be proved, it would be a right singular
vector of DAT1 with related singular value σ n +1 , thus contradicting the fact that by

hypothesis σˆ n > σ n +1 and that σ̂ n is the smallest singular value of DAT1.

Therefore the condition σˆ n > σ n+1 together with the interlace property guarantees that

vn+1 is the only one among the right singular vectors of D(A|b)T having σ n+1 as related

singular value (that is k=n) and that vn+1 has a non vanishing (n+1)th component.
Under the condition of σˆ n > σ n+1 it is possible to find the following expression for the

solution to the TLS problem:

(
xTLS = T1 Aˆ T Aˆ − σ n+1 I )
−1
(
Aˆ T bˆ = T1 T1 AT D 2 AT1 − σ n+1 I )
−1
T1 AT D 2b =

(
= AT D 2 A − σ n2+1T1− 2 )
−1
AT D 2 b

Equation 11

The solution obtained looks like the LS solution except for the term σ n2+1T1−2 between the

parentheses. If tn+1=0 we have that σn+1=0 (the range of the matrix D(A|b)T loses one
dimension and so one singular values has to be zero) and so we obtain exactly the LS
solution as claimed at the beginning.
Recalling that the solution to the LS constrained problem:

∈ℜ n D( Ax − b ) 2 + µ T −1 x
2 2
Minimize over x∈
2
is given by the usual Ridge Regression expression:

(
x LS (µ ) = AT D 2 A + µT1−2 )−1
AT D 2 b ;

Equation 12

we conclude that the TLS solution is equivalent to a Ridge Regression with parameter
µ=-σn+12, therefore TLS is a sort of deregularizing procedure, since it includes a negative
damping in the inversion of the problem. Therefore the TLS solution is always less stable
than the usual LS. The parameter regulating the instability of the TLS solution is
( σˆ n − σ n +1 ), essentially since it gives a measure of the distance between the set problems

that for sure can be solved ( σˆ n > σ n +1 ) and the set of problems that may have no solution

( σˆ n = σ n +1 ). This difference is also a rough measure of how ill conditioned is the matrix

inversion in equation (11).

OTHER PROPERTIES OF THE SOLUTION xTLS

We present a survey of other results that define the properties of the solution xTLS:

• xTLS minimizes the function Ψ(x):


2
m aiT x − bi
Ψ(x ) = ∑ d 2

x T T1−2 x + t n−+21
i
i =1

Equation 13

where aiT is the ith row of A. When tn+1→0 the function Ψ(x) becomes proportional to the
function whose minimum is obtained with the solution of the D-weighted LS problem.

• the following relation holds :


 1 n n Uˆ ji b 2j 
σ n2+1  2 + ∑∑ 2 = ρ LS
2
 t n +1 i =1 j =1 σˆ i − σ n2+1 
 

Equation 14

where ρ LS
2
( 2
)
= min D( Ax − b ) 2 = D( Ax LS − b ) 2 , where xLS represents the solution to the
2

LS problem with weighting matrix D.

• The following inequality relates the LS and TLS fits:

t n +1 bˆ ρ LS
xTLS − x LS T −1
≤ 2

σˆ n2 − σ n2+1

Equation 15

As we see the difference between the TLS and LS fits goes to zero linearly with tn+1.

ALGORITHMS

We present two algorithms to compute the solution of the TLS problem.


The first algorithms uses the first expression (equation (7)) of the solution we have
obtained and uses the Householder reflection to compute the vector y. It uses at the step
described by equation (9) all the right vectors whose singular value differs from the
smallest one by less of a given amount εMACH which can be chosen to be the machine
precision

1. Compute the SVD of D(A|b)T=UΣVT;


2. Find p such that σp≥σn+1+εMACH≥σp+1≥..≥σn+1;
W T y
(
3. Compute (v p+1 | .. | v n+1 ) I − 2 ss T s T s )
T
=   (see Appendix B for the
 0 α 
expression of the vector s);
a) if α=0 the solution doesn’t exist→END.
b) if α≠0 go to the next step;
1
4. Compute xTLS = − T1 y .
αt n +1
END.

The second algorithm uses the result contained in equation (11) and exploits the sum rule
(14); it provides us also with the value of σˆ n − σ n+1 which gives a quantitative evaluation

of the instability of the problem we are treating:

1. Compute the SVD of the matrix DAT1 = UˆΣVˆ T ;

2. Solve the LS problem “minimization of D( Ax − b ) 2 “ and obtain


2

ρ LS
2
( 2
)
= min D( Ax − b ) 2 = D( Ax LS − b ) 2 ;
2

 1 n n Uˆ ji b 2j 
 t n2+1 ∑∑
3. find σ such that σ  2
+  = ρ LS
2
;
2 
 i =1 j =1 σˆ 2
i − σ 
a) if σ ≥ σˆ n there is no solution→END.

b) if σ < σˆ n go to the next step;

(
4. Compute xTLS = AT D 2 A − σ 2T1−2 )
−1
AT D 2 b

END.

CONCLUSIONS

The TLS problem is an extension of the LS problem in the domain of uncertainty of the
sensitivity matrix. The TLS solutions are less stable than the usual LS ones, the profound
reason being that while the LS problem has always a solution, the TLS problem doesn’t.
The reason why the TLS is a suitable method of analysis of linear problems (and
nonlinear, with an iterative approach) is that it gives the possibility to explore the validity
of the model used, giving a measure of the change in the sensitivity matrix needed to
have the best solution. Therefore the TLS can give the scientist hints for recognizing
mistakes in the model, while this in the case of LS is forced to work. The TLS is
therefore more suitable since conceptually more correct in the cases where the model
used is extremely empirical and its apriori scientific motivations are weak.

BIBLIOGRAPHY
G. Golub, C. F. Von Loan, Matrix Computations (J. Hopkins University Press, Baltimore,
1996), Chapters 5, 8, 12 and references therein, particularly:
G. Golub, C. F. Von Loan, SIAM, J. Numer. Anal. Vol. 17 No. 6 (1980)
upon which most of this work has been based.
APPENDIX A

If a matrix B∈ℜmxn, m≥n, has rank r, computing the SVD decomposition of B (the SVD
decomposition can be defined also for rectangular matrices) we will find that n-r of its
singular values are zero:

B = UΣV T , Σ = diag (σ 1 ,σ 2 ,..,σ r ,0..0 ), Σ ∈ ℜ mxn

A. 1

The matrix B is not invertible, but it is possible to define B+, the pseudoinverse of B,
defined as:

B + = VΣ +U T , Σ + ≡ diag (1 σ 1 ,1 σ 2 ,..,1 σ r ,0..0), Σ + ∈ ℜ nxm

A. 2

with the property that:

 
BB + = B + B = diag 1{ ,..,0  ∈ ℜ nxn
,1..,1, 0{
 r times n−r times 

A. 3

if rank(B)=n, we have that B + = B T B ( )−1


B T ; if rank(B)=n=m we have that B + = B −1
APPENDIX B

Let’s consider the equation VQ where V = (v k +1 | .. | v n+1 ) , and take its take the transpose:

Q T V T = Q T C = Q T (c1 | .. | cn +1 )

B. 1

where ci is the vector of the ith components of the vectors (vk+1,..,vn+1).


We construct the vector s = cn +1 en +1 − c n+1 (en+1 is the usual unit vector in the (n+1)th

direction) and form the Householder reflection matrix Q=I-2ssT/sTs. Applying this matrix
to the left of the matrix VT=C we obtain:

W T 0 
(
Q C = I − 2 ss
T T T
)
s s (c1 | .. | c n+1 ) =  Y  ,
cn+1 
y

B. 2

that is Q kills the first n elements of the last column. The last element is different from
zero if and only if at least one of the vectors (vk+1,..,vn+1) has a non vanishing (n+1)th
component.
Taking the transpose of B. 2 we obtain the equation (9).
APPENDIX C

Let F be a symmetric (n+1)x(n+1) matrix and let F’ its nxn upper left sub matrix. The
following important property holds:

λ1 ≥ λ1′ ≥ λ 2 ≥ λ 2′ ≥ ... ≥ λ n′ ≥ λ n +1

C. 1

where the λ’s are the eigenvalues of F, the λ’ ’s are the eigenvalues of F’ both ordered in
magnitude.
We remember that the squares of the singular values of a matrix B are the eigenvalues of
BTB. In our case let B= D( A | b )T :

 T1 AT D 2 AT1 T1 AT D 2btn+1 
(D( A | b)T ) (D( A | b)T ) = ((DAT1 | Dbtn+1 ))
T T
((DAT1 | Dbtn+1 )) =  T 2 2 T 2


 t n+1b D AT1 t n+1b D b 

C. 2

We observe that the upper left block is nothing but (DAT1 ) (DAT1 ) , so that the its
T

eigenvalues are interlaced with those of (D( A | b )T ) (D( A | b )T ) . But the eigenvalues of
T

(DAT1 )T (DAT1 ) are nothing but the squares of the singular values of DAT1 , the

eigenvalues of (D( A | b )T ) (D( A | b )T ) are nothing but the squares of singular values of
T

D( A | b )T , and therefore the property of interlacement extends to the singular values of

the two matrices DAT1 and D( A | b )T .

You might also like