You are on page 1of 32

Tutorial

Laura Rebollo-Neira
September 9, 2011

This tutorial is intended to be a pedagogical self-contained introduction to the techniques


developed within the EPSRC founded project: “Highly nonlinear approximations for sparse
signal representation”.

1
List of Symbols
The following notations and symbols will be used without defining them explicitly:

∪ : union
∩ : intersection
⊆ : subset of
⊂ : proper subset of
∈ : belong(s)
N : set of all positive integers
Z : set of all integers
R : field of all real numbers
C : field of all complex numbers
F : field of real or complex numbers
:= : is defined by
=⇒ : imply (implies)
⇐⇒ : if and only if
→ : maps to

The Kronecker symbol is given by


{
1 if i = j
δij =
0 otherwise.

The characteristic function χS of a set S is defined as


{
1 if x ∈ S
χS (x) =
0 otherwise.

For n ∈ N the factorial n! is defined as n! = n(n − 1) · · · 2 · 1. The absolute value of a number


a ∈ F is indicated as |a|. If a ∈ R
{
a if a ≥ 0
|a| =
−a if a < 0.

If a ∈ C its complex conjugate is denoted by a and |a|2 = aa.

Elementary Definitions
Note: Bold face is used when a terminology is defined. Italics are used to emphasize a
terminology or statement.

2
Vector Space
A vector space over a field F is a set V together with two operations vector addition, denoted
v + w ∈ V for v, w ∈ V and scalar multiplication, denoted av ∈ V for a ∈ F and v ∈ V, such
that the following axioms are satisfied:

1. v + w = w + v, v, w ∈ V.

2. u + (v + w) = (u + v) + w, u, v, w ∈ V.

3. There exists an element 0 ∈ V, called the zero vector, such that v + 0 = v, v ∈ V.

4. There exists an element ṽ ∈ V, called the additive inverse of v, such that v + ṽ = 0, v ∈ V.

5. a(bv) = (ab)v, a, b ∈ F and v ∈ V.

6. a(v + w) = av + aw, a ∈ F and v, w ∈ V.

7. (a + b)v = av + bv, a, b ∈ F and v ∈ V.

8. 1v = v, v ∈ V, where 1 denotes the multiplicative identity in F.

The elements of a vector space are called vectors.

Subspaces - Direct sum


A subset S of a vector space V is a subspace of V if it is a vector space with respect to
the vector space operations on V. A subspace which is a proper subset of the whole space
is called a proper subspace. Two subspaces V1 and V2 are complementary or disjoint if
V1 ∩ V2 = {0}.
The sum of two subspaces V1 and V2 is the subspace V = V1 +V2 of elements v = v1 +v2 , v1 ∈
V1 , v2 ∈ V2 . If the subspaces V1 and V2 are complementary V = V1 + V2 is called direct sum
and indicated as V = V1 ⊕V2 . This implies that each element v ∈ V has a unique decomposition
v = v1 + v2 , v1 ∈ V1 , v2 ∈ V2 .
For the sets V1 and V2 , the set {v ∈ V1 : v 6∈ V2 } is denoted by V1 \ V2 .
Linear operators and linear functionals
Let V1 and V2 be vectors spaces. A mapping  : V1 → V2 is a linear operator if

Â(v + w) = Âv + Âw, Â(av) = aÂv,

for all v, w ∈ V1 and a ∈ F . V1 is called the domain of  and V2 its codomain or image. If
the codomain of a linear operator is a scalar field, the operator is called a linear functional
on V1 . The set of all linear functionals on V1 is called the dual space of V1 .
The adjoint of an operator  : V1 → V2 is the unique operator Â∗ satisfying that

hÂg1 , g2 i = hg1 , Â∗ g2 i.

If Â∗ = Â the operator is self-adjoint or Hermitian


An operator  : V1 → V2 has an inverse if there exists Â−1 : V2 → V1 such that

Â−1 Â = IˆV1 and ÂÂ−1 = IˆV2 ,

3
where IˆV1 and IˆV2 denote the identity operators in V1 and V2 , respectively. By a generalised
inverse we shall mean an operator † satisfying the following conditions

†  = Â
† † = † .

The unique generalized inverse satisfying

(† )∗ = †
(† Â)∗ = † Â.

is known as the Moore Penrose pseudoinverse.


Frames and bases for finite a dimension vector space
If v1 , . . . , vn are some elements of a vector space V, by a linear combination of v1 , . . . , vn
we mean an element in V of the form a1 v1 + · · · + an vn , with ai ∈ F, i = 1, . . . , n.
Let S be a subset of element of V. The set of all linear combinations of elements of S is
called the span of S and is denoted by span S.
A subset S = {vi }ni=1 of V is said to be linearly independent if and only if

a1 v1 + · · · + an vn = 0, =⇒ ai = 0, i = 1, . . . , n.

A subset is said to be linearly dependent if it is not linearly independent.


S is said to be a basis of V if it is linearly independent and span S = V. The dimension
of a finite dimensional vector space V is the number of elements in a basis for V. The number
of elements in a set is termed the cardinality of the set.
If the number of elements spanning a finite dimensional vector space V is larger than the
number of elements of a basis for the same space, the set is called a redundant frame. In
other words a redundant frame (henceforth called simply a frame) for a finite dimensional vector
space V is a linearly dependent set of vectors spanning V.
Let {vi }ni=1 be a spanning set for V. Then every v ∈ V can be expressed as

v = a1 v1 + · · · , a n vn , with, ai ∈ F, i = 1, . . . , n.

If the spanning set {vi }ni=1 is a basis for V the numbers ai ∈ F, i = 1, . . . , n in the above
decomposition are unique. In the case of a redundant frame, however, these numbers are not
longer unique. For further discussion about redundant frames in finite dimension see [1].
Normed vector space
A norm k · k on a vector space V is a function from V to R such that for every v, w ∈ V
and a ∈ F the following three properties are fulfilled
1. kvk ≥ 0, and kvk = 0 ⇐⇒ v = 0.

2. kavk = |a|kvk.

3. kv + wk ≤ kvk + kwk.

A vector space V together with a norm is called a normed vector space.

4
Inner product space
An inner product on a vector space V is a map from V to F which satisfies the following
axioms
1. hv, vi ≥ 0, v ∈ V, and hv, vi = 0 ⇐⇒ v = 0.

2. hv + w, zi = hv, zi + hw, zi, v, w, z ∈ V.

3. hv, azi = ahv, zi, v, z ∈ V and a ∈ F.

4. hv, wi = hw, vi, v, w ∈ V.

A vector space V together with an inner product h·, ·i is called an inner product space.

Orthogonality
Two vectors v and w in an inner product space are said to be orthogonal if hv, wi = 0. If,
in addition, kvk = kwk = 1 they are orthonormal.
Two subspaces V1 and V2 are orthogonal if hv1 , v2 i = 0 for all v1 ∈ V1 and v2 ∈ V1 . The
sum of two orthogonal subspaces V1 and V2 is termed orthogonal sum and will be indicated
as V = V1 ⊕⊥ V2 . The subspace V2 is called the orthogonal complement of V1 in V. Equiv-
alently, V1 is the orthogonal complement of V2 in V.

The spaces Rn , L2 [a, b] and C k [a, b]


The Euclidean space Rn is an inner product space with inner product defined by

hx, yi = xi y i + . . . xn y n ,

with x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). The norm kxk is induced by the inner product
1 1 1
kxk = hx, xi 2 = (xi xi + . . . xn xn ) 2 = (|xi |2 + . . . + |xn |2 ) 2 .

The space L2 [a, b] is an inner product space of functions on [a, b] with inner product defined
by ∫ b
hf, gi = f (t)g(t) dt
a
and norm (∫ ) 12
b
1
kf k = hf, f i =
2 |f (t)|2 dt .
a
k
The space C [a, b] is the space of functions on [a, b] having continuous derivatives up to
order k ∈ N. The space of continuous functions on [a, b] is denoted as C 0 [a, b].

Signal Representation, Reconstruction, and Projection


Regardless of its informational content, in this tutorial we consider that a signal is an element
1
of an inner product space H with norm induced by the inner product, k · k = h·, ·i 2 . Moreover,
we assume that all the signals of interest belong to some finite dimensional subspace V of

5
H spanned by the set {vi ∈ H}M
i=1 . Hence, a signal f can be expressed by a finite linear
superposition
∑M
f= ci vi ,
i=1

where the coefficients ci , i = 1, . . . , M , are in F.


We call measurement or sampling to the process of transforming a signal into a number.
Hence a measure or sample is a functional. Because we restrict considerations to linear
measures the associated functional is linear and can be expressed as

m = hw, f i for some w ∈ H.

We refer the vector w to as measurement vector.


Considering M measurements mi , i = 1, . . . , M , each of which is obtained by a measurement
vector wi , we have a numerical representation of f as given by

mi = hwi , f i, i = 1, . . . , M.

Now we want to answer the question as to whether it is possible to reconstruct f ∈ V from


these measurements. More precisely, we wish to find the requirements we need to impose upon
the measurement vectors wi , i = 1, . . . , M , so as to use the concomitant measures hwi , f i, i =
1, . . . , M , as coefficients for the signal reconstruction, i.e., we wish to have


M ∑
M
f= c i vi = hwi , f ivi . (1)
i=1 i=1

By denoting

M
Ê = vi hwi , ·i, (2)
i=1

where the operation hwi , ·i indicates that Ê acts by taking inner products, (1) is written as

f = Êf.

As will be discussed next, the above equation tells us that the measurement vectors wi , i =
1, . . . , M , should be such that the operator Ê is a projector onto V.

Projectors
An operator Ê : H → V is a projector if it is idempotent, i.e.,

Ê 2 = Ê.

As a consequence, the projection is onto R(Ê), the range of the operator, and along N (Ê), the
null space of the operator. Let us recall that

R(Ê) = {f, such that f = Êg, g ∈ H}.

Thus, if f ∈ R(Ê),
Êf = Ê 2 g = Êg = f.

6
This implies that Ê behaves like the identity operator for all f ∈ R(Ê), regardless of N (E),
which is defined as
N (E) = {g, such that Êg = 0, g ∈ H}.
It is clear then that to reconstruct a signal f ∈ V by means of (1) the involved measurement
vectors wi , i = 1, . . . , M , that we shall also called henceforth duals, should give rise to an
operator of the form (2), which must be a projector onto V. Notice that the required operator
is not unique, because there exist many projectors onto V having different N (Ê). Thus, for
reconstructing signals in the range of the projector its null space can be chosen arbitrarily.
However, the null space becomes extremely important when the projector acts on signals outside
its range. A popular projector is the orthogonal one.

Oblique and Orthogonal Projector


When N (Ê) happens to be equal to R(Ê)⊥ , which indicates the orthogonal complement of
R(Ê), the projector is called orthogonal projector onto R(Ê). This is the case if and only
if the projector is self adjoint.
A projector which is not orthogonal is called an oblique projector and we need two
subscripts to represent it. One subscript to indicate the range of the projector and another to
represent the subspace along which the projection is performed. Hence the projector onto V
along W ⊥ is indicated as ÊVW ⊥ .
The particular case ÊVV ⊥ corresponds to an orthogonal projector and we use the special
notation P̂V to indicate such a projector. When a projector onto V is used for signal processing,
W ⊥ can be chosen according to the processing task. The examples below illustrate two different
situations.

Example 1
Let us assume that the signal processing task is to approximate a signal f ∈ H by a signal
fV ∈ V. In this case, one normally would choose fV = P̂V f because this is the unique signal in
V minimizing the distance kf − fV k. Indeed, let us take another signal g in V and write it as
g = g + P̂V f − P̂V f . Since f − P̂V f is orthogonal to every signal in V we have

kf − gk2 = kf − g + P̂V f − P̂V f k2 = kf − P̂V f k2 + kP̂V f − gk2 .

Hence kf − gk is minimized if g = P̂V f .


Remark 1. Any other projection would yield a distance kf − ÊVW ⊥ f k which satisfies [2]
1
kf − P̂V f k ≤ kf − ÊVW ⊥ f k ≤ kf − P̂V f k,
cos(θ)

where θ is the minimum angle between the subspaces V and W. The equality is attained for
V = W, which corresponds to the orthogonal projection.

7
Example 2
Assume that the signal f to be analyzed here is the superposition of two signals, f = f1 +f2 ,
each component being produced by a different phenomenon we want to discriminate. Let us
assume further that f1 ∈ V and f2 ∈ W ⊥ with V and W ⊥ disjoint subspaces. Thus, we can
obtain, f1 say, from f , by an oblique projector onto V and along W ⊥ . The projector will map
to zero the component f2 to produce

f1 = ÊVW ⊥ f.

Construction of Oblique Projectors for signal splitting


Oblique projectors in the context of signal reconstruction were introduced in [2] and further an-
alyzed in [3]. The application to signal splitting, also termed structured noise filtering, amongst
a number of other applications, is discussed in [4]. For advanced theoretical studies of oblique
projector operators in infinite dimensional spaces see [5, 6]. We restrict our consideration to
numerical constructions in finite dimension, with the aim of addressing the problem of signal
splitting when the problem is ill posed.
Given V and W ⊥ disjoint, i.e., such that V ∩ W ⊥ = {0}, in order to provide a prescription
for constructing ÊVW ⊥ we proceed as follows. Firstly we define S as the direct sum of V and
W ⊥ , which we express as
S = V ⊕ W ⊥.
Let W = (W ⊥ )⊥ be the orthogonal complement of W ⊥ in S. Thus we have

S = V ⊕ W ⊥ = W ⊕⊥ W ⊥ .

The operations ⊕ and ⊕⊥ are termed direct and orthogonal sum, respectively.
Considering that {vi }M
i=1 is a spanning set for V a spanning set for W is obtained as

ui = vi − P̂W ⊥ vi = P̂W vi , i = 1, . . . , M.

Denoting as {ei }M i=1 the standard orthonormal basis in F , i.e., the Euclidean inner product
M

hei , ej i = δij , we define the operators V̂ : F → V and Û : FM → W as


M


M ∑
M
V̂ = vi hei , ·i, Û = ui hei , ·i.
i=1 i=1

Thus the adjoint operators Û ∗ and V̂ ∗ are



M ∑
M
∗ ∗
V̂ = ei hvi , ·i, Û = ei hui , ·i.
i=1 i=1

It follows that P̂W V̂ = Û and Û ∗ P̂W = Û ∗ hence, Ĝ : CM → CM defined as:

Ĝ = Û ∗ V̂ = Û ∗ Û

is self-adjoint and its matrix representation, G, has elements

gi,j = hui , vj i = hP̂W ui , vj i = hui , P̂W uj i = hui , uj i, i, j = 1, . . . , M.

8
From now on we restrict our signal space to be S, since we would like to build the oblique
projector ÊVW ⊥ onto V and along W ⊥ having the form


M
ÊVW ⊥ = vi hwi , ·i. (3)
i=1

Clearly for the operator to map to zero every vector in W ⊥ the dual vectors {wi }M
i=1 must span
⊥ ⊥
W = (W ) = span{ui }i=1 . This entails that for each wi there exists a set of coefficients
M

{bi,j }M
j=1 such that

M
wi = bi,j ui , (4)
i=1

which guarantees that every wi is orthogonal to all vectors in W ⊥ and therefore W ⊥ is included
in the null space of ÊVW ⊥ . Moreover, since every signal, g say, in S can be written as g =
gV + gW ⊥ with gV ∈ V and gW ⊥ ∈ W ⊥ , the fact that ÊVW ⊥ g = 0 implies gV = 0. Hence,
g = gW ⊥ , which implies that the null space of ÊVW ⊥ restricted to S is W ⊥ .
2
In order for ÊVW ⊥ to be a projector it is necessary that ÊVW ⊥ = ÊVW ⊥ . As will be shown

in the next proposition, if the coefficients bi,j are the matrix elements of a generalised inverse
of the matrix G this condition is fulfilled.
Proposition 1. If the coefficients bi,j in (4) are the matrix elements of a generalised inverse
of the matrix G, which has elements gi,j = hvi , uj i, i, j = 1, . . . , M , the operator in (3) is a
projector.
Proof. For the measurement vectors in (4) to yield a projector of the form (3), the corresponding
operator should be idempotent, i.e.,


M ∑
M ∑
M ∑
M ∑
M ∑
M
vi bi,j huj , vn ibn,m hum , ·i = vi bi,j huj , ·i. (5)
n=1 m=1 i=1 j=1 i=1 j=1

Defining

M ∑
M
B̂ = ei bi,j hej , ·i (6)
i=1 j=1

and using the operators V̂ and Û ∗ , as given above, the left hand side in (5) can be expressed as

V̂ B̂ ∗ Û ∗ V̂ B̂ ∗ Û ∗ (7)

and the right hand side as


V̂ B̂ ∗ Û ∗ . (8)
Assuming that B̂ ∗ is a generalised inverse of (Û ∗ V̂ ) indicated as B̂ ∗ = (Û V̂ )† it satisfies (c.f.
Section )
B̂ ∗ (Û ∗ V̂ )B̂ ∗ = B̂ ∗ , (9)
and therefore, from (7), the right hand side of (5) follows. Since B̂ ∗ = (Û ∗ V̂ )† = Ĝ† and
Ĝ∗ = Ĝ, we have B̂ = Ĝ† . Hence, if the elements bi,j determining B̂ in (6) are the matrix
elements of a generalised inverse of the matrix representation of Ĝ, the corresponding vectors
{wi }ni=1 obtained by (4) yield an operator of the form (3), which is an oblique projector.

9
Property 1. Let ÊVW ⊥ be the oblique projector onto V and along W ⊥ and P̂W the orthogonal
projector onto W = (W ⊥ )⊥ . Then the following relation holds

P̂W ÊVW ⊥ = P̂W .

Proof. ÊVW ⊥ given in (3) can be recast, in terms of operator V̂ and Û ∗ , as:

ÊVW ⊥ = V̂ (Û ∗ V̂ )† Û ∗ .

Applying P̂W both sides of the equation we obtain:

P̂W ÊVW ⊥ = P̂W V̂ (Û ∗ V̂ )† Û ∗ = Û (Û ∗ V̂ )† Û ∗ = Û (Û ∗ Û )† Û ∗ ,

which is a well known form for the orthogonal projector onto R(Û ) = W.

Remark 2. Notice that the operative steps for constructing an oblique projector are equivalent
to those for constructing an orthogonal one. The difference being that in general the spaces
span{vi }M i=1 = V and span{wi }M i=1 = W are different. For the special case ui = vi , i =
1, . . . , M , both sets of vectors span V and we have an orthogonal projector onto V along V ⊥ .

Example 3
Suppose that the chirp signal in the first graph of Fig. 1 is corrupted by impulsive noise
belonging to the subspace

W ⊥ = span{yj (t) = e−100000(t−0.05j) , t ∈ [0, 1]}200


2
j=1 .

The chirp after being corrupted by a realization of the noise consisting of 95 pulses taken
randomly from elements of W ⊥ is plotted in the second graph of Fig. 1.
Consider that the signal subspace is well represented by V given by

V = span{vi+1 (t) = cos πit, t ∈ [0, 1]}M =99


i=0 .

In order to eliminate the impulsive noise from the chirp we have to compute the measurement
vectors {wi }100
i=1 , here functions of t ∈ [0, 1], determining the appropriate projector. For this we
first need a representation of P̂W ⊥ , which is obtained simply by transforming the set {yj }200 j=1
into an orthonormal set {oj }200
j=1 to have


200
P̂W ⊥ = oj hoj , ·i.
j=1

The function for constructing an orthogonal projector in a number of different ways is Orth-
Proj.m.
With P̂W ⊥ we construct vectors


200
ui+1 (t) = cos πit − oj (t)hoj (t), cos πiti, i = 0, . . . , 99, t ∈ [0, 1].
j=1

10
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 1: Chirp signal (first graph). Chirp corrupted by 95 randomly taken pulses (middle
graph). Chirp denoised by oblique projection (last graph).

The inner products involved in the above equations and in the elements, gi,j , of matrix G
∫ 1
gi+1,j+1 = ui+1 (t) cos πjt dt, i = 0, . . . , 99, j = 0, . . . , 99
0

are computed numerically. This matrix has an inverse, which is used to obtain functions
{wi (t), t ∈ [0, 1]}100
i=1 giving rise to the required oblique projector. The chirp filtered by such a
projector is depicted in the last graph of Fig. 1.

Example 4
Here we deal with an image of a poster in memory of the Spanish Second Republic shown
in the first picture of Fig. 2. This image is an array of 609 × 450 pixels that we have processed
row by row. Each row is a vector Ii ∈ R450 , i = 1, . . . , 609. The image is affected by structured
noise produced when random noise passes through a channel characterized by a given matrix
A having 160 columns and 450 rows. The model for each row Ioi ∈ R609 of the noisy image is

Ioi = Ii + Ahi , i = 1, . . . , 450,

where each hi is a random vector in R160 . The image plus noise is represented in the middle
graph of Fig. 2. In order to denoised the image we consider that every row Ii ∈ R450 is well
represented in a subspace V spanned by discrete cosines. More precisely, we consider Ii ∈ V for
i = 1, . . . , 450, where
{ ( ) }290
π(2j − 1)(i − 1)
V = span xi = cos , j = 1, . . . , 609 .
2L i=1

The space of the noise is spanned by the 160 vectors in R450 corresponding to the columns of
the given matrix A. The image, after being filtered row by row by the oblique projector onto
V and along the space of the noise, is depicted in the last graph of Fig. 2.
Possible constructions of oblique projector
Notice that the oblique projector onto V is independent of the selection of the spanning set for
W. The possibility of using different spanning sets yields a number of different ways of com-
puting ÊVW ⊥ , all of them theoretically equivalent, but not necessarily numerically equivalent
when the problem is ill posed.

11
Figure 2: Image of a poster in memory of the Spanish Second Republic (first picture). Image
plus structured noise (middle picture). The image obtained from the middle picture by an
oblique projection (last picture).

Given the sets {vi }M i=1 and {ui }i=1 we have considered the following theoretically equivalent
M

ways of computing vectors {wi }M i=1 .



i) wi = M j=1 g̃i,j uj , where g̃i,j is the (i, j)-th element of the inverse of the matrix G having
elements gi,j = hui , vj i, i, j = 1, . . . , M .

ii) Vectors {wi }M i=1 are as in i) but the matrix elements of G are computed as gi,j =
hui , uj i, i, j = 1, . . . , M .
0 0
iii) Orthonormalising {ui }M
i=1 to obtain {qi }i=1 M ≤ M vectors {wi }i=1 are then computed
M M

as
∑M
wi = g̃i,j qj ,
j=1

with gi,j = hqi , vj i, i, j = 1, . . . , M .

iv) Same as in iii) but gi,j = hqi , uj i, i, j = 1, . . . , M .

Moreover, considering that ψn ∈ FM , n = 1, . . . , M , are the eigenvectors of Ĝ and assuming that


there exist N nonzero eigenvalues on ordering these eigenvalues in descending order λn , n =
1, . . . , N , we can express the matrix elements of the Moore-Penrose pseudo inverse of G as:



N
1 ∗
gi,j = ψn (i) ψ (j), (10)
n=1
λn n

with ψn (i) the i-th component of ψn . Moreover, the orthonormal vectors

Ŵ ψn √
ξn = , σn = λn , n = 1, . . . , N (11)
σn

12
are singular vectors of Ŵ , which satisfies Ŵ ∗ ξn = σn ψn , as it is immediate to verify. By defining
now the vectors ηn , n = 1, . . . , N as

V̂ ψn
ηn = , n = 1, . . . , N, (12)
σn

the projector ÊVW ⊥ in (3) is recast


N
ÊVW ⊥ = ηn hξn , ·i. (13)
n=1

Inversely, the representation (3) of ÊVW ⊥ arises from (13), since


N
1 ∗
wi = ξn ψ (i), i = 1, . . . , M. (14)
n=1
σn n

Proposition 2. The vectors ξn ∈ W, n = 1, . . . , N and ηn ∈ V, n = 1, . . . , N given in (11)


and (12) are biorthogonal to each other and span W and V, respectively.
The proof of this proposition can be found in [7] Appendix A.
All the different numerical computations for an oblique projector discussed above can be
realized with the routine ObliProj.m.

The need of nonlinear approaches


Even when the subspaces V and W ⊥ are ‘theoretically’ complementary, in practice, due to the
calculations being performed in finite precision arithmetics, the inaccuracy in the computation
of the corresponding projector may cause the failure to correctly separate signals in V and W ⊥ .
The next example illustrates the situation.

Example 5
Let V be the cardinal cubic spline space with distance 0.01 between consecutive knots, on the
interval [0, 1]. This is a subspace of dimension M = 103, which we span using a B-spline basis

B = {Bi (x), x ∈ [0, 1]}103


i=1

The functions Bi (x) in V are obtained by translations of a prototype function and the restriction
to the interval [0, 1]. A few of such functions are plotted in the left graph of Fig. 3.
Taking, randomly, 30 B-splines {B`i }30i=1 from B we simulate a signal by a weighted super-
position of such functions, i.e.,

30
fV (x) = c`i B`i (x), x ∈ [0, 1], (15)
i=1

with the coefficients c`i randomly chosen from [0, 1].


We simulate a background by considering it belongs to the subspace W ⊥ spanned by the
set of functions
Y = {yj (x) = (x + 0.01j)−0.01j , x ∈ [0, 1]}50
j=1 .

13
0.036

0.3 0.035

0.25 0.034

0.2 0.033

0.15 0.032

0.1 0.031

0.05 0.03

0 0.029
0.1 0.15 0.2 0.25 0.3 0 0.2 0.4 0.6 0.8 1

Figure 3: Left graph: cubic B spline functions, in the rage x ∈ [0.1, 0.3], from the set spanning
the space of the signal response. Right graph: tree of the functions spanning the space of the
background.

A few functions from this set are plotted in the right graph of Fig. 3 (normalized to unity on
[0, 1]). The background, g(x) is generated by the linear combination

50
g(x) = j 4 e−0.05j yj (x) (16)
j=1

To simulate the data we have perturbed the superposition of (15) and (16), by ‘very small’
Gaussian errors (of variance up to 0.00001% the value of each data point). The simulated data
are plotted in the left graph of Fig. 4.
This example is very illustrative of how sensitive to errors the oblique projection is. The
subspace we are dealing with are disjoint: the last five singular values of operator Ŵ ∗ (c.f. (11))
are:
0.3277, 0.3276, 1.0488 × 10−4 , 6.9356 × 10−8 , 2.3367 × 10−10 ,
while the first is σ1 = 1.4493. The smallest singular value cannot be considered a numerical
representation of zero, when the calculations are being carried out in double precision arith-
metic. Hence, one can assert that the condition V ∩ W ⊥ = {0} is fulfilled. However, due to
the three small singular values the oblique projector along W ⊥ onto the whole subspace V is
very unstable, which causes the failure to correctly separate signals in V from the background.
The result of applying the oblique projector onto the signal of the left graph is represented
by the broken line in the right graph. As can be observed, the projection does not yield the
required signal, which is represented by the continuous dark line in the same graph. Now,
since the spectrum of singular values has a clear jump (the last three singular values are far
from the previous ones) it might seem that one could regularize the projection by truncation
of singular values. Nevertheless, such a methodology turns out to be not appropriate for the
present problem, as it does not yield the correct separation. Propositions 3 below analyzes the
effect that regularization by truncation of singular values produces in the resulting projection.
Proposition 3. Truncation of the expansion (13) to consider up to r terms, produces an oblique
projector along W̃ ⊥ = W̃r⊥ + W̃0 + Ṽ0 , with W̃r⊥ = span{|ξi i}ri=1 , W̃0 = span{|ξi i}N
i=r+1 and
Ṽ0 = span{|ηi i}i=r+1 onto Ṽr = span{|ηi i}i=1 .
N r

14
0.8 0.8

0.7 0.6

0.6 0.4

0.5 0.2

0.4 0

−0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 4: Left graph: signal plus background. Right graph: the dark continuous line cor-
responds to the signal to be discriminated from the one in the left graph. The broken line
corresponds to the approximation resulting from the oblique projection. The three closed lines
correspond to the approximations obtained by truncation of one, two, and three singular values.

The proof of these propositions can be found in [7] Appendix B. For complete studies of a
projector when V ∩ W ⊥ 6= {0} see [8] and [9].
This example illustrates, very clearly, the need for nonlinear approaches: We know that a
unique and stable solution does exist, since the signal which is to be discriminated from the
background actually belongs to a subspace of the given spline space, and the construction of
the oblique projectors onto such a subspace is well posed. Whoever, the lack of knowledge
about the subspace prevents the separation of the signal components by a linear operation.
A possibility to tackle the problem is to transform it into the one of finding the subspace to
which the sought signal component belongs to. In this way the problem can be addressed by
nonlinear techniques.

Recursive updating/downdating of oblique projectors


Here we provide the equations for updating and downdating oblique projectors in order to
account for the following situations:
Let us consider that the oblique projector ÊVk W ⊥ onto the subspace Vk = span{vi }ki=1 along
a given subspace W ⊥ is known. If the subspace Vk is enlarged to Vk+1 by the inclusion of
one element, i.e., Vk+1 = span{vi }k+1i=1 , we wish to construct ÊVk+1 W ⊥ from the availability of
ÊVk W ⊥ . On the other hand, if the subspace Vk = span{vi }ki=1 is reduced by the elimination
of one element, say the j-th one, we wish to construct the corresponding oblique projector
ÊVk\j W ⊥ from the knowledge of ÊVk W ⊥ . The subspace W ⊥ is assumed to be fixed. Its orthogonal
complement Wk in Hk = Vk ⊕ W ⊥ changes with the index k to satisfy Hk = Wk ⊕⊥ W ⊥ .

15
Updating the oblique projector ÊVk W ⊥ to ÊVk+1 W ⊥
We assume that ÊVk W ⊥ is known and write it in the explicit form


k
ÊVk W ⊥ = vi hwik , ·i. (17)
i=1

In order to inductively construct the duals wik+1 , i = 1, . . . , k + 1 we have to discriminate two


possibilities

i) Vk+1 = span{vi }k+1


i=1 = span{vi }i=1 = Vk , i.e., vk+1 ∈ Vk .
k

i=1 ⊃ span{vi }i=1 = Vk , i.e. vk+1 ∈


ii) Vk+1 = span{vi }k+1 k
/ Vk .

Case i)

Proposition 4. Let vk+1 ∈ Vk and vectors wik in (17) be given. For an arbitrary vector
yk+1 ∈ H the dual vectors wik+1 computed as

wik+1 = wik − huk+1 , wik iyk+1 (18)


k+1
for i = 1, . . . , k and wk+1 = yk+1 produce the identical oblique projector as the dual vectors
k
wi , i = 1, . . . , k.

Case ii)

Proposition 5. Let vector vk+1 ∈


/ Vk and vectors wik , i = 1, . . . , k in (17) be given. Thus the
dual vectors wik+1 computed as

wik+1 = wik − wk+1


k+1
huk+1 , wik i, (19)
γk+1
k+1
where wk+1 = kγk+1 k2
with γk+1 = uk+1 −P̂Wk uk+1 , provide us with the oblique projector ÊVk+1 W ⊥ .

The proof these propositions are given in [10]. The codes for updating the dual vectors are
FrInsert.m and FrInsertBlock.m.

Property 2. If vectors {vi }ki=1 are linearly independent they are also biorthogonal to the dual
vectors arising inductively from the recursive equation (19).

The proof of this property is in [10].

Remark 3. If vectors {vi }ki=1 are not linearly independent the oblique projector ÊVk W ⊥ is not
unique. Indeed, if {wik }ki=1 are dual vectors giving rise to ÊVk W ⊥ then one can construct infinitely
many duals as:
∑k
ỹi = wik + yi − yj hvj , wik i i = 1, . . . , k, (20)
j=1

where yi , i = 1, . . . , k are arbitrary vectors in H (see [10]).

16
Downdating the oblique projector ÊVk W ⊥ to ÊVk\j W ⊥
Let us suppose that by the elimination of the element j the subspace Vk is reduced to Vk\j =
span{vi }ki=1 . In order to give the equations for adapting the corresponding dual vectors gener-
i6=j

ating the oblique projector ÊVk\j W ⊥ we need to consider two situations:

i) Vk\j = span{vi }ki=1 = span{vi }ki=1 = Vk i.e., vj ∈ Vk\j .


i6=j

ii) Vk\j = span{vi }ki=1 ⊂ span{vi }ki=1 = Vk , i.e., vj 6∈ Vk\j .


i6=j

Case i)
Proposition 6. Let ÊVk W ⊥ be given by (17) and let us assume that removing vector vj from
the spanning set of Vk leaves the identical subspace, i.e., vj ∈ Vk\j . Hence, if the remaining
dual vectors are modified as follows:

huj , wik iwjk


wi k\j = wik + , (21)
1 − huj , wjk i

the corresponding oblique projector does not change, i.e. ÊVk\j W ⊥ = ÊVk W ⊥ .

Case ii)
Proposition 7. Let ÊVk W ⊥ be given by (17) and let us assume that the vector vj to be removed
from the spanning set of Vk is not in Vk\j . In order to produce the oblique projector ÊVk\j W ⊥ the
appropriate modification of the dual vectors can be achieved by means of the following equation

wjk hwjk , wik i


wi k\j = wik − . (22)
kwjk k2

The proof these propositions are given in [10]. The code for updating the vectors are
FrDelete.m.

Signals discrimination by subspace selection


We discuss now the possibility of extracting the signal fV , from a mixture f = fV + fV when
the subspaces V and W ⊥ are not well separated and the oblique projector onto V along W ⊥
fails to yield the right signals separation. For this we introduce the following hypothesis on the
class of signals to be considered:
We assume that the signal of interest is K-sparse in a spanning set for V
This implies that given {vi }M i=1 there exists a subset of elements characterized by the set of
indices J, of cardinality K < M , spanning the subspace Ṽ = span{v` }`∈J and such that
fV = ÊṼW ⊥ f. Thus, the hypothesis generates an, in general, intractable problem because
the
(M ) number of possible subspaces spanned by K vectors out of M is a combinatorial number
M!
K
= (M −K)!K!
.
The techniques developed within the project aim at reducing complexity by making the
search for the right subspace signal dependent.

17
Given a signal f , and assuming that the subspaces W ⊥ and V = span{vi }M i=1 , are known, the
goal is to find {v` }`∈J ⊂ {vi }M
i=1 spanning Ṽ and such that ÊṼW ⊥ f = ÊVW ⊥ f . The cardinality of
the subset of indices J should be such that construction of ÊṼW ⊥ is well posed. This assumption
characterizes the class of signals that can be handled by the proposed approaches.
Under the stated hypothesis, if the subspace Ṽ were known, one would have

ÊVW f = ÊṼW ⊥ f = v` hw` , f i. (23)
`∈J

However, if the computation of ÊVW ⊥ is an ill posed problem, which is the situation we are
considering, ÊVW ⊥ f is not available. In order to look for the subset of indices J yielding Ṽ
one may proceed as follows: Applying P̂W on every term of (23) and using the properties
P̂W ÊVW ⊥ = P̂W and P̂W ÊṼW ⊥ = P̂W̃ , where W̃ = span{u` }`∈J , (23) becomes

P̂W f = P̂W̃ f = u` hw` , f i. (24)
`∈J

Since W ⊥ is given and P̂W f = f − P̂W ⊥ f , the left hand side of (23) is available and we can
search for the set {u` }`∈J ⊂ {ui }M
i=1 , in a stepwise manner by adaptive pursuit approaches.

Oblique Matching Pursuit (OBMP)


The criterion we use for the forward recursive selection of the set {v` }`∈J ⊂ {vi }M i=1 yielding
the right signal separation is in line with the consistency principle introduced in [2] and extended
in [3]. Furthermore, it happens to coincide with the Optimize Orthogonal Matching Pursuit
(OOMP) [12] approach applied to find the sparse representation of the projected signal P̂W f
using the dictionary {ui }Mi=1
By fixing P̂Wk , at iteration k + 1 we select the index `k+1 such that kP̂W |f i − P̂Wk+1 |f ik2 is
minimized.
Proposition 8. Let us denote by J the set of indices {`1 , . . . , `k }. Given Wk = span{|u`i i}ki=1 ,
the index `k+1 corresponding to the atom |u`k+1 i for which kP̂W f − P̂Wk+1 f k2 is minimal is to
be determined as
|hγn |f i|
`k+1 = arg max , kγn ik 6= 0, (25)
n∈J\Jk kγn k

with γn = un − P̂Wk un and Jk the set of indices that have been previously chosen to determine
Wk .
γn hγn ,f i
Proof. It readily follows since P̂Wk+1 f = P̂Wk f + kγn k2
and hence

|hγn , f i|2
kP̂W f − P̂Wk+1 f k2 = kP̂W f k2 − kP̂Wk f k2 − .
kγn k2
|hγn ,f i|
Because P̂W f and P̂Wk f are fixed, kP̂W f − P̂Wk+1 f k2 is minimized if kγn k
, kγn k 6= 0 is maximal
over all n ∈ J \ Jk .
The original OBMP selection criterion proposed in [11] selects the index `k+1 as the maxi-
mizer over n ∈ J \ Jk of
|hγn , f i|
, kγn k =
6 0.
kγn k2

18
This condition was proposed in [11] based on the consistency principle which states that the
reconstruction of a signal should be self consistent in the sense that, if the approximation is
measured with the same vectors, the same measures should be obtained (see [2,3]). Accordingly,
k+1
the above OBMP criterion was derived in [11] in order to select the measurement vector wk+1
producing the maximum consistency error ∆ = |hwk+1 k+1
, f − ÊVk W ⊥ f i|, with regard to a new
k+1
measurement wk+1 . However, since the measurement vectors are not normalized to unity, it
is sensible to consider the consistency error relative to the corresponding vector norm kwk+1 k+1
k,
and select the index so as to maximize over k + 1 ∈ J \ Jk the relative consistency error

˜ = |hwk+1 , f − ÊVk W ⊥ f i| ,
k+1
∆ kwk+1
k+1
k 6= 0. (26)
kwk+1
k+1
k

In order to cancel this error, the new approximation is constructed accounting for the concomi-
tant measurement vector.

Property 3. The index `k+1 satisfying (25) maximizes over k + 1 ∈ J \ Jk the relative consis-
tency error (26)
k+1
Proof. Since for all vector wk+1 given in (19) hwk+1k+1
, ÊVk W ⊥ f i = 0 and kwk+1
k+1
k = kγk+1 k−1 we
have
˜ |hwk+1
k+1
, f i| |hγk+1 , f i|
∆= = .
kwk+1 k
k+1 kγk+1 k
˜ over k + 1 ∈ J \ Jk is equivalent to (25).
Hence, maximization of ∆
It is clear at this point that the forward selection of indices prescribed by proposition (25)
is equivalent to selecting the indices by applying OOMP [12] on the projected signal P̂W f using
the dictionary {ui }Mi=1 . The routine for implementing the pursuit strategy for subspace selection
according to criterion (25) is OBMP.m. An example of application is given in .

Implementing corrections
Let us discuss now the possibility of correcting bad moves in the forward selection, which
is specially necessary when dealing with ill posed problems. Indeed, assume we are trying to
approximate a signal which is K-sparse in a given dictionary, and the search for the right atoms
become ill posed after the iteration, r, say, with r > K. The r-value just indicates that it is
not possible to continue with the forward selection, because the computations would become
inaccurate and unstable. Hence, if the right solution was not yet found, one needs to implement
a strategy accounting for the fact that it is not feasible to compute more than r measurement
vectors. A possibility is provided by the swapping-based refinement to the OOMP approach
introduced in [13]. It consists of interchanging already selected atoms with nonselected ones.
Consider that at iteration r the correct subspace has not appeared yet and the selected
indices are labeled by the r indices `1 , . . . , `r . In order to choose the index of the atom that
minimizes the norm of the residual error as passing from approximation P̂Wr f to approximation
P̂Wr\j f we should fix the index of the atom to be deleted, `j say, as the one for which the quantity

|cri |
, i = 1, . . . , r. (27)
kwir k

19
is minimized [13, 14].
The process of eliminating one atom from the atomic decomposition is called backward step
while the process of adding one atom is called forward step. The forward selection criterion to
choose the atom to replace the one eliminated in the previous step is accomplished by finding
the index `i , i = 1, . . . , r for which the functional

|hνn , f i|
en = , with νn = un − P̂Wr\j un , kνn k 6= 0 (28)
kνn k

is maximized. In our framework, using (22), the projector P̂Wr\j is computed as

hwir , wjr ihwjr , ·i


P̂Wr\j = P̂Wr − .
kwjr k2

The swapping of pairs of atoms is repeated until the swapping operation, if carried out, would
not decrease the approximation error. The implementation details for an effective realization of
this process are given in [13]. Moreover, the process has been extended to include the swapping
of more than a pair of atoms. This possibility is of assistance in the application of signal
splitting, see [15]. A number of codes for implementing correction to Pursuit Startegis can be
found in Pursuits.

Sparse representation by minimization of the q−norm like


quantity - Handling the ill posed case.
The problem of finding the sparsest representation of a signal for a given dictionary is equivalent
to minimization of the zero norm kck0 (or counting measure) which is defined as:


M
kck0 = |ci |0 ,
i=1

where ci are the coefficients of the atomic decomposition


M
fV = c i vi . (29)
i=1

Thus, kck0 is equal to the number of nonzero coefficients ∑ in (29). However, sice minimiza-
tion of kck0 is numerically intractable, the minimization of M i=1 |ci | , for 0 < q ≤ 1 has been
q
∑M
considered [16]. Because the minimization of i=1 |ci |q , 0 < q < 1 does not lead to a convex
optimization problem,
∑M the most popular norm to minimize, when a sparse solution is required,
is the 1-norm i=1 |ci |. Minimization of the 1-norm is considered the best convex approx-
imant to the minimizer of kcik0 [17, 18]. In the context of signals splitting already stated,
we ∑are not particularly concerned about convexity so we have considered the minimization
of M i=1 |ci | , 0 < q ≤ 1, allowing for uncertainty in the available data [7]. This was imple-
q

mented by a recursive process for incorporating constrains, which is equivalent to the procedure
introduced in [19] and applied in [20].

20
Managing the constraints
Without loss of generality we assume here that the measurements on the signal in hand are given
by the values the signal takes at the sampling points xj , j = 1, . . . , N . Thus, the measures are
obtained from fW as fW (xj ), j = 1, . . . , N and the functionals ui (xj ), j = 1, . . . , N from vectors
ui , i = 1, . . . , M . Since the values fW (xj ), j = 1, . . . , N arise from observations or experiments
o
they are usually affected by errors therefore we use the notation fW (xj ), j = 1, . . . , N for the
observed data and request that the model given by the r.h.s. of (29) satisfies the restriction

N
o
(fW (xj ) − fW (xj ))2 ≤ δ, (30)
j=1

δ accounting for the data’s error. Nevertheless, rather than using directly this restriction
as constraints of the q−normq we handle the available information using an idea introduced
much earlier, in [19], and applied in [20] for transforming the constraint (30) into∑
linear equality
constraints. Replacing fW (xj ) by (29), the condition of minimal square distance N o
j=1 (fW (xj )−
fW (xj ))2 leads to the so called normal equations:

M
hun , fW
o
i = ci hun , ui i, n = 1 . . . , M. (31)
i=1

Of course, since we are concerned with ill posed problems we cannot use all these equations to
find the coefficients ci , i = 1, . . . , M . However, as proposed in [19], we could use ‘some’ of these
equations as constraints of our optimization process. The number of such equations being the
necessary to reach ∑ the condition (30). We have then transformed the original problem into the
one of minimizing M i=1 |ci | , 0 < q ≤ 1, subject to a number of equations selected from (31),
q

the `n -th, n = 1 . . . , r ones say. In line with [19] we select the subset of equations (31) in an
iterative fashion. We start by the initial estimation ci = C, i = 1, . . . , M , where the constant
C is determined by minimizing the distant between the model and the data. Thus,
∑M
hun , fWo
i
C = ∑M n=1 ∑M . (32)
i=1 n=1 hu i , u n i
With this initial estimation we ‘predict’ the normal equations (31) and select as our first
constraint the worst
∑M predicted by the initial solution, let this equation be the `1 -th one. We
then minimize i=1 |ci | subject to the constraint
q


M
hu`1 , fW
o
i = ci hu`1 , ui i, (33)
i=1
(1)
and indicate the resulting coefficients as ci ,
i = 1, . . . , M . With these coefficients we predict
(2)
equations (31) and select the worst predicted as a new constraint to obtain ci , i = 1, . . . , M
and so on. The iterative process is stopped when the condition (30) is reached.
The numerical example discussed next has been solved by recourse to the method for min-
imization of the (q-norm)q published in [16]. Such an iterative method, called FOCal Under-
determined System Solver (FOCUSS) in that publication, is straightforward implementable.
It evolves by computation of pseudoinverse matrices, which under the given hypothesis of our
problem, and within our recursive strategy for feeding the constraints, are guaranteed to be
numerically stable (for a detailed explanation of the method see [16]). The routine for imple-
menting the proposed strategy is ALQMin.m.

21
Numerical Simulation
We test the proposed approaches, first on the simulation of Example and then extend that
simulation to consider a more realistic level of uncertainty in the data. Let us remark that the
signal is meant to represent an emission spectrum consisting of the superposition of spectral
lines (modeled by B-spline functions of support 0.04) which are centered at the positions (n −
1)∆, n = 0, . . . , 102, with ∆ = 0.01. Since the errors in the data in Example are not significant,
both OBMP and the procedure outlined in the previous section accurately recovers the spectrum
from the background, with any positive value of the q-parameter less than or equal to one. The
result (coinciding with the theoretical one) is shown in the right hand top graph of Fig. 5.
Now we transform the example into a more realistic situation by adding larger errors to the
data. In this case, the data set is perturbed by Gaussian errors of variance up to 1% of each
data point. Such a piece of data is plotted in the left middle graph of Fig. 3 and the spectrum
extracted by the q−norm like approach (for q = 0.8) is represented by the broken line in the
right middle graph of Fig. 5. The corresponding OBMP approach is plotted in the first graph
of Fig. 6 and is slightly superior.
Finally we increase the data’s error up to 3% of each data point (left bottom graph of
Fig. 5) and, in spite of the perceived significant distortion of the signal, we could still recover
a spectrum which, as shown by the broken line in the right bottom graph of Fig.5 is a fairly
good approximation of the true one (continuous line). The OBMP approach is again superior,
as can be observed in the second graph of Fig. 6. Experiments for different realization of the
errors (with the same variance) have produced results essentially equivalent. The same is true
for other realizations of the signal.

Adaptive non-uniform B-spline dictionaries


In this work, with Zhiqiang Xu [21],we consider the sparse representation matter for the large
class of signals which are amenable to satisfactory approximation in spline spaces [?,22]. Given
a signal, we have the double goal of a) finding a spline space for approximating the signal
and b) constructing those dictionaries for the space which are capable of providing a sparse
representation of such a signal. In order to achieve both aims we have first discussed the
construction of dictionaries of B-spline functions for non-uniform partitions.

Background and notations


We refer to the fundamental books [23, 25] for a complete treatment of splines. Here we simply
introduce the notation and basic definitions which are needed in the present context.

Definition 1. Given a finite closed interval [c, d] we define a partition of [c, d] as the finite
set of points

∆ := {xi }N
i=0 , N ∈ N, such that c = x0 < x1 < · · · < xN < xN +1 = d.
+1
(34)

We further define N subintervals Ii , i = 0, . . . , N as: Ii = [xi , xi+1 ), i = 0, . . . , N − 1 and


IN = [xN , xN +1 ].

22
0.8 0.3

0.75
0.25
0.7
0.2
0.65

0.6 0.15

0.55
0.1
0.5
0.05
0.45
0
0.4

0.35 −0.05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.8 0.3

0.75
0.25
0.7
0.2
0.65

0.6 0.15

0.55
0.1
0.5
0.05
0.45
0
0.4

0.35 −0.05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.8 0.3

0.75
0.25
0.7
0.2
0.65

0.6 0.15

0.55
0.1
0.5
0.05
0.45
0
0.4

0.35 −0.05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 5: Top left graph: signal plus background. Top right graph: Recovered signal Middle left
graph: signal distorted up to to 1%. Middle right graph: q-norm like approach approximation
(broken line) Bottom graphs: Same description as in the previous graphs but the data distorted
up to 3%.

Definition 2. Let Πm be the space of polynomials of degree smaller than or equal to m ∈ N0 =


N ∪ {0}, i.e.,
∑m }
{Πm (x) : Πm (x) = a i xi , x ∈ R ,
i=0

and define
Sm (∆) = {f ∈ C m−2 [c, d] : f |Ii ∈ Πm−1 , i = 0, . . . , N }, (35)

23
0.3 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0

−0.05 −0.05
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 6: Same description as the right middle and bottom graphs of Fig. 5 but applying the
BOMP method discussed in Section

where f |Ii indicates the restriction of the function f on the interval Ii .

The construction of nonuniform dictionaries arises from the next proposition [21]

Theorem 1. Suppose that ∆j , j = 1, . . . , n are partitions of [c, d]. Then

Sm (∆1 ) + · · · + Sm (∆n ) = Sm (∪nj=1 ∆j ).

Building B-spline dictionaries


Let us start by recalling that an extended partition with single inner knots associated with
˜ = {yi }2m+N such that
Sm (∆) is a set ∆ i=1

ym+i = xi , i = 1, . . . , N, x1 < · · · < xN

and the first and last m points y1 ≤ · · · ≤ ym ≤ c, d ≤ ym+N +1 ≤ · · · ≤ y2m+N can be


arbitrarily chosen. With each fixed extended partition ∆˜ there is associated a unique B-spline
basis for Sm (∆), that we denote as {Bm,j }j=1 . The B-spline Bm,j can be defined by the
m+N

recursive formulae [23]:


{
1, tj ≤ x < tj+1 ,
B1,j (x) =
0, otherwise,
x − yj yj+m − x
Bm,j (x) = Bm−1,j (x) + Bm−1,j+1 (x).
yj+m−1 − yj yj+m − yj+1

The following theorem paves the way for the construction of dictionaries for Sm (∆). We use
the symbol # to indicate the cardinality of a set.

Theorem 2. Let ∆j , j = 1, . . . , n be partitions of [c, d] and ∆ = ∪nj=1 ∆j . We denote the


(j)
B-spline basis for Sm (∆j ) as {Bm,k : k = 1, . . . , m + #∆j }. Accordingly, a dictionary, Dm (∆ :
∪nj=1 ∆j ), for Sm (∆) can be constructed as
(j)
Dm (∆ : ∪nj=1 ∆j ) := ∪nj=1 {Bm,k : k = 1, . . . , m + #∆j },

24
so as to satisfy
span{Dm (∆ : ∪nj=1 ∆j )} = Sm (∆).
When n = 1, Dm (∆ : ∆1 ) is reduced to the B-spline basis of Sm (∆).

Remark∑4. Note that the number of functions in the above defined dictionary is equal to
n · m + nj=1 #∆j , which is larger than dim Sm (∆) = m + #∆. Hence, excluding the trivial
case n = 1, the dictionary constitutes a redundant dictionary for Sm (∆).

According to Theorem 2, to build a dictionary for Sm (∆) we need to choose n-subpartitions


∆j ∈ ∆ such that ∪nj=1 ∆j = ∆. This gives a great deal of freedom for the actual construc-
tion of a non-uniform B-spline dictionary. Fig.7 shows some examples which are produced by
generating a random partition of [0, 4] with 6 interior knots. From an arbitrary partition

∆ := {0 = x0 < x1 < · · · < x6 < x7 = 4},

we generate two subpartitions as

∆1 := {0 = x0 < x1 < x3 < x5 < x7 = 4}, ∆2 := {0 = x0 < x2 < x4 < x6 < x7 = 4}

and join together the B-spline basis for ∆1 (light lines in the right graphs of Fig.7) and ∆2
(dark lines in the same graphs)

Application to sparse signal representation


Given a signal, f say, we address now the issue of determining a partition ∆, and sub-partitions
∆j , j = 1, . . . , n, such that: a) ∪nj=1 ∆j = ∆ and b) the partitions are suitable for generating a
sparse representation of the signal in hand. As a first step we propose to tailor the partition
to the signal f by setting ∆ taking into account the critical points of the curvature function of
the signal, i.e., ( )0
f 00
T := {t : (t) = 0}.
(1 − f 02 )3/2
Usually the entries in T are chosen as the initial knots of ∆. In order to obtain more knots
we apply subdivision between consecutive knots in T thereby obtaining a partition ∆ with
the decided number of knots. An algorithm for implementing such procedure can be found
in [21]. According to Theorem 2, in order to build a dictionary for Sm (∆) we need to choose
n-subpartitions ∆j ∈ ∆ such that ∪nj=1 ∆j = ∆. As an example we suggest a simple method
for producing n-subpartitions ∆j ∈ ∆, which is used in the numerical simulations of the next
section. Considering the partition ∆ = {x0 , x1 , . . . , xN +1 } such that c = x0 < x1 < · · · <
xN +1 = d, for each integer j in [1, n] we set

∆j := {c, d} ∪ {xk : k ∈ [1, N ] and k mod n = j − 1},

e.g. if N = 10 and n = 3, we have ∆1 = {c, x3 , x6 , x9 , d}, ∆2 = {c, x1 , x4 , x7 , x10 , d}


∆3 = {c, x2 , x5 , x8 , d}. The codes for creating a partition adapted to a given signal are Produ-
cePartition.m and FinalProducePartition.m and the one code for creating the dictionaries for
the space is CutDic.m.

25
1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 7: Examples of bases (graphs on the left) and the corresponding dictionaries (right
graphs) for a random partition The top graphs correspond to linear B-splines (m = 2). The
bottom graphs involve cubic B-splines (m = 4).

Numerical examples
We produce here an example illustrating the potentiality of the proposed nonuniform dictio-
naries for achieving sparse representations by nonlinear techniques. The signals we consider
are the chirp signal f1 and the seismic signal f2 plotted in the Fig. 8.
We deal with the chirp signal f1 on the interval [0, 8], by discretizing it into L = 2049
samples and applying Algorithm 1 to produce the partition. The resulting number of knots
is 1162, which is enough to approximate the signal, by a cubic B-spline basis for the space,
within the precision tol1 = 0.01kf1 k. A dictionary D4 (∆ : ∪10 j=1 ∆j ) for the identical space is
constructed by considering 10 subpartitions, which yield N1 = 1200 functions.
The signal f2 is a piece of L = 513 data. A partition of cardinality 511 is obtained as ∆ =
T (f1 , 8) and the dictionary of cubic splines we have used arises by considering 3 subpartitions,
which yields a dictionary D4 (∆ : ∪3j=1 ∆j ) of cardinality N2 = 521.
Denoting by αni , n = 1, . . . , Ni the atoms of the ith-dictionary, we look now for the subsets
of indices Γi , i = 1, 2 of cardinality Ki , i = 1, 2 providing us with a sparse representation of the

26
1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8

Figure 8: Chirp signal f1 (left). Seismic signal f2 (right).

signals. In other words, we are interested in the approximations



fiKi = cin αni , i = 1, 2.
n∈Γi

such that kfiKi − fi k ≤ toli , i = 1, 2 and the values Ki , i = 1, 2 are satisfactory small for the
approximation to be considered sparse. Since the problem of finding the sparsest solution is
intractable, for all the signals we look for a satisfactory sparse representation using the same
greedy strategy, which evolves by selecting atoms through stepwise minimization of the residual
error as follows.

i) The atoms are selected one by one according to the Optimized Orthogonal Matching
Pursuit (OOMP) method [12] until the above defined tolerance for the norm of the residual
error is reached.

ii) The previous approximation is improved, without greatly increasing the computational
cost, by a ‘swapping refinement’ which at each step interchanges one atom of the atomic
decomposition with a dictionary atom, provided that the operation decreases the norm
of the residual error [13].

iii) A Backward-Optimized Orthogonal Matching Pursuit (BOOMP) method [14] is applied


to disregard some coefficients of the atomic decomposition, in order to produce an ap-
proximation up to the error of stage i). The last two steps are repeated until no further
swapping is possible. The routine implementing the steps is OOMPFinalRefi.m.

The described technique is applied to all the non-orthogonal dictionaries we have considered for
comparison with the proposed approach. The results are shown in Table 1. In the first column
we place the dictionaries to be compared. These are: 1) the spline basis for the space adapted
to the corresponding signal. 2) The dictionary for the identical spaces consisting of functions
of larger support. 3) The orthogonal cosine bases used by the discrete cosine transform (dct).
4) The semi-orthogonal cardinal Chui-Wang spline wavelet basis [26] and 5) the Chui-Wang
cardinal spline dictionary for the same space [27]. Notice that whilst the non-uniform spline

27
Table 1: Comparison of sparsity performance achieved by selecting atoms from the non-uniform
bases and dictionaries for adapted spline space (1st and 2nd rows), dft (3rd row), cardinal
wavelet bases and dictionaries (4th and 5th rows).

Dictionaries K 1 (signal f1 ) K 2 (signal f2 )


Non-uniform spline basis 1097 322
Non-uniform spline dictionary 173 129
Discreet cosine transform 263 208
Cardinal Chui-Wang wavelet basis 246 201
Cardinal Chui-Wang wavelet dictionary 174 112

space is adapted to the corresponding signal, only the dictionary for the space achieves the
sparse representation. Moreover the performance is superior to that of the Chui-Wang spline
wavelet basis [26] and similar to the cardinal Chui-Wang dictionary, which is known to render
a very good representation for these signals [27]. However, whilst the Chui-Wang cardinal
spline wavelet dictionaries introduced in [27] are significantly redundant with respect to the
corresponding basis (about twice as larger) the non-uniform B-spline dictionaries introduced
here contain a few more functions than the basis. Nevertheless, as the examples indicate, the
improvement in the sparseness of the approximation a dictionary may yield with respect to the
B-spline basis is enormous.

Application to filtering low frequency noise from a seismic signal


A common interference with broadband seismic signals is produced by long waves, generated by
known or unknown sources, called infragravity waves [28–30]. Such an interference is refereed
to as low frequency noise, because it falls in a frequency range of up to 0.05 Hz. Thus, in [31]
we consider that the model of the subspace of this type of structured noise, on a signal given
by L = 403 samples, is
2πn(i−1)
W ⊥ = span{eı L , i = 1, . . . , L}21
n=−21 . (36)

The particular realization of the noise we have simulated is plotted in the left graph of Fig 9.
However, the success of correct filtering does not depend on the actual form of the noise (as
long as it belongs to the subspace given in (36)) because the approach we consider guarantees
the suppression of the whole subspace W ⊥ . The seismic signal to be discriminated from the
noisy one is a piece of the signal in the right graph of Fig 8. The middle graph in Fig 9
depicts the signal f which is formed by adding both components. Constructing P̂W ⊥ we obtain
fW = f − P̂W ⊥ f and use the routine ProducePartition.m to find a partition adapted to this
projection. In this case, to succeed in modelling a signal subspace complementary to W ⊥
we needed a basis for the nonuniform spline space and a regularized version of the FOCUSS
algorithm [32]. The result is plotted in the right graph of Fig 9. The implementation details
can be found in [31].

28
0.5 0.5 1.5

0 0 1

−0.5 −0.5
0.5
−1 −1
0
−1.5 −1.5
−0.5
−2 −2

−2.5 −2.5 −1

−3 −3 −1.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 9: Simulated low frequency noise (left) Signal plus noise (middle) Approximation recov-
ered from the middle graph as explained in [31]

Sparsity and ‘something else’


We present here a ‘bonus’ of sparse representations by alerting that they can be used for
embedding information. Certainly, since a sparse representation entails a projection onto a
subspace of lower dimensionality, it generates a null space. This feature suggests the possibility
of using the created space for storing data. In particular, in a work with James Bowley [35],
we discuss an application involving the null space yielded by the sparse representation of an
image, for storing part of the image itself. We term this application ‘Image Folding’.
Consider that by an appropriate technique one finds a sparse representation of an image. Let
{v`i }K
i=1 be the K-dictionary’s atoms rendering such a representation and VK the space they
span. The sparsity property of a representation implies that VK is a subspace considerably
smaller than the image space RN ⊗ RN . We can then construct a complementary subspace
W ⊥ , such that RN ⊗ RN = VK ⊕ W ⊥ , and compute the dual vectors {wiK }K i=1 yielding the
oblique projection onto VK along W ⊥ . Thus, the coefficients of the sparse representation can
be calculated as:
ci = hwiK , Ii, i = 1, . . . , K. (37)
Now, if we take a vector in F ∈ W ⊥ and add it to the image forming the vector G = I + F to
replace I in (37), since F is orthogonal to the duals {wiK }K
i=1 , we still have

hwiK , Gi = hwiK , I + F i = hwiK , Ii = ci , i = 1, . . . , K. (38)


This suggests the possibility of using the sparse representation of an image to embed the image
with additional information stored in the vector F ∈ W ⊥ . In order to do this, we apply the
earlier proposed scheme to embed redundant representations [34], which in this case operates
as described below.

Embedding Scheme
We can embed H numbers hi , i = 1, . . . , H into a vectors F ∈ W ⊥ as follows.
• Take an orthonormal basis zi , i = 1, . . . , H for W ⊥ and form vector F as the linear
combination
∑H
F = hi zi .
i=1

• Add F to I K to obtain G = I K + F
Information Retrieval
Given G retrieve the numbers hi , i = 1, . . . , H as follows.

29
Figure 10: The small picture at the top is the folded Image. The left picture below is the
unfolded image without knowledge of the private key to initialize the permutation. The next
is the unfolded picture when the correct key is used.

• Use G to compute the coefficients of the sparse


∑K representation of I as in (38). Use this
˜K K K
coefficients to reconstruct the image I = i=1 ci vi

• Obtain F from the given G and the reconstructed I˜K as F = G − I˜K . Use F and the
orthonormal basis zi , i = 1, . . . , H to retrieve the embedded numbers hi , i = 1, . . . , H as

hi = hzi , F i, i = 1, . . . , H

One can encrypt the embedding procedure simply by randomly controlling the order of the
orthogonal basis zi , i = 1, . . . , H or by applying some random rotation to the basis, requiting
a key for generating it. An example is given in the next section.

Application to Image Folding


In order to apply the embedding scheme to fold an image we simply process the image by
dividing it into, say q blocks, Iq of Nq ⊗Nq pixels. We find the representation of each block with a
combination of discrete cosine and spline based subdictionaries constructed as explained in [35].
With this dictionaries the image of Fig 10 has a highly sparse representation at PSNR=40 dB
(the CR is approximately 11) so that we are able to store the whole image embedding the few
pixels shown in the top picture of Fig 10. The next picture is the unfolded image without
using the security key and the right picture the one obtained with the correct key. For further
implementation details see [33]. An example of how to call the available codes for sparse image
approximation with cosine and spline based dictionaries can be found here.

30
References
[1] O. Christensen, Fames and Bases an introductory course, Birkhauser (2008).

[2] M. Unser, A. Aldroubi, A general sampling theory for nonideal acquisition devices, IEEE
Trans. Signal Processing 42 (1994) 2915–2925.

[3] Y. Eldar, Sampling with arbitrary sampling and reconstruction spaces and oblique dual
frame vectors, Journal of Fourier Analysis and Applications 9 (2003) 77–96.

[4] R. Behrens, L. Scharf, Signal processing applications of oblique projection operators, IEEE
Transactions on Signal Processing 42 (1994) 1413–1424.

[5] G. Corach, A. Maestripieri, D Stojanoff, A Classification of Projectors, Banach Center


Publ. 67 (2005), 145–160.

[6] G. Corach, A. Maestripieri, D Stojanoff, Projections in operators ranges, Proc. Amer.


Math. Soc. 134 (2005), no 3, 765–788.

[7] L. Rebollo-Neira, A. Plastino, Nonlinear non-extensive approach for identification of struc-


tured information, Phyica A, in press (2009).

[8] A. Hirabayashi, M. Unser, Consistent sampling and signal recovery, IEEE Transactions on
Signal Processing 55 (2007) 4104–4115.

[9] A. Cerny, Characterization of the oblique projector U (V U )† V with application to con-


strained least squares, Linear Algebra and Its Applications, (2009)

[10] L. Rebollo-Neira, Constructive updating/downdating of oblique projectors: a generaliza-


tion of the Gram–Schmidt process, Journal of Physics A: Mathematical and Theoritcal 40
(2007) 6381–6394.

[11] L. Rebollo-Neira, Oblique matching pursuit, IEEE Signal Processing Letters 14 (10) (2007)
703–706.

[12] L. Rebollo-Neira, D. Lowe, Optimized orthogonal matching pursuit approach, IEEE Signal
Processing Letters 9 (2002) 137–140.

[13] M. Andrle, L. Rebollo-Neira, A swapping-based refinement of orthogonal matching pursuit


strategies, Signal Processing 86 (2006) 480–495.

[14] M. Andrle, L. Rebollo-Neira, E. Sagianos, Backward-optimized orthogonal matching pur-


suit approach, IEEE Signal Proc. Let. 11 (2004) 705–708.

[15] L. Rebollo-Neira, Measurements design and phenomena discrimination, J. Phys. A: Math.


Theor. 42 (2009) 165210.

[16] B.D. Rao and K. Kreutz-Delgado, An Affine Scaling Methodology for Best Basis Selection,
IEEE Trans. Sig. Proc. 47 (1999) 187–200.

[17] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal
on Scientific Computing 20 (1998) 33–61.

31
[18] S.S. Chen and D.L. Donoho and M.A. Saunders, Atomic Decomposition by Basis Pursuit,
SIAM Rev. 43 (2001) 129–156.
[19] L. Rebollo-Neira, A. Constantinides, A. Plastino, F. Zyserman, A. Alvarez, R. Bonetto,
H. Viturro, Statistical inference, state distribution, and noisy data, Physica A 198 (1993)
514–537
[20] L. Rebollo-Neira, A. G. Constantinides, A. Plastino, A. Alvarez, Bonetto, M. Iñiguez
Rodriguez, Statistical analysis of a mixed-layer x-ray diffraction peak, Journal of Physics
D, 30, 17 (1997) 2462–2469.
[21] L. Rebollo-Neira, Z. Xu, Adaptive non-uniform B-spline dictionaries on a compact interval,
arXiv:0908.0691v1 [math.FA]
[22] M. Unser, Splines. A perfect fit for signal and image processing, IEEE Signal Processing
Magazine, 22–38 (1999).
[23] L. L. Schumaker, Spline Functions: Basic Theory, Wiley, New-York, 1981.
[24] C. K. Chui, Multivariate splines, SIAM, Philadelphia, 1988.
[25] Carl De Boor, A Practical Guide to Splines, Springer, New York, 2001.
[26] C. Chui, J. Wang, On compactly supported spline wavelets and a duality principle, Trans.
Amer. Math. Soc. 330 (1992) 903–915.
[27] M. Andrle, L. Rebollo-Neira, From cardinal spline wavelet bases to highly coherent dictio-
naries, Journal of Physics A 41 (2008) 172001.
[28] R. D. Kosian, N. V. B. Pykhov, B. L. Edge, Coastal processes in tideless seas, ASCE
Publications, 2000.
[29] D. Dolenc, B. Romanowicz, B. Uhrhammer, P. McGill, D. Neuhauser, D. Stakes,
Identifying and removing noise from the Monterey ocean bottom broadband seis
mic station (MOBB) data, Geochem. Geophys. Geosyst., 8, (2007) Q02005,
doi:10.1029/2006GC001403.
[30] W. C. Crawford, S. C. Webb, Identifying and removing tilt noise from low-frequency (< 0.1
Hz) seafloor vertical seismic data, Bull. Seism. Soc. Am., 90, 952-963, 2000.
[31] Z. Xu, L. Rebollo-Neira, A. Plastino, Subspace modeling for structured noise suppression,
arXiv:0908.1020v1 [math-ph] (2009).
[32] B.D. Rao, K. Engan, S. F. Cotter, J. Palmer, K. Kreutz-Delgado, Subset selection in noise
based on diversity measure minimization, IEEE Transactions on Signal Processing, 51, 3
(2003) 760– 770, 10.1109/TSP.2002.808076.
[33] J. Bowley, L. Rebollo-Neira, Sparsity and ‘something else’, (2009).
[34] J.Miotke, L. Rebollo-Neira Oversampling of Fourier Coefficients for Hiding Messages, Ap-
plied and Computational Harmonic Analysis , 16,3, (2004) 203–207.
[35] J. Bowley, L. Rebollo-Neira, Sparse image representation by discrete cosine/spline based
dictionaries, (2009).

32

You might also like