Professional Documents
Culture Documents
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
Abstract We present a new algorithm for nonlinear dimension- and the connectivity matrix A = (aij ), where aij = 1 if
ality reduction that consistently uses global information, and that zi and zj are neighbors and zero otherwise. Hereinafter, we
enables understanding the intrinsic geometry of non-convex mani- write M referring to both the discrete and the continuous
folds. Compared to methods that consider only local information, our
method appears to be more robust to noise. Unlike most methods that manifold. NLDR algorithms usually approximate local (short)
incorporate global information, the proposed approach automatically distances on the data manifold by the Euclidean distances in
handles non-convexity of the data manifold. We demonstrate the the embedding space, (zi , zj ) = zi zj 2 , for i, j such
performance of our algorithm and compare it to state-of-the-art that aij = 1. The geodesic distances are approximated as
methods on synthetic as well as real data. graph distances, which can be expressed as a sum of local
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
Keywords Dimensionality reduction, manifold learning, multidi- distances. The NLDR problem can be formulated as nding
mensional scaling, geodesic distance, boundary detection. a set of coordinates {x1 , ..., xN } = 1 ({z1 , ..., zN }) in Rm
that describe the data.
I. I NTRODUCTION Most NLDR methods minimize criteria that consider the
N ONLINEAR dimensionality reduction (NLDR) algo- relationship of each point and its nearest neighbors. For
rithms explain a given data set of high dimensionality, example, the locally linear embedding (LLE) algorithm [8]
in terms of a small number of variables or coordinates. Such attempts to express each point as a linear combination of
methods are used in various pattern recognition problems, its neighbors. The deviation of each point from this linear
including pathology tissue analysis [1], motion understanding combination is summed over the manifold and used as a
[2], lip reading [3], speech recognition [4], enhancement of penalty function. The coordinates that minimize the penalty
MRI images [5], and face recognition [6]. are then computed by solving an eigenvalue problem.
Most NLDR algorithms map the data to a coordinate system The Laplacian eigenmaps algorithm [9] uses as intrinsic co-
of given dimensionality that represents the given data while ordinate functions the minimal eigenfunctions of the Laplace-
minimizing some error measure. Unlike classical dimension- Beltrami operator. This is done by constructing the Laplacian
ality reduction methods such as principal component analysis matrix of the proximity graph, nding its smallest m non-
(PCA) [7], the map is non-linear. zero eigenvectors and using them as the coordinates of the
The data is usually assumed to arise from a manifold M, data points. Diffusion maps have been recently proposed as
embedded into a high-dimensional Euclidean space RM . The an extension of Laplacian eigenmaps, able to compensate for
manifold M is assumed to have a low intrinsic dimension non-uniform sampling of the manifold [1].
m (m M ), i.e., it has a parametrization in a subset C The Hessian eigenmaps algorithm [10], computes coordi-
of Rm , represented by the smooth bijective map : C nate functions that minimize the Frobenius norm of the Hes-
Rm M. The geodesic distances : M M R, dened sian, summed over the manifold. The algorithm expresses, for
as the lengths of the shortest paths on M (called geodesics), each coordinate function, the sum of the quadratic components
represent the intrinsic structure of the data. The goal of NLDR at each point. The minimization result in an eigenvalue prob-
is, given M, to recover the parametrization in Rm . lem, whose minimal vectors provide the desired coordinate
The intrinsic dimension m is usually assumed to be known vectors, similarly to Laplacian eigenmaps.
a priori. We denote the data samples by zi , i I a point on The semidenite embedding algorithm [11], takes a different
M, where I is a set of continuous indices. approach, trying to maximize the variance of the data set in
In the discrete setting, the data is represented as a graph its new coordinates, while preserving short distances. This is
whose vertices z1 , ..., zN are nite samples of the manifold, done by solving a semidenite programming (SDP) problem,
while preservation of local distances imposed as constraints.
Manuscript received September 10, 2006; revised October 1, 2006. Solving the resulting SDP problem, however, still involves
G. Rosman is with the department of Computer Science, Tech-
nion Israel Institute of Technology, Haifa 32000, Israel (email: ros- high computational cost. Attempts to lower the complexity
man@cs.technion.ac.il). have been made in [12].
A. M. Bronstein is with the department of Computer Science, Tech- Unlike local methods, the Isomap algorithm [13], [27] , tries
nion Israel Institute of Technology, Haifa 32000, Israel (email:
bron@cs.technion.ac.il). to preserve a global invariant the geodesic distances on the
M. M. Bronstein is with the department of Computer Science, data manifold. While the geodesics may change dramatically
Technion Israel Institute of Technology, Haifa 32000, Israel (email: even in case of small noise, for well-sampled manifolds, their
mbron@cs.technion.ac.il).
R. Kimmel is with the department of Computer Science, Technion Israel lengths (i.e., the geodesic distances) hardly change even in the
Institute of Technology, Haifa 32000, Israel (email: ron@cs.technion.ac.il). presence of high level of noise. This property may be useful
International Scholarly and Scientific Research & Innovation 2(10) 2008 3534 scholar.waset.org/1999.4/13076
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
in analysis of noisy data, in which local methods often fail. et al. [16] proved that the graph distances converge to the true
A multidimensional scaling (MDS) algorithm is used to nd geodesic distances, i.e., that the discretization is consistent.
a set of coordinates whose Euclidean distances approximate The Isomap algorithm assumes that the parametrization C
the geodesic distances. The least squares MDS (LSMDS) of M is a convex subset of Rm , and relies on the isometry
algorithm, for example, minimizes the stress [14], assumption to nd the map from M to the metric space (C, dC )
by means of MDS (the stress in the solution will be zero).
X = argmin
2
wij (dij (X) ij ) , MDS can be used because dC = dRm |C due to the convexity
XRN m i<j
assumption. In the case when C is non-convex, this is not
Here X = (xij ) is an N m matrix whose rows are the necessarily true, as there may exist pairs of points for which
coordinate vectors in the low-dimensional Euclidean space dC
= dRm |C . We call such pairs inconsistent. An example of
Rm , ij = (zi , zj ) and dij (X) = xi xj 2 is the Euclidean such a pair is shown in Figure 1. We denote the set of all
distance between points xi and xj in Rm . consistent pairs by
The underlying assumption of Isomap is that M is isometric P = {(i, j) : dC (xi , xj ) = dRm |C (xi , xj )} I I.
to C Rm with the induced metric dC , that is, (zi , zj ) =
dRm (xi , xj ) for all i, j = 1, ..., N . If C is convex, the restricted In the TCIE algorithm, steps 2 and 3 are used to nd a
metric dRm |C coincides with the induced metric dC and Isomap subset P P of pairs of points that will be consistently
succeeds recovering the parametrization of M. Otherwise, C used in the MDS problem, using criteria (1) and (2), rst
proposed in [17] for matching of partially-missing shapes. In
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
International Scholarly and Scientific Research & Innovation 2(10) 2008 3535 scholar.waset.org/1999.4/13076
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
1 for i = 1, ..., N do
2 Find the set N (i) of the K nearest neighbors of
the point i.
3 Apply MDS to the K K matrix
K = (klN (i) ) and obtain a set of local
coordinates x1 , ..., xK Rm .
xj xi ,xk xi
4 for j, k N (i) such that x x x x 1
j i k i
do
Fig. 1. Example of two inconsistent points z1 , z2 M, and the geodesic 5 Mark the pair (j, k) as valid.
x xi ,vl
connecting them. Also shown are the two images of these points under the
6 if |x : x x 1| a |N (i)| for all
isometry 1 , a geodesic connecting them in C, and the line connecting them i
in Rm . l = 1, ..., m 1 then
7 Label the pair (j, k) as satisfied.
(here vl denotes the lth vector of an
orthonormal basis of the subspace of Rm
originating from xi to xj intersects the boundary C at points orthogonal to xj xk ).
x and x ). Consequently, 8 end
9 end
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
International Scholarly and Scientific Research & Innovation 2(10) 2008 3536 scholar.waset.org/1999.4/13076
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
[14], D. Multiresolution optimization
X(k+1) = V B(X(k) )X(k) , Another way to accelerate the solution of the MDS problem
is using multiresolution (MR) methods [20]. The main idea is
where V denotes matrix pseudoinverse, subsequently approximating the solution by solving the MDS
problem at different resolution levels. At each level, we work
w ij i =
j with a grid consisting of points with indices L L1
vij =
k =i vik i = j, ... 0 = {1, ..., N }, such that |l | = Nl . At the lth level,
the data is represented as an Nl Nl matrix l , obtained by
and B(X) is an N N matrix dependent of X with elements, extracting the rows and columns of 0 = , corresponding
1 to the indices l . The solution X of the MDS problem on
dS (si , sj )dij (X) i
= j and dij (X)
= 0 the lth level is transferred to the l next level l 1 using an
bij (X) = 0 i
= j and dij (X) = 0 interpolation operator P l1 , which can be represented as an
l
k =i bik i = j. Nl1 Nl matrix.
The SMACOF iteration produces a monotonous non-
increasing sequence of stress values, and can be shown to be 1 Construct the hierarchy of grids 0 , ..., L and
equivalent to a scaled steepest descent iteration with constant interpolation operators P10 , ..., PLL1 .
step size [20].
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
(0)
2 Start with some initial XL at the coarsest grid, and
l = L.
3 while l 0 do
C. Vector extrapolation
4 Solve the lth level MDS problem
To speed up the convergence of the SMACOF iterations, we
employ vector extrapolation. These methods use a sequence of Xl = argmin wij (dij (Xl ) ij )2
Xl RNl m i,jl
solutions at subsequent iterations of the optimization algorithm
and extrapolate the limit solution of the sequence. While these using SMACOF iterations initialized with Xl .
(0)
algorithms were derived assuming a linear iterative scheme, in 5 Interpolate the solution to the next resolution
practice, they work well also for nonlinear schemes, like some (0)
level, Xl1 = Pll1 (Xl )
processes in computational uid dynamics [21]. For further
6 l l 1
details, we refer the reader to [22], [23], [24].
7 end
The main idea of vector extrapolation is, given a sequence
of solutions X(k) from iterations k = 0, 1, ..., to approximate
the limit limk X(k) , which must coincide with the optimal We use a modication of the farthest point sampling (FPS)
solution X . The extrapolation X is constructed as an afne [25] strategy to construct the grids, in which we add more
combination of previous iterates, points from the boundaries, to allow correct interpolation of
the ne grid using the coarse grid elements. We use linear
K
K
X = j X(k+j) ; j = 1. interpolation with weights determined using a least squares
j=0 j=0 tting problem with regularization made to ensure all available
nearest neighbors are used.
The coefcients j are determined in different ways. In the The multiresolution scheme can be combined with vector
reduced rank extrapolation (RRE) method, j are obtained by extrapolation by employing MPE or RRE methods at each
the solution of the minimization problem, resolution level. In our experiments we used the RRE method,
although in practice, for the SMACOF algorithm, both the
K
K
MPE and the RRE algorithms gave comparable results, giving
min j X(k+j) , s.t. j = 1,
0 ,..,K
j=0 j=0
us a three-fold speedup. A comparison of the convergence with
and without vector extrapolation and multiresolution methods
where X(k) = X(k+1) X(k) . In the minimal polynomial is shown in Figure 2. The stress values shown are taken from
extrapolation (MPE) method, the problem shown in Figure 4.
cj
j = K , j = 0, 1, ..., K,
i=0 ci E. Initialization
where ci arise from the solution of the minimization problem, Since the stress function is non-convex, convex optimization
method may converge to local minima. In order to avoid
K
min cj X(k+j) , cK = 1, local convergence, we initialized the LSMDS problem by
c0 ,..,cK1
j=0
classical scaling result [26]. Although such an initialization
does not guarantee global convergence in theory, in practice,
which in turn can be formulated as a linear system [24]. we converge to the global minimum.
International Scholarly and Scientific Research & Innovation 2(10) 2008 3537 scholar.waset.org/1999.4/13076
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
Time (sec)
15 20 25 30 35 40 45 50
SMACOF
-2
10 MR
RRE
RRE+MR
Normalized Stress
-4
10
-6
10 Locally linear Laplacian Hessian LLE
embedding eigenmaps
0.5 1 1.5 2
Complexity (MFLOPs) x 10
4
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
Fig. 4. Left to right top to bottom: Embedding of the Swiss roll (without
noise), produced by LLE, Laplacian eigenmaps, Hessian LLE, diffusion maps,
Isomap, and our algorithm. Detected boundary points are shown as red pluses.
Fig. 3. Left: Swiss hole surface without noise. Right: A Swiss hole
contaminated with additive Gaussian noise with = 0.015 and = 0.05,
and the spiral surface. The detected boundary points are shown in red.
International Scholarly and Scientific Research & Innovation 2(10) 2008 3538 scholar.waset.org/1999.4/13076
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
Fig. 8. The intrinsic coordinates of the image manifold of the eyes area with
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
International Scholarly and Scientific Research & Innovation 2(10) 2008 3539 scholar.waset.org/1999.4/13076
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:2, No:10, 2008
[12] K. Q. Weinberger, B. D. Packer, and L. K. Saul, Nonlinear dimen-
sionality reduction by semidenite programming and kernel matrix
factorization, in Proceedings of the 10th International Workshop on
Articial Intelligence and Statistics, Barbados, January 2005.
[13] E. L. Schwartz, A. Shaw, and E. Wolfson, A numerical solution to the
generalized mapmakers problem: Flattening nonconvex polyhedral sur-
faces, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 11, pp. 10051008, November 1989.
[14] I. Borg and P. Groenen, Modern multidimensional scaling: Theory and
applications. New York: Springer Verlag, 1997.
[15] C. Grimes and D. L. Donoho, When does isomap recover the natural
parameterization of families of articulates images? Department of
Statistics, Stanford University, Stanford, CA 94305-4065, Tech. Rep.
2002-27, 2002.
[16] M. Bernstein, V. de Silva, J. C. Langford, and J. B. Tenenbaum,
Graph approximations to geodesics on embedded manifolds, Stanford
University, Technical Report, January 2001.
[17] A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Generalized
multidimensional scaling: a framework for isometry-invariant partial
surface matching, Proceedings of the National Academy of Sciences,
vol. 103, no. 5, pp. 11681172, January 2006.
[18] G. Guy and G. Medioni, Inference of surfaces, 3D curves, and junctions
from sparse, noisy, 3D data, IEEE Transactions on Pattern Analysis and
International Science Index, Computer and Information Engineering Vol:2, No:10, 2008 waset.org/Publication/13076
Machine Intelligence, vol. 19, no. 11, pp. 12651277, November 1997.
[19] T. E. Boult and J. R. Kender, Visual surface reconstruction using sparse
depth data, in Computer Vision and Pattern Recognition, vol. 86, 1986,
pp. 6876.
[20] M. M. Bronstein, A. M. Bronstein, and R. Kimmel, Multigrid mul-
tidimensional scaling, Numerical Linear Algebra with Applications
(NLAA), vol. 13, no. 2-3, pp. 149171, March-April 2006.
[21] A. Sidi, Efcient implementation of minimal polynomial and reduced
rank extrapolation methods, J. Comput. Appl. Math., vol. 36, no. 3, pp.
305337, 1991.
[22] S. Cabay and L. Jackson, Polynomial extrapolation method for nding
limits and antilimits of vector sequences, SIAM Journal on Numerical
Analysis, vol. 13, no. 5, pp. 734752, October 1976.
[23] R. P. Eddy, Extrapolationg to the limit of a vector sequence, in
Information Linkage between Applied Mathematics and Industry, P. C.
Wang, Ed. Academic Press, 1979, pp. 387396.
[24] D. A. Smith, W. F. Ford, and A. Sidi, Extrapolation methods for vector
sequences, SIAM Review, vol. 29, no. 2, pp. 199233, June 1987.
[25] Y. Eldar, M. Lindenbaum, M. Porat, and Y. Zeevi, The farthest point
strategy for progressive image sampling, IEEE Transactions on Image
Processing, vol. 6, no. 9, pp. 13051315, September 1997. [Online].
Available: citeseer.ist.psu.edu/eldar97farthest.html
[26] A. Kearsley, R. Tapia, and M. W. Trosset, The solution of the metric
stress and sstress problems in multidimensional scaling using newtons
method, Computational Statistics, vol. 13, no. 3, pp. 369396, 1998.
[27] J. B. Tenenbaum, V. de Silva, and J. C. Langford, A global geometric
framework for nonlinear dimensionality reduction, Science, vol. 290,
no. 5500, pp. 23192323, December 2000.
International Scholarly and Scientific Research & Innovation 2(10) 2008 3540 scholar.waset.org/1999.4/13076