Professional Documents
Culture Documents
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION Organizing high-dimensional data Motivating example: text database A LGORITHM Hierarchical clustering and trees Iterative principle Multiscale afnity S MOOTHNESS , APPROXIMATION , AND DECAY How good is a tree? Applications Genomic data Compression Matrix completion Geometry Example: trigonometric functions Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
Given a measure of similarity, you can always build a tree. K-means approach. Choose a set of centroids greater than a xed distance apart; Cluster each data point to the nearest centroids; Clusters form nodes of the tree. To go to a higher level, pick the centroids of each cluster and make clusters of those.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I TERATIVE PRINCIPLE
First build K-means trees on the observations and the features. Initial similarity is just Euclidean distance: the distance between two observations is the sum of squares of differences between their features. Now build new similarity that well dene based on the tree structure. Update the trees iteratively. New tree of features based on similarity dened using observation tree; new tree of observations based on similarity dened using feature tree. Reshufe rows and columns so that nearby rows are in the same cluster on the row tree and nearby columns are in the same cluster on the column tree.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
T REE
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
M ULTISCALE AFFINITY
l Let Xk be the kth node of a tree, at generation l. Let the afnity between f and g be given by
(f , g) = e1/
2 x (f (x)g(x))
(Another possible choice is = cov(f , g)/((f )(g)), the correlation/inner product between two functions. is an afnity between two functions; its equal to 1 if theyre similar, and decreases to 0 as they become more different.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
M ULTISCALE AFFINITY
Dene the multiscale afnity of two columns My1 , My2 with respect to a row tree Tx by Tx (My1 , My2 ) =
k,l
On each folder in the partition tree, we take the afnity between the columns, but just restricted to that folder.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
M ULTISCALE AFFINITY
Dene the similarity between the rows with the multiscale afnity dened with respect to the existing tree on the columns. In the database example, this means that more weight is given to the similarity of two documents in clusters of related words; it incorporates contextual information. Likewise, the similarity of two words is given more weight if they occur in similar documents.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
H OW GOOD IS A TREE ?
After weve iterated this algorithm, how do we know if we have a good pair of trees and reorganization? For similar rows and similar columns to be arranged together, we want to be able to say that the values of the matrix vary smoothly. As a general principle, when you expand a function in a basis (like Fourier coefcients, wavelets, etc), faster coefcient decay corresponds to greater smoothness of the function. Cluster trees induce a wavelet-like basis called a tensor Haar-like basis that consists of functions on the matrix that take constant values on the rectangles dened by pairs of clusters in the rows and columns.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
G ENOMIC DATA
Warning: rampant speculation! What if we put individuals on the rows and genes on the columns, and applied the algorithm? The row-tree approximates a family tree. The column-tree gives similarities between genes, which should approximate their location on the sequence. Possible new technique in computational phylogeny; doesnt require global optimizations.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
C OMPRESSION
To compress a matrix, store both partition trees, and only the coefcients of the tensor Haar functions dened on rectangles bigger than c 2 , where is the desired error. This allows storage of matrices in much less space.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
M ATRIX COMPLETION
In a matrix with missing values, if we have the trees, we can estimate the values by averaging on local clusters. For example: if you have a matrix of users and clicks/purchases/preferences, reorganizing by the dual clustering algorithm and extrapolating new values can give a new algorithm for collaborative ltering.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
If the data matrix you use is A(m, n) = sin(m 2 n/T), then the rows are different frequencies of sines, and the columns are different points where the functions are sampled. Shufing the rows and columns at random, and then reorganizing them by the dual clustering algorithm, recovers the sines in order.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
One can actually prove that the multiscale afnity between two trigonometric functions is an increasing function in the difference between their frequencies. The geometry of trigonometric functions in the multiscale afnity norm gives us back the circle. Theres a relationship between the geometry of the surface the functions are dened on, and the geometry of relationships between the functions. We might ask: is this true for other functions on other surfaces?
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
On arbitrary surfaces, its dened as f = div(grad(f )) An eigenfunction of the Laplacian is a function having the property that f = f . On the circle, trigonometric functions are the eigenfunctions of the Laplacian.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
Every function on a smooth manifold can be expanded in terms of eigenfunctions of the Laplacian, just as every periodic function can be expanded in a trigonometric series (the Fourier expansion). For some surfaces the Laplacian eigenfunctions have closed-form solutions: on the disc, the Bessel functions, and on the sphere, the spherical harmonics. Efciently or sparsely expanding a function on a surface in Laplacian eigenfunctions is valuable for the same reason that fast or sparse Fourier expansion is useful for functions dened on an interval.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
Eigenvalues alone dont determine geometry. But perhaps including relationships between eigenfunctions will.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
C OUNTEREXAMPLE : INCOHERENCE
It turns out that you cant always reconstruct the geometry of a surface from the multiscale afnities between its Laplacian eigenfunctions. Indeed, on the sphere, if you choose a basis of Laplacian eigenfunctions at random, one can prove that with high probability their multiscale afnity is close to zero. In other words, instead of getting a geometric relationship back, we get a clump. We cant distinguish eigenfunctions by looking at their multiscale afnity.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
The notion of incoherence comes from the noiselet basis, an orthonormal basis for L2 functions on the line such that the inner product of a noiselet with a Haar function is independent of the choice of noiselet. In other words, if you try to measure a Haar function with noiselets, all the measurements look the same; its totally uncorrelated. This property of incoherence means that noiselets form a good basis for compressed sensing; if you measure a function with many different noiselets, you can reconstruct the function accurately.
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
C OMPRESSED SENSING
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
If taking small observations against an incoherent basis allows us to reconstruct the original function, perhaps taking measurements against a random spherical eigenfunction basis will allow us to reconstruct functions on the sphere. (Applications: 3d photographic views, protein reconstruction, etc.) Theoretical question: which bases of Laplacian eigenfunctions in general are coherent with respect to the geometry of the surface? Which are incoherent?
I NTRODUCTION
A LGORITHM
Applications
Geometry
Conclusion
C ONCLUSION
We can organize large, high-dimensional datasets by creating a multiscale folder structure on observations and features. Iteratively improving observation and feature hierarchical trees gives us an organization that simultaneously clusters similar observations and similar features. This procedure has broad applications for machine learning and data science, and makes use of richer information than conventional statistical techniques.