You are on page 1of 31

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

Dual geometries of high-dimensional datasets


Sarah Constantin Yale University

August 16, 2012

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

I NTRODUCTION Organizing high-dimensional data Motivating example: text database A LGORITHM Hierarchical clustering and trees Iterative principle Multiscale afnity S MOOTHNESS , APPROXIMATION , AND DECAY How good is a tree? Applications Genomic data Compression Matrix completion Geometry Example: trigonometric functions Conclusion

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

O RGANIZING HIGH - DIMENSIONAL DATA


We often need to nd clusters or patterns in high-dimensional data. Many observations, each with many features. What if we could simultaneously organize the feature and the observations, putting similar variables and similar observations together? Take advantage of the dual relationship: symmetry between features and observations. The algorithm presented here is the work of my advisor Ronald Coifman in a paper with Matan Gavish; Ill also be introducing some of the open problems Im currently working on.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

M OTIVATING EXAMPLE : TEXT DATABASE


Consider a document database represented as a matrix. Every row is a document; every column is a word. The (i, j)th entry is the number of times the jth word appears in the ith document. The dual clustering algorithm reshufes the rows and columns of the matrix, so that similar documents go together and similar words go together. This builds up a tree hierarchy of words and a tree hierarchy of documents, grouped by context. Even very similar documents share only a small set of highly correlated words, but this relationship may drown in overall noise; contextual similarity is more powerful than TF-IDF.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

S CIENCE N EWS DATABASE

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

H IERARCHICAL CLUSTERING AND TREES

Given a measure of similarity, you can always build a tree. K-means approach. Choose a set of centroids greater than a xed distance apart; Cluster each data point to the nearest centroids; Clusters form nodes of the tree. To go to a higher level, pick the centroids of each cluster and make clusters of those.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

H IERARCHICAL CLUSTERING EXAMPLE

Genetic microarray data.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

I TERATIVE PRINCIPLE
First build K-means trees on the observations and the features. Initial similarity is just Euclidean distance: the distance between two observations is the sum of squares of differences between their features. Now build new similarity that well dene based on the tree structure. Update the trees iteratively. New tree of features based on similarity dened using observation tree; new tree of observations based on similarity dened using feature tree. Reshufe rows and columns so that nearby rows are in the same cluster on the row tree and nearby columns are in the same cluster on the column tree.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

T REE

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

M ULTISCALE AFFINITY

l Let Xk be the kth node of a tree, at generation l. Let the afnity between f and g be given by

(f , g) = e1/

2 x (f (x)g(x))

(Another possible choice is = cov(f , g)/((f )(g)), the correlation/inner product between two functions. is an afnity between two functions; its equal to 1 if theyre similar, and decreases to 0 as they become more different.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

M ULTISCALE AFFINITY

Dene the multiscale afnity of two columns My1 , My2 with respect to a row tree Tx by Tx (My1 , My2 ) =
k,l

1 l (My1 , My2 ) l |Xk | Xk

On each folder in the partition tree, we take the afnity between the columns, but just restricted to that folder.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

M ULTISCALE AFFINITY

Dene the similarity between the rows with the multiscale afnity dened with respect to the existing tree on the columns. In the database example, this means that more weight is given to the similarity of two documents in clusters of related words; it incorporates contextual information. Likewise, the similarity of two words is given more weight if they occur in similar documents.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

H OW GOOD IS A TREE ?
After weve iterated this algorithm, how do we know if we have a good pair of trees and reorganization? For similar rows and similar columns to be arranged together, we want to be able to say that the values of the matrix vary smoothly. As a general principle, when you expand a function in a basis (like Fourier coefcients, wavelets, etc), faster coefcient decay corresponds to greater smoothness of the function. Cluster trees induce a wavelet-like basis called a tensor Haar-like basis that consists of functions on the matrix that take constant values on the rectangles dened by pairs of clusters in the rows and columns.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

H AAR - LIKE FUNCTION

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

T ENSOR H AAR - LIKE FUNCTION

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

S MOOTHNESS AND RESHUFFLING


It can be shown that: Every function on the matrix can be expressed by an orthonormal basis of tensor Haar-like functions. Functions can be approximated by taking only those tensor Haar functions which have large coefcients and are supported on large folders. If a function on the matrix has rapidly decaying coefcients in the tensor Haar expansion, then it satises a Holder smoothness condition, and vice versa. If a function can be described efciently on a tensor Haar-like basis, then it decomposes into a typical matrix, which varies smoothly in rows and columns, and a small outlier matrix with irregular behavior but small support; this is a Calderon-Zygmund argument.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

G ENOMIC DATA

Warning: rampant speculation! What if we put individuals on the rows and genes on the columns, and applied the algorithm? The row-tree approximates a family tree. The column-tree gives similarities between genes, which should approximate their location on the sequence. Possible new technique in computational phylogeny; doesnt require global optimizations.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

G ENOMIC DATA AND DISEASE RISK


A trait, or disease, is a function on the rows. Rectangles that light up correspond to clusters of genes inherited together and related individuals who have the disease. Predict disease risk via tensor Haar basis expansion Unlike genome-wide association studies + logistic regression, makes no assumption genes are uncorrelated! limited number of tensor Haar coefcients imposes a sparsity condition, prevents overtting. Cost of sequencing is dropping we need more efcient statistical tools! GWAS often nd spurious correlations, fail to predict risk of disease.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

C OMPRESSION

To compress a matrix, store both partition trees, and only the coefcients of the tensor Haar functions dened on rectangles bigger than c 2 , where is the desired error. This allows storage of matrices in much less space.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

M ATRIX COMPLETION

In a matrix with missing values, if we have the trees, we can estimate the values by averaging on local clusters. For example: if you have a matrix of users and clicks/purchases/preferences, reorganizing by the dual clustering algorithm and extrapolating new values can give a new algorithm for collaborative ltering.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

E XAMPLE : TRIGONOMETRIC FUNCTIONS

If the data matrix you use is A(m, n) = sin(m 2 n/T), then the rows are different frequencies of sines, and the columns are different points where the functions are sampled. Shufing the rows and columns at random, and then reorganizing them by the dual clustering algorithm, recovers the sines in order.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

E XAMPLE : TRIGONOMETRIC FUNCTIONS

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

T RIG FUNCTIONS AND DUAL GEOMETRY

One can actually prove that the multiscale afnity between two trigonometric functions is an increasing function in the difference between their frequencies. The geometry of trigonometric functions in the multiscale afnity norm gives us back the circle. Theres a relationship between the geometry of the surface the functions are dened on, and the geometry of relationships between the functions. We might ask: is this true for other functions on other surfaces?

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

E IGENFUNCTIONS OF THE L APLACIAN


The Laplace operator is an anti-averaging function; its the edge sharpening lter in image processing. On the plane its simply f = 2f 2f + 2. x2 y

On arbitrary surfaces, its dened as f = div(grad(f )) An eigenfunction of the Laplacian is a function having the property that f = f . On the circle, trigonometric functions are the eigenfunctions of the Laplacian.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

E IGENFUNCTIONS OF THE L APLACIAN ON SURFACES

Every function on a smooth manifold can be expanded in terms of eigenfunctions of the Laplacian, just as every periodic function can be expanded in a trigonometric series (the Fourier expansion). For some surfaces the Laplacian eigenfunctions have closed-form solutions: on the disc, the Bessel functions, and on the sphere, the spherical harmonics. Efciently or sparsely expanding a function on a surface in Laplacian eigenfunctions is valuable for the same reason that fast or sparse Fourier expansion is useful for functions dened on an interval.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

E IGENFUNCTIONS , NOT JUST EIGENVALUES

Can you hear the shape of a drum? No!

Eigenvalues alone dont determine geometry. But perhaps including relationships between eigenfunctions will.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

C OUNTEREXAMPLE : INCOHERENCE

It turns out that you cant always reconstruct the geometry of a surface from the multiscale afnities between its Laplacian eigenfunctions. Indeed, on the sphere, if you choose a basis of Laplacian eigenfunctions at random, one can prove that with high probability their multiscale afnity is close to zero. In other words, instead of getting a geometric relationship back, we get a clump. We cant distinguish eigenfunctions by looking at their multiscale afnity.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

I NCOHERENCE AND COMPRESSED SENSING

The notion of incoherence comes from the noiselet basis, an orthonormal basis for L2 functions on the line such that the inner product of a noiselet with a Haar function is independent of the choice of noiselet. In other words, if you try to measure a Haar function with noiselets, all the measurements look the same; its totally uncorrelated. This property of incoherence means that noiselets form a good basis for compressed sensing; if you measure a function with many different noiselets, you can reconstruct the function accurately.

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

C OMPRESSED SENSING

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

I NCOHERENCE ON SURFACES : CONJECTURE

If taking small observations against an incoherent basis allows us to reconstruct the original function, perhaps taking measurements against a random spherical eigenfunction basis will allow us to reconstruct functions on the sphere. (Applications: 3d photographic views, protein reconstruction, etc.) Theoretical question: which bases of Laplacian eigenfunctions in general are coherent with respect to the geometry of the surface? Which are incoherent?

I NTRODUCTION

A LGORITHM

S MOOTHNESS , APPROXIMATION , AND DECAY

Applications

Geometry

Conclusion

C ONCLUSION

We can organize large, high-dimensional datasets by creating a multiscale folder structure on observations and features. Iteratively improving observation and feature hierarchical trees gives us an organization that simultaneously clusters similar observations and similar features. This procedure has broad applications for machine learning and data science, and makes use of richer information than conventional statistical techniques.

You might also like