You are on page 1of 26

Playing Twenty Questions With The Universe

Why Dimensionality Reduction Should Change How You Think About Everything

Yahoo vs. Amazon

Fixed categories or spontaneous organization?

Data organization matters

A technological question; also an epistemological question. How do we choose categories?

How do we organize data?


 

We have to decide on categories for data. For example, dividing vertebrates into mammals, fish. Relatedly, we also organize data along coordinates. For example, we can plot all the vertebrates along a line, measuring how far back in time they branched off from a common ancestor. birds, reptiles and

 

Finding the right axes

Dimensionality reduction problem


 

Poole and Rosenthal studied roll call votes in the US Congress. Each congressman is a point in a high-dimensional cloud of data; one dimension for every bill, with a value of 0 or 1 in each dimension according to his vote on that bill. Result: one coordinate (liberal-conservative) best predicts voting. Empirically, the Nolan Chart is wrong (at least in Congress.)

 

How to find the best axes? Variance!

Principal Components Analysis


 

Transforms the data (linear change of coordinates) Greatest variance lies on first coordinate, second-greatest lies on second coordinate, and so on. Explicitly: if X is the covariance matrix of the data, write X = WSV , where W is the matrix of eigenvectors of XX , V is the matrix of eigenvectors of X X, and S is a diagonal matrix. The transformation is given by Y = VS

PCA for dimensionality reduction


 

Suppose we truncate S, so that it only contains a few of the largest values. Then the PCA transformation yields a projection on the few largest-variance dimensions. If the data lies on a low-dimensional surface (like the plane in the previous example, or the single axis of voting) then the low-rank PCA will be close to the original data. Theorem: PCA provides the best rank-k approximation to the data (in Frobenius norm.)

Making the subjective objective


There are two kinds of people in the world, introverts and extroverts. US politics is defined by where you fit on the liberal-conservative spectrum. Birds are more like snakes than like mammals. These kinds of claims are not necessarily matters of opinion! Dimensionality reduction techniques can give us objective measures of how to form clusters, what axes are important, and how to measure similarity.

Clustering: Carving Reality at the Joints

Similar concept: look at variance


   

K-means clustering Assign each point to the cluster whose center is nearest. Provably minimizes within-cluster variance. Related to PCA: the cluster centroids for K clusters are the K largest principal directions resulting from PCA. (Ding, He; 2004) You can think of PCA as a way of defining categories.

Beyond PCA: a bunny is not a plane!

Manifold learning

Sometimes data lies on a low-dimensional manifold a curved surface, not a flat hyperplane. Finding the underlying manifold is like finding the predictive law for the data, a concise description.

Manifold learning techniques




Locally linear embedding: approximate with small patches that locally look like planes. Isomap: look at distances between points and piece them together into a global picture (multidimensional scaling). Laplacian Eigenmaps: use the eigenvectors of the graph Laplacian as the coordinates; geometrical approach. Diffusion maps: imagine a random walk among the data. Points are close when travel time is quick. Coordinates are eigenvectors of stochastic matrix.

Locally Linear Embedding

Isomap

Laplacian Eigenmaps

Diffusion Maps

Applications

MMPI

Dimensionality Reduction and Psychiatry




A questionnaire is a high-dimensional space: one dimension for every question, one point for every respondent. Coifman et al. used diffusion geometry. Nonlinear dimensionality reduction. They found that their intrinsic coordinates derived algorithmically, without looking at the questions matched standard diagnostic scores like depression score, psychosis score and so on.

 

This should change the way you think.




Data has internal structure. Data analysis can replicate or replace intuitive judgments! It s an empirical question whether a category or axis has good explanatory power. Examine categories critically: is this a natural cluster? Would I choose these categories if I had no preconceptions about the data? Is within-group variance smaller than between-group variance? Examine measurements critically: is this a useful thing to measure? Does this axis capture a lot of the variance of the data?

Example: Taxonomy

Example: Autism


Simon Baron-Cohen argues that autism is defined by a systematizingempathizing axis. Autistics are more systematizing, and less empathizing, than neurotypicals. He found, indeed, that autistics score higher on a systematizing questionnaire and lower on an empathizing questionnaire. But that s not enough to prove his point! Alarm bells should go off. Does empathizing/systematizing capture most of the variance? Is it a good axis? Is it a top principal component?

 

Modeling the world efficiently




Understanding the world is, fundamentally, a matter of having a map that is simpler than the territory. Making measurements or drawing categories are ways of making simple models of high-dimensional data. Dimensionality reduction techniques tell us how to choose the right categories and measurements to capture the data efficiently. When you play Twenty Questions with the universe, some questions are better than others!

You might also like