You are on page 1of 51

Applications of Computational

Algebraic Topology to Data Analysis

Rachel Hodos
University of Houston

Dr. Martin Lo
Jet Propulsion Laboratory
● Topology
– Introduction
– Specific Tools and Concepts
● Topology of a Point Cloud
● Application: Structure in Galactic Distribution
– 1st Approach: Persistence
– 2nd Approach: Genus of Density Isocontours
● Recommendations for Future Work
Topology
● The study of shape, structure, and connectivity
without a notion of distance
● Two objects are topologically equivalent if one
can be deformed into the other
Topology
● The study of shape, structure, and connectivity
without a notion of distance
● Two objects are topologically equivalent if one
can be deformed into the other
Topology: Some Properties
● Dimension
● Continuity
● Number of holes
● Number of components (i.e., distinct objects)
● NOT: size, angle, curvature
Topology: Betti Numbers
● Counts number of topologically distinct, non-
non
bounding cycles of dimension k
● Only concerned with first three Betti numbers
● B0 = components, B1 = tunnels, B2 = voids
● Examples:
B0 = 1 B0 = 1
B1 = 2 B1 = 0
B2 = 0 B2 = 1
Topology: Genus
● Number of handles:

f
*Images from: “Genus, mathematics.” Wikipedia. December 19, 2008. Wikimedia Foundation, Inc. May 5, 2009. <http://en.wikipedia.org/wiki/Genus_(mathematics)>
Topology: Genus
● Number of handles
● Bonnet Theorem for compact, 2D surfaces:
Gauss-Bonnet

g = 1−
( B 0 − B1 + B 2 + b )
2
Topology: Genus
● Number of handles
● Bonnet Theorem for compact, 2D surfaces:
Gauss-Bonnet

g = 1−
( B 0 − B1 + B 2 + b )
2

• b represents number of
boundary components:
Topology: Genus
● Number of handles
● Bonnet Theorem for compact, 2D surfaces:
Gauss-Bonnet

g = 1−
( B 0 − B1 + B 2 + b )
2

• b represents number of
boundary components:
Topology: Genus
● Number of handles
● Bonnet Theorem for compact, 2D surfaces:
Gauss-Bonnet

( B 0 − B1 + B 2 + b )
b=3
g = 1−
2

• b represents number of
boundary components:
But what the heck is computational
algebraic topology??
● Subfield of topology
● Uses concepts from abstract algebra on finitely-
finitely
defined structures to study topological properties
● One example of a finitely-defined
finitely structure is a
simplicial complex...
Simplicial Complexes
● Building block is called a k-simplex: a k-
dimensional analogue of a triangle
– 0-simplex: point/vertex
– 1-simplex: edge
– 2-simplex: triangle
– 3-simplex: tetrahedron

● Simplicial complex is a finite set of k-simplices


k
Simplicial Complexes:
Finitely-Defined
Defined Structures which Imitate Smooth Ones

● 4 Vertices: {1} {2} {3} {4}


● 6 Edges: {1 2} {2 3} {1 3} {1 4} {2 4} {3 4}
● 4 Faces: {1 2 3} {1 2 4} {2 3 4} {1 3 4}
Simplicial Complexes:
Finitely-Defined
Defined Structures which Imitate Smooth Ones
Pause for some acknowledgements…

Many of the figures in this presentation were created using PLEX


software version 2.5, written by Patrick Perry and Vin de Silva,
Stanford University, Stanford, CA.

Galactic data was provided by Tom Jarrett of IPAC, taken from the
Two-Micron All-Sky
Sky Redshift Survey from the April 2009 release.

Homology computations were done using the ChomP Advanced


software package by the Computational Homology Project group
from Rutgers University, Newark, NJ.
Topology of a Point Cloud:
A Game of Connect the Dots
Topology of a Point Cloud:
A Game of Connect the Dots
Topology of a Point Cloud:
A Game of Connect the Dots

just kidding...
Connecting the Dots:
Čech
ech Complex

● Definition: A set of points{p p ... , p } form a


1, 2, k

simplex in a Čech
ech Complex, Čd , if their closed,
d/2-ball
ball neighborhoods have a point of common
intersection
● Example: {p p }form an edge if they are within a
1, 2

distance d of each other


Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Čech
ech Complex
Topology of a Point Cloud:
Persistence
Application: Finding Structure in
Galactic Distribution
● Data: positions of galaxies from redshift surveys
Application: Finding Structure in
Galactic Distribution
● Data: positions of galaxies from redshift surveys
First Approach: Density Isocontours
First Approach: Density Isocontours

genus: 4
components: 4
tunnels: 16
voids: 0
Problem: Non--Integer Genus
Case #1: the “Point Violation”
Problem: Non--Integer Genus
Case #2: the “Edge Violation”
Second Approach: Persistence
Second Approach: Persistence
Second Approach: Persistence
Second Approach: Persistence
Second Approach: Persistence
Second Approach: Persistence
Second Approach: Persistence
Second Approach: Persistence
Problem: Redundancy in Clusters
Recommendations for Future Work
● Density Isocontours:
– Apply existing topological correction algorithms
– Obtain isocontour using another method
● Persistence:
– Use different type of complex (e.g., Witness)
– Reduce number of points in dense regions
Acknowledgements
Papers:

De Silva, V., Carlsson,, Gunnar, “Topological Estimation Using Witness Complexes.” Eurographics Symposium on
Point-Based Graphics. Eurographics Association, Zurich, Switzerland, 157-166,
157 2004.

Gott,, J. R. III, et al., “Topology of Structure in the Sloan Digital Sky Survey: Model Testing.” The Astrophysical
Journal, Volume 675, 16-28, 2008.
Gott, J. R. III, Melott, A., Dickinson, M., “The Sponge-Like
Like Topology of Large-Scale
Large Structure in the Universe.” The
Astrophysical Journal, Volume 306, 341-357,357, July 1986.

Schaap, W. E., van de Weygaert,, R., “Continuous Fields and Discrete Samples: Reconstruction through Delaunay
Tessellations.” Astronomy and Astrophysics, Volume 363, L29, February 2008.

Zomorodian,, A. J., “Computing and Comprehending Topology: Persistence and Hierarchical Morse Complexes,”
Ph.D. Dissertation, Computer Science Dept., University of Illinois at Urbana-Champaign,
Urbana Urbana, IL, 2001.

Websites:

“Genus, mathematics.” Wikipedia. December 19, 2008. Wikimedia Foundation, Inc. May 5, 2009.
<http://en.wikipedia.org/wiki/Genus_(mathematics)>
Acknowledgements
Data:

New York University Value-Added


Added Galaxy Catalog (using the Sloan Digital Sky Survey):
**We worked with the void0 subset of the LSS-DR4plus
DR4plus galaxy sample from the NYU-VAGC.
NYU

Adelman-McCarthy,
McCarthy, et al., “The Sixth Data Release of the Sloan Digital Sky Survey,” Astrophysical Journal, Vol. 175,
297-313, April, 2008.

Added Galaxy Catalog: A Galaxy Catalog Based on New Public


Blanton, M. R., et al., “New York University Value-Added
Surveys,” Astronomical Journal, Vol. 129, 2562-2578,
2578, June, 2005.

Padmanabhan,, N., et al., “An Improved Photometric Calibration of the Sloan Digital Sky Survey Imaging Data,”
Astrophysical Journal, Vol. 674, 1217-1233,
1233, February 2008.

Funding for the Sloan Digital Sky Survey (SDSS) has been provided by the Alfred P. Sloan Foundation, the Participating Institutions,
Instit the
National Aeronautics and Space Administration, the National Science Foundation, the U.S. Department of Energy, the Japanese
Monbukagakusho,, and the Max Planck Society. The SDSS Web site is http://www.sdss.org/. The SDSS is managed by the Astrophysical
Research Consortium (ARC) for the Participating Institutions. The Participating Institutions are The University of Chicago, Fermilab, the
Institute for Advanced Study, the Japan Participation Group, The Johns Hopkins University, Los Alamos National Laboratory, the th Max-
Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute
Institute for Astrophysics (MPA), New Mexico State University, University of
Pittsburgh, Princeton University, the United States Naval Observatory, and the University of Washington.
Acknowledgements
Data: (cont.)

Two Micron All Sky Redshift Survey:


**Our data was provided by T. H. Jarrett of IPAC, published April 2009, entitled 2MASS_XSCz_09Apr2009.tbl.gz
and can be found at ftp://spider.ipac.caltech.edu/outgoing/jarrett/XSCz/.
ftp://spider.ipac.caltech.edu/outgoing/jarrett/XSCz/

Cutri,, R., et al. 2003, Two Micron All Sky Survey (Pasadena: IPAC), http://www.ipac.caltech.edu/2mass

Jarrett, T. H., et al., “2MASS Extended Source Catalog: Overview and Algorithms,” Astronomical Journal, 119, 2498,
2000.

This publication makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of
Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics
Aero
and Space Administration and the National Science Foundation.

Software:

CHomP, Software Package,, Advanced Version, Computational Homology Project, Rutgers University, Newark, NJ,
2005.

MATLAB, R2008a.

PLEX, Software Package, Version 2.5, Dept. of Mathematics, Stanford University, Stanford, CA, 2006.
THANKS!