Professional Documents
Culture Documents
Contents
Bregman Divergences
Definition Properties
Generalization of PCA to Exponential Family Generalized2 Linear2 Models Clustering / Coclustering with Bregman Divergences Generalized Nonnegative Matrix Factorization
Conclusion
2
Generalized Relative Entropy (also called generalized KL-divergence) is another Bregman divergence
4
Itakura-Saito distance is another Bregman divergence (used in signal processing, also known as Burg entropy)
5
a y
Gaussian Distribution
10
Poisson Distribution
The Poisson distribution:
The Poisson distribution is a member of exponential family. Its expected value =. Is there a Divergence associated with the Poisson distribution? Yes! p(x) can be rewritten as
Implication: Poisson distribution $ Relative Entropy Implication: Poisson distribution $ Relative Entropy
11
Exponential Distribution
The Exponential distribution:
The Exponential distribution is a member of exponential family. Its expected value =1/. Is there a Divergence associated with the Exponential distribution? Yes! p(x) can be rewritten as
12
Fenchel Conjugate
Defintion: The Fenchel conjugate of function f is defined as:
13
14
15
16
17
18
19
Hadamard inequality:
Proof:
Another inequality:
20
Clustering
Partition the columns of a data matrix, so that similar columns are in the same partition (Banerjee et al. JMLR, 2005)
Co-clustering
Simultaneously partition both the rows and columns of a data matrix (Banerjee et al. JMLR, 2007)
(Gordon, NIPS,2002)
Given a matrix of distances, find the nearest matrix of distances such that all distances satisfy the triangle inequality (Dhillon et al, 2004)
21
Goal:
PCA, SVD Exp-family PCA Infomax ICA Linear regression Nonnegative matrix factorization
22
instead of 1000 predicting 1010 is just as bad as predicting 3 instead of -7 exp many local minima in dim
Sigmoid regression
The log loss function is convex in ! We say f(z) and the log loss match each other.
23
24
25
Special cases
Thus,
26
Logistic regression
27
(GL)2M algorithm
GLM goal: GLM cost: (GL)2M goal: (GL)2M cost:
28
Robot Navigation
A corridor in the CMU CS building with initial belief: (it doesnt know which end)
Robot Navigation
The belief space is large, but sparse and compressible. The belief vectors lie on a nonlinear manifold. This method can be used for planning, too. They factored a matrix of 400 beliefs using feature space ranks l=3,4,5. f(z)=exp(z), H*=10-12||V||2, G*= 10-12||U||2+(U)
Reconstructions using l=3,4,5 ranks With PCA, they need 85 dimensions to match (GL)^2M rank-5 decomposition and 25 dimension for the rank-3 decomposition
30
Algorithms:
31
Without constraints
PCA 2
Cost function
Special case
33
34
35
36
Conclusion
Introduced the Bregman divergence Relationship to Exponential family Generalization to matrices Applications: Matrix inequalities Exponential family PCA NMF GLM Clustering / Biclustering Online learning Bregman divergences propose new algorithms Lots of existing algorithms turn to be special case Matching loss function can help to decrease the number of local minima
37
References
Matrix Nearness Problems with Bregman Divergences I. S. Dhillon and J. A. Tropp SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 4, pages 1120-1146, November 2007. A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximations A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha Journal of Machine Learning Research (JMLR), vol. 8, pages 1919-1986, August 2007. Clustering with Bregman Divergences A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh Journal of Machine Learning Research (JMLR), vol. 6, pages 1705-1749, October 2005. A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximations A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 509-514, August 2004. Clustering with Bregman Divergences A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh Proceedings of the Fourth SIAM International Conference on Data Mining, pages 234-245, April 2004 Nonnegative Matrix Approximation: Algorithms and Applications S. Sra and I. S. Dhillon UTCS Technical Report #TR-06-27, June 2006 Generalized Nonnegative Matrix Approximations with Bregman Divergences I. S. Dhillon and S. Sra NIPS, pages 283-290, Vancouver Canada, December 2005. (Also appears as UTCS Technical Report #TR-05-31, June 1, 2005. 38
PPT slides
Irina Rish Bregman Divergences in Clustering and Dimensionality reduction Manfred K. Warmuth
COLT2000
Inderjit S. Dhillon Machine Learning with Bregman Divergences Low-Rank Kernel Learning with Bregman Matrix Divergences Matrix Nearness Problems Using Bregman Divergences Information Theoretic Clustering, Co-clustering and Matrix Approximations
39