Professional Documents
Culture Documents
Abstract—The problem of learning tree-structured Gaussian parameters and are tractable for learning [5] and statistical in-
graphical models from independent and identically dis- ference [1], [4].
tributed (i.i.d.) samples is considered. The influence of the The problem of maximum-likelihood (ML) learning of a
tree structure and the parameters of the Gaussian distribution on
the learning rate as the number of samples increases is discussed.
Markov tree distribution from i.i.d. samples has an elegant so-
Specifically, the error exponent corresponding to the event that lution, proposed by Chow and Liu in [5]. The ML tree structure
the estimated tree structure differs from the actual unknown is given by the maximum-weight spanning tree (MWST) with
tree structure of the distribution is analyzed. Finding the error empirical mutual information quantities as the edge weights.
exponent reduces to a least-squares problem in the very noisy Furthermore, the ML algorithm is consistent [6], which implies
learning regime. In this regime, it is shown that the extremal tree that the error probability in learning the tree structure decays to
structure that minimizes the error exponent is the star for any
fixed set of correlation coefficients on the edges of the tree. If the
zero with the number of samples available for learning.
magnitudes of all the correlation coefficients are less than 0.63, While consistency is an important qualitative property, there
it is also shown that the tree structure that maximizes the error is substantial motivation for additional and more quantitative
exponent is the Markov chain. In other words, the star and the characterization of performance. One such measure, which we
chain graphs represent the hardest and the easiest structures to investigate in this theoretical paper is the rate of decay of the
learn in the class of tree-structured Gaussian graphical models. error probability, i.e., the probability that the ML estimate of
This result can also be intuitively explained by correlation decay:
pairs of nodes which are far apart, in terms of graph distance,
the edge set differs from the true edge set. When the error prob-
are unlikely to be mistaken as edges by the maximum-likelihood ability decays exponentially, the learning rate is usually referred
estimator in the asymptotic regime. to as the error exponent, which provides a careful measure of
Index Terms—Error exponents, Euclidean information theory, performance of the learning algorithm since a larger rate im-
Gauss-Markov random fields, Gaussian graphical models, large plies a faster decay of the error probability.
deviations, structure learning, tree distributions. We answer three fundamental questions in this paper: i) Can
we characterize the error exponent for structure learning by the
ML algorithm for tree-structured Gaussian graphical models
I. INTRODUCTION (also called Gauss-Markov random fields)? ii) How do the struc-
ture and parameters of the model influence the error exponent?
L EARNING of structure and interdependencies of a large
collection of random variables from a set of data sam-
ples is an important task in signal and image analysis and many
iii) What are extremal tree distributions for learning, i.e., the dis-
tributions that maximize and minimize the error exponents? We
other scientific domains (see examples in [1]–[4] and references believe that our intuitively appealing answers to these impor-
therein). This task is extremely challenging when the dimen- tant questions provide key insights for learning tree-structured
sionality of the data is large compared to the number of samples. Gaussian graphical models from data, and thus, for modeling
Furthermore, structure learning of multivariate distributions is high-dimensional data using parameterized tree-structured dis-
also complicated as it is imperative to find the right balance be- tributions.
tween data fidelity and overfitting the data to the model. This
problem is circumvented when we limit the distributions to the A. Summary of Main Results
set of Markov tree distributions, which have a fixed number of We derive the error exponent as the optimal value of the
objective function of a nonconvex optimization problem,
Manuscript received September 28, 2009; accepted January 21, 2010. Date
which can only be solved numerically (Theorem 2). To gain
of publication February 05, 2010; date of current version April 14, 2010. The better insights into when errors occur, we approximate the
associate editor coordinating the review of this manuscript and approving it error exponent with a closed-form expression that can be
for publication was Dr. Deniz Erdogmus. This work was presented in part at interpreted as the signal-to-noise ratio (SNR) for structure
the Allerton Conference on Communication, Control, and Computing, Mon-
ticello, IL, September 2009. This work was supported in part by a AFOSR learning (Theorem 4), thus showing how the parameters of
through Grant FA9550-08-1-1080, in part by a MURI funded through ARO the true model affect learning. Furthermore, we show that due
Grant W911NF-06-1-0076, and in part under a MURI through AFOSR Grant to correlation decay, pairs of nodes which are far apart, in
FA9550-06-1-0324. The work of V. Tan was supported by A*STAR, Singapore.
The authors are with Department of Electrical Engineering and Computer terms of their graph distance, are unlikely to be mistaken as
Science and the Stochastic Systems Group, Laboratory for Information and De- edges by the ML estimator. This is not only an intuitive result,
cision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139 but also results in a significant reduction in the computational
USA (e-mail: vtan@mit.edu; animakum@mit.edu; willsky@mit.edu).
Color versions of one or more of the figures in this paper are available online
complexity to find the exponent—from for exhaustive
at http://ieeexplore.ieee.org. search and for discrete tree models [7] to for
Digital Object Identifier 10.1109/TSP.2010.2042478 Gaussians (Proposition 7), where is the number of nodes.
1053-587X/$26.00 © 2010 IEEE
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2702 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
We then analyze extremal tree structures for learning, given a learning Gaussian tree models. In Section III, we derive an ex-
fixed set of correlation coefficients on the edges of the tree. Our pression for the so-called crossover rate of two pairs of nodes.
main result is the following: The star graph minimizes the error We then relate the set of crossover rates to the error exponent
exponent and if the absolute value of all the correlation coeffi- for learning the tree structure. In Section IV, we leverage on
cients of the variables along the edges is less than 0.63, then the ideas from Euclidean information theory [18] to state conditions
Markov chain also maximizes the error exponent (Theorem 8). that allow accurate approximations of the error exponent. We
Therefore, the extremal tree structures in terms of the diam- demonstrate in Section V how to reduce the computational com-
eter are also extremal trees for learning Gaussian tree distribu- plexity for calculating the exponent. In Section VI, we identify
tions. This agrees with the intuition that the amount of correla- extremal structures that maximize and minimize the error ex-
tion decay increases with the tree diameter, and that correlation ponent. Numerical results are presented in Section VII and we
decay helps the ML estimator to better distinguish the edges conclude the discussion in Section VIII.
from the nonneighbor pairs. Lastly, we analyze how changing
the size of the tree influences the magnitude of the error expo- II. PRELIMINARIES AND PROBLEM STATEMENT
nent (Propositions 11 and 12).
A. Basics of Undirected Gaussian Graphical Models
B. Related Work
Undirected graphical models or Markov random fields1
There is a substantial body of work on approximate learning
(MRFs) are probability distributions that factorize according
of graphical models (also known as Markov random fields)
to given undirected graphs [3]. In this paper, we focus solely
from data e.g., [8]–[11]. The authors of these papers use
on spanning trees (i.e., undirected, acyclic, connected graphs).
various score-based approaches [8], the maximum entropy
A -dimensional random vector
principle [9] or regularization [10], [11] as approximate
is said to be Markov on a spanning tree
structure learning techniques. Consistency guarantees in terms
with vertex (or node) set and edge set
of the number of samples, the number of variables and the
if its distribution satisfies the (local)
maximum neighborhood size are provided. Information-theo-
Markov property: , where
retic limits [12] for learning graphical models have also been
denotes the set of neighbors
derived. In [13], bounds on the error rate for learning the
of node . We also denote the set of spanning trees with
structure of Bayesian networks were provided but in contrast
nodes as , thus . Since is Markov on the tree ,
to our work, these bounds are not asymptotically tight (cf.
its probability density function (pdf) factorizes according to
Theorem 2). Furthermore, the analysis in [13] is tied to the
into node marginals and pairwise marginals
Bayesian Information Criterion. The focus of our paper is the
in the following specific way [3] given the
analysis of the Chow-Liu [5] algorithm as an exact learning
edge set :
technique for estimating the tree structure and comparing error
rates amongst different graphical models. In a recent paper [14],
the authors concluded that if the graphical model possesses (1)
long range correlations, then it is difficult to learn. In this paper,
we in fact identify the extremal structures and distributions in
terms of error exponents for structure learning. The area of We assume that , in addition to being Markov on the spanning
study in statistics known as covariance selection [15], [16] also tree , is a Gaussian graphical model or Gauss-
has connections with structure learning in Gaussian graphical Markov random field (GMRF) with known zero mean2 and un-
models. Covariance selection involves estimating the nonzero known positive definite covariance matrix . Thus,
elements in the inverse covariance matrix and providing con- can be written as
sistency guarantees of the estimate in some norm, e.g., the
Frobenius norm in [17]. (2)
We previously analyzed the error exponent for learning dis-
crete tree distributions in [7]. We proved that for every discrete We also use the notation as a shorthand for
spanning tree model, the error exponent for learning is strictly (2). For Gaussian graphical models, it is known that the fill-pat-
positive, which implies that the error probability decays expo- tern of the inverse covariance matrix encodes the structure
nentially fast. In this paper, we extend these results to Gaussian of [3], i.e., if and only if (iff) .
tree models and derive new results which are both explicit and We denote the set of pdfs on by , the set of
intuitive by exploiting the properties of Gaussians. The results Gaussian pdfs on by and the set of Gaussian
we obtain in Sections III and IV are analogous to the results graphical models which factorize according to some tree in
in [7] obtained for discrete distributions, although the proof as . For learning the structure of (or
techniques are different. Sections V and VI contain new results equivalently the fill-pattern of ), we are provided with a set
thanks to simplifications which hold for Gaussians but which do of -dimensional samples drawn from ,
not hold for discrete distributions. where .
1In this paper, we use the terms “graphical models” and “Markov random
C. Paper Outline
fields” interchangeably.
This paper is organized as follows: In Section II, we state 2Our results also extend to the scenario where the mean of the Gaussian is
the problem precisely and provide necessary preliminaries on unknown and has to be estimated from the samples.
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: LEARNING GAUSSIAN TREE MODELS 2703
B. ML Estimation of Gaussian Tree Models in (7) exists in Section III (Corollary 3). The value of for
different tree models provides an indication of the relative ease
In this subsection, we review the Chow-Liu ML learning al-
of estimating such models. Note that both the parameters and
gorithm [5] for estimating the structure of given samples .
structure of the model influence the magnitude of .
Denoting as the Kullback-Leibler
(KL) divergence [19] between and , the ML estimate of the
structure is given by the optimization problem3 III. DERIVING THE ERROR EXPONENT
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2704 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: LEARNING GAUSSIAN TREE MODELS 2705
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2706 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
(23)
gree of the nodes in the tree graph. Hence, in the worst case, the
complexity is , instead of if (17) is used. We can, Then it is clear that the two extremal structures, the chain (where
in fact, reduce the number of computations to . there is a simple path passing through all nodes and edges ex-
Proposition 7 (Complexity in Computing ): The approx- actly once) and the star (where there is one central node) have
imate error exponent , derived in (17), can be computed in the largest and smallest diameters, respectively, i.e.,
linear time ( operations) as , and .
Definition 4 (Line Graph): The line graph [22] of a graph
, denoted by , is one in which, roughly speaking,
(21)
the vertices and edges of are interchanged. More precisely,
is the undirected graph whose vertices are the edges of
where the maximum correlation coefficient on the edges adja- and there is an edge between any two vertices in the line graph
cent to is defined as
5Although the set of correlation coefficients on the edges is fixed, the ele-
ments in this set can be arranged in different ways on the edges of the tree. We
(22) formalize this concept in (24).
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: LEARNING GAUSSIAN TREE MODELS 2707
(25)
G H = (G)
Fig. 4. (a): A graph . (b): The line graph L that corresponds to G
G e
is the graph whose vertices are the edges of (denoted as ) and there is an (26)
i
edge between any two vertices and in j H if the corresponding edges in G
share a node.
Thus, (respectively, ) corresponds to the Gaussian
tree model which has the largest (respectively, smallest) approx-
imate error exponent.
if the corresponding edges in have a common node, i.e., are
adjacent. See Fig. 4 for a graph and its associated line graph
. C. Reformulation as Optimization Over Line Graphs
Since the number of permutations and number of spanning
B. Formulation: Extremal Structures for Learning trees are prohibitively large, finding the optimal distributions
cannot be done through a brute-force search unless is small.
We now formulate the problem of finding the best and worst
Our main idea in this section is to use the notion of line graphs to
tree structures for learning and also the distributions associated
simplify the problems in (25) and (26). In subsequent sections,
with them. At a high level, our strategy involves two distinct
we identify the extremal tree structures before identifying the
steps. First, and primarily, we find the structure of the optimal
precise best and worst distributions.
distributions in Section VI-D. It turns out that the optimal struc-
Recall that the approximate error exponent can be ex-
tures that maximize and minimize the exponent are the Markov
pressed in terms of the weights between two adja-
chain (under some conditions on the correlations) and the star,
cent edges as in (19). Therefore, we can write the extremal
respectively, and these are the extremal structures in terms of
distribution in (25) as
the diameter. Second, we optimize over the positions (or place-
ment) of the correlation coefficients on the edges of the optimal
(27)
structures.
Let be a fixed vector of feasible6 cor-
relation coefficients, i.e., for all . For a tree, Note that in (27), is the edge set of a weighted graph whose
it follows from (18) that if ’s are the correlation coefficients on edge weights are given by . Since the weight is between two
the edges, then is a necessary and sufficient condition edges, it is more convenient to consider line graphs defined in
to ensure that . Define to be the group of permuta- Section VI-A.
tions of order , hence elements in are permutations of We now transform the intractable optimization problem in
a given ordered set with cardinality . Also denote the set of (27) over the set of trees to an optimization problem over all
tree-structured, -variate Gaussians which have unit variances the set of line graphs:
at all nodes and as the correlation coefficients on the edges in
some order as . Formally, (28)
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2708 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: LEARNING GAUSSIAN TREE MODELS 2709
Fig. 6. If j j < j j, then the likelihood of the non-edge (1; 3) replacing (31)
edge (1; 2) would be higher than if j j = j j. Hence, the weight
W ( ; ) is maximized when equality holds. (32)
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2710 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
VIII. CONCLUSION
Using the theory of large deviations, we have obtained
the error exponent associated with learning the structure of
a Gaussian tree model. Our analysis in this theoretical paper
also answers the fundamental questions as to which set of
parameters and which structures result in high and low error
exponents. We conclude that Markov chains (respectively,
stars) are the easiest (respectively, hardest) structures to learn
Fig. 9. Left: The symmetric star graphical model used for comparing the true as they maximize (respectively, minimize) the error exponent.
and approximate crossover rates as described in Section VII-A. Right: The struc-
ture of a hybrid tree graph with d = 10 nodes as described in Section VII-B.
Indeed, our numerical experiments on a variety of Gaussian
2 2
This is a tree with a length-d= chain and a order d= star attached to one of graphical models validate the theory presented. We believe the
the leaf nodes of the chain. intuitive results presented in this paper will lend useful insights
for modeling high-dimensional data using tree distributions.
APPENDIX A
PROOF OF THEOREM 1
Proof: This proof borrows ideas from [25]. We assume
(i.e., disjoint edges) for simplicity. The result for
follows similarly. Let be a set of nodes corre-
sponding to node pairs and . Given a subset of node pairs
such that , the set of feasible
moments [4] is defined as
(33)
(34)
Lemma 13 (Sanov’s Thm, Contraction Principle [20]): For
the event that the empirical moments of the i.i.d. observations
are equal to , we have the LDP
Fig. 10. Simulated error probabilities and error exponents for chain, hybrid (35)
and star graphs with fixed . The dashed lines show the true error exponent K
!1
computed numerically using (10) and (11). Observe that the simulated error
exponent converges to the true error exponent as n . The legend applies If , the optimizing pdf in (35) is given by
to both plots. where the set of
constants are chosen such that
given in (34).
correlation coefficients were chosen to be equally spaced in the From Lemma 13, we conclude that the optimal in (35)
interval and they were randomly placed on the edges is a Gaussian. Thus, we can restrict our search for the optimal
of the three tree graphs. We observe from Fig. 10 that for fixed distribution to a search over Gaussians, which are parameter-
, the star and chain have the highest and lowest error probabil- ized by means and covariances. The crossover event for mutual
ities , respectively. The simulated error exponents given information defined in (8) is , since in the
by also converge to their true values as Gaussian case, the mutual information is a monotonic function
. The exponent associated to the star is higher than of the square of the correlation coefficient [cf. (5)]. Thus it suf-
that of the chain, which is corroborated by Theorem 8, even fices to consider , instead of the event involving the
though the theorem only applies in the very-noisy case (and mutual information quantities. Let , and
for in the case of the chain). From this experi- be the moments of
ment, the claim also seems to be true even though the setup is , where is the covariance of and , and
not very-noisy. We also observe that the error exponent of the is the variance of (and similarly for the other mo-
hybrid is between that of the star and the chain. ments). Now apply the contraction principle [21, Ch. 3] to the
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: LEARNING GAUSSIAN TREE MODELS 2711
Fig. 11. Illustration for the proof of Corollary 3. The correlation coefficient on
=
the non-edge is and satisfies j j j j if j j =1 . Fig. 12. Illustration for the proof of Corollary 3. The unique path between i
(
and i is i ; i ; ... ;i ) = Path( ; )e E .
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2712 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: LEARNING GAUSSIAN TREE MODELS 2713
(48) REFERENCES
[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference, 2nd ed. San Francisco, CA: Morgan Kaufmann,
We now argue that the length chain (in the line 1988.
graph domain) with correlation coefficients arranged [2] D. Geiger and D. Heckerman, “Learning Gaussian networks,” in Un-
in decreasing order on the nodes (see Fig. 13) is the line graph certainty in Artificial Intelligence (UAI). San Francisco, CA: Morgan
Kaufmann, 1994.
that optimizes (48). Note that the edge weights of are [3] S. Lauritzen, Graphical Models. Oxford, U.K.: Oxford Univ. Press,
given by for . Consider any other 1996.
line graph . Then we claim that [4] M. J. Wainwright and M. I. Jordan, Graphical Models, Exponential
Families, and Variational Inference, vol. of Foundations and Trends in
Machine Learning. Boston, MA: Now, 2008.
(49) [5] C. K. Chow and C. N. Liu, “Approximating discrete probability distri-
butions with dependence trees,” IEEE Trans. Inf. Theory, vol. 14, no.
3, pp. 462–467, May 1968.
To prove (49), note that any edge is consecu- [6] C. K. Chow and T. Wagner, “Consistency of an estimate of tree-depen-
tive, i.e., of the form . Fix any such . Define the dent probability distributions,” IEEE Trans. Inf. Theory, vol. 19, no. 3,
two subchains of as and pp. 369–371, May 1973.
[7] V. Y. F. Tan, A. Anandkumar, L. Tong, and A. S. Willsky, “A large-de-
(see Fig. 13). Also, viation analysis for the maximum likelihood learning of tree struc-
let and tures,” in Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, Jul. 2009
be the nodes in subchains and , respectively. Because [Online]. Available: http://arxiv.org/abs/0905.0940
[8] D. M. Chickering, “Learning equivalence classes of Bayesian network
, there is a set of edges (called cut set edges) structures,” J. Mach. Learn. Res., vol. 2, pp. 445–498, 2002.
to ensure that [9] M. Dudik, S. J. Phillips, and R. E. Schapire, “Performance guarantees
the line graph remains connected.11 The edge weight of each for regularized maximum entropy density estimation,” in Proc. Conf.
cut set edge satisfies by Learn. Theory (COLT), 2004.
[10] N. Meinshausen and P. Buehlmann, “High dimensional graphs and
(46) because and and . By consid- variable selection with the Lasso,” Ann. Statist., vol. 34, no. 3, pp.
ering all cut set edges for fixed and subsequently 1436–1462, 2006.
all , we establish (49). It follows that [11] M. J. Wainwright, P. Ravikumar, and J. D. Lafferty, “High-dimen-
sional graphical model selection using l -regularized logistic regres-
sion,” in Neural Information Processing Systems (NIPS). Cambridge,
(50) MA: MIT Press, 2006.
[12] N. Santhanam and M. J. Wainwright, “Information-theoretic limits of
selecting binary graphical models in high dimensions,” in Proc. IEEE
because the other edges in and in (49) are common. Int. Symp. on Inf. Theory, Toronto, Canada, Jul. 2008.
See Fig. 14 for an example to illustrate (49). [13] O. Zuk, S. Margel, and E. Domany, “On the number of samples needed
Since the chain line graph achieves the maximum to learn the correct structure of a Bayesian network,” in Uncertainly in
Artificial Intelligence (UAI). Arlington, VA: AUAI Press, 2006.
bottleneck edge weight, it is the optimal line graph, i.e., [14] A. Montanari and J. A. Pereira, “Which graphical models are difficult
. Furthermore, since the line graph of a chain to learn,” in Neural Information Processing Systems (NIPS). Cam-
bridge, MA: MIT Press, 2009.
11The line graph H = L( )
G of a connected graph G is connected. In addi- [15] A. Dempster, “Covariance selection,” Biometrics, vol. 28, pp. 157–175,
tion, any H 2 L(T ) must be a claw-free, block graph [24, Th. 8.5]. 1972.
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.
2714 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010
[16] A. d’Aspremont, O. Banerjee, and L. El Ghaoui, “First-order methods Animashree Anandkumar (S’02–M’09) received
for sparse covariance selection,” SIAM J. Matrix Anal. Appl., vol. 30, the B.Tech. degree in electrical engineering from
no. 1, pp. 56–66, Feb. 2008. the Indian Institute of Technology, Madras, in 2004,
[17] A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu, “Sparse permuta- and the Ph.D. degree in electrical engineering from
tion invariant covariance estimation,” Electron. J. Statist., vol. 2, pp. Cornell University, Ithaca, NY, in 2009.
494–515, 2008. She is currently a Postdoctoral Researcher with
[18] S. Borade and L. Zheng, “Euclidean information theory,” in Proc. the Stochastic Systems Group, the Massachusetts
Allerton Conf., Monticello, IL, Sep. 2007. Institute of Technology (MIT), Cambridge. Her
[19] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd research interests are in the area of statistical-signal
ed. New York: Wiley-Intersci., 2006. processing, information theory and networking with
[20] J.-D. Deuschel and D. W. Stroock, Large Deviations. Providence, RI: a focus on distributed inference, learning and fusion
Amer. Math. Soc., 2000. in graphical models.
[21] F. Den Hollander, Large Deviations (Fields Institute Monographs). Dr. Anandkumar received the 2008 IEEE Signal Processing Society (SPS)
Providence, RI: Amer. Math. Soc., 2000. Young Author award for her paper coauthored with L. Tong, which appeared
[22] D. B. West, Introduction to Graph Theory, 2nd ed. Englewood Cliffs, in the IEEE TRANSACTIONS ON SIGNAL PROCESSING. She is the recipient
NJ: Prentice-Hall, 2000. of the Fran Allen IBM Ph.D. fellowship 2008–2009, presented annually to
[23] S. Verdu, “Spectral efficiency in the wideband regime,” IEEE Trans. one female Ph.D. student in conjunction with the IBM Ph.D. Fellowship
Inf. Theory, vol. 48, no. 6, Jun. 2002. Award. She was named a finalist for the Google Anita-Borg Scholarship
[24] F. Harary, Graph Theory. Reading, MA: Addison-Wesley, 1972. 2007–2008 and also received the Student Paper Award at the 2006 International
[25] S. Shen, “Large deviation for the empirical correlation coefficient of Conference on Acoustic, Speech, and Signal Processing (ICASSP). She has
two Gaussian random variables,” Acta Math. Scientia, vol. 27, no. 4, served as a reviewer for IEEE TRANSACTIONS ON SIGNAL PROCESSING, IEEE
pp. 821–828, Oct. 2007. TRANSACTIONS ON INFORMATION THEORY, IEEE TRANSACTIONS ON WIRELESS
[26] M. Fazel, H. Hindi, and S. P. Boyd, “Log-det heuristic for matrix rank COMMUNICATIONS, and IEEE SIGNAL PROCESSING LETTERS.
minimization with applications with applications to Hankel and Eu-
clidean distance metrics,” in Proc. Amer. Control Conf., 2003.
Authorized licensed use limited to: MIT Libraries. Downloaded on May 06,2010 at 00:23:29 UTC from IEEE Xplore. Restrictions apply.