You are on page 1of 687

EDITORS

Jean-Pierre Françoise Gregory L. Naber Tsou Sheung Tsun


Université P.-M. Curie, Paris VI Drexel University University of Oxford
Paris, France Philadelphia, PA, USA Oxford, UK
EDITORIAL ADVISORY BOARD

Sergio Albeverio Lisa Jeffrey


Rheinische Friedrich-Wilhelms-Universität Bonn University of Toronto
Bonn, Germany Toronto, Canada

Huzihiro Araki T.W.B. Kibble


Kyoto University Imperial College of Science, Technology and Medicine
Kyoto, Japan London, UK

Abhay Ashtekar Antti Kupiainen


Pennsylvania State University University of Helsinki
University Park, PA, USA Helsinki, Finland

Andrea Braides Shahn Majid


Università di Roma ‘‘Tor Vergata’’ Queen Mary, University of London
Roma, Italy London, UK

Francesco Calogero Barry M. McCoy


Università di Roma ‘‘La Sapienza’’ State University of New York Stony Brook
Roma, Italy Stony Brook, NY, USA

Cecile DeWitt-Morette Hirosi Ooguri


The University of Texas at Austin California Institute of Technology
Austin, TX, USA Pasadena, CA, USA

Artur Ekert Roger Penrose


University of Cambridge University of Oxford
Cambridge, UK Oxford, UK

Giovanni Gallavotti Pierre Ramond


Università di Roma ‘‘La Sapienza’’ University of Florida
Roma, Italy Gainesville, FL, USA

Simon Gindikin Tudor Ratiu


Rutgers University Ecole Polytechnique Federale de Lausanne
Piscataway, NJ, USA Lausanne, Switzerland

Gennadi Henkin Rudolf Schmid


Université P.-M. Curie, Paris VI Emory University
Paris, France Atlanta, GA, USA

Allen C. Hirshfeld Albert Schwarz


Universität Dortmund University of California
Dortmund, Germany Davis, CA, USA
Yakov Sinai Vladimir Turaev
Princeton University Institut de Recherche Mathématique Avancée,
Princeton, NJ, USA Strasbourg, France

Herbert Spohn Gabriele Veneziano


Technische Universität München CERN, Genève, Switzerland
München, Germany
Reinhard F. Werner
Stephen J. Summers Technische Universität Braunschweig
University of Florida Braunschweig, Germany
Gainesville, FL, USA
C.N. Yang
Roger Temam Tsinghua University
Indiana University Beijing, China
Bloomington, IN, USA
Eberhard Zeidler
Craig A. Tracy Max-Planck Institut für Mathematik in
University of California den Naturwissenschaften
Davis, CA, USA Leipzig, Germany

Andrzej Trautman Steve Zelditch


Warsaw University Johns Hopkins University
Warsaw, Poland Baltimore, MD, USA
FOREWORD

I n bygone centuries, our physical world appeared to be filled to the brim with mysteries. Divine powers
could provide for genuine miracles; water and sunlight could turn arid land into fertile pastures, but the
same powers could lead to miseries and disasters. The force of life, the vis vitalis, was assumed to be the
special agent responsible for all living things. The heavens, whatever they were for, contained stars and other
heavenly bodies that were the exclusive domain of the Gods.
Mathematics did exist, of course. Indeed, there was one aspect of our physical world that was recognised to
be controlled by precise, mathematical logic: the geometric structure of space, elaborated to become a genuine
form of art by the ancient Greeks. From my perspective, the Greeks were the first practitioners of ‘mathematical
physics’, when they discovered that all geometric features of space could be reduced to a small number of
axioms. Today, these would be called ‘fundamental laws of physics’. The fact that the flow of time could be
addressed with similar exactitude, and that it could be handled geometrically together with space, was only
recognised much later. And, yes, there were a few crazy people who were interested in the magic of numbers,
but the real world around us seemed to contain so much more that was way beyond our capacities of analysis.
Gradually, all this changed. The Moon and the planets appeared to follow geometrical laws. Galilei and
Newton managed to identify their logical rules of motion, and by noting that the concept of mass could be
applied to things in the sky just like apples and cannon balls on Earth, they made the sky a little bit more
accessible to us. Electricity, magnetism, light and sound were also found to behave in complete accordance
with mathematical equations.
Yet all of this was just a beginning. The real changes came with the twentieth century. A completely new
way of thinking, by emphasizing mathematical, logical analysis rather than empirical evidence, was pioneered
by Albert Einstein. Applying advanced mathematical concepts, only known to a few pure mathematicians, to
notions as mundane as space and time, was new to the physicists of his time. Einstein himself had a hard
time struggling through the logic of connections and curvatures, notions that were totally new to him, but are
only too familiar to students of mathematical physics today. Indeed, there is no better testimony of Einstein’s
deep insights at that time, than the fact that we now teach these things regularly in our university classrooms.
Special and general relativity are only small corners of the realm of modern physics that is presently being
studied using advanced mathematical methods. We have notoriously complex subjects such as phase transitions in
condensed matter physics, superconductivity, Bose–Einstein condensation, the quantum Hall effect, particularly
the fractional quantum Hall effect, and numerous topics from elementary particle physics, ranging from fibre
bundles and renormalization groups to supergravity, algebraic topology, superstring theory, Calabi–Yau spaces
and what not, all of which require the utmost of our mental skills to comprehend them.
The most bewildering observation that we make today is that it seems that our entire physical world
appears to be controlled by mathematical equations, and these are not just sloppy and debatable models, but
precisely documented properties of materials, of systems, and of phenomena in all echelons of our universe.
Does this really apply to our entire world, or only to parts of it? Do features, notions, entities exist that are
emphatically not mathematical? What about intuition, or dreams, and what about consciousness? What
about religion? Here, most of us would say, one should not even try to apply mathematical analysis, although
even here, some brave social scientists are making attempts at coordinating rational approaches.
No, there are clear and important differences between the physical world and the mathematical world.
Where the physical world stands out is the fact that it refers to ‘reality’, whatever ‘reality’ is. Mathematics is
the world of pure logic and pure reasoning. In physics, it is the experimental evidence that ultimately decides
whether a theory is acceptable or not. Also, the methodology in physics is different.
A beautiful example is the serendipitous discovery of superconductivity. In 1911, the Dutch physicist Heike
Kamerlingh Onnes was the first to achieve the liquefaction of helium, for which a temperature below 4.25 K
had to be realized. Heike decided to measure the specific conductivity of mercury, a metal that is frozen solid
at such low temperatures. But something appeared to go wrong during the measurements, since the volt
meter did not show any voltage at all. All experienced physicists in the team assumed that they were dealing
with a malfunction. It would not have been the first time for a short circuit to occur in the electrical
equipment, but, this time, in spite of several efforts, they failed to locate it. One of the assistants was
responsible for keeping the temperature of the sample well within that of liquid helium, a dull job, requiring
nothing else than continuously watching some dials. During one of the many tests, however, he dozed off.
The temperature rose, and suddenly the measurements showed the normal values again. It then occurred to
the investigators that the effect and its temperature dependence were completely reproducible. Below 4.19
degrees Kelvin the conductivity of mercury appeared to be strictly infinite. Above that temperature, it is
finite, and the transition is a very sudden one. Superconductivity was discovered (D. van Delft, ‘‘Heike
Kamerling Onnes’’, Uitgeverij Bert Bakker, Amsterdam, 2005 (in Dutch)).
This is not the way mathematical discoveries are made. Theorems are not produced by assistants falling
asleep, even if examples do exist of incidents involving some miraculous fortune.
The hybrid science of mathematical physics is a very curious one. Some of the topics in this Encyclopedia
are undoubtedly physical. High Tc superconductivity, breaking water waves, and magneto-hydrodynamics,
are definitely topics of physics where experimental data are considered more decisive than any high-brow
theory. Cohomology theory, Donaldson–Witten theory, and AdS/CFT correspondence, however, are examples
of purely mathematical exercises, even if these subjects, like all of the others in this compilation, are strongly
inspired by, and related to, questions posed in physics.
It is inevitable, in a compilation of a large number of short articles with many different authors, to see quite a
bit of variation in style and level. In this Encyclopedia, theoretical physicists as well as mathematicians together
made a huge effort to present in a concise and understandable manner their vision on numerous important
issues in advanced mathematical physics. All include references for further reading. We hope and expect that
these efforts will serve a good purpose.

Gerard ’t Hooft,
Spinoza Institute,
Utrecht University,
The Netherlands.
PREFACE

M athematical Physics as a distinct discipline is relatively new. The International Association of


Mathematical Physics was founded only in 1976. The interaction between physics and mathematics
has, of course, existed since ancient times, but the recent decades, perhaps partly because we are living
through them, appear to have witnessed tremendous progress, yielding new results and insights at a dizzying
pace, so much so that an encyclopedia seems now needed to collate the gathered knowledge.
Mathematical Physics brings together the two great disciplines of Mathematics and Physics to the benefit of
both, the relationship between them being symbiotic. On the one hand, it uses mathematics as a tool to
organize physical ideas of increasing precision and complexity, and on the other it draws on the questions
that physicists pose as a source of inspiration to mathematicians. A classical example of this relationship
exists in Einstein’s theory of relativity, where differential geometry played an essential role in the formulation
of the physical theory while the problems raised by the ensuing physics have in turn boosted the development
of differential geometry. It is indeed a happy coincidence that we are writing now a preface to an
encyclopedia of mathematical physics in the centenary of Einstein’s annus mirabilis.
The project of putting together an encyclopedia of mathematical physics looked, and still looks, to us a
formidable enterprise. We would never have had the courage to undertake such a task if we did not believe,
first, that it is worthwhile and of benefit to the community, and second, that we would get the much-needed
support from our colleagues. And this support we did get, in the form of advice, encouragement, and
practical help too, from members of our Editorial Advisory Board, from our authors, and from others as well,
who have given unstintingly so much of their time to help us shape this Encyclopedia.
Mathematical Physics being a relatively new subject, it is not yet clearly delineated and could mean
different things to different people. In our choice of topics, we were guided in part by the programs of recent
International Congresses on Mathematical Physics, but mainly by the advice from our Editorial Advisory
Board and from our authors. The limitations of space and time, as well as our own limitations, necessitated
the omission of certain topics, but we have tried to include all that we believe to be core subjects and to cover
as much as possible the most active areas.
Our subject being interdisciplinary, we think it appropriate that the Encyclopedia should have certain
special features. Applications of the same mathematical theory, for instance, to different problems in physics
will have different emphasis and treatment. By the same token, the same problem in physics can draw upon
resources from different mathematical fields. This is why we divide the Encyclopedia into two broad sections:
physics subjects and related mathematical subjects. Articles in either section are deliberately allowed a fair
amount of overlap with one another and many articles will appear under more than one heading, but all are
linked together by elaborate cross referencing. We think this gives a better picture of the subject as a whole
and will serve better a community of researchers from widely scattered yet related fields.
The Encyclopedia is intended primarily for experienced researchers but should be of use also to beginning
graduate students. For the latter category of readers, we have included eight elementary introductory articles for easy
reference, with those on mathematics aimed at physics graduates and those on physics aimed at mathematics
graduates, so that these articles can serve as their first port of call to enable them to embark on any of the main
articles without the need to consult other material beforehand. In fact, we think these articles may even form the
foundation of advanced undergraduate courses, as we know that some authors have already made such use of them.
In addition to the printed version, an on-line version of the Encyclopedia is planned, which will allow both
the contents and the articles themselves to be updated if and when the occasion arises. This is probably a
necessary provision in such a rapidly advancing field.
This project was some four years in the making. Our foremost thanks at its completion go to the members
of our Editorial Advisory Board, who have advised, helped and encouraged us all along, and to all our
authors who have so generously devoted so much of their time to writing these articles and given us much
useful advice as well. We ourselves have learnt a lot from these colleagues, and made some wonderful
contacts with some among them. Special thanks are due also to Arthur Greenspoon whose technical expertise
was indispensable.
The project was started with Academic Press, which was later taken over by Elsevier. We thank warmly
members of their staff who have made this transition admirably seamless and gone on to assist us greatly in
our task: both Carey Chapman and Anne Guillaume, who were in charge of the whole project and have been
with us since the beginning, and Edward Taylor responsible for the copy-editing. And Martin Ruck, who
manages to keep an overwhelming amount of details constantly at his fingertips, and who is never known to
have lost a single email, deserves a very special mention.
As a postscript, we would like to express our gratitude to the very large number of authors who generously
agreed to donate their honorariums to support the Committee for Developing Countries of the European
Mathematical Society in their work to help our less fortunate colleagues in the developing world.

Jean-Pierre Françoise
Gregory L. Naber
Tsou Sheung Tsun
PERMISSION ACKNOWLEDGMENTS
The following material is reproduced with kind permission of Nature Publishing Group
Figures 11 and 12 of ‘‘Point-vortex Dynamics’’
http://www.nature.com/nature
The following material is reproduced with kind permission of Oxford University Press
Figure 1 of ‘‘Random Walks in Random Environments’’
http://www.oup.co.uk
Introductory Articles
Introductory Article: Classical Mechanics
G Gallavotti, Università di Roma ‘‘La Sapienza,’’ forces not corresponding to a potential are certain
Rome, Italy velocity-dependent forces like the Coriolis force
ª 2006 G Gallavotti. Published by Elsevier Ltd. (which, however, appears only in noninertial frames
All rights reserved. of reference) and the closely related Lorentz force
(in electromagnetism): they could be easily accom-
modated in the Hamiltonian formulation of
mechanics; see Appendix 2.
General Principles The action principle states that an equivalent
Classical mechanics is a theory of motions of point formulation of the eqns [1] is that a motion
particles. If X = (x1 , . . . , xn ) are the particle positions t ! X 0 (t) satisfying [1] during a time interval
in a Cartesian inertial system of coordinates, the [t1 , t2 ] and leading from X 1 = X 0 (t1 ) to X 2 = X 0 (t2 ),
equations of motion are determined by their masses renders stationary the action
(m1 , . . . , mn ), mj > 0, and by the potential energy of Z t2 X !
n
interaction, V(x1 , . . . , xn ), as 1 _ 2
AðfXgÞ ¼ mi X i ðtÞ  VðXðtÞÞ dt ½2
t1 i¼1
2
€i ¼ @xi Vðx1 ; . . . ; xn Þ;
mi x i ¼ 1; . . . ; n ½1
within the class Mt1 , t2 (X 1 , X 2 ) of smooth (i.e.,
here xi = (xi1 , . . . , xid ) are coordinates of the ith analytic) ‘‘motions’’ t ! X(t) defined for t 2 [t1 , t2 ]
particle and @xi is the gradient (@xi1 , . . . , @xid ); d is the and leading from X 1 to X 2 .
space dimension (i.e., d = 3, usually). The potential The function
energy function will be supposed ‘‘smooth,’’ that is,
analytic except, possibly, when two positions coin- 1X n
def
LðY, XÞ ¼ mi y2i  VðXÞ¼ KðYÞ  VðXÞ,
cide. The latter exception is necessary to include the 2 i¼1
important cases of gravitational attraction or, when Y ¼ ðy1 , . . . , yn Þ
dealing with electrically charged particles, of Cou-
lomb interaction. A basic result is that if V is is called the Lagrangian function and the action can
bounded below, eqn [1] admits, given initial data be written as
X 0 = X(0), X_ 0 = X(0),
_ a unique global solution
Z t2
t ! X(t), t 2 (1, 1); otherwise a solution can fail
_
LðXðtÞ; XðtÞÞ dt
to be global if and only if, in a finite time, it reaches
t1
infinity or a singularity point (i.e., a configuration in
which two or more particles occupy the same point: _
The quantity K(X(t)) is called kinetic energy and
an event called a collision). motions satisfying [1] conserve energy as time
In eqn [1], @xi V(x1 , . . . , xn ) is the force acting on t varies, that is,
the points. More general forces are often admitted.
For instance, velocity-dependent friction forces: they _
KðXðtÞÞ þ VðXðtÞÞ ¼ E ¼ const: ½3
are not considered here because of their phenomeno-
logical nature as models for microscopic phenomena Hence the action principle can be intuitively thought
which should also, in principle, be explained in of as saying that motions proceed by keeping
terms of conservative forces (furthermore, even from constant the energy, sum of the kinetic and potential
a macroscopic viewpoint, they are rather incomplete energies, while trying to share as evenly as possible
models, as they should be considered together with their (average over time) contribution to the energy.
the important heat generation phenomena that In the special case in which V is translation invariant,
def P
accompany them). Another interesting example of motions conserve linear momentum Q = i mi x_ i ; if V
2 Introductory Article: Classical Mechanics

is rotation invariant around thePorigin O, motions In general, the ‘-dimensional manifold M will not
def
conserve angular momentum M = i mi xi ^ x_ i , where ^ admit a global system of coordinates: however, it
denotes the vector product in Rd , that is, it is the tensor will be possible to describe points in the vicinity
(a ^ b)ij = ai bj  bi aj , i, j = 1, . . . , d: if the dimension of any X 0 2 M by using N = nd coordinates
d = 3 the a ^ b will be naturally regarded as a vector. q = (q1 , . . . , q‘ , q‘þ1 , . . . , qN ) varying in an open ball
More generally, to any continuous symmetry group of BX 0 : X = X(q1 , . . . , q‘ , q‘þ1 , . . . , qN ).
the Lagrangian correspond conserved quantities: this is The q-coordinates can be chosen well adapted to
formalized in the Noether theorem. the surface M and to the kinetic metric, i.e., so that
It is convenient to think that the scalar product the points of M are identified by q‘þ1 =    = qN = 0
in Rdn is defined Pin terms of the ordinaryPscalar product (which is the meaning of ‘‘adapted’’); furthermore,
in R d , a  b = dj= 1 aj bj , by (v, w) = ni= 1 mi vi  wi : infinitesimal displacements (0, . . . , 0, d"‘þ1 , . . . , d"N )
so that kinetic energy and line element ds can be out of a point X 0 2 M are orthogonal to M (in the
written as K(X) _ = 1 (X, _ X)_ and ds2 = Pn mi dx2 , kinetic metric) and have a length independent of the
2 i=1 i
respectively. Therefore, the metric generated by the position of X 0 on M (which is the meaning of ‘‘well
latter scalar product can be called kinetic energy adapted’’ to the kinetic metric).
metric. Motions constrained on M arise when the
The interest of the kinetic metric appears from the potential V has the form
Maupertuis’ principle (equivalent to [1]): the princi-
ple allows us to identify the trajectory traced in R d VðXÞ ¼ Va ðXÞ þ WðXÞ ½5
by a motion that leads from X 1 to X 2 moving with
energy E. Parametrizing such trajectories as where W is a smooth function which reaches its
 ! X() by a parameter  varying in [0, 1] so that minimum value, say equal to 0, precisely on the
the line element is ds2 = (@ X, @ X) d 2 , the principle manifold M while Va is another smooth potential.
states that the trajectory of a motion with energy E The factor  > 0 is a parameter called the rigidity of
which leads from X 1 to X 2 makes stationary, among the constraint.
the analytic curves x 2 M0, 1 (X 1 , X 2 ), the function A particularly interesting case arises when the level
surfaces of W also have the geometric property of
Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi being ‘‘parallel’’ to the surface M: in the precise sense
LðxÞ ¼ E  VðxðsÞÞ ds ½4 that the matrix @q2i qj W(X), i, j > ‘ is positive definite
x
and X-independent, for all X 2 M, in a system of
so that the possible trajectories traced by the coordinates well adapted to the kinetic metric.
solutions of [1] in Rnd and with energy E can be A potential W with the latter properties can be
identified with the geodesics of the metric called an approximately ideal constraint reaction. In
def
dm2 = (E V(X))  ds2 . fact, it can be proved that, given an initial datum
For more details, the reader is referred to Landau X 0 2 M with velocity X_ 0 tangent to M, i.e., given
and Lifshitz (1976) and Gallavotti (1983). an initial datum whose coordinates in a local system
of coordinates are (q0 , 0) and (q_ 0 , 0) with q0 =
(q01 , . . . , q0‘ ) and q_ 0 = (q_ 01 , . . . , q_ 0‘ ), the motion
generated by [1] with V given by [5] is a motion
Constraints t ! X  (t) which
Often particles are subject to constraints which force 1. as  ! 1 tends to a motion t ! X 1 (t);
the motion to take place on a surface M  Rnd , i.e., 2. as long as X 1 (t) stays in the vicinity of the initial
X(t) is forced to be a point on the manifold data, say for 0  t  t1 , so that it can be
M. A typical example is provided by rigid systems described in the above local adapted coordinates,
in which motions are subject to forces which keep its coordinates have the form t ! (q(t), 0) =
the mutual distances of the particles constant: (q1 (t), . . . , q‘ (t), 0, . . . , 0): that is, it is a motion
jxi  xj j = ij , with ij time-independent positive quan- developing on the constraint surface M; and
tities. In essentially all cases, the forces that imply 3. the curve t ! X 1 (t), t 2 [0, t1 ], as an element of
constraints, called constraint reactions, are velocity the space M0, t1 (X 0 , X 1 (t1 )) of analytic curves on
dependent and, therefore, are not in the class of M connecting X 0 to X 1 (t1 ), renders the action
conservative forces considered here, cf. [1]. Hence,
Z t1 
from a fundamental viewpoint admitting only conser- 
AðXÞ ¼ _
KðXðtÞÞ  Va ðXðtÞÞ dt ½6
vative forces, constrained systems should be regarded 0
as idealizations of systems subject to conservative
forces which approximately imply the constraints. stationary.
Introductory Article: Classical Mechanics 3

The latter property can be formulated ‘‘intrinsically,’’ satisfy the mentioned conditions and therefore, the so
that is, referring only to M as a surface, via the constrained motions X 1 (t) of the body satisfy the
restriction of the metric ds2 to line elements ds = variational principles mentioned in connection with [7]
(dq1 , . . . , dq‘ , 0, . . . , 0) tangent to M atPthe point and [9]: in other words, the above natural way of
X = (q0 , 0, . . . , 0) 2 M; we write ds2 = 1,‘ i, j gij (q) realizing a rather general rigidity constraint is ideal.
dqi dqj . The ‘  ‘ symmetric positive-definite matrix g The modern viewpoint on the physical meaning of
can be called the metric on M induced by the kinetic the constraint reactions is as follows: looking at
energy. Then the action in [6] can be written as motions in an inertial Cartesian system, it will appear
Z t1 1;‘
that the system is subject to the applied forces with
1X potential Va (X) and to constraint forces which are
AðqÞ ¼ gij ðqðtÞÞq_ i ðtÞq_ j ðtÞ
0 2 i;j defined as the differences Ri = mi x €i þ ¶ xi Va (X). The
! latter reflect the action of the forces with potential
W(X) in the limit of infinite rigidity ( ! 1).
 V a ðqðtÞÞ dt ½7
In applications, sometimes the action of a constraint
def
can be regarded as ideal: the motion will then verify the
where V a (q) = Va (X(q1 , . . . , q‘ ,0, . . . , 0)): the function variational principles mentioned and R can be com-
1;‘ puted as the differences between the mi x €i and the active
def 1X
Lðh; qÞ ¼ gij ðqÞi j  V a ðqÞ forces ¶ xi Va (X). In dynamics problems it is, however,
2 i;j a very difficult and important matter, particularly in
1 engineering, to judge whether a system of particles can
 gðqÞh  h  V a ðqÞ ½8 be considered as subject to ideal constraints: this leads
2
to important decisions in the construction of machines.
is called the constrained Lagrangian of the system. It simplifies the calculations of the reactions and fatigue
An important property is that the constrained motions of the materials but a misjudgment can have serious
conserve the energy defined as E = 12 (g(q)q, _ q)þ
_ consequences about stability and safety. For statics
V a (q); see next section. problems, the difficulty is of lower order: usually
The constrained motion X 1 (t) of energy E satisfies assuming that the constraint reaction is ideal leads to
the Maupertuis’ principle in the sense that the curve an overestimate of the requirements for stability of
on M on which the motion develops renders equilibria. Hence, employing the action principle to
Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi statics problems, where it constitutes the principle of
LðxÞ ¼ E  Va ðxðsÞÞ ds ½9 virtual work, generally leads to economic problems
x
rather than to safety issues. Its discovery even predates
stationary among the (smooth) curves that develop Newtonian mechanics.
on M connecting two fixed values X 1 and X 2 . In the We refer the reader to Arnol’d (1989) and
particular case in which ‘ = n this is again Mauper- Gallavotti (1983) for more details.
tuis’ principle for unconstrained motions under the
potential V(X). In general, ‘ is called the number of
degrees of freedom because a complete description
of the initial data requires 2‘ coordinates q(0), q(0).
_ Lagrange and Hamilton Forms
If W is minimal on M but the condition on W of of the Equations of Motion
having level surfaces parallel to M is not satisfied, i.e., The stationarity condition for the action A(q), cf.
if W is not an approximate ideal constraint reaction, [7], [8], is formulated in terms of the Lagrangian
it still remains true that the limit motion X 1 (t) takes L(h, x), see [8], by
place on M. However, in general, it will not satisfy the
above variational principles. For this reason, motions d
arising as limits (as  ! 1) of motions developing @ LðqðtÞ;
_ qðtÞÞ
dt i
under the potential [5] with W having minimum on M ¼ @xi LðqðtÞ;
_ qðtÞÞ; i ¼ 1; . . . ; ‘ ½10
and level curves parallel (in the above sense) to M are
called ideally constrained motions or motions subject which is a second-order differential equation called
by ideal constraints to the surface M. the Lagrangian equation of motion. It can be cast in
As anPexample, suppose that W has the form ‘‘normal form’’: for this purpose, adopting the
W(X) = i, j2P wij (jxi  xj j) with wij (jxj) 0 an ana- convention of ‘‘summation over repeated indices,’’
lytic function vanishing only when jxj = ij for i, j in introduce the ‘‘generalized momenta’’
some set of pairs P and for some given distances ij (e.g., def
2
wij (x) = (x  2ij )2 ,  > 0). Then W can be shown to pi ¼ gðqÞij q_ j ; i ¼ 1; . . . ; ‘ ½11
4 Introductory Article: Classical Mechanics

Since g(q) > 0, the motions t ! q(t) and the corre- [12] can be equivalently formulated by requiring
sponding velocities t ! q(t)
_ can be described equiva- that the function
lently by t ! (q(t), p(t)): and the equations of motion Z t2 
def
[10] become the first-order equations AH ðjÞ ¼ pðtÞ  k_ ðtÞ  HðpðtÞ; k ðtÞÞ dt ½14
t1
q_ i ¼ @pi Hðp; qÞ; p_ i ¼ @qi Hðp; qÞ ½12
be stationary for j = j 0 : in fact, eqns [12] are the
where the function H, called the Hamiltonian of the stationarity conditions for the Hamilton action
system, is defined by [14] on Mt0 , t1 ((p1 , q1 ), (p2 , q2 ); M). And, since the
def
derivatives of p(t) do not appear in [14], statio-
Hðp; qÞ ¼ 12ðgðqÞ1 p; pÞ þ V a ðqÞ ½13 narity is even achieved in the larger space
Mt1 , t2 (q1 , q2 ; M) of the motions j : t ! (p(t), k (t))
Equations [12], regarded as equations of motion for
leading from q1 to q2 without any restriction on
phase space points (p, q), are called Hamilton
the initial and final momenta p1 , p2 (which, there-
equations. In general, q are local coordinates on M
fore, cannot be prescribed a priori independently
and motions are specified by giving q, q_ or p, q.
of q1 , q2 ). If the prescribed data p1 , q1 , p2 , q2 are
Looking for a coordinate-free representation of
not compatible with the equations of motion (e.g.,
motions consider the pairs X, Y with X 2 M and Y a
H(p1 , q2 ) 6¼ H(p2 , q2 )), then the action functional
vector Y 2 TX tangent to M at the point X. The
has no stationary trajectory in Mt1 , t2 ((p1 , q1 ),
collection of pairs (Y, X) is denoted T(M) = [X2M
_ (p2 q2 ); M).
(TX  {X}) and a motion t ! (X(t), X(t)) 2 T(M) in
For more details, the reader is referred to Landau
local coordinates is represented by (q(t),
_ q(t)). The
and Lifshitz (1976), Arnol’d (1989), and Gallavotti
space T(M) can be called the space of initial data for
(1983).
Lagrange’s equations of motion: it has 2‘ dimen-
sions (also known as the ‘‘tangent bundle’’ of M).
Likewise, the space of initial data for the
Hamilton equations will be denoted T
(M) and it Canonical Transformations of Phase
consists of pairs X, P with X 2 M and P = g(X)Y Space Coordinates
with Y a vector tangent to M at X. The space T
(M)
The Hamiltonian form, [13], of the equations of
is called the phase space of the system: it has
motion turns out to be quite useful in several
2‘ dimensions (and it is occasionally called the
problems. It is, therefore, important to remark that
‘‘cotangent bundle’’ of M).
it is invariant under a special class of transformations
Immediate consequence of [12] is
of coordinates, called canonical transformations.
d Consider a local change of coordinates on phase
HðpðtÞ; qðtÞÞ  0
dt space, i.e., a smooth, smoothly invertible map
C(p, k ) = (p 0 , k 0 ) between an open set U in the
and it means that H(p(t), q(t)) is constant along phase space of a Hamiltonian system with
the solutions of [12]. Noting that H(p, q) = ‘ degrees of freedom, into an open set U0 in a
(1=2)(g(q) q, _ q)
_ þ V a (q) is the sum of the kinetic 2‘-dimensional space. The change of coordinates is
and potential energies, it follows that the conservation said to be canonical if for any solution
of H along solutions means energy conservation in t ! (p(t), k (t)) of equations like [12], for any
presence of ideal constraints. Hamiltonian H(p, k ) defined on U, the C–image
Let St be the flow generated on the phase space t ! (p 0 (t), k 0 (t)) = C(p(t), k (t)) is a solution of [12]
variables (p, q) by the solutions of the equations of with the ‘‘same’’ Hamiltonian, that is, with
motion [12], that is, let t ! St (p, q)  (p(t), q(t)) def
Hamiltonian H0 (p 0 , k 0 ) = H(C1 (p 0 , k 0 )).
denote a solution of [12] with initial data (p, q). The condition that a transformation of coordi-
Then a (measurable) set  in phase space evolves in nates is canonical is obtained by using the
time t into a new set St  with the same volume: this arbitrariness of the function H and is simply
is obvious because the Hamilton equations [12] have expressed as a necessary and sufficient property of
manifestly zero divergence (‘‘Liouville’s theorem’’). the Jacobian L,
The Hamilton equations also satisfy a variational  
principle, called the Hamilton action principle: that A B

is, if Mt1 , t2 ((p1 , q1 ), (p2 , q2 ); M) denotes the space of C D
the analytic functions j : t ! (p(t), k (t)) which in the ½15
Aij ¼ @j 0i ; Bij ¼ @j 0i ;
time interval [t1 , t2 ] lead from (p1 , q1 ) to (p2 , q2 ),
then the condition that j 0 (t) = (p(t), q(t)) satisfies Cij ¼ @j 0i ; Dij ¼ @j 0i
Introductory Article: Classical Mechanics 5

where i, j = 1, . . . , ‘. Let It means that the Hamiltonians H(p, q) and


def
  H0 (p0 , q0 )) = H(C1 (p0 , q0 )) have Hamilton actions
0 1 AH and AH0 differing by a constant, if evaluated

1 0 on corresponding motions (p(t), q(t)) and
denote the 2‘  2‘ matrix formed by four ‘  ‘ (p0 (t), q0 (t)) = C(p(t), q(t)).
blocks, equal to the 0 matrix or, as indicated, to the The constant depends only on the initial and final
(identity matrix); then, if a superscript T denotes values (p(t1 ), q(t1 )) and (p(t2 ), q(t2 )) and, respec-
matrix transposition, the condition that the map be tively, (p0 (t1 ), q0 (t1 )) and (p0 (t2 ), q0 (t2 )) so that if
canonical is that (p(t), q(t)) makes AH extreme, then (p0 (t), q0 (t)) =
C(p(t), q(t)) also makes AH0 extreme.
 
DT BT Hence, if t ! (p(t), q(t)) solves the Hamilton equa-
L1 ¼ ELT ET or L1 ¼ ½16 tions with Hamiltonian H(p, q) then the motion
CT AT
t ! (p0 (t), q0 (t)) = C(p(t), q(t)) solves the Hamilton
which immediately implies that det L = 1. In fact, equations with Hamiltonian H0 (p0, q0 ) = H(C1 (p0, q0 ))
it is possible to show that [16] implies det L = 1. no matter which it is: therefore, the transformation is
Equation [16] is equivalent to the four relations ADT  canonical. The function  is called its generating
BCT = 1, ABT þ BAT = 0, CDT  DCT = 0, and function.
CBT þ DAT = 1. More explicitly, since the first and Equation [19] provides a way to construct
the fourth relations coincide, these can be expressed as canonical maps. Suppose that a function (p 0 , k ) is
f0i ; 0j g ¼ ij ; f0i ; 0j g ¼ 0; f0i ; 0j g ¼ 0 ½17 given and defined on some domain W; then setting

where, for any two functions F(p, k ), G(p, k ), the p ¼ @k ðp 0 ; k Þ
Poisson bracket is k 0 ¼ @p 0 ðp 0 ; k Þ

def
X
‘ 
and inverting the first equation in the form
fF; Ggðp; k Þ ¼ @k Fðp; k Þ@k Gðp; k Þ
p 0 = X(p, k ) and substituting the value for p 0 thus
k¼1
 obtained, in the second equation, a map
 @k Fðp; k Þ @k Gðp; k Þ ½18
C(p, k ) = (p 0 , k 0 ) is defined on some domain (where
The latter satisfies Jacobi’s identity: {{F, G}, Q} þ the mentioned operations can be performed) and if
{{G, Q}, F} þ {{Q, F}, G} = 0, for any three functions such domain is open and not empty then C is a
F, G, Q on the phase space. It is quite useful to canonical map.
remark that if t ! (p(t), q(t)) = St (p, q) is a solution For similar reasons, if (k , k 0 ) is a function
to Hamilton equations with Hamiltonian H then, defined on some domain then setting p = @k 
given any observable F(p, q), it ‘‘evolves’’ as (k , k 0 ), p 0 = @k 0 (k , k 0 ) and solving the first rela-
def
F(t) = F(p(t), q(t)) satisfying tion to express k 0 = D(p, k ) and substituting in the
second relation a map (p 0 , k 0 ) = C(p, k ) is defined on
@t FðpðtÞ; qðtÞÞ = {H; F}ðpðtÞ; qðtÞÞ some domain (where the mentioned operations can
Requiring the latter identity to hold for all observables be performed) and if such domain is open and not
F is equivalent to requiring that the t ! (p(t), q(t)) be a empty then C is a canonical map.
solution of Hamilton’s equations for H. Likewise, canonical transformations can be con-
Let C : U ! U0 be a smooth, smoothly invertible structed starting from a priori given functions
transformation between two open 2‘-dimensional F(p, k 0 ) or G(p, p 0 ). And the most general canonical
sets: C(p, k ) = (p 0 , k 0 ). Suppose that there is a function map can be generated locally (i.e., near a given point
(p 0 , k ) defined on a suitable domain W such that in phase space) by a single one of the above four
ways, possibly composed with a few ‘‘trivial’’
p ¼ @k ðp 0 ; k Þ canonical maps in which one pair of coordinates
Cðp; k Þ ¼ ðp 0 ; k 0 Þ ) ½19
k 0 ¼ @p 0 ðp 0 ; k Þ (i , i ) is transformed into (i , i ). The necessity of
also including the trivial maps can be traced to the
then C is canonical. This is because [19] implies that existence of homogeneous canonical maps, that is,
if k , p 0 are varied and if p, k 0 , p 0 , k are related by maps such that p  dk = p 0  dk 0 (e.g., the identity
C(p, k ) = (p 0 , k 0 ), then p  dk þ k 0  dp 0 = d(p 0 , k ), map, see below or [49] for nontrivial examples)
which implies that which are action preserving hence canonical, but
which evidently cannot be generated by a function
p  dk  Hðp; k Þdt  p 0  dk 0  HðC1 ðp 0 ; k 0 ÞÞdt
(k , k 0 ) although they can be generated by a
þ dðp 0 ; k Þ  dðp 0  k 0 Þ ½20 function depending on p 0 , k .
6 Introductory Article: Classical Mechanics

Simple examples of homogeneous canonical maps The most general solution with energy E has the
are maps in which the coordinates q are changed form q(t) = Q(t0 þ t), where t0 is defined by
into q0 = R(q) and, correspondingly, the p’s are _ 0 ), i.e., it is the time needed for
q0 = Q(t0 ), q_ 0 = Q(t
transformed as p0 = (@q R(q))1 T p, linearly: indeed, the ‘‘standard solution’’ Q(t) to reach the initial data
def
this map is generated by the function F(p0 , q) = for the new motion.
p0  R(q). If the derivative of V vanishes in one of the
For instance, consider the map ‘‘Cartesian–polar’’ extremes or if at least one of the two solutions q (E)
coordinates (q1 , q2 ) ! (,
) with (,
) ffi the polar
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi does not exist, the motion is not periodic and it may
coordinates of q (namely  = q21 þ q22 ,
= arctan be unbounded: nevertheless, it is still expressible via
def
(q2 =q1 )) and let n= q=jqj = (n1 , n2 ) and t =(n2 , n1 ). integrals of the type [22]. If the potential V is
def def periodic in q and the variable q is considered to be
Setting p = p  n, p
= p  t, the map (p1 , p2 ,
varying on a circle then essentially all solutions are
q1 , q2 ) !(p , p
, ,
) is homogeneous canonical
periodic: exceptions can occur if the energy E has a
(because p  dq = p  nd þ p  td
= p d þ p
d
).
value such that V(q) = E admits a solution where V
As a further example, any area-preserving map
has zero derivative.
(p, q) ! (p0 , q0 ) defined on an open region of the
Typical examples are the harmonic oscillator, the
plane R2 is canonical: because in this case the
pendulum, and the Kepler oscillator: whose Hamil-
matrices A, B, C, D are just numbers, which satisfy
tonians, if m, !, g, h, G, k are positive constants, are,
AD  BC = 1 and, therefore, [16] holds.
respectively,
For more details, the reader is referred to Landau
and Lifshitz (1976) and Gallavotti (1983). p2 1
þ m!2 q2
2m 2
p2  q
þ mg 1  cos ½24
Quadratures 2m h
2
The simplest mechanical systems are integrable by p 1 G2
 mk þ m 2
quadratures. For instance, the Hamiltonian on R2 , 2m jqj 2q
1 2 the Kepler oscillator Hamiltonian has a potential
Hðp; qÞ ¼ p þ VðqÞ ½21
2m which is singular at q = 0 but if G 6¼ 0 the energy
conservation forbids too close an approach to q = 0
generates a motion t ! q(t) with initial data q0 , q_ 0
and the singularity becomes irrelevant.
such that H(p0 , q0 ) = E, i.e., 12 mq_ 20 þ V(q0 ) = E,
The integral in [23] is called a quadrature and the
satisfying
systems in [21] are therefore integrable by quad-
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 ratures. Such systems, at least when the motion is
qðtÞ
_ ¼ ðE  VðqðtÞÞÞ periodic, are best described in new coordinates in
m
which periodicity is more manifest. Namely when
If the equation E = V(q) has only two solutions V(q) = E has only two roots q (E) and V 0 (q (E)) > 0
q (E) < qþ (E) and j@q V(q (E))j > 0, the motion is the energy–time coordinates can be used by replac-
periodic with period ing q, q_ or p, q by E, , where  is the time needed
Z qþ ðEÞ for the standard solution t ! Q(t) to reach the given
dx _
TðEÞ ¼ 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½22 data, that is, Q() = q, Q() = q.
_ In such coordi-
q ðEÞ ð2=mÞðE  VðxÞÞ nates, the motion is simply (E, ) ! (E,  þ t) and,
of course, the variable  has to be regarded as
The special solution with initial data q0 =
varying on a circle of radius T=2. The E, 
q (E), q_ 0 = 0 will be denoted Q(t), and it is an
variables are a kind of polar coordinates, as can
analytic function (by the general regularity theorem
be checked by drawing the curves of constant E,
on ordinary differential equations). For 0  t  T=2
‘‘energy levels,’’ in the plane p, q in the cases in
or for T=2  t  T it is given, respectively, by
[24]; see Figure 1.
Z QðtÞ
dx In the harmonic oscillator case, all trajectories are
t¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½23a periodic. In the pendulum case, all motions are
q ðEÞ ð2=mÞðE  VðxÞÞ
periodic except the ones which separate the oscilla-
or tory motions (the closed curves in the second
Z QðtÞ
drawing) from the rotatory motions (the apparently
T dx open curves) which, in fact, are on closed curves as
t¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½23b
2 qþ ðEÞ ð2=mÞðE  VðxÞÞ well if the q coordinate, that is, the vertical
Introductory Article: Classical Mechanics 7

generates (locally) the


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi correspondence between
p = 2m(E(A)  V(q)) and
Z q
0 dx
¼ E ðAÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0 1
2m ðEðAÞ  VðxÞÞ
Therefore, by the criterion [20], if
2
E0 ðAÞ ¼
TðEðAÞÞ

Figure 1 The energy levels of the harmonic oscillator, the


i.e., if A0 (E) = T(E)=2, the coordinates (A, ) will
pendulum, and the Kepler motion. be canonical coordinates. Hence, by [22], A(E) can
be taken as
Z qþ ðEÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
coordinate in Figure 1, is regarded as ‘‘periodic’’ 1
A¼ 2 2mðE  VðqÞÞdq
with period 2h. In the Kepler case, only the 2 q ðEÞ
I
negative-energy trajectories are periodic and a few 1
 p dq ½27
of them are drawn in Figure 1. The single dots 2
represent the equilibrium points in phase space.
where the last integral is extended to the closed curve
The region of phase space where motions are
of energy E; see Figure 1. The action–angle coordi-
periodic is a set of points (p, q) with the
nates (A, ) are defined in open regions of phase
topological structure of [u2U ({u}  Cu ), where u is
space covered by periodic motions: in action–angle
a coordinate varying in an open interval U (e.g.,
coordinates such regions have the form W = J  T of
the set of values of the energy), and Cu is a closed
a product of an open interval J and a one-
curve whose points (p, q) are identified by a
dimensional ‘‘torus’’ T = [0, 2] (i.e., a unit circle).
coordinate (e.g., by the time necessary for an
For details, the reader is again referred to Landau and
arbitrarily fixed datum with the same energy to
Lifshitz (1976), Arnol’d (1989), and Gallavotti (1983).
evolve into (p, q)).
In the above cases, [24], if the ‘‘radial’’ coordinate
is chosen to be the energy the set U is the interval Quasiperiodicity and Integrability
(0, þ1) for the harmonic oscillator, (0, 2mg) or
A Hamiltonian is called integrable in an open region
(2mg, þ1) for the pendulum, and ( 12 mk2 =G2 , 0) in
W  T
(M) of phase space if
the Kepler case. The fixed datum for the reference
motion can be taken, in all cases, to be of the form 1. there is an analytic and nonsingular (i.e., with
(0, q0 ) with the time coordinate t0 given by [23]. nonzero Jacobian) change of coordinates (p, q) !
It is remarkable that the energy–time coordinates (I, j) mapping W into a set of the form I  T ‘
are canonical coordinates: for instance, in the vicinity with I  R‘ (open); and furthermore
of (p0 , q0 ) and if p0 > 0, this can be seen by setting 2. the flow t ! St (p, q) on phase space is trans-
Z q pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi formed into (I, j) ! (I, j þ w(I)t) where w(I) is a
Sðq; EÞ ¼ 2mðE  VðxÞÞdx ½25 smooth function on I :
q0
This means that, in suitable coordinates, which
and checking that p = @q S(q, E), t = @E S(q, E) are can be called ‘‘integrating coordinates,’’ the system
identities if (p, q) and (E, t) are coordinates for the appears as a set of ‘ points with coordinates
same point so that the criterion expressed by [20] j = (’1 , . . . , ’‘ ) moving on a unit circle at angular
applies. velocities w(I) = (!1 (I), . . . , !‘ (I)) depending on the
It is convenient to standardize the coordinates actions of the initial data.
by replacing the time variable by an angle = A system integrable in a region W which, in
(2=T(E))t; and instead of the energy any invertible integrating coordinates I, j, has the form I  T ‘ is
function of it can be used. said to be anisochronous if det @I w(I) 6¼ 0. It is said
It is natural to look for a coordinate A = A(E) to be isochronous if w(I)  w is independent of I.
such that the map (p, q) ! (A, ) is a canonical The motions of integrable systems are called
map: this is easily done as the function quasiperiodic with frequency spectrum w(I), or
Z q pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi with frequencies w(I)=2, in the coordinates (I, j).
^
Sðq; AÞ ¼ 2mðEðAÞ  VðxÞÞ dx ½26 Clearly, an integrable system admits ‘ independent
q0 constants of motion, the I = (I1 , . . . , I‘ ), and, for each
8 Introductory Article: Classical Mechanics

choice of I, the other coordinates vary on a ‘‘standard’’ and, since the computation of S(A, j) is ‘‘reduced to
‘-dimensional torus T ‘ : hence, it is possible to say that integrations’’ which can be regarded as a natural
a phase space region of integrability is foliated into extension of the quadratures discussed in the one-
‘-dimensional invariant tori T (I) parametrized by the dimensional cases, such systems are also called
values of the constants of motion I 2 I . integrable by quadratures. The just-described con-
If an integrable system is anisochronous then it is struction is a version of the more general Arnol’d–
canonically integrable: that is, it is possible to define Liouville theorem.
on W a canonical change of coordinates (p, q) = In practice, however, the actual evaluation of the
C(A, a) mapping W onto J  T ‘ and such that integrals in [29], [30] can be difficult: its analysis in
H(C(A, a)) = h(A) for a suitable h. Then, if various cases (even as ‘‘elementary’’ as the pendu-
def
w(A) = @A h(A), the equations of motion become lum) has in fact led to key progress in various
domains, for example, in the theory of special
A_ ¼ 0; a_ ¼ wðAÞ ½28 functions and in group theory.
Given a system (I, j) of coordinates integrating an In general, any surface on phase space on which
anisochronous system the construction of action– the restriction of the differential form p  dq is locally
angle coordinates can be performed, in principle, via integrable is called a Lagrangian manifold: hence the
a classical procedure (under a few extra invariant tori of an anisochronous integrable system
assumptions). are Lagrangian manifolds.
Let 1 , . . . , ‘ be ‘ topologically independent circles If an integrable system is anisochronous, it cannot
on T ‘ , for definiteness let i (I) = {j j ’1 = ’2 =    = admit more than ‘ independent constants of motion;
’i1 = ’iþ1 =    = 0, ’i 2 [0, 2]}, and set furthermore, it does not admit invariant tori of
I dimension > ‘. Hence ‘-dimensional invariant tori
1 are called maximal.
Ai ðIÞ ¼ p  dq ½29
2 i ðIÞ Of course, invariant tori of dimension < ‘ can also
exist: this happens when the variables I are such that
If the map I ! A(I) is analytically invertible as the frequencies w(I) admit nontrivial rational rela-
I = I(A), the function tions; i.e., there is an integer components vector
Z j n 2 Z‘ , n = ( 1 , . . . , ‘ ) 6¼ 0 such that
SðA; jÞ ¼ ðÞ p  dq ½30 X
0 wðIÞ  n ¼ !i ðIÞ i ¼ 0 ½32
i
is well defined if the integral is over any path 
joining the points (p(I(A), 0), q(I(A), 0)) and in this case, the invariant torus T (I) is called
(p(I(A), j)), q(I(A), j) and lying on the torus para- resonant. If the system is anisochronous then
metrized by I(A). det @I w(I) 6¼ 0 and, therefore, the resonant tori are
The key remark in the proof that [30] really associated with values of the constants of motion
defines a function of the only variables A, j is that I which form a set of measure zero in the space
anisochrony implies the vanishing of the Poisson I but which is not empty and dense.
brackets
P (cf. [18]): {Ii , Ij } = 0 (hence also {Ai , Aj }  Examples of isochronous systems are the systems of
h, k @Ik Ai @Ih Aj {Ik , Ih } = 0). And the property harmonic oscillators, i.e., systems with Hamiltonian
{Ii , Ij } = 0 can be checked to be precisely the
X 1; ‘
integrability condition for the differential form p  dq

1 2 1X
pi þ cij qi qj
restricted to the surface obtained by varying q while p is i¼1
2mi 2 i; j
constrained so that (p, q) stays on the surface
I = constant, i.e., on the invariant torus of the points where the matrix v is a positive-definite matrix.
with fixed I. This is an isochronous system with frequencies
The latter property is necessary and sufficient in w = (!1 , . . . , !‘ ) whose squares are the eigenvalues of
1=2 1=2
order that the function S(A, j) be well defined (i.e., the matrix mi cij mj . It is integrable in the region
be independent on the integration path P ) up to an W of the data x = (p, q) 2 R2‘ such that, setting
additive quantity of the form i 2ni Ai with 0 1
!2 !2
n = (n1 , . . . , n‘ ) integers. 1 B X v ; i pi
‘ X v ; i qi C

Then the action–angle variables are defined by the A ¼ @ pffiffiffiffiffiffi þ !2 qffiffiffiffiffiffiffiffiffi A
2! mi m1
canonical change of coordinates with S(A, j) as i¼1 i¼1 i
generating function, i.e., by setting
for all eigenvectors v , = 1, . . . , ‘, of the above
i ¼ @Ai SðA; jÞ; Ii ¼ @j i SðA; jÞ ½31 matrix, the vectors A have all components >0.
Introductory Article: Classical Mechanics 9

Even though this system is isochronous, it never- Hence, the equations of motion are
theless admits a system of canonical action–angle
d
coordinates in which the Hamiltonian takes the m2
_ ¼ 0
simplest form dt
i.e., m2
˙ = G is a constant of motion (it is the
X

angular momentum), and
hðAÞ ¼ ! A  w  A ½33
¼1 m
€ ¼ @ VðÞ þ @ 2
_2
with 2
2
0 1 G
P
‘ ¼ @ VðÞ þ
v ; i pi
pffiffiffiffi m3
B i mC
B C def
¼  arctanB ‘ i¼1 C ¼ @ VG ðÞ
@ P pffiffiffiffiffiffi A
mi ! v ; i qi Then the energy conservation yields a second
i¼1
constant of motion E,
as conjugate angles.
An example of anisochronous system is the free m 2 1 G2
_ þ þ VðÞ ¼ E
rotators or free wheels: i.e., ‘ noninteracting points 2 2 m2
on a circle of radius R or ‘ noninteracting homo- 1 2 1 p2

geneous coaxial wheels of radius R. If Ji = mi R2 or, ¼ p þ þ VðÞ ½35


2m 2m 2
respectively, Ji = (1=2)mi R2 are the inertia moments
and if the positions are determined by ‘ angles a = The right-hand side (rhs) is the Hamiltonian for the
( 1 , . . . , ‘ ), the angular velocities are constants system, derived from L, if p , p
denote conjugate
related to the angular momenta A = (A1 , . . . , A‘ ) by momenta of ,
: p = m˙ and p
= m2
˙ (note that
!i = Ai =Ji . The Hamiltonian and the spectrum are p
= G).
Suppose 2 V() ! 0: then the singularity at the
!0
X‘   origin cannot be reached by any motion starting
1 2 1
hðAÞ ¼ Ai ; wðAÞ ¼ Ai ½34 with  > 0 if G > 0. Assume also that the function
i¼1
2J i Ji i¼1;...;‘
def 1 G2
VG ðÞ ¼ þ VðÞ
For further details see Landau and Lifshitz (1976), 2 m2
Gallavotti (1983), Arnol’d (1989), and Fassò (1998). has only one minimum E0 (G), no maximum and no
horizontal inflection, and tends to a limit E1 (G)  1
when  ! 1. Then the system is integrable in the
Multidimensional Quadratures: domain W = {(p, q) j E0 (G) < E < E1 (G), G 6¼ 0}.
Central Motion This is checked by introducing a ‘‘standard’’ periodic
solution t ! R(t) of m¨ = @ VG () with energy
Several important mechanical systems with more
E0 (G) < E < E1 (G) and initial data  = E,(G),
than one degree of freedom are integrable by
˙ = 0 at time t = 0, where E, (G) are the two
canonical quadratures in vast regions of phase
solutions of VG () = E, see the section ‘‘Quadratures’’:
space. This is checked by showing that there is a
this is a periodic analytic function of t with period
foliation into invariant tori T (I) of dimension equal
to the number of degrees of freedom (‘) parame- Z E;þ ðGÞ
dx
trized by ‘ constants of motion I in involution, i.e., TðE; GÞ ¼ 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
such that {Ii , Ij } = 0. One then performs, if possible, E; ðGÞ ð2=mÞðE  VG ðxÞÞ
the construction of the action–angle variables by
the quadratures discussed in the previous section. The function R(t) is given, for 0  t  12 T(E, G)
The above procedure is well illustrated by the or for 12 T(E, G)  t  T(E, G), by the quadratures
theory of the planar motion of a unit mass attracted Z RðtÞ
by a coplanar center of force: the Lagrangian is, in dx
t¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½36a
polar coordinates (,
), E; ðGÞ ð2=mÞðE  VG ðxÞÞ

m 2 or
L¼ ð_ þ 2
_2 Þ  VðÞ
2 Z RðtÞ
TðE; GÞ dx
The planarity of the motion is not a strong restriction t¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½36b
as central motion always takes place on a plane. 2 E;þ ðGÞ ð2=mÞðE  VG ðxÞÞ
10 Introductory Article: Classical Mechanics

respectively. The analytic regularity of R(t) follows 2. 2 as the cycle  = const,


2 [0, 2] on which
from the general existence, uniqueness, and regularity d = 0 and p
= G obtaining
theorems applied to the differential equation for . ¨ Z E; þ ðGÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Given an initial datum _ 0 , 0 ,
_0 ,
0 with energy E 2
A1 ¼ 2mðE  VG ðxÞÞdx;
and angular momentum G, define t0 to be the time 2 E;  ðGÞ ½38
_ 0 ) = _ 0 : then (t)  R(t þ t0 )
such that R(t0 ) = 0 , R(t A2 ¼ G
and
(t) can be computed as
Z t According to the general theory (cf. the previous
G

ðtÞ ¼
0 þ 2
dt0 section) a generating function for the canonical
0 mRðt0 þ t0 Þ change of coordinates from (p , , p
,
) to action–
a second quadrature. Therefore, we can use as angle variables is (if, to fix ideas, p > 0)
coordinates for the motion E, G, t0 , which determine Z  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
_ 0 , 0 ,
_0 and a fourth coordinate that determines
0 SðA1 ; A2 ; ;
Þ ¼ G
þ 2mðE  VG ðxÞÞdx ½39
which could be
0 itself but which is conveniently E; 

determined, via the second quadrature, as follows. In terms of the above !0 , 0 the Jacobian matrix
The function Gm1 R(t)2 is periodic with period G)=@(A
T(E, G); hence it can be expressed in a Fourier series
@(E,
  1 , A2 ) is computed from [38], [39] to be
!0  0   0 t
X   . It follows that @E S = t, @G S =

(t)
2 0 1
0 ðE; GÞ þ k ðE; GÞ exp itk
TðE; GÞ so that, see [31],
k6¼0
def def
the quadrature for
(t) can be performed by 1 ¼ @A1 S ¼ !0 t; 2 ¼ @A2 S ¼

ðtÞ ½40
integrating the series terms. Setting
  and (A1 , 1 ), (A2 , 2 ) are the action–angle pairs.
def TðE; GÞ
X k ðE; GÞ 2 For more details, see Landau and Lifshitz (1976)

ðt0 Þ ¼ exp it0 k
2 k6¼0 k TðE; GÞ and Gallavotti (1983).
 0 ), the expression
and ’1 (0) =
0 
(t
Z t
G Newtonian Potential and Kepler’s Laws

ðtÞ ¼
0 þ 2
dt0
0 mRðt0 þ t0 Þ The anisochrony property, that is, det @(!0 , 0 )=
becomes @(A1 , A2 ) 6¼ 0 or, equivalently, det @(!0 , 0 )=
@(E, G) 6¼ 0, is not satisfied in the important cases
’1 ðtÞ ¼ ’1 ð0Þ þ 0 ðE; GÞ t ½37 of the harmonic potential and the Newtonian
Hence the system is integrable and the spectrum is potential. Anisochrony being only a sufficient con-
w(E, G) = (!0 (E, G), !1 (E, G))  (!0 , !1 ) with dition for canonical integrability it is still possible
(and true) that, nevertheless, in both cases the
def 2 def canonical transformation generated by [39] inte-
!0 ¼ and !1 ¼ 0 ðE; GÞ
TðE; GÞ grates the system. This is expected since the two
while I = (E, G) are constants of motion and the potentials are limiting cases of anisochronous ones
angles j = (’0 , ’1 ) can be taken as (e.g., jqj2þ" and jqj1" with " ! 0).
The Newtonian potential
def def
’0 ¼ !0 t0 ; ’1 ¼
0 
ðt0 Þ 1 2 km
Hðp; qÞ ¼ p 
At E, G fixed, the motion takes place on a two- 2m jqj
dimensional torus T (E, G) with ’0 , ’1 as angles.
is integrable in the region G 6¼ 0, E0 (G) =
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
In the anisochronous cases, i.e., when
k2 m3 =2G2 < E < 0, jGj < k2 m3 =(2E). Pro-
det @E, G w(E, G) 6¼ 0, canonical action–angle vari-
ceeding as in the last section, one finds integrating
ables conjugated to (p , , p
,
) can be constructed
coordinates and that the integrable motions develop
via [29], [30] by using two cycles 1 , 2 on the torus
on ellipses with one focus on the center of attraction
T (E, G). It is convenient to choose
S so that motions are periodic, hence not anisochro-
1. 1 as the cycle consisting of the points  = x,
= 0 nous: nevertheless, the construction of the canonical
whose first half (where p 0)pconsists in theffi
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi coordinates via [29]–[31] (hence [39]) works and
set E,  (G)  x  E, þ (G), p = 2m(E  VG (x)) leads to canonical coordinates (L0 , 0 , G0 ,  0 ). To
and d
= 0; and obtain action–angle variables with a simple
Introductory Article: Classical Mechanics 11

D c E

P P
ξ ξ P
θ
O S O S O S

e = 0.75 e = 0.75 e = 0.3

Figure 2 Eccentric and true anomalies of P, which moves on a small circle E centered at a point c moving on the circle D located
half-way between the two concentric circles containing the Keplerian ellipse: the anomaly of c with respect to the axis OS is . The
circle D is eccentric with respect to S and therefore  is, even today, called eccentric anomaly, whereas the circle D is, in ancient
terminology, the deferent circle (eccentric circles were introduced in astronomy by Ptolemy). The small circle E on which the point P
moves is, in ancient terminology, an epicycle. The deferent and the epicyclical motions are synchronous (i.e., they have the same
period); Kepler discovered that his key a priori hypothesis of inverse proportionality between angular velocity on the deferent and
distance between P and S (i.e., _ = constant) implied both synchrony and elliptical shape of the orbit, with focus in S. The latter law is
equivalent to 2
_ = constant (because of the identity a _ = 
).
_ Small eccentricity ellipses can hardly be distinguished from circles.

interpretation, it is convenient to perform on the where g = g(e sin , e cos ), f = f (e sin , e cos ),
variables (L0 , 0 , G0 ,  0 ) (constructed by following the and g(x, y), f (x, y) are suitable functions analytic
procedure just indicated) a further trivial canonical for jxj, jyj < 1. Furthermore, g(x, y) = x(1 þ y þ    ),
transformation by setting L = L0 þ G0 , G = G0 , f (x, y) = 2x(1 þ 54 y þ   ) and the ellipses denote
 = 0 ,  =  0  0 ; then terms of degree 2 or higher in x, y, containing only
even powers of x.
1.  (average anomaly) is the time necessary for the
For more details, the reader is referred to Landau
point P to move from the pericenter to its actual
and Lifshitz (1976) and Gallavotti (1983).
position, in units of the period, times 2;
2. L (action) is essentially the energy E = k2 m3 =2L2 ;
3. G (angular momentum);
Rigid Body
4.  (axis longitude), is the angle between a fixed
axis and the major axis of the ellipse oriented Another fundamental integrable system is the rigid
from the center of the ellipse O to the center of body in the absence of gravity and with a fixed point
attraction S. O. It can be naturally described in terms of the Euler
angles
0 , ’0 , 0 (see Figure 3) and their derivatives
The eccentricity of the ellipse is e such that G =
pffiffiffiffiffiffiffiffiffiffiffiffiffi
_0 , ’_ 0 , _ 0 .
L 1  e2 . The ellipse equation is  = a(1 
Let I1 , I2 , I3 be the three principal inertia moments
e cos ), where  is the eccentric anomaly (see
of the body along the three principal axes with unit
Figure 2), a = L2 =km2 is the major semiaxis, and
vectors i1 , i2 , i3 . The inertia moments and the
 is the distance to the center of attraction S.
principal axes are the eigenvalues and the associated
Finally, the relations between eccentric anomaly ,
unit eigenvectors of the 3 P 3 inertia matrix I ,
average anomaly , true anomaly
(the latter is the
which is defined by I hk = ni= 1 mi (xi )h (xi )k , where
polar angle), and SP distance  are given by the
h, k = 1, 2, 3 and xi is the position of the ith particle
Kepler equations
in a reference frame with origin at O and in which
 ¼   e sin 
ð1  e cos Þð1 þ e cos
Þ ¼ 1  e2 i3 z
Z

d
0 ½41
 ¼ ð1  e2 Þ3=2 2 i2
0 ð1 þ e cos
0 Þ θ0
y
 1  e2 O
¼
a 1 þ e cos
ϕ0
and the relation between true anomaly and average x ψ0
anomaly can be inverted in the form
n i1
 ¼  þ g
½42 Figure 3 The Euler angles of the comoving frame i 1 , i 2 , i 3 with
 1  e2

¼  þ f ) ¼ respect to a fixed frame x , y , z. The direction n is the ‘‘node line,
a 1 þ e cosð þ f Þ intersection between the planes x, y and i 1 , i 2 .
12 Introductory Article: Classical Mechanics

all particles are at rest: this comoving frame exists as Since angular momentum is conserved, it is con-
a consequence of the rigidity constraint. The venient to introduce the laboratory frame (O; x0 ,
principal axes form a coordinate system which is y0 , z0 ) with fixed axes x0 , y0 , z0 and (see Figure 4):
comoving as well: that is, in the frame (O; i1 , i2 , i3 )
1. (O; x, y, z), the momentum frame with fixed axes,
as well, the particles are at rest.
but with z-axis oriented as M, and x-axis
The Lagrangian is simply the kinetic energy: we
coinciding with the node (i.e., the intersection)
imagine the rigidity constraint to be ideal (e.g., as
of the x0 –y0 plane and the x–y plane (orthogonal
realized by internal central forces in the limit of
to M). Therefore, x, y, z is determined by the two
infinite rigidity, as mentioned in the section ‘‘Lagrange
Euler angles ,  of (O; x, y, z) in (O; x0 , y0 , z0 );
and Hamilton forms of equations of motion’’). The
2. (O; 1, 2, 3), the comoving frame, that is, the
angular velocity of the rigid motion is defined by
frame fixed with the body, and with unit vectors
w ¼
_0 n þ ’_ 0 z þ _ 0 i3 ½43 i1 , i2 , i3 parallel to the principal axes of the body.
The frame is determined by three Euler angles
expressing that a generic infinitesimal motion
0 , ’0 , 0 ;
must consist of a variation of the three Euler 3. the Euler angles of (O; 1, 2, 3) with respect to
angles and, therefore, it has to be a rotation of (O; x, y, z), which are denoted
, ’, ; P
speeds
_0 , ’_ 0 , _ 0 around the axes n, z, i3 as shown 4. G, the total angular momentum: G2 = j Ij2 !2j ;
in Figure 3. 5. M3 , the angular momentum along the z0 axis;
Let (!1 , !2 , !3 ) be the components of w along the M3 = G cos ; and
principal axes i1 , i2 , i3 : for brevity, the latter axes 6. L, the projection of M on the axis 3, L = G cos
.
will often be called 1, 2, 3. Then the angular
momentum M, with respect to the pivot point O, The quantities G, M3 , L, ’, , determine
0 , ’0 ,
0 and
0 , ’
_ _ 0 , _ 0 , or the p
, p’ , p variables
and the kinetic energy K can be checked to be 0 0 0
conjugated to
0 , ’0 , 0 as shown by the following
M ¼ I 1 !1 i 1 þ I 2 ! 2 i 2 þ I 3 !3 i 3 comment.
1 ½44 Considering Figure 4, the angles ,  determine
K ¼ ðI1 !21 þ I2 !22 þ I2 !23 Þ location, in the fixed frame (O; x0 , y0 , z0 ) of the
2
direction of M and the node line m, which are,
and are constants of motion. From Figure 3 it follows respectively, the z-axis and the x-axis of the fixed
that !1 =
_0 cos 0 þ ’_ 0 sin
0 sin 0 , !2 = 
_0 sin 0 þ frame associated with the angular momentum; the
’_ 0 sin
0 cos 0 and !3 = ’_ 0 cos
0 þ _ 0 , so that the angles
, ’, then determine the position of the
Lagrangian, uninspiring at first, is comoving frame with respect to the fixed frame
def 1 (O; x, y, z), hence its position with respect to
L¼ I1 ð
_0 cos 0 þ ’_ 0 sin
0 sin 0 Þ2 (O; x0 , y0 , z0 ), that is, (
0 , ’0 , 0 ). From this and
2
1 G, it is possible to determine w because
2
þ I2 ð
_0 sin 0 þ ’_ 0 sin
0 cos 0Þ
2 I3 !3 I 2 !2
1 cos
¼ ; tan ¼
þ I3 ð’_ 0 cos
0 þ _ 0 Þ2 ½45 G I 1 !1 ½47
2 !22 ¼ I22 ðG2  I12 !21  I32 !23 Þ
Angular momentum conservation does not imply and, from [43],
_0 , ’_ 0 , _ 0 are determined.
that the components !j are constants because
i1 , i2 , i3 also change with time according to
z0
d 3
ij ¼ w ^ ij ; j ¼ 1; 2; 3 θ0 2
dt M ||z
_ = 0 becomes, by the first of [44] and y
Hence, M θ ζ
denoting Iw = (I1 !1 , I2 !2 , I3 !3 ), the Euler equations O y0
Iẇ þ w ^ Iw = 0, or

I1 !_ 1 ¼ ðI2  I3 Þ!2 !3 γ ψ
x0 1
I2 !_ 2 ¼ ðI3  I1 Þ!3 !1 ½46 ϕ0 ϕ ψ0
I3 !_ 3 ¼ ðI1  I2 Þ!1 !2 n
n0
x =m
which can be considered together with the conserved Figure 4 The laboratory frame, the angular momentum frame,
quantities [44]. and the comoving frame (and the Deprit angles).
Introductory Article: Classical Mechanics 13

The Lagrangian [45] gives immediately (after Note that if I1 = I2 = I, the above analysis is
expressing w, i.e., n, z, i3 , in terms of the Euler extremely simplified. Furthermore, if gravity g acts
angles
0 , ’0 , 0 ) an expression for the variables on the system the Hamiltonian will simply change by
p
0 , p’0 , p 0 conjugated to
0 , ’0 , 0 : the addition of a potential mgz if z is the height of
the center of mass. Then (see Figure 4), if the center
p
0 ¼ M  n0 ; p’0 ¼ M  z0 ; p ¼ M  i3 ½48
0 of mass of the body is on the axis i3 and z = h cos
0 ,
and, in principle, we could proceed to compute the and h is the distance of the center of mass from O,
Hamiltonian. since cos
0 = cos
cos   sin
sin  cos ’, the Hamil-
However, the computation can be avoided tonian will become H = K  mgh cos
0 or
because of the very remarkable property (DEPRIT),  1=2
which can be checked with some patience, making G2 G2  L2 M3 L M23
H¼ þ  mgh  1 
use of [48] and of elementary spherical trigonometry 2I3 2I G2 G2
identities,  1=2 !
L2
 1 2 cos ’ ½53
M3 d þ G d’ þ L d G
¼ p’0 d’0 þ p 0 d 0 þ p
0 d
0 ½49
so that, again, the system is integrable by quadratures
which means that the map ((M3 , ), (L, ), (with the roles of and ’ ‘‘interchanged’’ with respect
(G, ’)) ! ((p
0 ,
0 ), (p’0 , ’0 ), (p 0 , 0 )) is a canoni- to the previous case) in suitable regions of phase space.
cal map. And in the new coordinates, the kinetic This is called the Lagrange’s gyroscope.
energy, hence the Hamiltonian, takes the form A less elementary integrable case is when the
" !# inertia moments are related as I1 = I2 = 2I3 and the
1 L2 2 2 sin2 cos2 center of mass is in the i1 –i2 plane (rather than on
K¼ þ ðG  L Þ þ ½50 the i3 -axis) and only gravity acts, besides the
2 I3 I1 I2
constraint force on the pivot point O; this is called
This again shows that G, M3 are constants of Kowalevskaia’s gyroscope.
motion, and the L, variables are determined by a For more details, see Gallavotti (1983).
quadrature, because the Hamilton equation for
combined with the energy conservation yields
Other Quadratures
!
2 2
_ ¼ 1 sin cos An interesting classical integrable motion is that of a
 
I3 I1 I2 point mass attracted by two equal-mass centers of
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  gravitational attraction, or a point ideally constrained
u
u2E  G2 sin2 þ cos2 to move on the surface of a general ellipsoid.
u I1 I2
t ½51 New integrable systems have been discovered
1 sin2 cos2
I3  I1  I2 quite recently and have generated a wealth of new
developments ranging from group theory (as integ-
In the integrability region, this motion is periodic rable systems are closely related to symmetries) to
with some period TL (E, G). Once (t) is determined, partial differential equations.
the Hamilton equation for ’ leads to the further It is convenient to extend the notion of integ-
quadrature rability by stating that a system is integrable in a
!
sin2 ðtÞ cos2 ðtÞ region W of phase space if
’_ ¼ þ G ½52
I1 I2 1. there is a change of coordinates (p, q) 2
W ! {A, a, Y, y} 2 (U  T ‘ )  (V  Rm ) where
which determines a second periodic motion with U  R‘ , V  Rm , with ‘ þ m 1, are open sets; and
period TG (E, G). The , M3 are constants and, 2. the A, Y are constants of motion while the other
therefore, the motion takes place on three- coordinates vary ‘‘linearly’’:
dimensional invariant tori T E, G, M3 in phase space,
each of which is ‘‘always’’ foliated into two- ða; yÞ ! ða þ wðA; YÞt; y þ vðA; YÞtÞ ½54
dimensional invariant tori parametrized by the
where w(A, Y), v(A, Y) are smooth functions.
angle  which is constant (by [50], because K is
M3 -independent): the latter are in turn foliated by In the new sense, the systems studied in the previous
one-dimensional invariant tori, that is, by periodic sections are integrable in much wider regions (essen-
orbits, with E, G such that the value of tially on the entire phase space with the exception of a
TL (E, G)=TG (E, G) is rational. set of data which lie on lower-dimensional surfaces
14 Introductory Article: Classical Mechanics

forming sets of zero volume). The notion is con- whose Lax pair is related to that of the Calogero
venient also because it allows us to say that even the lattice.
systems of free particles are integrable. By taking suitable limits as n ! 1 and as the
Two very remarkable systems integrable in the other parameters tend to 0 or 1 at suitable rates,
new sense are the Hamiltonian systems, respectively integrability of a few differential equations, among
called Toda lattice (KRUSKAL, ZABUSKY), and which the Korteweg–deVries equation or the non-
Calogero lattice (CALOGERO, MOSER); if (pi , qi ) 2 R 2 , linear Schrödinger equation, can be derived.
they are As mentioned in the introductory section, sym-
metry properties under continuous groups imply
1 Xn X
n1
existence of constants of motion. Hence, it is natural
HT ðp; qÞ ¼ p2i þ g eðqiþ1 qi Þ
2m i¼1 i¼1 to think that integrability of a mechanical system
1 Xn Xn
g reflects enough symmetry to imply the existence of
HC ðp; qÞ ¼ p2i þ ½55 as many constants of motion, independent and in
2m i¼1 i<j ðqi  qj Þ 2 involution, as the number of degrees of freedom, n.
1X n This is in fact always true, and in some respects it
þ m!2 q2i is a tautological statement in the anisochronous
2 i¼1
cases. Integrability in a region W implies existence
where m > 0 and , !, g 0. They describe the of canonical action–angle coordinates (A, a) (see the
motion of n interacting particles on a line. section ‘‘Quasiperiodicity and integrability’’) and the
The integration method for the above systems is Hamiltonian depends solely on the A’s: therefore, its
again to find first the constants of motion and later restriction to W is invariant with respect to the
to look for quadratures, when appropriate. The action of the continuous commutative group T n of
constants of motion can be found with the method the translations of the angle variables. The actions
of the Lax pairs. One shows that there is a pair of can be seen as constants of motion whose existence
self-adjoint n  n matrices M(p, q), N(p, q) such that follows from Noether’s theorem, at least in the
the equations of motion become anisochronous cases in which the Hamiltonian
pffiffiffiffiffiffiffi formulation is equivalent to a Lagrangian one.
d
Mðp; qÞ ¼ i½Mðp; qÞ; Nðp; qÞ; i¼ 1 ½56 What is nontrivial is to recognize, prior to
dt realizing integrability, that a system admits this
which imply that M(t) = U(t)M(0)U(t)1 , with U(t) a kind of symmetry: in most of the interesting cases,
unitary matrix. When the equations can be written in the systems either do not exhibit obvious symmetries
the above form, it is clear that the n eigenvalues of the or they exhibit symmetries apparently unrelated to
matrix M(0) = M(p0 , q0 ) are constants of motion. the group T n , which nevertheless imply existence of
When appropriate (e.g., in the Calogero lattice case sufficiently many independent constants of motion
with ! > 0), it is possible to proceed to find canonical as required for integrability. Hence, nontrivial
action–angle coordinates: a task that is quite difficult integrable systems possess a ‘‘hidden’’ symmetry
due to the arbitrariness of n, but which is possible. under T n : the rigid body is an example.
The Lax pairs for the Calogero lattice (with However, very often the symmetries of a Hamiltonian
! = 0, g = m = 1) are H which imply integrability also imply partial
isochrony, that is, they imply that the number of
Mhh ¼ ph ; Nhh ¼ 0 independent frequencies is smaller than n (see the
i 1 ½57 section ‘‘Quasiperiodicity and integrability’’). Even
Mhk ¼ ; Nhk ¼ 2
h 6¼ k in such cases, often a map exists from the original
ðqh  qk Þ ðqh  qk Þ
coordinates (p, q) to the integrating variables (A, a)
while for the Toda lattice (with m = g = 12  = 1) the in which A are constants of motion and the a are
nonzero matrix elements of M, N are uniformly rotating angles (some of which are also
constant) with spectrum w(A), which is the gradient
Mhh ¼ ph ; Mh; hþ1 ¼ Mhþ1; h ¼ eðqh qhþ1 Þ ¶ A h(A) for some function h(A) depending only on a
½58
Nh; hþ1 ¼ Nhþ1; h ¼ i eðqh qhþ1 Þ few of the A coordinates. However, the map might
fail to be canonical. The system is then said to be
which are checked by first trying the case n = 2. bi-Hamiltonian: in the sense that one can represent
Another integrable system (SUTHERLAND) is motions in two systems of canonical coordinates,
1 Xn Xn
g not related by a canonical transformation, and by
HS ðp; qÞ ¼ p2k þ 2
½59 two Hamiltonian functions H and H0  h which
2m i¼k h<k sinh ðqh  qk Þ generate the same motions in the respective
Introductory Article: Classical Mechanics 15

coordinates (the latter changes of variables are power series expansion in " as " = "1 þ "2 2 þ    .
sometimes called ‘‘canonical with respect to the Hence, 1 would have to satisfy
pair H, H 0 ’’ while the transformations considered in
the section ‘‘Canonical transformations of phase wðA0 Þ  ¶ a 1 ðA0 ; aÞ þ f ðA0 ; aÞ ¼ f ðA0 Þ ½61
space coordination’’ are called completely
canonical). where f (A0 ) depends only on A0 (hence integrating
For more details, we refer the reader to Calogero both sides with respect to a, it appears that f (A0 )
and Degasperis (1982). must coincide with the average of f (A0 , a) over a).
This implies that the Fourier transform fn (A),
n 2 Z‘ , should satisfy
Generic Nonintegrability fn ðA0 Þ ¼ 0 if wðA0 Þ  n ¼ 0; n 6¼ 0 ½62
It is natural to try to prove that a system ‘‘close’’ to which is equivalent to the existence of e fn (A0 ) such that
an integrable one has motions with properties very 0 e
fn (A) = w(A )  n fn (A) for n 6¼ 0. But since there is no
close to quasiperiodic. This is indeed the case, but in relation between w(A) and f (A, a), this property
a rather subtle way. That there is a problem is easily ‘‘generically’’ will not hold in the sense that as close
seen in the case of a perturbation of an anisochro- as wished to an f which satisfies the property [62] there
nous integrable system. will be another f which does not satisfy it essentially no
Assume that a system is integrable in a region W matter how ‘‘closeness’’Pis defined, (e.g., with respect to
of phase space which, in the integrating action–angle the metric jjf  gjj = n jfn (A)  gn (A)jj). This is so
variables (A, a), has the standard form U  T ‘ with 2
because the rank of ¶ AA h(A) is higher than 1 and w(A)
a Hamiltonian h(A) with gradient w(A) = @A h(A). If varies at least on a two-dimensional surface, so that
the forces are perturbed by a potential which is w  n = 0 becomes certainly possible for some n 6¼ 0
smooth then the new system will be described, in the while fn (A) in general will not vanish, so that 1 ,
same coordinates, by a Hamiltonian like hence " , does not exist.
This means that close to a function f there is a
H" ðA; aÞ ¼ hðAÞ þ "f ðA; aÞ ½60 function f 0 which violates [62] for some n. Of course,
this depends on what is meant by ‘‘close’’: however,
with h, f analytic in the variables A, a. here essentially any topology introduced on the
If the system really behaved like the unperturbed space of the functions f will make the statement
one, it ought to have ‘ constants of motion of the correct. For instance, ifPthe distance between two
form F" (A, a) analytic in " near " = 0 and uniform, functions is defined by n supA2U jfn (A)  gn (A)j or
that is, single valued (which is the same as periodic) by sup A, a jf (A, a)  g(A, a)j.
in the variables a. However, the following theorem The idea behind the last statement of the theorem
(POINCARÉ) shows that this is a somewhat unlikely is in essence the same: consider, for simplicity, the
possibility. anisochronous case in which the matrix ¶ AA h(A)
2

2 has maximal rank ‘, that is, the determinant


Theorem 1 If the matrix ¶ AA h(A) has rank 2, the 2
Hamiltonian [60] ‘‘generically’’ (an intuitive notion det ¶ AA h(A) does not vanish. Anisochrony implies
precised below) cannot be integrated by a canonical that w(A)n 6¼ 0 for all n 6¼ 0 and A on a dense set,
transformation C" (A, a) which and this property will be used repeatedly in the
following analysis.
(i) reduces to the identity as " ! 0; and Let B(", A, a) be a ‘‘uniform’’ constant of motion,
(ii) is analytic in " near " = 0 and in (A, a) 2 meaning that it is single valued and analytic in the
U0  T‘ , with U0  U open. non-simply-connected region U  T‘ and, for " small,
Furthermore, no uniform constants of motion F" (A, a), Bð"; A; aÞ ¼ B0 ðA; aÞ þ "B1 ðA; aÞ
defined for " near 0 and (A, a) in an open domain U0 
T‘ , exist other than the functions of H" itself. þ "2 B2 ðA; aÞ þ    ½63

Integrability in the sense (i), (ii) can be called The condition that B is a constant of motion can be
analytic integrability and it is the strongest (and written order by order in its expansion in ": the first
most naive) sense that can be given to the attribute. two orders are
The first part of the theorem, that is, (i), (ii), holds wðAÞ  @a B0 ðA; aÞ ¼ 0
simply because, if integrability was assumed, a
@A f ðA; aÞ  @a B0 ðA; aÞ  @a f ðA; aÞ  @A B0 ðA; aÞ ½64
generating function of the integrating map would
have the form A0  a þ " (A0 , a) with  admitting a þ wðAÞ  @a B1 ðA; aÞ ¼ 0
16 Introductory Article: Classical Mechanics

Then the above two relations and anisochrony imply coordinates as "V(x), in terms of the action–angle
(1) that B0 must be a function of A only and (2) that variables of the unperturbed, integrable, system.
w(A)  n and @A B0 (A)  n vanish simultaneously for all In particular, the problem arises when trying to
n. Hence, the gradient of B0 must be proportional to check nonexistence of nontrivial constants of
w(A), that is, to the gradient of h(A) : ¶ A B0 (A) = motion when the anisochrony assumption (cf. the
(A)¶ A h(A). Therefore, generically (because of the previous section) is not satisfied. Usually it
anisochrony) it must be that B0 depends on A becomes satisfied ‘‘to second order’’ (or higher):
through h(A) : B0 (A) = F(h(A)) for some F. but to show this, a more detailed information on
Looking again, with the new information, at the the structure of the perturbing function expressed
second of [64] it follows that at fixed A the in action–angle variables is needed. For instance,
a-derivative in the direction w(A) of B1 equals this is often necessary even when the perturbation
F0 (h(A)) times the a-derivative of f, that is, is approximated by a trigonometric polynomial, as
B1 (A, a) = f (A, a)F0 (h(A)) þ C1 (A). it is essentially always the case in celestial
Summarizing: the constant of motion B has been mechanics.
written as B(A, a) = F(h(A)) þ "F0 (h(A))f (A, a) þ Finding explicit expressions for the action–angle
"C1 (A) þ "2 B2 þ    which is equivalent to variables is in itself a rather nontrivial task which
B(A, a) = F(H" ) þ "(B00 þ "B01 þ    ) and therefore leads to many problems of intrinsic interest even in
B00 þ "B01 þ    is another analytic constant of seemingly simple cases. For instance, in the case of
motion. Repeating the argument also B00 þ "B01 þ    the planar gravitational central motion, the Kepler
must have the form F1 (H" ) þ "(B000 þ "B001 þ    ); equation  = " sin  (see the first of [41]) must be
conclusion solved expressing  in terms of  (see the first of
[42]). It is obvious that for small ", the variable 
B ¼ FðH" Þ þ "F1 ðH" Þ þ "2 F2 ðH" Þ þ    can be expressed as an analytic function of ":
þ "n Fn ðH" Þ þ Oð"nþ1 Þ ½65 nevertheless, the actual construction of this expres-
sion leads to several problems. For small ", an
By analyticity, B = F" (H" (A, a)) for some F" : hence interesting algorithm is the following.
generically all constants of motion are trivial. Let h() =   , so that the equation to solve (i.e.,
Therefore, a system close to integrable cannot the first of [41]) is
behave as it would naively be expected. The
hðÞ ¼ " sinð þ hðÞÞ
problem, however, was not manifest until POIN-
CARÉ’s proof of the above results: because in most @c
 " ð þ hðÞÞ ½66
applications the function f has only finitely many @
Fourier components, or at least is replaced by an where c() = cos ; the function  ! h() should be
approximation with this property, so that at least periodic in , with period 2, and analytic in ",  for
[62] and even a few of the higher-order constraints " small and  real. If h() = "h(1) þ "2 h(2) þ    , the
like [64] become possible in open regions of action Fourier transform of h(k) () satisfies the recursion
space. In fact, it may happen that the values of A of relation
interest are restricted so that w(A)  n = 0 only for
‘‘large’’ values of n for which fn = 0. Nevertheless, X1
1 X
hðkÞ
¼ ði 0 Þc 0 ði 0 Þp
the property that fn (A) = (w(A)  n)e fn (A) (or the p! k1 þþkp ¼k1
p¼1
analogous higher-order conditions, e.g., [64]), 0 þ 1 þþ p ¼
Y
which we have seen to be necessary for analytic  h ðkj j Þ ; k>1 ½67
integrability of the perturbed system, can be
checked to fail in important problems, if no with c the Fourier transform of the cosine (c 1 = 12 ,
approximation is made on f. Hence a conceptual c = 0 if 6¼ 1), and (of course) h(1) = i c .
problem arises. Equation [67] is obtained by expanding the RHS
For more details see Poincaré (1987). of [66] in powers of h and then taking the Fourier
transform of both sides retaining only terms of order
k in ".
Iterating the above relation, imagine drawing all
Perturbing Functions
trees
with k ‘‘branches,’’ or ‘‘lines,’’ distinguished
To check, in a given problem, the nonexistence of by a label taking k values, and k nodes and attach to
nontrivial constants of motion along the lines each node v a harmonic label v = 1 as in Figure 5.
indicated in the previous section, it is necessary to The trees will be assumed to start with a root line vr
express the potential, usually given in Cartesian linking a point r and the ‘‘first node’’ v (see Figure 5)
Introductory Article: Classical Mechanics 17

ν4
(also readable from the tree representation): the
ν1
actual radius of convergence, first determined by
ν5
Laplace, of the series for h can also be determined
ν2 ν6
ν
from the latter expression for h (ROUCHÉ) or directly
ν7 from the tree representation: it is 0.6627.
ν0
ν3 ν8 One can find better estimates or at least more
ν9 efficient methods for evaluating the sums in [69]:
ν10 in fact, in performing the sum in [69] important
cancellations occur. For instance, the harmonic
Figure 5 An example of a tree graph and its labels. It contains
only one simple node (3). Harmonics are indicated next to their labels can be subject to the further strong constraint
nodes. Labels distinguishing lines are not marked. that no line carries zero current because the
sum of the values of the trees of fixed order and
with at least one line carrying zero current
and then bifurcate arbitrarily (such trees are some-
vanishes.
times called ‘‘rooted trees’’).
The above expansion can also be simplified by
Imagine the tree oriented from the endpoints
partial resummations. For the purpose of an
towards the root r (not to be considered a node)
example, let the nodes with one entering and one
and given a node v call v0 the node immediately
exiting line (see Figure 5) be called as ‘‘simple’’
following it. If v is the first node before the root r,
nodes. Then all tree graphs which, on any line
let v0 = r and v 0 = 1. For each such decorated tree
between two nonsimple nodes, contain any number
define its numerical value
of simple nodes can be eliminated. This is done by
i Y Y replacing, in evaluating the (remaining) tree values,
Valð
Þ ¼ ð v 0 v Þ c v ½68 the factors v0 v in [68] by v0 v =(1  " cos ): then
k! lines l¼v0 v nodes the value of
(denoted Val(
) ) for a tree becomes a
and define a current (l) on a line l = v0 v to be the function of and " and [69] is replaced by
sum of P the harmonics of the nodes preceding
v0 : (l) = wv v . Call (
) the current flowing in
1 X
X

hð Þ ¼ "k ei Valð
Þ ½70
the root branch and call order of
the number of
k¼1
; ð
Þ¼
nodes (or branches). Then orderð
Þ¼k

X where the
means that the trees are subject to the
hðkÞ
¼ Valð
Þ ½69 further restriction of not containing any simple

; ð
Þ¼
orderð
Þ¼k node. It should be noted that the above graphical
representation of the solution of the Kepler equation
provided trees are considered identical if they can be
is strongly reminiscent of the representations of
overlapped (labels included) after suitably scaling
quantities in terms of graphs that occur often in
the lengths of their branches and pivoting them
quantum field theory. Here the trees correspond to
around the nodes out of which they emerge (the root
Feynman graphs, the factors associated with the
is always imagined to be fixed at the origin).
nodes are the couplings, the factors associated with
If the trees are stripped of the harmonic labels,
the lines are the propagators, and the resummations
their number is finite and it can be estimated to be
are analogous to the self-energy resummations,
 k!4k (because the labels which distinguish the lines
while the cancellations mentioned above can be
can be attached to an unlabeled tree in many ways).
related to the class of identities called Ward
The harmonic labels (i.e., v = 1) can be laid
identities. Not only the analogy can be shown not
down in 2k ways, and the value of each tree can be
to be superficial, but it also turns out to be very
bounded by P(1=k!)2
k
(because c 1 = 12).
(k) k helpful in key mechanical problems: see Appendix 1.
Hence jh j  4 , which gives a (rough) The existence of a vast number of identities
estimate of the radius of convergence of the
relating the tree values is shown already by the
expansion of h in powers of ": namely 0.25 (easily
simple form of the Lagrange series and by the
improvable to 0.3678 if 4k k! is replaced by kk1
even more remarkable resummation (LEVI-CIVITA)
using Cayley’s formula for the enumeration of
leading to
rooted trees). A simple expression for h(k) ( )
(LAGRANGE) is  k
X1
ð" sin Þk 1
1 hð Þ ¼ @ ½71
hðkÞ ð Þ = @ k1 sink k! 1  " cos
k! k¼1
18 Introductory Article: Classical Mechanics

It is even possible to further collect the series analytic invariant torus on which the motion is
terms to express it as a series with much better quasiperiodic and
convergence properties; for instance, its terms can be
1. has the same spectrum w 0 ,
reorganized and collected (resummed) so that h is
2. depends analytically on " at least for " small,
expressed as a power series in the parameter
3. reduces to the ‘‘unperturbed torus’’ {A0 }  T ‘ as
pffiffiffiffiffiffiffiffiffi
" e 1"
2 " ! 0.
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ½72
1 þ 1  "2 More concretely, the question is:

with radius of convergence 1, which corresponds to Are there functions H " (y ), h" (y ) analytic in y 2 T ‘
" = 1 (via a simple argument by Levi-Civita). The and in " near 0, vanishing as " ! 0 and such that the
torus with parametric equations
analyticity domain for the Lagrange series is jj < 1.
This also determines the value of Laplace radius,
which is the point closest to the origin of the A ¼ A0 þ H " ðy Þ; a ¼ y þ h" ðy Þ; y 2 T ‘ ½73
complex curve j(")j = 1: it is imaginary so that it is def
is invariant and, if w 0 = w(A0 ), the motion on it is
the root of the equation
simply y ! y þ w 0 t, i.e., it is quasiperiodic with
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi spectrum w 0 ?
2
"e 1 þ " =ð1 þ 1 þ "2 Þ ¼ 1

The analysis provides an example, in a simple In this context, Poincaré’s theorem (in the section
case of great interest in applications, of the kind of ‘‘Generic nonintegrability’’) had followed another
computations actually necessary to represent the key result, earlier developed in particular cases and
perturbing function in terms of action–angle completed by him, which provides a partial answer
variables. The property that the function c() in to the question.
[66] is the cosine has been used only to limit the Suppose that w 0 = w(A0 ) 2 R ‘ satisfies a Diophan-
range of the label to be 1; hence the same tine property, namely suppose that there exist
method, with similar results, can be applied to constants C,  > 0 such that
study the inversion of the relation between the
average anomaly  and the true anomaly
and to 1
jw 0  nj ; for all 0 6¼ n 2 Z‘ ½74
efficiently obtain, for instance, the properties of Cjnj
f, g in [42].
For more details, the reader is referred to Levi- which, for each  > ‘  1 fixed, is a property
Civita (1956). enjoyed by all w 2 R‘ but for a set of zero measure.
Then the motions on the unperturbed torus run over
trajectories that fill the torus densely because of the
‘‘irrationality’’ of w 0 implied by [74]. Writing
Lindstedt and Birkhoff Series:
Hamilton’s equations,
Divergences
Nonexistence of constants of motion, rather than a_ ¼ @A H0 ðAÞ þ " ¶ A f ðA; aÞ; A_ ¼ " ¶ a f ðA; aÞ
being the end of the attempts to study motions close
to integrable ones by perturbation methods, marks with A, a given by [73] with y replaced by y þ wt,
the beginning of renewed efforts to understand their and using the density of the unperturbed trajectories
nature. implied by [74], the condition that [73] are
Let (A, a) 2 U  T‘ be action–angle variables equations for an invariant torus on which the
defined in the integrability region for an analytic motion is y ! y þ w 0 t are
Hamiltonian and let h(A) be its value in the action–
angle coordinates. Suppose that h(A0 ) is anisochro- w 0 þ ðw 0  ¶ y Þh" ðy Þ ¼ ¶ A H0 ðA0 þ H " ðy ÞÞ
nous and let f (A, a) be an analytic perturbing þ "¶ A f ðA0 þ H " ðy Þ; y þ h" ðy ÞÞðw 0  ¶ y ÞH " ðy Þ
function. Consider, for " small, the Hamiltonian
¼ " ¶ a f ðA0 þ H " ðy Þ; y þ h" ðy ÞÞ ½75
H" (A, a) = H0 (A) þ "f (A, a).
Let w 0 = w (A0 )  ¶ A H0 (A) be the frequency spec-
The theorem referred to above (POINCARÉ) is that
trum (see the section ‘‘Quasiperiodicity and integ-
rability’’) of one of the invariant tori of the Theorem 2 If the unperturbed system is anisochro-
unperturbed system corresponding to an action A0 . nous and w 0 = w(A0 ) satisfies [74] for some C,  > 0
Short of integrability, the question to ask at this there
P1 exist two well definedP power series h" (y ) =
k (k) 1 k (k)
point is whether the perturbed system admits an k=1 " h (y ) and H " (y ) = k = 1 " H (y ) which
Introductory Article: Classical Mechanics 19

solve [75] to all orders in ". The series for H " is u" ðA0 Þ ¼ "A2
uniquely determined, and such is also the series for
F" ðA0 ; aÞ ¼
h" up to the addition of an arbitrary constant at each
order, so that it is unique if h" is required, as X
1 X ði 2 Þk
"k fn eian ½77
henceforth done with no loss of generality, to have ðið!01 1 þ !02 2 ÞÞkþ1
k¼1 06¼n2Z2
zero average over y .
The algorithm for the construction is illustrated in The series does not converge: in fact, its convergence
a simple case in the next section (see eqns [83], would imply integrability and, consequently,
[84]). Convergence of the above series, called bounded trajectories in phase space: however, the
Lindstedt series, even for small " has been a problem equations of motion for [76] can be easily solved
for rather a long time. Poincaré proved the existence explicitly and in any open region near given initial
of the formal solution; but his other result, discussed data there are other data which have unbounded
in the section ‘‘Generic nonintegrability,’’ casts trajectories if !01 =(!02 þ ") is rational.
doubts on convergence although it does not exclude Nevertheless, even in this elementary case a
it, as was immediately stressed by several authors formal sum of the series yields
(including Poincaré himself). The result in that
section shows the impossibility of solving [75] for uðA0 Þ ¼ "A02
all w 0 ’s near a given spectrum, analytically and X fn eian ½78
uniformly, but it does not exclude the possibility of F" ðA0 ; aÞ ¼ "
2
ið!01 1 þ ð!20 þ "Þ 2 Þ
06¼n2Z
solving it for a single w 0 .
The theorem admits several extensions or analogs: and the series in [78] (no longer a power series in ")
an interesting one is to the case of isochronous is really convergent if w = (!01 , !02 þ ") is a Dio-
unperturbed systems: phantine vector (by [74], because analyticity implies
Given the Hamiltonian H" (A, a) = w 0  A þ "f (A, a), exponential decay of jfn j). Remarkably, for such
with w 0 satisfying [74] and f analytic, there exist values of " the Hamiltonian H" is integrable and it is
power series C" (A0 , a 0 ), u" (A0 ) such that H" (C" (A0 , a 0 )) = integrated by the canonical map generated by [78],
w 0  A0 þ u" (A0 ) holds as an equality between formal in spite of the fact that [78] is obtained, from [77],
power series (i.e., order by order in ") and at the via the nonrigorous sum rule
same time the C" , regarded as a map, satisfies order by
order the condition (i.e., (4.3)) that it is a canonical map. X
1
1
zk ¼ for z 6¼ 1 ½79
This means that there is a generating function k¼0
1z
A0  a þ F" (A0 , a) also
P defined by a formal power
series F" (A0 , a) = 1 k=1 " k (k)
F (A 0
, a), that is, such (applied to cases with jzj 1, which are certainly
0 0
that if C" (A , a ) = (A, a) then it is true, order by realized for a dense set of "’s even if w is Diophantine
order in powers of ", that A = A0 þ ¶ a F" (A0 , a) and because the z’s have values z = 2 =w 0  n). In other
a 0 = a þ ¶ A0 F" (A0 , a). The series for F" , u" are called words, the integration of the equations is elementary
Birkhoff series. and once performed it becomes apparent that, if w is
In this isochronous case, if Birkhoff series were diophantine, the solutions can be rigorously found
convergent for small " and (A0 , a) in a region of the from [78]. NoteP that,k for instance, this means that
form U  T ‘, with U  R‘ open and bounded, it relations like 1 k = 0 2 = 1 are really used to obtain
would follow that, for small ", H" would be inte- [78] from [77].
grable in a large region of phase space (i.e., where the Another extension of Lindstedt series arises in a
generating function can be used to build a canonical perturbation of an anisochronous system when
map: this would essentially be U  T ‘ deprived of a asking the question as to what happens to the
small layer of points near the boundary of U). unperturbed invariant tori T w 0 on which the spec-
However, convergence for small " is false (in general), trum is resonant, that is, w 0  n = 0 for some n 6¼ 0,
as shown by the simple two-dimensional example n 2 Z‘ . The result is that even in such a case there is a
formal power series solution showing that at least
H" ðA; aÞ ¼ w 0  A þ " ðA2 þ f ðaÞÞ a few of the (infinitely many) invariant tori into
½76 which T w0 is in turn foliated in the unperturbed case
ðA; aÞ 2 R2  T2
can be formally continued at " 6¼ 0 (see the section
with f (a) an arbitrary analytic function with all ‘‘Resonances and their stability’’).
Fourier coefficients fn positive for n 6¼ 0 and fo = 0. For more details, we refer the reader to Poincaré
In the latter case, the solution is (1987).
20 Introductory Article: Classical Mechanics

Quasiperiodicity and KAM Stability This is a stability result: for instance, in systems
with two degrees of freedom the invariant tori of
To discuss more advanced results, it is convenient
dimension two which lie on a given three-dimensional
to restrict attention to a special (nontrivial) para-
energy surface, will separate the points on the energy
digmatic case
surface into the set which is ‘‘inside’’ the torus and the
H" ðA; aÞ ¼ 12 A2 þ "f ðaÞ ½80 set which is ‘‘outside.’’ Hence, an initial datum
starting (say) inside cannot reach the outside. Like-
In this simple case (called Thirring model: represent- wise, a point starting between two tori has to stay in
ing ‘ particles on a circle interacting via a potential between forever. Further, if the two tori are close, this
"f (a)) the equations for the maximal tori [75] means that motion will stay very localized in action
reduce to equations for the only functions h" : space, with a trajectory accessing only points close to
the tori and coming close to all such points, within a
ðw  ¶ y Þ2 h" ðy Þ ¼ "¶ a f ðy þ h" ðy ÞÞ; y 2 T‘ ½81 distance of the order of the distance between the
confining tori. The case of three or more degrees of
as the second of [75] simply becomes the definition
freedom is quite different (see sections ‘‘Diffusion in
of H " because the RHS does not involve H " .
phase space’’ and ‘‘The three-body problem’’).
The real problem is therefore whether the formal
In the simple case of the rotators system [80] the
series considered in the last section converge at least
equations for the parametric representation of the
for small ": and the example [76] on the Birkhoff
tori are given by [81]. The latter bear some analogy
series shows that sometimes sum rules might be
with the easier problem in [66]: but [81] are ‘
needed in order to give a meaning to the series. In
equations instead of one and they are differential
fact, whenever a problem (of physical interest)
equations rather than ordinary equations. Further-
admits a formal power series solution which is not
more, the function f (a) which plays here the role of
convergent, or which is such that it is not known
c() in [66] has Fourier coefficient fn with no
whether it is convergent, then one should look for
restrictions on n, while the Fourier coefficients c
sum rules for it.
for c in [66] do not vanish only for = 1.
The modern theory of perturbations starts with
The above differences are, to some extent,
the proof of the convergence for " small enough of
‘‘minor’’ and the power series solution to [81] can
the Lindstedt series (KOLMOGOROV). The general
be constructed by the same algorithm as used in the
‘‘KAM’’ result is:
case of [66]: namely one forms trees as in Figure 5
Theorem 3 (KAM) Consider the Hamiltonian with the harmonic labels v 2 Z replaced by n v 2 Z‘
H" (A, a) = h(A) þ "f (A, a), defined in U = V  T‘ (still to be thought of as possible harmonic indices in
with V  R ‘ open and bounded and with f (A, a), the Fourier expansion of the perturbing function f).
h(A) analytic in the closure V  T‘ where h(A) is also All other labels affixed to the trees in the section
def
anisochronous; let w 0 = w(A0 ) = @A h(A0 ) and assume ‘‘Generic nonintegrability’’ will be the same. In
that w 0 satisfies [74]. Then particular, the current flowing on a branch l = v0 v
will be defined as the sum of the harmonics of the
(i) there is "C,  > 0 such that the Lindstedt series
nodes w  v preceding v:
converges for j"j < "C,  ;
(ii) its sum yields two function H " (y ), h" (y ) on T‘ def
X
which parametrize an invariant torus nðlÞ¼ nw ½82
wv
T C,  (A0 , ");
(iii) on T C,  (A0 , ") the motion is y ! y þ w 0 t, see
and we call n(
) the current flowing in the root
[73]; and
branch.
(iv) the set of data in U which belong to invariant
Here the value Val(
) of a tree has to be defined
tori T C,  (A0 , ") with w(A0 ) satisfying [74]
differently because the equation to be solved ([81])
with prefixed C,  has complement with volume
contains the differential operator (w 0  ¶ y )2 which,
<const Ca for a suitable a > 0 and with area
when Fourier transformed, becomes multiplication
also <const Ca on each nontrivial surface of
of the Fourier component with harmonic n by
constant energy H" = E.
(iw  n)2 .
In other words, for small " the spectra of most The variation due to the presence of the operator
unperturbed quasiperiodic motions can still be found (w 0  ¶ y )2 and the necessity of its inversion in the
as spectra of perturbed quasiperiodic motions devel- evaluation of u  h(k)
n , that is, of the component of
oping on tori which are close to the corresponding h(k)
n along an arbitrary unit vector u, is nevertheless
unperturbed ones (i.e., with the same spectrum). quite simple: the value of a tree graph
of order k
Introductory Article: Classical Mechanics 21

(i.e., with k nodes and k branches) has to be defined which lie on the same path to the root carry the
by (cf. [68]) same current and, furthermore, the node harmonics
! are bounded by jnj  N for some N. Then the
def ið1Þ
k Y n v0  n v number of lines ‘ in
with divisor w 0  n ‘ satisfying
Valð
Þ ¼
k! 2 2n < Cjw 0  n ‘ j  2nþ1 does not exceed 4Nk2n= .
lines l¼v0 v ðw 0  nðlÞÞ
!
Y Hence, setting
 fn v ½83
def
nodes v F ¼ C2 maxjnjN jfn j
where the n v0 appearing in the factor relative to the the corresponding Val(
) can be bounded by
root line rv from the first node v to the root r (see
1 k 2k Y 1
n= def 1
Figure 5) is interpreted as a unit vector u (it was F N 22nð4Nk2 Þ ¼ Bk
interpreted as 1 in the one-dimensional case [66]). k! n¼0
k!
X ½85
Equation [83] makes sense only for trees in which B ¼ FN 2 2 8n2n=
no line carries zero current. Then the component n
along u (the harmonic label attached to the root of a
since the product is convergent. In the case in which
tree) of h(k) is given (see also [69]) by
f is a trigonometric polynomial of degree N, the
X

above restricted contributions to u  h(k) n would
u  hðkÞ
n ¼ Valð
Þ ½84 generate a convergent series for " small enough. In

; nð
Þ¼n
orderð
Þ¼k
fact, the number of trees is bounded (as in the
section ‘‘Perturbing
P funct ions’’) by k!4k (2N þ 1)‘k so
k (k)
where the
means that the sum is only over trees in that the series n j"j ju  hn j would converge for
which a nonzero current n(l) flows on the lines l 2
. small " (i.e., j"j < (B  4(2N þ 1)‘ )1 ).
The quantity u  h(k) 0 will be defined to be 0 (see the Given this comment, the analysis of the ‘‘remain-
previous section). ing contributions’’ becomes the real problem, and it
In the case of [66] zero-current lines could appear: requires new ideas because among the excluded trees
but the contributions from tree graphs containing at there are some simple kth order trees whose value
least one zero current line would cancel. In the alone, if considered separately from the other
present case, the statement that the above algorithm contributions, would generate a factorially divergent
actually gives h(k)n by simply ignoring trees with lines power series in ".
with zero current is nontrivial. It was Poincaré’s However, the contributions of all large-valued
contribution to the theory of Lindstedt series to show trees of order k can be shown to cancel: although
that even in the general case (cf. [75]) the equations not exactly (unlike the case of the elementary
for the invariant tori can be solved by a formal power problem in the section ‘‘Perturbing functions,’’
series. Equation [84] is proved by induction on k after where the cancellation is not necessary for the
checking it for the first few orders. proof, in spite of its exact occurrence), but enough
The algorithm just described leading to [83] can so that in spite of the existence of exceedingly large
be extended to the case of the general Hamiltonian values of individual tree graphs their total sum can
considered in the KAM theorem. still be bounded by a constant to the power k so that
The convergence proof is more delicate than the the power series actually converges for " small
(elementary) one for eqn [66]. In fact, the values of enough. The idea is discussed in Appendix 1.
trees of order k can give large contributions to h(k) n : For more details, the reader is referred to Poincaré
because the ‘‘new’’ factors (w 0  n(l))2 , although not (1987), Kolmogorov (1954), Moser (1962), and Arnol’d
zero, can be quite small and their small size can (1989).
overwhelm the smallness of the factors fn and ". In
fact, even if f is a trigonometric polynomial (so that fn
vanishes identically for jnj large enough) the currents Resonances and their Stability
flowing in the branches can be very large, of the
A quasiperiodic motion with r rationally indepen-
order of the number k of nodes in the tree; see [82].
dent frequencies is called resonant if r is strictly less
This is called the small-divisors problem. The key
than the number of degrees of freedom, ‘. The
to its solution goes back to a related work (SIEGEL)
difference s = ‘  r is the degree of the resonance.
which shows that
Of particular interest are the cases of a perturba-
Theorem 4 Consider the contribution to the sum tion of an integrable system in which resonant
in [82] from graphs
in which no pairs of lines motions take place.
22 Introductory Article: Classical Mechanics

A typical example is the n-body problem which other words, the a priori stable case, s1 = s2 = 0 in
studies the mutual perturbations of the motions of [86], is the only excluded case. Of course, the stability
n  1 particles gravitating around a more massive properties of the motions when a perturbation acts
particle. If the particle masses can be considered to will depend on the perturbation in both cases.
be negligible, the system will consist of n  1 central The a priori stable systems usually have a great
Keplerian motions: it will therefore have ‘ = 3(n  1) variety of resonances (e.g., in the anisochronous
degrees of freedom. In general, only one frequency case, resonances of any dimension are dense). The
per body occurs in the absence of the perturbations a priori unstable systems have (among possible other
(the period of the Keplerian orbit). Hence, r  n  1 resonances) some very special r-dimensional
and s 2(n  1) (or in the planar case s (n  1)) resonances occurring when the unstable coordinates
with equality holding when the periods are ration- (p, q) and (p, k ) are zero and the frequencies of the r
ally independent. action–angle coordinates are rationally independent.
Another example is the rigid body with a fixed In the first case (a priori stable), the general
point perturbed by a conservative force: in this case, question is whether the resonant motions, which
the unperturbed system has three degrees of freedom form invariant tori of dimension r arranged into
but, in general, only two frequencies (see the families that fill ‘-dimensional invariant tori, con-
discussion following [52]). tinue to exist, in presence of small enough perturba-
Furthermore, in the above examples there is the tions "f (A, a), on slightly deformed invariant tori.
possibility that the independent frequencies assume, Similar questions can be asked in the a priori
for special initial data, values which are rationally unstable cases. To examine the matter more closely
related, giving rise to resonances of even higher consider the formulation of the simplest problems.
order (i.e., with smaller values of r). A priori stable resonances: more precisely, suppose
In an integrable anisochronous system, resonant H0 = 12 A2 and let {A0 }  T‘ be the unperturbed
motions will be dense in phase space because the invariant torus T A0 with spectrum w 0 = w(A0 ) =
frequencies w(A) will vary as much as the actions @A H0 (A0 ) with only r rationally independent compo-
and therefore resonances of any order (i.e., any nents. For simplicity, suppose that w 0 = (!1 , . . . ,
def
r < ‘) will be dense in phase space: in particular, the !r , 0, . . . , 0) = (w, 0) with w 2 Rr . The more general
periodic motions (i.e., the highest-order resonances) case in which w has only r rationally independent
will be dense. components can be reduced to the special case above
Resonances, in integrable systems, can arise in by a canonical linear change of coordinates at the price
a priori stable integrable systems and in a priori of changing the H0 to a new one, still quadratic in the
unstable systems: the former are systems whose actions but containing mixed products Ai Bj : the proofs
Hamiltonian admits canonical action–angle coordi- of the results that are discussed here would not be
nates (A, a) 2 U  T ‘ with U  R‘ open, while the really affected by such more general form of H.
latter are systems whose Hamiltonian has, in It is convenient to distinguish between the ‘‘fast’’
suitable local canonical coordinates, the form angles 1 , . . . , r and the ‘‘resonant’’ angles
rþ1 , . . . , ‘ (also called ‘‘slow’’ or ‘‘secular’’) and
X
s1
1 X
s2
1 call a = (a 0 , b) with a 0 2 Tr and b 2 Ts . Likewise,
H0 ðAÞ þ ðp2i  2i q2i Þ þ ð2j þ 2j 2j Þ; we distinguish the fast actions A0 = (A1 , . . . , Ar ) and
i¼1
2 j¼1
2 ½86
the resonant ones Arþ1 , . . . , A‘ and set A = (A0 , B)
i ; j > 0 with A0 2 Rr and B 2 Rs .
Therefore, the torus T A0 , A0 = (A00 , B0 ), is in turn a
where (A, a) 2 U  Tr , U 2 R r , (p, q) 2 V  R 2s1 , continuum of invariant tori T A0 , b with trivial
(p, k )2V  R2s2 with V,V 0 neighborhoods of the
0
parametric equations: b fixed, a 0 = y , y 2 Tr , and
origin
pffiffiffiffi and ‘ = r þ s1 þ s2 , si 0, s1 þ s2 > 0 and A0 = A00 , B = B0 . On each of them the motion is:
pffiffiffiffiffi
j , j are called Lyapunov coefficients of A0 , B, b constant and a 0 ! a 0 þ wt, with rationally
the resonance. The perturbations considered are independent w 2 Rr .
supposed to have the form "f (A, a, p, q, p, k ). The Then the natural question is whether there exist
denomination of a priori stable or unstable refers to functions h" , k" , H " , K " smooth in " near " = 0 and in
the properties of the ‘‘a priori given unperturbed y 2 Tr , vanishing for " = 0, and such that the torus
Hamiltonian.’’ The label ‘‘a priori unstable’’ is T A0 , b 0 , " with parametric equations
certainly appropriate if s1 > 0: here also s1 = 0 is
allowed for notational convenience implying that the
A0 ¼ A00 þ H " ðy Þ; a 0 ¼ y þ h" ðy Þ;
Lyapunov coefficients in a priori unstable cases are all
pffiffiffiffiffi y 2 Tr ½87
of order 1 (whether real j or imaginary i j ). In B ¼ B0 þ K " ðy Þ; b ¼ b 0 þ k" ðy Þ
Introductory Article: Classical Mechanics 23

is invariant for the motions with Hamiltonian Theorem 5 If w 2 Rr satisfies a Diophantine


property and if b 0 is a nondegenerate stationarity
2
H" ðA; aÞ ¼ 12 A0 þ 12 B2 þ "f ða 0 ; bÞ point for the ‘‘fast angle average’’ f ( b) (i.e., such
2
that det @bb f ( b 0 ) 6¼ 0), then the following equations
and the motions on it are y ! y þ wt. The above for the functions h" , k" ,
property, when satisfied, is summarized by saying
that the unperturbed resonant motions ðw  @y Þ2 h" ðy Þ ¼ "@a 0 f ðy þ h" ðy Þ; b 0 þ k" ðy ÞÞ
A = (A00 , B0 ), a = (a 00 þ w 0 t, b 0 ) can be continued in ½89
presence of perturbation "f , for small ", to quasiper- ðw  @y Þ2 k" ðy Þ ¼ "@b f ðy þ h" ðy Þ þ k" ðy ÞÞ
iodic motions with the same spectrum and on a
slightly deformed torus T A00 , b 0 , " . can be formally solved in powers of ".
A priori unstable resonances: here the question is Given the simplicity of the Hamiltonian [80] that
whether the special invariant tori continue to exist we are considering, it is not necessary to discuss the
in presence of small enough perturbations, of functions H " , K " because the equations that they
course slightly deformed. This means asking should obey reduce to their definitions as in the
whether, given A0 such that w(A0 ) = @A H0 (A0 ) has section ‘‘Quasiperiodicity and KAM stability,’’ and
rationally independent components, there are func- for the same reason.
tions (H " (y ), h" (y )), (P " (y ), Q" (y )) and (P " (y ), In other words, also the resonant tori admit a
K " (y )) smooth in " near " = 0, vanishing for " = 0, Lindstedt series representation. It is however very
analytic in y 2 Tr and such that the r-dimensional unlikely that the series are, in general, convergent.
surface Physically, this new aspect is due to the fact that
the linearization of the motion near the torus T A0 , b 0
A ¼ A0 þ H " ðy Þ; a ¼ y þ h" ðy Þ introduces oscillatory motions around T A00 , b 0 with
p ¼ P" ðy Þ; q ¼ Q" ðy Þ y 2 Tr ½88 frequencies proportional to the square roots of the
2
p ¼ P " ðy Þ; k ¼ K " ðy Þ positive eigenvalues of the matrix "@bb f ( b 0 ): there-
fore, it is naively expected that it has to be necessary
is an invariant torus T A0, " on which the motion is that a Diophantine property be required on the
pffiffiffiffiffiffiffiffi
y ! y þ w(A0 )t. Again, the above property is vector (w, "1 , . . . ), where "j are the positive
summarized by saying that the unperturbed special eigenvalues. Hence, some values of ", namely those
pffiffiffiffiffiffiffiffi
resonant motions can be continued in presence of for which (w, "1 , . . . ) is not a Diophantine vector
perturbation "f for small " to quasiperiodic motions or is too close to a non-Diophantine vector, should
with the same spectrum and on a slightly deformed be excluded or at least should be expected to
torus T A0 , " . generate difficulties. Note that the problem arises
Some answers to the above questions are pre- irrespective of the assumptions about the nonde-
2
sented in the following section. For more details, the generate matrix @bb f ( b 0 ) (since " can have either
reader is referred to Gallavotti et al. (2004). sign), and no matter how small j"j is supposed to be.
2
But we can expect that if the matrix @bb f ( b 0 ) is
(say) positive definite (i.e., b 0 is a minimum point
for f ( b)) then the problem should be easier for " < 0
Resonances and Lindstedt Series
and vice versa, if b 0 is a maximum, it should be
We discuss eqns [87] in the paradigmatic case in easier for " > 0 (i.e., in the cases in which the
which the Hamiltonian H0 (A) is 12 A2 (cf. [80]). It eigenvalues of "@bb 2
f ( b 0 ) are negative and their roots
will be w(A0 )  A0 so that A0 = w, B0 = 0 and the do not have the interpretation of frequencies).
perturbation f (a) can be considered as a function Technically, the sums of the formal series can be
of a = (a 0 , b): let f ( b) be defined as its average over given (so far) a meaning only via summation rules
a 0 . The determination of the invariant torus of involving divergent series: typically, one has to
dimension r which can be continued in the sense identify in the formal expressions (denumerably
discussed in the last section is easily understood in many) geometric series which, although divergent,
this case. can be given a meaning by applying the rule [79].
A resonant invariant torus which, among the tori Since the rule can only be applied if z 6¼ 1, this leads
T A0 , b , has parametric equations that can be con- to conditions on the parameter ", in order to exclude
tinued as a formal power series in " is the torus that the various z that have to be considered are very
T A0 , b 0 with b 0 a stationarity point for f ( b), that is, close to 1. Hence, this stability result turns out to be
an equilibrium point for the average perturbation: rather different from the KAM result for the
@b f ( b 0 ) = 0. In fact, the following theorem holds: maximal tori. Namely the series can be given a
24 Introductory Article: Classical Mechanics

meaning via summation rules provided f and b 0 The case of a priori unstable systems has also
satisfy certain additional conditions and provided been widely studied. In this case too resonances
certain values of " are excluded. An example of a with Diophantine r-dimensional spectrum w are
theorem is the following: considered. However, in the case s2 = 0 (called a
priori unstable hyperbolic resonance) the Lindstedt
Theorem 6 Given the Hamiltonian [80] and a
series can be shown to be convergent, while in the
resonant torus T A00 , b 0 with w = A00 2 Rr satisfying a
case s1 = 0 (called a priori unstable elliptic reso-
Diophantine property let b 0 be a nondegenerate
nance) or in the mixed cases s1 , s2 > 0 extra
maximum R point for the average potential f ( b) def =
r conditions are needed. They involve w and
(2) Tr f (a 0 , b)dr a 0 . Consider the Lindstedt series
m = (1 , . . . , s2 ) (cf. [86]) and properties of the
solution for eqns [89] of the perturbed resonant
perturbations as well. It is also possible to study a
torus with spectrum (w, 0). It is possible to express
slightly different problem: namely to look for
the single nth-order term of the series as a sum of
conditions on w, m, f which imply that, for small
many terms and then rearrange the series thus
", invariant tori with spectrum "-dependent but
obtained so that the resummed series converges for
close, in a suitable sense, to w exist.
" in a domain E which contains a segment [0, "0 ] and
The literature is vast, but it seems fair to say that,
also a subset of ["0 , 0] which, although with open
given the above comments, particularly those con-
dense complement, is so large that it has 0 as a
cerning uniqueness and analyticity, the situation is still
Lebesgue density point. Furthermore, the resummed
quite unsatisfactory. We refer the reader to Gallavotti
series for h" , k" define an invariant r-dimensional
et al. (2004) for more details.
analytic torus with spectrum w.
More generally, if b 0 is only a nondegenerate
stationarity point for f ( b), the domain of definition
Diffusion in Phase Space
of the resummed series is a set E  ["0 , "0 ] which
on both sides of the origin has an open dense The KAM theorem implies that a perturbation of an
complement although it has 0 as a Lebesgue density analytic anisochronous integrable system, i.e., with
point. an analytic Hamiltonian H" (A, a) = H0 (A) þ
Theorem 6 can be naturally extended to the "f (A, a) and nondegenerate Hessian matrix
2
general case in which the Hamiltonian is the most @AA h(A), generates large families of maximal invar-
general perturbation of an anisochronous integrable iant tori. Such tori lie on the energy surfaces but do
2
system H" (A, a) = h(A) þ "f (A, a) if @AA h is a non- not have codimension 1 on them, i.e., they do not
singular matrix and the resonance arises from a split the (2‘  1)–dimensional energy surfaces into
spectrum w(A0 ) which has r independent compo- disconnected regions except, of course, in the case of
nents (while the remaining are not necessarily zero). systems with two degrees of freedom (see the section
We see that the convergence is a delicate problem ‘‘Quasiperiodicity and KAM stability’’).
for the Lindstedt series for nearly integrable reso- The refore, there might exist trajectories with
nant motions. They might even be divergent initial data close to Ai in action space which reach
(mathematically, a proof of divergence is an open phase space points close to Af 6¼ Ai in action space
problem but it is a very reasonable conjecture in for " 6¼ 0, no matter how small. Such diffusion
view of the above physical interpretation); never- phenomenon would occur in spite of the fact that
theless, Theorem 6 shows that sum rules can be the corresponding trajectory has to move in a space
given that sometimes (i.e., for " in a large set near in which very close to each {A}  T‘ there is an
" = 0) yield a true solution to the problem. invariant surface on which points move keeping
This is reminiscent of the phenomenon met in A constant within O("), which for " small can be
discussing perturbations of isochronous systems in  jAf  Ai j.
[76], but it is a much more complex situation. It In a priori unstable systems (cf. the section
leaves many open problems: foremost among them ‘‘Resonances and their stability’’) with s1 = 1,
is the question of uniqueness. The sum rules of s2 = 0, it is not difficult to see that the correspond-
divergent series always contain some arbitrary ing phenomenon can actually occur: the paradig-
choices, which lead to doubts about the uniqueness matic example (ARNOL’D) is the a priori unstable
of the functions parametrizing the invariant tori system
constructed in this way. It might even be that the
convergence set E may depend upon the arbitrary A21 p2
H" ¼ þ A2 þ þ gðcos q  1Þ
choices, and that considering several of them no " 2 2
with j"j < "0 is left out. þ "ðcos 1 þ sin 2 Þðcos q  1Þ ½90
Introductory Article: Classical Mechanics 25

This is a system describing a motion of a ‘‘pendu-


lum’’ ((p, q) coordinates) interacting with a ‘‘rotat- Af
ing wheel’’ ((A1 , 1 ) coordinates) and a ‘‘clock’’
Ai
((A2 , 2 ) coordinates) a priori unstable near the
pffiffiffi Af
points p = 0, q = 0, 2 (s1 = 1, s2 = 0, 1 = g, Ai
cf. [86]). It can be proved that on the energy surface
of energy E and for each " 6¼ 0 small enough (no
(a) (b)
matter how small) there are initial data with action
Figure 6 (a) The " = 0 geometry: the ‘‘partial energy’’ lines are
coordinates close to Ai = (Ai1 , Ai2 ) with (1=2)Ai21 þ A2
i
parabolas, (1=2)A21 þ A2 = const: The vertical lines are the
close to E eventually evolving to a datum resonances A1 = rational (i.e., 1 A1 þ 2 = 0). The disks are
A0 = (A01 , A02 ) with A01 at a distance from Af1 smaller neighborhoods of the points Ai and Af (the dots at their centers).
than an arbitrarily prefixed distance (of course with (b) " 6¼ 0; an artist’s rendering of a trajectory in A space, driven
energy E). Furthermore, during the whole process by the pendulum swings to accelerate the wheel from Ai1 to Af1 at
the expenses of the clock energy, sneaking through invariant tori
the pendulum energy stays close to zero within o(")
not represented and (approximately) located ‘‘away’’ from the
(i.e., the pendulum swings following closely the intersections between resonances and partial energy lines (a
unperturbed separatrices). dense set, however). The pendulum coordinates are not shown:
In other words, [90] describes a machine (the its energy stays close to zero, within a power of ". Hence the
pendulum) which, working approximately in a pendulum swings, staying close to the separatrix. The oscilla-
tions symbolize the wiggly behavior of the partial energy
cycle, extracts energy from a reservoir (the clock)
(1=2)A21 þ A2 in the process of sneaking between invariant tori
to transfer it to a mechanical device (the wheel). The which, because of their invariance, would be impossible without
statement that diffusion is possible means that the the pendulum. The energy (1=2)A21 of the wheel increases
machine can work as soon as " 6¼ 0, if the initial slightly at each pendulum swing: accurate estimates yield an
actions and the initial phases (i.e., 1 , 2 , p, q) are increase of the wheel speed A1 of the order of "=( log "1 ) at
each swing of the pendulum implying a transition time of the
suitably tuned (as functions of ").
order of g 1=2 "1 log "1 .
The peculiarity of the system [90] is that the fixed
points P of the unperturbed pendulum (i.e., the
equilibria p = 0, q = 0, 2) remain unstable equilibria
The latter property remains true for more general
even when " 6¼ 0 and this is an important simplify-
a priori unstable Hamiltonians
ing feature.
It is a peculiarity that permits bypassing the H" ¼ H0 ðAÞ þ Hu ðp; qÞ þ "f ðA; a; p; qÞ
obstacle, arising in the analysis of more general ½91
in ðU  T‘ Þ  ðR2 Þ
cases, represented by the resonance surfaces consist-
ing of the A’s with A1 1 þ 2 = 0: the latter where Hu is a one-dimensional Hamiltonian which
correspond to harmonics ( 1 , 2 ) present in the has two unstable equilibrium points Pþ and P
perturbing function, i.e., the harmonics which linearly repulsive in one direction and linearly
would lead to division by zero in an attempt to attractive in another which are connected by two
construct (as necessary in studying [90] by Arnol’d’s heteroclinic trajectories which, as time tends to 1,
method) the parametric equations of the perturbed approach P and Pþ and vice versa.
invariant tori with action close to such A’s. In the Actually, the points need not be different but, if
case of [90] the problem arises only on the coinciding, the trajectories linking them must be
resonance marked in Figure 6 by a heavy line, i.e., nontrivial: in the case [90] the variable q can be
A1 = 0, corresponding to cos 1 in [90]. considered an angle and then Pþ and P would
If " = 0, the points P with p = 0, q = 0 and the coincide (but are connected by nontrivial trajec-
point Pþ with p = 0, q = 2 are both unstable tories, i.e., by trajectories that also visit points
equilibria (and they are, of course, the same point, different from P ). Such trajectories are called
if q is an angular variable). The unstable manifold heteroclinic if Pþ 6¼ P and homoclinic if Pþ = P .
(it is a curve) of Pþ coincides with the stable In the general case, besides the homoclinicity (or
manifold of P and vice versa. So that the heteroclinicity) condition, certain weak genericity
unperturbed system admits nontrivial motions lead- conditions, automatically satisfied in the example
ing from Pþ to P and from P to Pþ , both in a bi- [90], have to be imposed in order to show that,
infinite time interval (1, 1): the p, q variables given Ai and Af with the same unperturbed energy
describe a pendulum and P are its unstable E, one can find, for all " small enough but not equal
equilibria which are connected by the separatrices to zero, initial data ("-dependent) with actions
(which constitute the zero-energy surfaces for the arbitrarily close to Ai which evolve to data with
pendulum). actions arbitrarily close to Af . This is a phenomenon
26 Introductory Article: Classical Mechanics

called the Arnol’d diffusion. Simple sufficient con- Long-Time Stability of Quasiperiodic
ditions for a transition from near Ai to near Af are Motions
expressed by the following result:
A more difficult problem is whether the same
Theorem 7 Given the Hamiltonian [91] with Hu phenomenon of migration in action space occurs in
admitting two hyperbolic fixed points P with a priori stable systems. The root of the difficulty is a
heteroclinic connections, t ! (pa (t), qa (t)), a = 1, 2, remarkable stability property of quasiperiodic
suppose that: motions. Consider Hamiltonians H" (A, a) = h(A) þ
(i) On the unperturbed energy surface of energy "f (A, a) with H0 (A) = h(A) strictly convex, analytic,
E = H(Ai ) þ Hu (P ) there is a regular curve and anisochronous on the closure U of an open
 : s ! A(s) joining Ai to Af such that the bounded region U  R‘ , and a perturbation "f (A, a)
unperturbed tori {A(s)}  T‘ can be continued analytic in U  T‘ .
at " 6¼ 0 into invariant tori T A(s), " for a set of Then a priori bounds are available on how long it
values of s which fills the curve  leaving only can possibly take to migrate from an action close to
gaps of size of order o("). A1 to one close to A2 : and the bound is of
(ii) The ‘  ‘ matrix Dij of the second derivatives of ‘‘exponential type’’ as " ! 0 (i.e., it admits a lower
the integral of f over the heteroclinic motions is bound which behaves as the exponential of an
not degenerate, that is, inverse power of "). The simplest theorem is
(NEKHOROSSEV):
j det Dj Theorem 7 There are constants 0 < a, b, d, g, 

Z such that any initial datum (A, a) evolves so that A

1
¼

det dt @ i j f ðA; a þ wðAÞt; will not change by more than a"g before a long time
1 bounded below by  exp (b"d ).



Thus, this puts an exponential bound, i.e., a
pa ðtÞ; qa ðtÞÞ

> c > 0 ½92


bound exponential in an inverse power of ", to the
diffusion time: before a time  exp (b"d ) actions can
for all A’s on the curve  and all a 2 T2 . only change by O("g ) so that their variation cannot
be large no matter how small " 6¼ 0 is chosen. This
Given arbitrary  > 0, for " 6¼ 0 small enough
places a (long) lower bound to the time of diffusion
there are initial data with action and energy closer
in a priori stable systems.
than  to Ai and E, respectively, which after a long
The proof of the theorem provides, actually, an
enough time acquire an action closer than  to Af
interesting and detailed picture of the variations in
(keeping the initial energy).
actions showing that some actions may vary more
The above two conditions can be shown to hold slowly than others.
generically for many pairs Ai 6¼ Af (and many The theorem is constructive, i.e., all constants
choices of the curves  connecting them) if the 0 < a, b, d,  can be explicitly chosen and depend
number of degrees of freedom is 3. Thus, the result, on ‘, H0 , f although some of them can be fixed to
obtained by a simple extension of the argument depend only on ‘ and on the minimum curvature of
originally outlined by Arnol’d to discuss the para- the convex graph of H0 . Its proof can be adapted
digmatic example [90], proves the existence of to cover many cases which do not fall in the class of
diffusion in a priori unstable systems. The integral systems with strictly convex unperturbed Hamilto-
in [92] is called Melnikov integral. nian, and even to cases with a resonant unperturbed
The real difficulty is to estimate the time needed Hamiltonian.
for the transition: it is a time that obviously has to However, in important problems (e.g., in the
diverge as " ! 0. Assuming g fixed (i.e., " indepen- three-body problems met in celestial mechanics)
dent) a naive approach easily leads to estimates there is empirical evidence that diffusion takes
which can even be worse than O(exp (a"b )) with place at a fast pace (i.e., not exponentially slow in
some a, b > 0. It has finally been shown that in such the above sense) while the above results would
cases the minimum time can be, for rather general forbid a rapid migration in phase space if they
perturbations "f (a, q), estimated above by applied: however, in such problems the assumptions
O("1 log "1 ), which is the best that can be hoped of the theorem are not satisfied, because the
for under generic assumptions. unperturbed system is strongly resonant (as in the
The reader is referred to Arnol’d (1989) and celestial mechanics problems, where the number of
Chierchia and Valdinoci (2000) for more details. independent frequencies is a fraction of the number
Introductory Article: Classical Mechanics 27

of degrees of freedom and h(A) is far from strictly with " small and the mass mM moves in the plane of
convex), leaving wide open the possibility of observ- the circular orbit. This will be called the ‘‘circular
ing rapid diffusion. restricted three-body problem.’’
Further, changing the assumptions can dramati- In a reference system with center S and rotating at
cally change the results. For instance, rapid diffusion the angular speed of J around S inertial forces
can sometimes be proved even though it might be (centrifugal and Coriolis) act. Supposing that the
feared that it should require exponentially long body J is located on the axis with unit vector i at
times: an example that has been proposed is the distance R from the origin S, the acceleration of the
case of a three-timescales system, with Hamiltonian point M is
 
p2 2 "R
!1 A1 þ !2 A2 þ þ gð1 þ cos qÞ € ¼ F þ !0 R 
R i  2w 0 ^ R_
2 1þ"
þ "f ð 1 ; 2 ; p; qÞ ½93
if F is the force of attraction and w 0 ^ R_  !0 R_ ?
def 1=2 1=2
with w " = (!1 , !2 ), where !1 = " !, !2 = " !e where w 0 is a vector with jw 0 j = !0 and perpen-
and p e > 0 constants. The three scales are
!,ffiffiffiffiffiffiffi
! dicular to the orbital plane and R? def = (2 , 1 ) if
!1
1 , g1 , !1
2 . In this case, there are many R = (1 , 2 ). Here, taking into account that the origin
(although by no means all) pairs A1 , A2 which can S rotates around the fixed center of mass, !20 (R 
be connected within a time that can be estimated to "R=(1 þ ")i) is the centrifugal force while 2w 0 ^ R_
be of order O("1 log "1 ). is the Coriolis force. The equations of motion can
This is a rapid-diffusion case in an a priori therefore be derived from a Lagrangian
unstable system in which condition [92] is not
satisfied: because the "-dependence of w(A) implies 1 2 1
L¼ R_  W þ !0 R?  R_ þ !20 R2
that the lower bound c in [92] must depend on " 2 2
(and be exponentially small with an inverse power 2 "R
 !0 Ri ½94
of " as " ! 0). 1þ"
The unperturbed system in [93] is nonresonant in with
the H0 part for " > 0 outside a set of zero measure
(i.e., where the vector w " satisfies a suitable def
!20 R3 ¼ kmS ð1 þ "Þ ¼ g0
Diophantine property) and, furthermore, it is
kmS kmS "
a priori unstable: cases met in applications can be W¼ 
a priori stable and resonant (and often not aniso- jRj jR  Rij
chronous) in the H0 part. In such a system, not only where k is the gravitational constant, R the distance
the speed of diffusion is not understood but between S and J, and finally the last three terms in [94]
proposals to prove its existence, if present (as come from the Coriolis force (the first) and from the
expected), have so far not given really satisfactory centripetal force (the other two, taking into account that
results. the origin S rotates around the fixed center of mass).
For more details, the reader in referred Setting g = g0 =(1 þ ")  kmS , the Hamiltonian of
to Nekhorossev (1977). the system is
1 g 1
H ¼ ðp  !0 R? Þ2   !20 R2
The Three-Body Problem 2  jRj 2
g

R
1 R 

Mechanics and the three-body problem can be "


 i
  i ½95
R R R
almost identified with each other, in the sense that
the motion of three gravitating masses has long been The first part can be expressed immediately in the
a key astronomical problem and at the same time action–angle coordinates for the two-body problem
the source of inspiration for many techniques: (cf. the section ‘‘Newtonian potential and Kepler’s
foremost among them the theory of perturbations. laws’’). Calling such coordinates (L0 , 0 , G0 , 0 ) and
As an introduction, consider a special case. Let
0 the polar angle of M with respect to the major axis
three masses mS = m0 , mJ = m1 , mM = m2 interact of the ellipse and 0 the mean anomaly of M on its
via gravity, that is, with interaction potential ellipse, the Hamiltonian becomes, taking into account
kmi mj jxi  xj j1 : the simplest problem arises that for " = 0 the ellipse axis rotates at speed !0 ,
when the third body has a neglegible mass compared 
1 R 
g2 g

to the two others and the latter are supposed to be H ¼  2  !0 G0  "


 i
  i ½96
on a circular orbit; furthermore, the mass mJ is "mS 2L0 R R R
28 Introductory Article: Classical Mechanics

which is convenient if we study the interior problem, on an ellipse rotating at a rate !0 ) with actions
i.e., jRj < R. This can be expressed in the action– (L0 , G0 ), provided " is small enough. Hence,
angle coordinates via [41], [42]:
The KAM theorem answers, at least conceptually, the

0 ¼ 0 þ f 0 ;
0 þ 0 ¼ 0 þ 0 þ f 0 classical question: can a solution of the three-body
 1=2 problem remain close to an unperturbed one forever?
G20 jRj G20 1 ½97 That is, is it possible that a solar system is stable
e¼ 1 2 ; ¼ forever?
L0 R gR 1 þ ecosð0 þ f0 Þ

where (see [42]), f =f (esin, ecos) and Assuming e, j%j=R  1 and retaining only the lowest
  orders in e and j%j=R  1 the Hamiltonian [98]
5 simplifies into
f ðx; yÞ ¼ 2x 1 þ y þ   
4
g2 "g G40 
with the ellipsis denoting higher orders in x, y even H ¼  2  !G0 þ " ðG0 Þ  3cos2ð0 þ 0 Þ
2L0 2R g2 R2
in x. The Hamiltonian takes the form, if !2 = gR3 , 9
 e cos0  e cosð0 þ 20 Þ
g2 g 2
H" ¼   !G0 þ " FðG0 ; L0 ;0 ;0 þ 0 Þ ½98 3 
2L20 R þ ecosð30 þ 20 Þ ½100
2
where the only important feature (for our purposes) is
where
that F(L, G, , ) is an analytic function of L, G, ,
near a datum with jGj < L (i.e., e > 0) and jRj < R. "g G40
However, the domain of analyticity in G is rather " ðG0 Þ ¼ ðð1 þ "Þ1=2  1Þ!G0 
2R g2 R2
small as it is constrained by jGj < L excluding in  1=2
particular the circular orbit case G = L. G20
e¼ 1 2
Note that apparently the KAM theorem fails to be L0
applicable to [98] because the matrix of the second
derivatives of H0 (L, G) has vanishing determinant. It is an interesting exercise to estimate, assuming
Nevertheless, the proof of the theorem also goes as model the Hamiltonian [100] and following the
through in this case, with minor changes. This can proof of the KAM theorem, how small has " to be if
be checked by studying the proof or, following a a planet with the data of Mercury can be stable
remark by Poincaré, by simply noting that the forever on a (slowly precessing) orbit with actions
‘‘squared’’ Hamiltonian H0" def
= (H" )2 has the form close to the present-day values under the influence
of a mass " times the solar mass orbiting on a circle,
 2
g2 at a distance from the Sun equal to that of Jupiter. It
H" ¼  2  !G0 þ "F0 ðG0 ; L0 ; 0 ; 0 þ 0 Þ ½99
0
is possible to follow either the above reduction to
2L0
the ordinary KAM theorem or to apply directly to
with F0 still analytic. But this time [100] the Lindstedt series expansion, proceeding
@ 2 H00 along the lines of the section ‘‘Quasiperiodicity and
det ¼ 6g2 L4 2
0 !0 h 6¼ 0 KAM stability.’’ The first approach is easy but the
@ðG0 ; L0 Þ
second is more efficient: in both cases, unless the
if h ¼ g2 L2
0  2!G0 6¼ 0 estimates are done in a particularly careful manner,
the value found for "mS is not interesting from the
Therefore, the KAM theorem applies to H0" and viewpoint of astronomy.
the key observation is that the orbits generated by The reader is refered to Arnol’d (1989) for more
the Hamiltonian (H" )2 are geometrically the same as details.
those generated by the Hamiltonian H" : they are
only run at a different speed because of the need of a
time rescaling by the constant factor 2H" .
Rationalization and Regularization of
This shows that, given an unperturbed ellipse of
Singularities
parameters (L0 , G0 ) such that w = (g2 =L30 , !),
G0 > 0, with !1 =!2 Diophantine, then the perturbed Often integrable systems have interesting data which
system admits a motion which is quasiperiodic with lie on the boundary of the integrability domain. For
spectrum proportional to w and takes place on an orbit instance, the central motion when L = G (circular
which wraps around a torus remaining forever close to orbits) or the rigid body in a rotation around one of
the unperturbed torus (which can be visualized as the principal axes or the two-body problem when
described by a point moving, according to the area law G = 0 (collisional data). In such cases, perturbation
Introductory Article: Classical Mechanics 29

theory cannot be applied as discussed above. obtained from the one in [101] by letting alone
Typically, the
ffi perturbation depends on quantities
pffiffiffiffiffiffiffiffiffiffiffiffiffi L,  and setting
like L  G and is not analytic at L = G. Never- pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi
theless, it is sometimes possible to enlarge phase space p ¼ 2G cos ; q ¼ 2G sin  ½104
and introduce new coordinates in the vicinity of the then p, q vary in a neighborhood of the origin with
data which in the initial phase space are singular. the origin itself excluded.
A notable example is the failure of the analysis of Adding the origin of the p–q plane then in a full
the circular restricted three-body problem: it appar- neighborhood of the origin, the Hamiltonian [96] is
ently fails when the orbit that we want to perturb is analytic in L, , p, q. This is because it is analytic
circular. (cf. [96], [97]) as a function of L,  and e cos
0
It is convenient to introduce the canonical and of cos (0 þ
0 ). Since
0 =  þ  þ fþ and
coordinates L,  and G, :
0 þ 0 =  þ fþ by [97], the Hamiltonian [96] is
analytic in L, , e cos ( þ  þ fþ ), cos ( þ fþ )
L ¼ L0 ; G ¼ L 0  G0 for e small (i.e., for G small) and, by [42], fþ is
½101
 ¼ 0 þ 0 ;  ¼ 0 analytic in e sin ( þ ) and e cos ( þ ). Hence the
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi trigonometric identities
so that e = 2GL1 1  G(2L)1 and 0 =  þ  rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
and
0 = 0 þ f0 , where f0 is defined in [42] (see p sin  þ q cos  G
e sinð þ Þ ¼ pffiffiffiffi 1
also [97]). Hence, L 2L
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½105

0 ¼  þ  þ fþ ;
0 þ 0 ¼  þ fþ p cos   q sin  G
e cosð þ Þ ¼ pffiffiffiffi 1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 ffi L 2L
pffiffiffiffiffiffiffi 1 G
e ¼ 2G 1
½102 together with G = (1=2)(p2 þ q2 ) imply that [103] is
L 2L
analytic near p = q = 0 and L > 0,  2 [0, 2]. The
j%j L2 ð1  e2 Þ 1 Hamiltonian becomes analytic and the new coordi-
¼
R gR 1 þ e cosð þ  þ fþ Þ nates are suitable to describe motions crossing the
origin: for example, by setting
and the Hamiltonian [100] takes the form  
def 1 p2 þ q2 1=2
g2 C¼ 1 L
H" ¼   !L þ !G 2 4L
2L2
g [100] becomes
þ " FðL  G; L;  þ ; Þ ½103
R
g2
In the coordinates L,G of [101] the unperturbed H¼  !L þ !12ðp2 þ q2 Þ
2L2
circular case corresponds to G = 0 and [96], once 4
"g ðL  12 ðp2 þ q2 ÞÞ
expressed in the action–angle variables G, L, , , is þ " ð12ðp2 þ q2 ÞÞ 
analytic in a domain whose size is controlled by 2R g2 R2
pffiffiffiffiffi
G. Nevertheless, very often problems of perturba-  ð3 cos 2  ðð11 cos  þ 3 cos 3Þp
tion theory can be ‘‘regularized.’’  ð7 sin  þ 3 sin 3ÞqÞCÞ ½106
This is done by ‘‘enlarging the integrability’’
domain by adding to it points (one or more) around The KAM theorem does not apply in the form
the singularity (a boundary point of the domain of discussed above to ‘‘Cartesian coordinates,’’ that is,
the coordinates) and introducing new coordinates to when, as in [106], the unperturbed system is not
describe simultaneously the data close to the assigned in action–angle variables. However, there
singularity and the newly added points: in many are versions of the theorem (actually its corollaries)
interesting cases, the equations of motion are no which do apply and therefore it becomes possible to
longer singular (i.e., become analytic) in the new obtain some results even for the perturbations of
coordinates and are therefore apt to describe the circular motions by the techniques that have been
motions that reach the singularity in a finite time. illustrated here.
One can say that the singularity was only apparent. Likewise, the Hamiltonian of the rigid body with
Perhaps this is best illustrated precisely in the a fixed point O and subject to analytic external
above circular restricted three-body problem, with forces becomes singular, if expressed in the action–
the singularity occurring where G = 0, that is, at a angle coordinates of Deprit, when the body motion
circular unperturbed orbit. If we describe the points nears a rotation around a principal axis or, more
with G small in a new system of coordinates generally, nears a configuration in which any two of
30 Introductory Article: Classical Mechanics

the axes i3 , z, or z0 coincide (i.e., any two among the It is useful to introduce the notion of a line ‘1
principal axis, the angular momentum axis and the situated ‘‘between’’ two lines ‘, ‘0 with ‘0 > ‘: this
inertial z-axis coincide; see the section ‘‘Rigid will mean that ‘1 precedes ‘0 but not ‘.
body’’). Nevertheless, by imitating the procedure All trees
in which there are some pairs l0 > l of
just described in the simpler cases of the circular consecutive lines of scale label 1 which have equal
three-body problem, it is possible to enlarge the current and such that all lines between them bear
phase space so that in the new coordinates the scale label 0 are obtained by ‘‘inserting’’ on the lines
Hamiltonian is analytic near the singular of trees in 0 with label 1 any number of clusters
configurations. of lines and nodes, with lines of scale 0 and with the
A regularization also arises when considering property that the sum of the harmonics of the nodes
collisional orbits in the unrestricted planar three- inserted vanishes.
body problem. In this respect, a very remarkable Consider a line l0 2
0 2 0 linking nodes v1 < v2
result is the regularization of collisional orbits in the and labeled 1 and imagine inserting on it a cluster
planar three-body problem. After proving that if the  of lines of scale 0 with sum of the node harmonics
total angular momentum does not vanish, simulta- vanishing and out of which emerges one line
neous collisions of the three masses cannot occur connecting a node vout in  to v2 and into which
within any finite time interval, the question is enters one line linking v1 to a node vin 2 . The
reduced to the regularization of two-body collisions, insertion of a k–lines, jj = (k þ 1)-nodes, cluster
under the assumption that the total angular momen- changes the tree value by replacing the line factor,
tum does not vanish. that will be briefly called ‘‘value of the cluster ’’, as
The local change of coordinates, which changes the
n v1  n v2 ðn v1  Mð; nðl0 ÞÞn v2 Þ 1
relative position coordinates (x, y) of two colliding 2
! 2
½107
bodies as (x, y) ! (, ), with x þ iy = ( þ i)2 , is not w 0  nðl0 Þ w 0  nðl0 Þ w 0  nðl0 Þ2
one to one, hence it has to be regarded as an where M is an ‘  ‘ matrix
enlargement of the positions space, if points with
different (, ) are considered different. However, the "jj Y Y n v  n v0
Mrs ð; nðl0 ÞÞ ¼ out; r in; s ðfn v Þ 2
equations of motion written in the variables ,  have k! v2 l2 w 0  nðlÞ
no singularity at ,  = 0 (LEVI-CIVITA).
Another celebrated regularization is the regular- if ‘ = v0 v denotes a line linking v0 and v. Therefore, if
ization of the Schwartzschild metric, i.e., of the all possible connected clusters are inserted and the
general relativity version of the two-body problem: resulting values are added up, the result can be taken
it is, however, somewhat out of the scope of this into account by attributing to the original
P line l0 a
review (SYNGE, KRUSKAL). factor like [107] with M(0) (n(l0 )) def=  M(; n(l0 ))
For more details, the reader is refered to Levi- replacing M(; n(l0 )).
Civita (1956). If several connected clusters  are inserted on the
same line and their values are summed, the result is
a modification of the factor associated with the line
Appendix 1: KAM Resummation Scheme l0 into
The idea to control the ‘‘remaining contributions’’ is to !k
reduce the problem to the case in which there are no X
1
Mð0Þ ðnðl0 ÞÞ 1
pairs of lines that follow each other in the tree order n v1  2
n v2
k¼0 w 0  nðl0 Þ w 0  nðl0 Þ2
and which have the same current. Mark by a scale !
label ‘‘0’’ the lines, see [74], [83], of a tree whose 1
¼ n v1  n v2 ½108
divisors C=w 0 :n(l) are >1: these are lines which give w 0  nðl0 Þ2  Mð0Þ ðnðl0 ÞÞ
no problems in the estimates. Then mark by a scale
label ‘‘ 1’’ the lines with current n(l) such that The series defining M(0) involves, by construction, only
jw 0  n(l)j  2nþ1 for n = 1 (i.e., the remaining lines). trees with lines of scale 0, hence with large divisors, so
The lines labeled 0 are said to be on scale 0, while that it converges to a matrix of small size of order "
those labeled 1 are said to be on scale 1. A cluster (actually "2 , more precisely) if " is small enough.
of scale 0 will be a maximal collection of lines of Convergence can be established by simply remark-
scale 0 forming a connected subgraph of a tree
. ing that the series defining M(1) is built with lines
Consider only trees
0 2 0 of the family 0 of with values >(1=2) of the propagator, so that it
trees containing no clusters of lines with scale label certainly converges for " small enough (by the
0 which have only one line entering the cluster and estimates in the section ‘‘Perturbing functions,’’
one exiting it with equal current. where the propagators were identically 1) and the
Introductory Article: Classical Mechanics 31

sum is of order " (actually "2 ), hence <1. However, follow each other while any line between them has
such an argument cannot be repeated when dealing lower scale (i.e., 0), here ‘‘between’’ means ‘‘preced-
with lines with smaller propagators (which still have ing l0 but not preceding l,’’ as above.
to be discussed). Therefore, a method not relying on Therefore, a scale-independent method has to be
so trivial a remark on the size of the propagators has devised to check the convergence for M(1) and for the
eventually to be used when considering lines of scale matrices to be introduced later to deal with even
higher than 1, as it will soon become necessary. smaller propagators. This is achieved by the following
The advantage of the collection of terms achieved extension of Siegel’s theorem mentioned in the section
with [108] is that we can represent h as a sum of ‘‘Quasiperiodicity and KAM stability’’:
values of trees which are simpler because they
Theorem 8 Let w 0 satisfy [74] and set w = Cw 0 .
contain no pair of lines of scale 1 with in between
Consider the contribution to the sum in [82] from
lines of scale 0 with total sum of the node harmonics
graphs
in which
vanishing. The price is that the divisors are now more
involved and we even have a problem due to the fact (i) no pairs ‘0 > ‘ of lines which lie on the same
that we have not proved that the series in [108] path to the root carry the same current n if all
converges. In fact, it is a geometric series whose value lines ‘1 between them have current n(‘1 ) such
is the RHS of [108] obtained by the sum rule [79] that jw  n(‘1 )j > 2jw  nj;
unless we can prove that the ratio of the geometric (ii) the node harmonics are bounded by jnj  N for
series is <1. This is trivial in this case by the previous some N.
remark: but it is better to note that there is another
Then the number of lines ‘ in
with divisor w  n ‘
reason for convergence, whose use is not really
satisfying 2n < jw  n ‘ j  2nþ1 does not exceed
necessary here but will become essential later.
4 Nk2n= , n = 1, 2, . . . .
The property that the ratio of the geometric series
is <1 can be regarded as due to the consequence of This implies, by the same estimates in [85], that
the cancellation mentioned in the section ‘‘Quasi- the series defining M(1) converges. Again, it must be
periodicity and KAM stability’’ which can be checked that there are cancellations implying that
shown to imply that the ratio is <1 because M(1) (n) = "2 (w 0  n)2 m(1) (n) with jm(1) (n)j < D0 for
M(0) (n) = "2 (w 0  n)2 m(0) (n) with C jm(0) (n)j < D0 the same D0 > 0 and the same "0 .
for some D0 > 0 and for all j"j < "0 for some "0 . At this point, one deals with trees containing only
Then for small " the divisor in [108] is essentially lines carrying labels 0, 1, 2, and the line factors for
still what it was before starting the resummation. the lines ‘ = v0 v of scale 0 are n v0  n v =(w 0 n(‘))2 ,
At this point, an induction can be started. Consider those of the lines ‘ = v0 v of scale 1 have line factors
trees evaluated with the new rule and place a scale n v0  (w 0  n(‘)2  M(0) (n(‘)))1 n v , and those of the
level ‘‘ 2’’ on the lines with C jw 0  n(l)j  2nþ1 for lines ‘ = v0 v of scale 2 have line factors
n = 2: leave the label ‘‘0’’ on the lines already marked
so and label by ‘‘1’’ the other lines. The lines of scale n v0  ðw 0  nð‘Þ2  Mð1Þ ðnð‘ÞÞÞ1 n v
‘‘1’’ will satisfy 2n < jw 0  n(l)j  2nþ1 for n = 1.
The graphs will now possibly contain lines of scale 0, Furthermore, no pair of lines of scale ‘‘1’’ or of scale
1 or 2 while lines with label ‘‘ 1’’ no longer can ‘‘ 2’’ with the same momentum and with only lines
appear, by construction. of lower scale (i.e., of scale ‘‘0’’ in the first case or of
A cluster of scale 1 will be a maximal collection of scale ‘‘0’’, ‘‘1’’ in the second) between them can
lines of scales 0, 1 forming a connected subgraph of follow each other.
a tree
and containing at least one line of scale 1. This procedure can be iterated until, after infi-
The construction carried out by considering clusters nitely many steps, the problem is reduced to the
of scale 0 can be repeated by considering trees
1 2 1 , evaluation of tree values in which each line carries a
with 1 the collection of trees with lines marked 0, 1, scale label n and there are no pairs of lines which
or 2 and in which no pairs of lines with equal follow each other and which have only lines of
momentum appear to follow each other if between lower scale in between. Then the Siegel argument
them there are only lines marked 0 or 1. applies once more and the series so resumed is an
Insertion of connected clusters  of such lines on a absolutely convergent series of functions analytic in
line l0 of
1 leads to define a matrix M(1) formed by ": hence the original series is convergent.
summing tree values of clusters  with lines of scales Although at each step there is a lower bound on the
0 or 1 evaluated with the line factors defined in denominators, it would not be possible to avoid using
[107] and with the restriction that in  there are no Siegel’s theorem. In fact, the lower bound would become
pairs of lines ‘ < ‘0 with the same current and which worse and worse as the scale increases. In order to check
32 Introductory Article: Classical Mechanics

the estimates of the constants D0 , "0 which control the Therefore, if H = Hk is directed along the k-axis,
scale independence of the convergence of the various the acceleration it produces is the same that the
series, it is necessary to take advantage of the theorem, Coriolis force would impress on a unit mass located
and of the absence (at each step) of the necessity of in a reference frame which rotates with angular
considering trees with pairs of consecutive lines with velocity !0 k around the k-axis if H = 2!0 k.
equal momentum and intermediate lines of higher scale. The above remarks imply that a homogeneous
One could also perform the analysis by bounding sphere electrically charged uniformly with a unit
h(k) order by order with no resummations (i.e., charge and freely pivoting about its center in a
without changing the line factors) and exhibiting the constant magnetic field H directed along the k-axis
necessary cancellations. Alternatively, the paths that undergoes the same motion as it would follow if not
Kolmogorov, Arnol’d and Moser used to prove subject to the magnetic field but seen in a
the first three (somewhat different) versions of the noninertial reference frame rotating at constant
theorem, by successive approximations of the angular velocity !0 around the k-axis if H and !0
equations for the tori, can be followed. are related by H = 2!0 : in this frame, the Coriolis
The invariant tori are Lagrangian manifolds just force is interpreted as a magnetic field.
as the unperturbed ones (cf. comments after [31]) This holds, however, only if the centrifugal force
and, in the case of the Hamiltonian [80], the has zero moment with respect to the center: true in
generating function A  y þ (A, y ) can be the spherical symmetry case only. In spherically
expressed in terms of their parametric equations nonsymmetric cases, the centrifugal forces have in
general nonzero moment, so the equivalence
ðA; y Þ ¼ Gðy Þ þ a  y þ hðy Þ  ðA  w  hðy ÞÞ between Coriolis force and the Lorentz force is
def
¶ y Gðy Þ ¼  hðy Þ þ h ðy Þ¶ y h ðy Þ  a only approximate.

Z The Larmor theorem makes this more precise. It
def dy ½109 gives a quantitative estimate of the difference between
a ¼ ðhðy Þ þ h ðy Þ¶ y h ðy ÞÞ

ð2Þ‘ the motion of a general system of particles of mass m
Z
dy in a magnetic field and the motion of the same
¼ h ðy Þ¶ y h ðy Þ

ð2Þ‘ particles in a rotating frame of reference but in the
absence of a magnetic field. The approximation is
where  = (w  ¶ y ) and the invariant torus corre- estimated in terms of the size of the Larmor frequency
sponds to A0 = w in the map a = y þ ¶ A F(A, y ) and eH=2mc, which should be small compared to the
A0 = A þ ¶ y (A, y ). In fact, by [109] the latter other characteristic frequencies of the motion of the
becomes A0 = A  h and, from the second of [75] system: the physical meaning is that the centrifugal
written for f depending only on the angles a, it is force should be small compared to the other forces.
A = w þ h when A, a are on the invariant torus. The vector potential A for a constant magnetic
Note that if a exists it is necessarily determined by the field in the k-direction, H = 2!0 k, is A = 2!0 k ^ R 
third relation in [109] but the check that the second 2!0 R? . Therefore, from the treatment of the Coriolis
equation in [109] is soluble (i.e., that the RHS is an exact force in the section ‘‘Three-body problem’’ (see
gradient up to a constant) is nontrivial. The canonical [95]), the motion of a charge e with mass m in a
map generated by A  y þ F(A, y ) is also defined for A0 magnetic field H with vector potential A and subject
close to w and foliates the neighborhood of the invariant to other forces with potential W can be described, in
torus with other tori: of course, for A0 6¼ w the tori an inertial frame and in generic units, in which the
defined in this way are, in general, not invariant. speed of light is c, by a Hamiltonian
The reader is referred to Gallavotti et al. (2004)
for more details. 1  e 2
H¼ p  A þWðRÞ ½110
2m c
where p = mR_ þ (e=c)A and R are canonically con-
Appendix 2: Coriolis and Lorentz jugate variables.
Forces – Larmor Precession
Larmor precession refers to the motion of an
electrically charged particle in a magnetic field H Further Reading
(in an inertial frame of reference). It is due to the
Arnol’d VI (1989) Mathematical Methods of Classical Mechanics.
Lorentz force which, on a unit mass with unit Berlin: Springer.
charge, produces an acceleration R € = v ^ H if the Calogero F and Degasperis A (1982) Spectral Transform and
speed of light is c = 1. Solitons. Amsterdam: North-Holland.
Introductory Article: Differential Geometry 33

Chierchia L and Valdinoci E (2000) A note on the construction of Landau LD and Lifshitz EM (1976) Mechanics. New York:
Hamiltonian trajectories along heteroclinic chains. Forum Pergamon Press.
Mathematicum 12: 247–255. Levi-Civita T (1956) Opere Matematiche. Accademia Nazionale
Fassò F (1998) Quasi-periodicity of motions and complete dei Lincei. Bologna: Zanichelli.
integrability of Hamiltonian systems. Ergodic Theory and Moser J (1962) On invariant curves of an area preserving
Dynamical Systems 18: 1349–1362. mapping of the annulus. Nachricten Akademie Wissenschaften
Gallavotti G (1983) The Elements of Mechanics. New York: Göttingen 11: 1–20.
Springer. Nekhorossev V (1977) An exponential estimate of the time of
Gallavotti G, Bonetto F, and Gentile G (2004) Aspects of the stability of nearly integrable Hamiltonian systems. Russian
Ergodic, Qualitative and Statistical Properties of Motion. Mathematical Surveys 32(6): 1–65.
Berlin: Springer. Poincaré H (1987) Méthodes nouvelles de la mécanique celèste
Kolmogorov N (1954) On the preservation of conditionally vol. I. Paris: Gauthier-Villars. (reprinted by Gabay, Paris,
periodic motions. Doklady Akademia Nauk SSSR 96: 1987).
527–530.

Introductory Article: Differential Geometry


S Paycha, Université Blaise Pascal, Aubière, France Differential geometry appeared later in the eight-
ª 2006 Elsevier Ltd. All rights reserved.
eenth century with the works of Euler Recherches
sur la courbure des surfaces (1760) (Investigations
on the curvature of surfaces) and Monge Une
Differential geometry is the study of differential application de l’analyse à la géométrie (1795) (An
properties of geometric objects such as curves, application of analysis to geometry). Until Gauss’
surfaces and higher-dimensional manifolds endowed fundamental article Disquisitiones generales circa
with additional structures such as metrics and superficies curvas (General investigations of curved
connections. One of the main ideas of differential surfaces) published in Latin in 1827 (of which one
geometry is to apply the tools of analysis to can find a partial translation to English in Spivak
investigate geometric problems; in particular, it (1979)), surfaces embedded in R3 were either
studies their ‘‘infinitesimal parts,’’ thereby lineariz- described by an equation, W(x, y, z) = 0, or by
ing the problem. However, historically, geometric expressing one variable in terms of the others.
concepts often anticipated the analytic tools Although Euler had already noticed that the
required to define them from a differential geometric coordinates of a point on a surface could be
point of view; the notion of tangent to a curve, for expressed as functions of two independent variables,
example, arose well before the notion of derivative. it was Gauss who first made a systematic use of such
In its barely more than two centuries of existence, a parametric representation, thereby initiating the
differential geometry has always had strong (often concept of ‘‘local chart’’ which underlies differential
two-way) interactions with physics. Just to name a geometry.
few examples, the theory of curves is used in
kinematics, symplectic manifolds arise in Hamilto- Differentiable Manifolds
nian mechanics, pseudo-Riemannian manifolds in
general relativity, spinors in quantum mechanics, Lie The actual notion of n-manifold independent of a
groups and principal bundles in gauge theory, and particular embedding in a Euclidean space goes back
infinite-dimensional manifolds in the path-integral to a lecture Über die Hypothesen, welche der
approach to quantum field theory. Geometrie zu Grunde liegen (On the hypotheses
which lie at the foundations of geometry) (of which
one can find a translation to English and comments
in Spivak (1979)) delivered by Riemann at Göttingen
Curves and Surfaces
University in 1854, in which he makes clear the
The study of differential properties of curves and fact that n-manifolds are locally like n-dimensional
surfaces resulted from a combination of the coordi- Euclidean space. In his work, Riemann mentions
nate method (or analytic geometry) developed by the existence of infinite-dimensional manifolds,
Descartes and Fermat during the first half of the such as function spaces, which today play an
seventeenth century and infinitesimal calculus devel- important role since they naturally arise as config-
oped by Leibniz and Newton during the second half uration spaces in quantum field theories.
of the seventeenth and beginning of the eighteenth In modern language a differentiable manifold
century. modeled on a topological space V (which can be
34 Introductory Article: Differential Geometry

finite dimensional, Fréchet, Banach, or Hilbert for m 2 M and for any X, Y 2 Tm M,  2 R so that
example) is a topological space M equipped with a vector fields on M build a linear space.
family of local coordinate charts (Ui , i )i2I such that the One can generate tangent vectors to M via local
open subsets Ui  M cover M and where i : Ui ! V, one-parameter groups of differentiable transforma-
i 2 I, are homeomorphisms which give rise to smooth tions of M, that is, mappings (t, m) 7! t (m) from
transition maps i  1 j : j (Ui \ Uj ) ! i (Ui \ Uj ). ], [  U to U (with  > 0 and U  M an
An n-dimensional differentiable manifold is a differ- open subset of M) such that 0 = Id, tþs = t  s
entiable manifold modeled P on Rn . The sphere 8s, t 2 ], [ with t þ s 2 ], [ and m 7! t (m) is a
n n
Sn1 := {(x1 , . . . , xn ) 2 R , i = 1 x2i = 1} is a differenti- diffeomorphism of U onto an open subset t (U).
able manifold of dimension n  1. The tangent vector at t = 0 to the curve (t) = t (m)
Simple differentiable curves in Rn are one- yields a tangent vector to M at point m = (0).
dimensional differentiable manifolds locally speci- Conversely, when M is finite dimensional, the
fied by coordinates x(t) = (x1 (t), . . . , xn (t)) 2 R n , fundamental theorem for systems of ordinary
where t 7! xj (t) is of class Ck . The tangent at point equations yields, for any vector field X on M, the
x(t0 ) to such a curve, which is a straight line passing existence (around any point m 2 M) of a
through this point with direction given by the vector local one-parameter group of local transformations
x0 (t0 ), generalizes to the concept of tangent space  :], [  U ! M (with U an open subset contain-
Tm M at point m 2 M of a smooth manifold M ing m) which induces the tangent vector
modeled on V which is a vector space isomorphic to X(m) 2 Tm M.
V spanned by tangent vectors at point m to curves A differentiable mapping  : M ! N induces a map
(t) of class C1 on M such that (t0 ) = m.  (m) : Tm M ! T(m) M defined by  Xf = X(f  ).
In order to make this more precise, one needs the An ‘‘immersion’’ of a manifold M in a manifold N is a
notion of differentiable mapping. Given two differ- differentiable mapping  : M ! N such that the maps
entiable manifolds M and N, a mapping f : M ! N  (m) are injective at any point m 2 M. Such a map is
is differentiable at point m if, for every chart (U, ) an embedding if it is moreover injective in which case
of M containing m and every chart (V, ) of N such (M)  N is a submanifold of N. The unit sphere Sn
that f (U)  V, the mapping  f  1 : (U) ! (V) is a submanifold of Rnþ1 . Whitney showed that every
is differentiable at point (m). In particular, differenti- smooth real n-dimensional manifold can be embedded
able mappings f : M ! R form the algebra C1 (M, R) in R2nþ1 .
of smooth real-valued functions on M. Differentiable A differentiable manifold whose coordinate charts
mappings  : [a, b] ! M from an interval [a, b]  R to take values in a complex vector space V and whose
a differentiable manifold M are called ‘‘differentiable transition maps are holomorphic is called a complex
curves’’ on M. A differentiable mapping f : M ! N manifold, which is complex n-dimensional if V = Cn .
which is invertible and with differentiable inverse The complex projective space CPn , the union of
f 1 : N ! M is called a diffeomorphism. complex straight lines through 0 in Cnþ1 , is a
The derivative of a function f 2 C 1 (M, R) along compact complex manifold of dimension n. Similarly
a curve  : [a, b] ! M at point (t0 ) 2 M with t0 2 to the notion of differentiable mapping between
[a, b] is given by differentiable manifolds, we have the notion of
d holomorphic mapping between complex manifolds.
Xf :¼ f  ðtÞ A smooth family m 7! Jm of endomorphisms of the
dtjt¼t0
tangent spaces Tm M to a differentiable manifold M such
and the map f 7! Xf is called the tangent vector to 2
that Jm = Id gives rise to an almost-complex manifold.
the curve  at point (t0 ). Tangent vectors to some The prototype is the almost-complex structure on Cn
curve  : [a, b] ! M at a given point m 2 ([a, b]) defined by J(@xi ) = @yi ; J(@yi ) = @xi with z = (x1 þ
form a vector space Tm M called the ‘‘tangent space’’ iy1 , . . . , xn þ iyn ) 2 Cn which can be transferred to a
to M at point m. complex manifold M by means of local charts. An
A (smooth) map which, to a point m 2 M, assigns almost-complex structure J on a manifold M is called
a tangent vector X 2 Tm M is called a (smooth) complex if M is the underlying differentiable manifold
vector field. It can also be seen as a derivation of a complex manifold which induces J in this way.
~ : f 7! Xf on C1 (M, R) defined by (Xf
X ~ )(m) := Studying smooth functions on a differentiable
X(m)f for any m 2 M and the bracket of vector manifold can provide information on the topology
fields is thereby defined from the operator bracket of the manifold: for example, the behavior of a
gY] := X
[X, ~ Y~ Y~  X.
~ The linear operations on smooth function on a compact manifold as its
tangent vectors carry out to vector fields (X þ critical points strongly restricted by the topological
Y)(m) := X(m) þ Y(m), (X)(m) := X(m) for any properties of the manifold. This leads to the Morse
Introductory Article: Differential Geometry 35

critical point theory which extends to infinite- Lobatchevsky in 1829 and Bolyai in 1832. Non-
dimensional manifolds and, among other conse- Euclidean geometries actually played a major role in
quences, leads to conclusions on extremals or closed the development of differential geometry and Loba-
extremals of variational problems. Rather than chevsky’s work inspired Riemann and later Klein.
privileging points on a manifold, one can study Dropping the positivity assumption for the
instead the geometry of manifolds from the point of bilinear forms gm on Tm M leads to Lorentzian
view of spaces of functions, which leads to an manifolds which are (n þ 1)-dimensional smooth
algebraic approach to differential geometry. The manifolds equipped with bilinear forms on the
initial concept there is a commutative ring (which tangent spaces with signature (1, n). These occur in
becomes a possibly noncommutative algebra in the general relativity and tangent vectors with negative,
framework of noncommutative geometry), namely positive, or vanishing squared length are called
the ring of smooth functions on the manifold, while timelike, spacelike, and lightlike, respectively.
the manifold itself is defined in terms of the ring as the Just as complex vector spaces can be equipped with
space of maximal ideals. In particular, this point of positive-definite Hermitian products, a complex
view proves to be fruitful to understand supermani- manifold M can come equipped with a Hermitian
folds, a generalization of manifolds which is impor- metric, namely a positive-definite Hermitian product
tant for supersymmetric field theories. hm on Tm M for every point m 2 M depending
One can further consider the sheaf of smooth smoothly on the point m; every Hermitian metric
functions on an open subset of the manifold; this induces a Riemannian one given by its real part. The
point of view leads to sheaf theory which provides a complex projective space CPn comes naturally
unified approach to establishing connections between equipped with the Fubini–Study Hermitian metric.
local and global properties of topological spaces.
Transformation Groups
Metric Properties Metric properties can be seen from the point of view
Riemann focused on the metric properties of manifolds of transformation groups. Poncelet in his Traité
but the first clear formulation of the concept of a projectif des figures (1822) had investigated classical
manifold equipped with a metric was given by Weyl in Euclidean geometry from a projective geometric
Die Idee der Riemannsche Fläche. A Riemannian point of view, but it was not until Cayley (1858)
metric on a differentiable manifold M is a positive- that metric properties were interpreted as those
definite scalar product gm on Tm M for every point stable under any ‘‘projective’’ transformation which
m 2 M depending smoothly on the point m. A manifold leaves ‘‘cyclic points’’ (points at infinity on the
equipped with a Riemannian metric is called a imaginary axis of the complex plane) invariant.
Riemannian manifold. A Weyl transformation, which Transformation groups were further investigated by
is multiplying the metric by a smooth positive function, Lie, leading to the modern concept of Lie group, a
yields a new Riemannian metric with the same angle smooth manifold endowed with a group structure
measurement as the original one, and hence leaves the such that the group operations are smooth.
‘‘conformal’’ structure on M unchanged. A vector field X on a Lie group G is called left-
Riemann also suggested considering metrics on (resp. right-) invariant if it is invariant under left
the tangent spaces that are not induced from scalar translations Lg : h 7! gh (resp. right translations
products; metrics on the manifold built this way Rg : h 7! hg) for every g 2 G, that is, if (Lg ) X(h) =
were first systematically investigated by Finsler and X(gh) 8(g, h) 2 G2 (resp. (Rg ) X(h) = X(gh) 8(g, h)
are therefore called Finsler metrics. Geodesics on a 2 G2 ). The set of all left-invariant vector fields
Riemannian manifold M which correspond to equipped with the sum, scalar multiplication, and
smooth curves  : [a, b] ! M that minimize the the bracket operation on vector fields form an
length functional algebra called the Lie algebra of G.
The group Gln (R) (resp. Gln (C)) of all real (resp.
Z sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 
complex) invertible n  n matrices is a Lie group
1 b d d
LðÞ :¼ gðtÞ ; dt with Lie algebra, the algebra gln (R) (resp. gln (C)) of
2 a dt dt
all real (resp. complex) n  n matrices and the
then generalize to curves which realize the shortest bracket operation reads [A, B] = AB  BA.
distance between two points chosen sufficiently close. The orthogonal (resp. unitary) group On (R) :=
Euclid’s axioms which naturally lead to Rieman- {A 2 Gln (R), At A = 1}, where At denotes the trans-
nian geometry are also satisfied up to the axiom posed matrix (resp. Un (C) := {A 2 Gln (C), A A = 1},
of parallelism by a geometry developed by where A = A  t ), is a compact Lie group with Lie
36 Introductory Article: Differential Geometry

algebra on (R) := {A 2 Gln (R), At = A} (resp. un (C) := space. Smooth sections of E are maps  : B ! E such
{A 2 Gln (C), A = A}). that    = IB .
A left-invariant vector field X on a finite-dimen- When F is a vector space and when, given open
sional Lie group G (or equivalently an element X of subsets Ui  B that cover B with corresponding
the Lie algebra of G) generates a global one- coordinate charts (Ui , i )i2I , the local diffeomorph-
parameter group of transformations X (t), t 2 R. isms i : 1 (Ui ) ’ i (Ui )  F give rise to transition
The mapping from the Lie algebra of G into G maps i  j1 : j (Ui \ Uj )  F ! i (Ui \ Uj )  F that
defined by exp(X) := X (1) is called the exponential are linear in the fiber, the bundle is S called a ‘‘vector
mapping. The exponential mapping on Gl Pn (R) (resp. bundle.’’ The tangent bundle TM = m2M Tm M to a
Gln (C)) is given by the series exp (A) = 1 i
i = 0 A =i!. differentiable manifold M modeled on a vector space
As symmetry groups of physical systems, Lie V is a vector bundle with typical fiber V and
groups play an important role in physics, in transition maps ij = (i  1 1
j , d(i  j )) expressed
particular in quantum mechanics and Yang–Mills in terms of the differentials of the transition maps on
theory. Infinite-dimensional Lie groups arise as the manifold M. So are the cotangent bundle, the
symmetry groups, such as the group of diffeomorph- dual of the tangent bundle, and tensor products of
isms of a manifold in general relativity, the group of the tangent and cotangent vector bundles with
gauge transformations in Yang–Mills theory, and typical fiber the dual V  and tensor products of V
the group of Weyl transformations of metrics on a and V  . Vector fields defined previously are sections
surface in string theory. The principle ‘‘the physics of the tangent bundle, 1-forms on M are sections of
should not depend on how it is described’’ translates the cotangent bundle, and contravariant tensors,
to an invariance under the action of the (possibly resp. covariant tensors are sections of tensor
infinite-dimensional group) of symmetries of the products of the tangent, resp. cotangent bundles. A
theory. Anomalies arise when such an invariance differentiable mapping  : M ! N takes covariant
holds for the classical action of a physical theory but p-tensor fields on N to their pullbacks by ,
‘‘breaks’’ at the quantized level. covariant p-tensors on M given by
In his Erlangen program (1872), Klein puts the
ð TÞðX1 ; . . . ; Xp Þ := Tð X1 ; . . . ;  Xp Þ
concept of transformation group in the foreground
introducing a novel idea by which one should for any vector fields X1 , . . . , Xp on M.
consider a space endowed with some properties Differentiating a smooth function f on M gives
as a set of objects invariant under a given group of rise to a 1-form df on M. More generally, exterior p-
transformations. One thereby reaches a classifica- forms are antisymmetric smooth covariant p-tensors
tion of geometric results according to which group is so that !(X(1) , . . . , X(p) ) = ()!(X1 , . . . , Xp ) for
relevent in a particular problem as, for example, the any vector fields X1 , . . . , Xp on M and any permuta-
projective linear group for projective geometry, tion  2 p with signature ().
the orthogonal group for Riemannian geometry, or Riemannian metrics are covariant 2-tensors and
the symplectic group for ‘‘symplectic’’ geometry. the space of Riemannian metrics on a manifold M is
an infinite-dimensional manifold which arises as a
configuration space in string theory and general
Fiber Bundles
relativity.
Transformation groups give rise to principal fiber A principal bundle is a fiber bundle (P, , B) with
bundles which play a major role in Yang–Mills typical fiber a Lie group G acting freely and properly
theory. The notion of fiber bundle first arose out of on the total space P via a right action (p, g) 2
questions posed in the 1930s on the topology and the P  G 7! pg = Rg (p) 2 P and such that the local
geometry of manifolds, and by 1950 the definition of diffeomorphisms 1 (U) ’ U  G are G-equivariant.
fiber bundle had been clearly formulated by Steenrod. Given a principal fiber bundle (P, , B) with structure
A smooth fiber bundle with typical fiber a group a finite-dimensional Lie group G, the action of
manifold F is a triple (E, , B), where E and B are G on P induces a homomorphism which to an
smooth manifolds called the total space and the base element X of the Lie algebra of G assigns a vector
space, and  : E ! B is a smooth surjective map field X on P called the ‘‘fundamental vector field’’
called the projection of the bundle such that the generated by X. It is defined at p 2 P by
preimage 1 (b) of a point b 2 B called the fiber of
the bundle over b is isomorphic to F and any base d
X ðpÞ :¼ RexpðtXÞ ðpÞ
point b has a neighborhood U  B with preimage dtjt¼0
1 (U) diffeomorphic to U  F, where the diffeo-
mophisms commute with the projection on the base where exp is the exponential map on G.
Introductory Article: Differential Geometry 37

Given an action of G on a vector space V, one the group G) decomposition of the tangent space
builds from a principal bundle with typical fiber G an Tp P = Hp P  Vp P at each point p into a horizontal
associated vector bundle with typical fiber V. space Hp P and the vertical space Vp P = Ker dp ,
Principal bundles are essential in gauge theory; U(1)- gives rise to a linear connection on the associated
principal bundles arise in electro-magnetism and vector bundle.
nonabelian structure groups arise in Yang–Mills A connection on P gives rise to a 1-form ! on P
theory. There the fields are connections on the with values in the Lie algebra of the structure group
principal bundle, and the action of gauge transforma- G called the connection 1-form and defined as
tions on (irreducible) connections gives rise to an follows. For each X 2 Tp P, !(X) is the unique
infinite-dimensional principal bundle over the moduli element U of the Lie algebra of G such that the
space with structure group given by gauge transfor- corresponding fundamental vector field U (p) at
mations. Infinite-dimensional bundles arise in other point p coincides with the vertical component of X.
field theories such as string theory where the moduli In particular, !(U ) = U for any element U of the Lie
space corresponds to inequivalent complex structures algebra of G.
on a Riemann surface and the infinite-dimensional The space of connections which is an infinite-
structure group is built up from Weyl transformations dimensional manifold arises as a configuration space
of the metric and diffeomorphisms of the surface. in Yang–Mills theory and also comes into play in the
Seiberg–Witten theory.
Connections
On a manifold there is no canonical method to Geometric Differential Operators
identify tangent spaces at different points. Such an
From connections one defines a number of differ-
identification, which is needed in order to differenti-
ential operators on a Riemannian manifold, among
ate vector fields, can be achieved on a Riemannian
them second-order Laplacians. In particular, the
manifold via ‘‘parallel transport’’ of the vector fields.
Laplace–Beltrami operator f 7! tr(rTM df ) on
The basic concepts of the theory of covariant 
smooth functions, where rT M is the connection on
differentiation on a Riemannian manifold were given
the cotangent bundle induced by the Levi-Civita
at the end of the nineteenth century by Ricci and, in a
connection on M, generalizes the ordinary Laplace
more complete form, in 1901 in collaboration with
operator on Euclidean space. This in turn generalizes
Levi-Civita in Méthodes de calcul différentiel absolu et 
to second-order operators E := tr(rT ME rE )
leurs applications; on a Riemannian manifold, it is
acting on smooth sections of a vector bundle E over
possible to define in a canonical manner a parallel
a Riemannian manifold M, where rE is a connection
displacement of tangent vectors and thereby to 
on E and rT ME the connection on T  M  E
differentiate vector field covariantly using the since
induced by rE and the Levi-Civita connection on M.
then called Levi-Civita connection.
The Dirac operator on a spin Riemannian
More generally, a (linear) connection (or equiva-
manifold, a first-order differential operator whose
lently a covariant derivation) on a vector bundle E
square coincides with the Laplace–Beltrami opera-
over a manifold M provides a way to identify fibers
tor up to zeroth-order terms, can be best under-
of the vector bundle at different points; it is a map r
stood going back to the initial idea of Dirac. A
taking sections  of E to E-valued 1-forms on M
first-order differentialP operator with constant
which satisfies a Leibniz rule, r(f ) = df  þ f r, n
matrix coefficients i = 1 i (@=@x
Pi) has square
for any smooth function f on M. When E is the
given by the Laplace operator  ni= 1 @ 2 =@x2i on
tangent bundle over M, curves  on the manifold
Rn if and only if its coefficients satisfy the the
with covariantly constant velocity r(t)_ = 0 give rise
Clifford relations
to geodesics. Given an initial velocity (0) _ =X 2
Tm M and provided X has small enough norm, X (1) i2 ¼ 1 8 i ¼ 1; . . . ; n
defines a point on the corresponding geodesic and
i j þ j i ¼ 0 8 i 6¼ j
the map exp : X 7! X (1) a diffeomorphism from a
neighborhood of 0 in Tm M to a neighborhood of The resulting Clifford algebra, once complexified, is
m 2 M called the ‘‘exponential map’’ of r. isomorphic in even dimensions n = 2k to the space
The concept of connection extends to principal End(Sn ) (and End(Sn )  End(Sn ) in odd dimensions
k
bundles where it was developed by Ehresmann n = 2k þ 1) of endomorphisms of the space Sn = C2
building on the work of Cartan. A connection on a of complex n-spinors. When instead of the canoni-
principal bundle (P, , B) with structure group G, cal metric on Rn one starts from the the metric on
which is a smooth equivariant (under the action of the tangent bundle TM induced by the Riemannian
38 Introductory Article: Differential Geometry

metric on M and provided the corresponding spinor Riemannian curvature tensor, a 4-tensor which in
spaces patch up to a ‘‘spinor bundle’’ over M, M is local coordinates reads
called a spin manifold. The Dirac operator on a    
spin Riemannian manifold M is a first-order @ @ @ @
Rijkl :¼ g  ; ;
differential
P operator acting on spinors given by @i @j @k @l
Dg = ni= 1 i rei , where r is the connection
on spinors (sections of the spinor bundle S) induced further taking a partial trace leads to the P Ricci
by the Levi-Civita connection and e1 , . . . , en is curvature given by the 2-tensor Ricij = k Rikjk ,
an orthonormal frame of the tangent bundle TM. the trace ofPwhich gives in turn the scalar cur-
This is a particular case of more general twisted vature R = i Ricii . Sectional curvature at a point
Dirac operators DW g on a twisted spinor bundle m in the direction of a two-dimensional plane
S  W equipped with the connection rSW which spanned by two vectors U and V corresponds to
combines the connection r with a connection rW K(U, V) = g((U, V)V, U). A manifold has constant
on an auxilliary vector bundle W. Their square sectional curvature whenever K(U, V)=kU ^ Vk2 is a
2
(DWg ) relates to the Laplacian 
SW
built from this constant K for all linearly independent vectors U,V.
twisted connection via the Lichnerowicz formula A Riemannian manifold with constant sectional
which is useful for estimates on the spectrum of the curvature is said to be spherical, flat, or hyperbolic
Dirac operator in terms of the underling geometric type depending on whether K > 0, K = 0, or K < 0,
data. respectively. One owes to Cartan the discovery of an
When there is no spin structure on M, one can still important class of Riemannian manifolds, symmetric
hope for a Spinc structure and a Dirac Dc operator spaces, which contains the spheres, the Euclidean
associated with a connection compatible with that spaces, the hyperbolic spaces, and compact Lie
structure. In particular, every compact orientable groups. A connected Riemannian manifold M
4-manifold can be equipped with a Spinc structure equipped at every point m with an isometry m
and one can build invariants of the differentiable such that m (m) = m and the tangent map Tm m
manifold called Seiberg–Witten invariants from equals Id on the tangent space (it therefore reverses
solutions of a system of two partial differential the geodesics through m) is called symmetric. CPn
equations, one of which is the Dirac equation equipped with the Fubini–Study metric is a symmetric
Dc  = 0 associated with a connection compatible space with the isometry given by the reflection with
with the Spinc structure and the other a nonlinear respect to a line in Cnþ1 . A compact symmetric space
equation involving the curvature. has non-negative sectional curvature K.
Constraints on the curvature can have topological
consequences. Spheres are the only simply connected
manifolds with constant positive sectional curvature;
Curvature
if a simply connected complete Riemannian mani-
The concept of ‘‘curvature,’’ which is now under- fold of dimension >1 has non-positive sectional
stood in terms of connections (the curvature of a curvature along every plane, then it is homeo-
connection r is defined by  = r2 ), historically morphic to the Euclidean space.
arose prior to that of connection. In its modern A manifold with Ricci curvature tensor propor-
form, the concept of curvature dates back to Gauss. tional to the metric tensor is called an Einstein
Using a spherical representation of surfaces – the manifold. Since Einstein, curvature is a cornerstone
Gauss map , which sends a point m of an oriented of general relativity with gravitational force being
surface   R3 to the outward pointing unit normal interpreted in terms of curvature. For example, the
vector m – Gauss defined what is since then called vacuum Einstein equation reads Ricg = (1=2)Rg g with
the Gaussian curvature Km at point m 2 U   as Ricg the Ricci curvature of a metric g and Rg its scalar
the limit when the area of U tends to zero of the curvature. In addition, Kaluza–Klein supergravity is a
ratio area( (U))=area(U). It measures the obstruc- unified theory modeled on a direct product of the
tion to finding a distance-preserving map from a Mikowski four-dimensional space and an Einstein
piece of the surface around m to a region in the manifold with positive scalar curvature.
standard plane. Gauss’ Teorema Egregium says that The Ricci flow dg(t)=dt = 2Ricg(t) , which is
the Gaussian curvature of a smooth surface in R3 is related with the Einstein equation in general
defined in terms of the metric on the surface so that relativity, was only fairly recently introduced in the
it agrees for two isometric surfaces. mathematical literature. Hopes are strong to get a
From the curvature  of a connection on a classification of closed 3-manifolds using the Ricci
Riemannian manifold (M, g), one builds the flow as an essential ingredient.
Introductory Article: Differential Geometry 39

Cohomology isomorphic to the space of harmonic (i.e., annihi-


lated by the Laplace–Beltrami operator) differential
Differentiation of functions f 7! df on a differenti-
forms. Thus, the dimension of the set of harmonic
able manifold M generalizes to exterior differentia-
k-forms equals the kth Betti numbers from which
tion
7! d
of differential forms. A form
is closed
one can define the Euler characteristic (M) of the
whenever it is in the kernel of d and it is exact
manifold M taking their alternate sum. Hodge
whenever it lies in the range of d. Since d2 = 0, exact
theory plays an important role in mirror symmetry
forms are closed.
which posits a duality between different manifolds
Cartan’s structure equations d! = (1=2)[!, !] þ 
on the geometric side and between different field
relate the exterior differential of the connection 1-form
theories via their correlation functions on the
! on a principal bundle to its curvature  given by
physics side. Calabi–Yau manifolds, which are
the exterior covariant derivative D! := d!  h, where
Ricci-flat Kähler manifolds, are studied extensively
h : Tp P ! Hp P is the projection onto the horizontal
in the context of duality.
space.
On a complex manifold, forms split into sums
of (p, q)-forms, those with p-holomorphic and
Index Theory
q-antiholomorphic components, and exterior differ-
entiation splits as d = @ þ @ into holomorphic and While the Gaussian curvature is the solution to a
antiholomorphic derivatives, with @ 2 = @ 2 = 0. local problem, it has strong influence on the global
Geometric data are often expressed in terms of topology of a surface. The Gauss–Bonnet formula
closedness conditions on certain differential forms. (1850) relates the Euler characteristic on a closed
For example, a ‘‘symplectic manifold’’ is a manifold surface to the Gaussian curvature by
M equipped with a closed nondegenerate differential Z
1
2-form called the ‘‘symplectic form.’’ The theory of ðMÞ ¼ Km dAm
2 M
J-holomorphic curves on a manifold equipped with
an almost-complex structure J has proved fruitful in where dAm is the volume element on M. This is the
building invariants on symplectic manifolds. A first result relating curvature to global properties
Kähler manifold is a complex manifold equipped and can be seen as one of the starting points for
with a Hermitian metric h whose imaginary part index theory. It generalizes to the Chern–Gauss–
Im h yields a closed (1, 1)-form. The complex Bonnet theorem (1944) on an even-dimensional
projective space CPn is Kähler. closed manifold and can be interpreted as an
The exterior differentation d gives rise to de Rham example of the Atiyah–Singer index theorem (1963)
cohomology as Ker d=Im d, and de Rham’s theorem Z
establishes an isomorphism between de Rham coho- indðDW Þ ¼ ^ g Þ etrðW Þ
Að
g
mology and the real singular cohomology of a M

manifold. Chern (or characteristic) classes are topo- where g denotes a Riemannian metric on a spin
logical invariants associated to fiber bundles and play manifold M, DW g a Dirac operator acting on sections
a crucial role in index theory. Chern–Weil theory of some twisted bundle S  W with S the spinor
builds representatives of these de Rham cohomology bundle on M and W an auxiliary vector bundle over
classes from a connection r of the form tr(f (r2 )), M, ind(DW g ) the ‘‘index’’ of the Dirac operator, and
where f is some analytic function. g , W respectively the curvatures of the Levi-Civita
When the manifold is Riemannian, the Laplace– connection and a connection on W, and A( ^ g) a
Beltrami operator on functions generalizes to differ- ^
particular Chern form called the A-genus. Index
ential forms in two different ways, namely to the theorems are useful to compute anomalies in gauge

Bochner Laplacian T M on forms (i.e., sections of theories arising from functional quantisation of
T  M), where the contangent bundle T  M is classical actions.
equipped with a connection induced by the Levi-Civita Given an even-dimensional closed spin manifold
connection and to the Laplace–Beltrami operator on (M, g) and a Hermitian vector bundle W over M, the
forms (d þ d )2 = d d þ d d , where d is the (formal) index of the associated Dirac operator DW g yields the
adjoint of the exterior differential d. These are related so-called Atiyah map K0 (M) 7! Z defined by
via Weitzenböck’s formula which in the particular case W 7! ind(DW 0
g ), where K (M) is the group of formal
of 1-forms states that the difference of those two differences of stable homotopy classes of smooth
operators is measured by the Ricci curvature. vector bundles over M. This is the starting point for
When the manifold is compact, Hodge’s theorem the noncommutative geometry approach to index
asserts that the de Rham cohomology groups are theory, in which the space of smooth functions on a
40 Introductory Article: Electromagnetism

manifold which arises here in a disguised from since Husemoller D (1994) Fibre Bundles, 3rd edn. Graduate Texts in
K0 (M) ’ K0 (C1 (M)) (which consists of formal Mathematics 20. New York: Springer Verlag.
Jost J (1998) Riemannian Geometry and Geometric Analysis,
differences of smooth homotopy classes of idempo- Universitext. Berlin: Springer.
tents in the inductive limit of spaces of matrices Klingenberg W (1995) Riemannian Geometry, 2nd edn. Berlin: de
gln (C1 (M))) is generalized to any noncommutative Gruyter.
smooth algebra. Kobayashi S and Nomizou K (1996) Foundations of Differential
Geometry I, II. Wiley Classics Library, a Wiley-Interscience
Publication. New York: Wiley.
Further Reading Lang S (1995) Differential and Riemannian Manifolds, 3rd edn.
Graduate Texts in Mathematics, 160. New York: Springer
Bishop R and Crittenden R (2001) Geometry of Manifolds. Verlag.
Providence, RI: AMS Chelsea Publishing. Milnor J (1997) Topology from the Differentiate Viewpoint.
Chern SS, Chen WH, and Lam KS (2000) Lectures on Differential Princeton Landmarks in Mathematics. Princeton, NJ: Princeton
Geometry, Series on University Mathematics. Singapore: World University Press.
Scientific. Nakahara M (2003) Geometry, Topology and Physics, 2nd edn.
Choquet-Bruhat Y, de Witt-Morette C, and Dillard-Bleick M Bristol: Institute of Physics.
(1982) Analysis, Manifolds and Physics, 2nd edn. Amsterdam– Spivak M (1979) A Comprehensive Introduction to Differential
New York: North Holland. Geometry, vols. 1, 2 and 3. Publish or Perish Inc., Wilmington,
Gallot S, Hulin D, and Lafontaine J (1993) Riemannian Geometry, Delaware.
Universitext. Berlin: Springer. Sternberg S (1983) Lectures on Differential Geometry, 2nd edn.
Helgason S (2001) Differential, Lie Groups and Symmetric Spaces. New York: Chelsea Publishing Co.
Graduate Studies in Mathematics 36. AMS, Providence, RI.

Introductory Article: Electromagnetism


N M J Woodhouse, University of Oxford, Oxford, UK that they generate. From these equations, one can
ª 2006 Springer-Verlag. Published by Elsevier Ltd.
derive the familiar predictions of electrostatics and
All rights reserved. magnetostatics, as well as the dynamical behavior
of fields and charges, in particular, the generation
This article is adapted from Chapters 2 and 3 of Special
and propagation of electromagnetic waves – light
Relativity, N M J Woodhouse, Springer-Verlag, 2002, by kind
permission of the publisher. waves.
Maxwell would not have recognized the equations
in this compact vector notation – still less in the
tensorial form that they take in special relativity. It
is notable that although his contribution is univer-
Introduction sally acknowledged in the naming of the equations,
The modern theory of electromagnetism is built on it is rare to see references to ‘‘Maxwell’s theory.’’
the foundations of Maxwell’s equations: This is for a good reason. In his early studies of
electromagnetism, Maxwell worked with elaborate
mechanical models, which he saw as analogies
div E ¼ ½1
0 rather than as literal descriptions of the underlying
div B ¼ 0 ½2 physical reality. In his later work, the mechanical
models, in particular the mechanical properties of
1 @E the ‘‘lumiferous ether’’ through which light waves
curl B  ¼ 0 J ½3
c2 @t propagate, were put forward more literally as
@B the foundations of his electromagnetic theory. The
curl E þ ¼0 ½4 equations survive in the modern theory, but the
@t
mechanical models with which Maxwell, Faraday,
On the left-hand side are the electric and magnetic and others wrestled live on only in the survival of
fields, E and B, which are vector-valued functions archaic terminology, such as ‘‘lines of force’’ and
of position and time. On the right are the sources, ‘‘magnetic flux.’’ The luminiferous ether evaporated
the charge density , which is a scalar function of with the advent of special relativity.
position and time, and the current density J. The Maxwell’s legacy is not his ‘‘theory,’’ but his
source terms encode the distribution and velocities equations: a consistent system of partial differential
of charges, and the equations, together with equations that describe the whole range of known
boundary conditions at infinity, determine the fields interactions of electric and magnetic fields with
Introductory Article: Electromagnetism 41

moving charges. They unify the treatment of rest or in uniform motion. In the world of classical
electricity and magnetism by revealing for the first mechanics, therefore:
time the full duality between the electric and
Principle of Relativity There is no absolute stan-
magnetic fields. They have been verified over an
dard of rest; only relative motion is observable.
almost unimaginable variety of physical processes,
from the propagation of light over cosmological In his ‘‘Dialogue concerning the two chief world
distances, through the behavior of the magnetic systems,’’ Galileo illustrated the principle by arguing
fields of stars and the everyday applications in that the uniform motion of a ship on a calm sea does
electrical engineering and laboratory experiments, not affect the behavior of fish, butterflies, and other
down – in their quantum version – to the exchange moving objects, as observed in a cabin below deck.
of photons between individual electrons. Relativity theory takes the principle as funda-
The history of Maxwell’s equations is convoluted, mental, as a statement about the nature of space and
with many false turns. Maxwell himself wrote down time as much as about the properties of the
an inconsistent form of the equations, with a Newtonian equations of motion. But if it is to be
different sign for  in the first equation, in his given such universal significance, then it must apply
1865 work ‘‘A dynamical theory of the electromag- to all of physics, and not just to Newtonian
netic field.’’ The consistent form appeared later in dynamics. At first this seems unproblematic – it is
his Treatise on Electricity and Magnetism (1873); hard to imagine that it holds at such a basic level,
see Chalmers (1975). but not for more complex physical interactions.
In this article, we shall not follow the historical Nonetheless, deep problems emerge when we try to
route to the equations. Some of the complex story of extend it to electromagnetism since Galilean invari-
the development hinted at in the remarks above can ance conflicts with Maxwell’s equations.
be found in the articles by Chalmers (1975), Siegel All appears straightforward for systems involving
(1985), and Roche (1998). Neither shall we follow slow-moving charges and slowly varying electric and
the traditional pedagogic route of many textbooks in magnetic fields. These are governed by laws that
building up to the full dynamical equations through appear to be invariant under transformations
the study of basic electrical and magnetic phenom- between uniformly moving frames of reference.
ena. Instead, we shall follow a path to Maxwell’s One can imagine a modern version of Galileo’s
equations that is informed by knowledge of their ship also carrying some magnets, batteries, semi-
most critical feature, invariance under Lorentz conductors, and other electrical components. Salvia-
transformations. Maxwell, of course, knew nothing ti’s argument for relativity would seem just as
of this. compelling.
We shall start with a summary of basic facts The problem arises when we include rapidly
about the behavior of charges in electric and varying fields – in particular, when we consider the
magnetic fields, and then establish the full dynami- propagation of light. As Einstein (1905) put it,
cal framework by considering this behavior as seen ‘‘Maxwell’s electrodynamics . . . , when applied to
from moving frames of reference. It is impossible, of moving bodies, leads to asymmetries which do not
course, to do this consistently within the framework appear to be inherent in the phenomena.’’ The
of classical ideas of space and time since Maxwell’s central difficulty is that Maxwell’s equations give
equations are inconsistent with Galilean relativity. light, along with other electromagnetic waves, a
But it is at least possible to understand some of the definite velocity: in empty space, it travels with the
key features of the equations, in particular the need same speed in every direction, independently of the
for the term involving the time derivative of E, the motion of the source – a fact that is incompatible
so-called ‘‘displacement current,’’ in the third of with Galilean invariance. Light traveling with speed
Maxwell’s equations. c in one frame should have speed c þ u in a frame
We shall begin with some remarks concerning the moving towards the source of the light with speed u.
role of relativity in classical dynamics. Thus, it should be possible for light to travel with
any speed. Light that travels with speed c in a frame
in which its source is at rest should have some other
Relativity in Newtonian Dynamics
speed in a moving frame; so Galilean invariance
Newton’s laws hold in all inertial frames. The would imply dependence of the velocity of light on
formalism of classical mechanics is invariant under the motion of the source.
Galilean transformations and it is impossible to tell A full resolution of the conflict can only be
by observing the dynamical behavior of particles achieved within the special theory of relativity: here,
and other bodies whether a frame of reference is at remarkably, Maxwell’s equations retain exactly
42 Introductory Article: Electromagnetism

their classical form, but the transformations between EM2. A stationary point charge e generates an electric
the space and time coordinates of frames of field, but no magnetic field. The electric field is
reference in relative motion do not. The difference given by
appears when the velocities involved are not insig- ker
nificant when compared with the velocity of light. E¼ ½7
r3
So long as one can ignore terms of order u2 =c2 ,
Maxwell’s equations are compatible with the Gali- where r is the position vector from the charge,
lean principle of relativity. r = jrj, and k is a positive constant, analogous
to the gravitational constant.

Charges, Fields, and the By combining [7] and [5], we obtain an inverse-
Lorentz-Force Law square law electrostatic force
The basic objects in the modern form of electro- kee0
magnetic theory are ½8
r2
 charged particles; and between two stationary charges; unlike gravity, it is
 the electric and magnetic fields E and B, which repulsive when the charges have the same sign.
are vector quantities that depend on position and
time. EM3. A point charge moving with velocity v gen-
erates a magnetic field
The charge e of a particle, which can be positive
or negative, is an intrinsic quantity analogous k0 ev ^ r
B¼ ½9
to gravitational mass. It determines the strength r3
of the particle’s interaction with the electric where k0 is a second positive constant.
and magnetic fields – as its mass determines
the strength of its interaction with gravitational This is extrapolated from measurements of the
fields. magnetic field generated by currents flowing in
The interaction is in two directions. First, electric electrical circuits.
and magnetic fields exert a force on a charged The constants k and k0 in EM2 and EM3
particle which depends on the value of the charge, determine the strengths of electric and magnetic
the particle’s velocity, and the values of E and B at interactions. They are usually denoted by
the location of the particle. The force is given by the 1 0
Lorentz-force law k¼ ; k0 ¼ ½10
40 4
f ¼ eðE þ u ^ BÞ ½5 Charge e is measured in coulombs, jBj in teslas, and
jEj in volts per meter. With other quantities in SI units,
in which e is the charge and u is the velocity. It is
analogous to the gravitational force 0 ¼ 8:9  1012 ; 0 ¼ 1:3  106 ½11

f ¼ mg ½6 The charge of an electron is 1.6  1019 C; the


current through an electric fire is a flow
on a particle of mass m in a gravitational field g. It is of 5–10 C s1 . The earth’s magnetic field is about
through the force law that an observer can, in 4  105 T; a bar magnet’s is about 1 T; there is a
principle, measure the electric and magnetic fields at field of about 50 T on the second floor of the
a point, by measuring the force on a standard charge Clarendon Laboratory in Oxford; and the magnetic
moving with known velocity. field on the surface of a neutron star is about 108 T.
Second, moving charges generate electric and Although we are more aware of gravity in every-
magnetic fields. We shall not yet consider in detail day life, it is very much weaker than the electrostatic
the way in which they do this, beyond stating the force – the electrostatic repulsion between two
following basic principles. protons is a factor of 1.2  1036 greater than their
gravitational attraction (at any separation, both
EM1. The fields depend linearly on the charges.
forces obey the inverse-square law).
This means that if we superimpose two distributions Our aim is to pass from EM1–EM3 to Maxwell’s
of charge, then the resultant E and B fields are the equations, by replacing [7] and [9] by partial
sums of the respective fields that the two distribu- differential equations that relate the field strengths
tions generate separately. to the charge and current densities  and J of a
Introductory Article: Electromagnetism 43

continuous distribution of charge. The densities are volume V between S and a small sphere SR to
defined as the limits deduce that
P  P  Z Z Z
e ev E  dS  E  dS ¼ E  dS ¼ 0
 ¼ lim ; J ¼ lim ½12
V!0 V V!0 V S SR @V

where V is a small volume containing the point, e is and that the integrals of E over S and SR are the
a charge within the volume, and v is its velocity; the same. Therefore,
sums are over the charges in V and the limits are Z (
e=0 if the charge is in
taken as the volume is shrunk (although we shall not E  dS ¼ the volume bounded by S
worry too much about the precise details of the S
0 otherwise
limiting process).
When we sum over a distribution of charges,
the integral on the left picks out the total charge
within S. Therefore, we have the Gauss theorem.
Stationary Distributions of Charge
The Gauss theorem. For any closed surface @V
We begin the task of converting the basic principles bounding a volume V,
into partial differential equations by looking at the Z
electric field of a stationary distribution of charge, E  dS ¼ Q=0
where the passage to the continuous limit is made by @V
using the Gauss theorem to restate the inverse- where E is the total electric field and Q is the total
square law. charge within V.
The Gauss theorem relates the integral of the
electric field over a closed surface to the total charge Now we can pass to the continuous limit. Suppose
contained within it. For a point charge, the electric that E is generated by a distribution of charges with
field is given by EM2: density  (charge per unit volume). Then by the
Gauss theorem,
er Z Z
E¼ 1
40 r3 E  dS ¼  dV
@V 0 V
Since div r = 3 and grad r = r=r, we have
    for any volume V. But then, by the divergence
er e 3 3r  r theorem,
divðEÞ ¼ div ¼  ¼0 Z
0 r3 40 r3 r5
ðdiv E  =0 Þ dV ¼ 0
everywhere except at r = 0. Therefore, by the V
divergence theorem, Since this holds for any volume V, it follows that
Z
div E ¼ =0 ½14
E  dS ¼ 0 ½13
@V By an argument in a similar spirit, we can also
for any closed surface @V bounding a volume V that show that the electric field of a stationary distribu-
does not contain the charge. tion of charge is conservative in the sense that the
What if the volume does contain the charge? total work done by the field when a charge is moved
Consider the region bounded by the sphere SR of around a closed loop vanishes; that is,
radius R centered on the charge; SR has outward I
unit normal r=r. Therefore, E  ds ¼ 0
Z Z
e e for any closed path. This is equivalent to
E  dS ¼ 2
dS ¼
SR 4R 0 SR 0 curl E ¼ 0 ½15
In particular, the value of the surface integral on the since, by Stokes’ theorem,
left-hand side does not depend on R. I Z
Now consider arbitrary finite volume bounded by E  ds ¼ curl E  dS
S
a closed surface S. If the charge is not inside
the volume, then the integral of E over S vanishes where S is any surface spanning the path. This vanishes
by [13]. If it is, then we can apply [13] to the for every path and for every S if and only if [15] holds.
44 Introductory Article: Electromagnetism

The field of a single stationary charge is con- the right-hand side, by analogy with the charge
servative since density in [14].
e
E ¼ grad ;  ¼
40 r Inconsistency with Galilean Relativity
and therefore curl E = 0 since the curl of a gradient Our central concern is the compatibility of the laws
vanishes identically. For a continuous distribution, of electromagnetism with the principle of relativity.
E = grad , where As Einstein observed, simple electromagnetic inter-
Z actions do indeed depend only on relative motion;
1 ðr 0 Þ
ðrÞ ¼ dV 0 ½16 the current induced in a conductor moving through
40 r 0 2V jr  r 0 j
the field of a magnet is the same as that generated in
In the integral, r (the position of the point at which a stationary conductor when a magnet is moved past
 is evaluated) is fixed, and the integration is over it with the same relative velocity (Einstein 1905).
the positions r 0 of the individual charges. In spite of Unfortunately, this symmetry is not reflected in our
the singularity at r = r 0 , the integral is well defined. basic principles. We very quickly come up against
So, [15] also holds for a continuous distribution of contradictions if we assume that they hold in every
stationary charge. inertial frame of reference.
One emerges as follows. An observer O can measure
the values of B and E at a point by measuring the force
The Divergence of the Magnetic Field on a particle of standard charge, which is related to the
velocity v of the charge by the Lorentz-force law,
We can apply the same argument that established
the Gauss theorem to the magnetic field of a slow- f ¼ eðE þ v ^ BÞ
moving charge. Here, A second observer O0 moving relative to the first with
0 ev ^ r velocity v will see the same force, but now acting on a
B¼ particle at rest. He will therefore measure the electric
4r3
field to be E0 = f =e. We conclude that an observer
where r is the vector from the charge to the point at
moving with velocity v through a magnetic field B and
which the field is measured. Since r=r3 = grad(1=r),
an electric field E should see an electric field
we have
   E0 ¼ E þ v ^ B ½18
r 1
div v ^ 3 ¼ v ^ curl grad ¼0
r r By interchanging the roles of the two observers, we
should also have
Therefore, div B = 0 except at r = 0, as in the case of
the electric field. However, in the magnetic case, the E ¼ E 0  v ^ B0 ½19
integral of the field over a surface surrounding the where B0 is the magnetic field measured by the
charge also vanishes, since if SR is a sphere of radius second observer. If both are to hold, then B  B0
R centered on the charge, then must be a scalar multiple of v.
Z Z
0 e v^r r But this is incompatible with EM3; if the fields are
B  dS ¼ 3
 dS ¼ 0 those of a point charge at rest relative to the first
SR 4 SR r r
observer, then E is given by [7], and
By the divergence theorem, the same is true for any
surface surrounding the charge. We deduce that if B¼0
magnetic fields are generated only by moving On the other hand, the second observer sees the field
charges, then of a point charge moving with velocity v. Therefore,
Z
B  dS ¼ 0 0 ev ^ r
B0 ¼ 
@V 4r3
for any volume V, and hence that So B  B0 is orthogonal to v, not parallel to it.
This conspicuous paradox is resolved, in part, by
div B ¼ 0 ½17
the realization that EM3 is not exact; it holds only
Of course, if there were free ‘‘magnetic poles’’ when the velocities are small enough for the
generating magnetic fields in the same way that magnetic force between two particles to be negli-
charges generate electric fields, then this would not gible in comparison with the electrostatic force. If v
hold; there would be a ‘‘magnetic pole density’’ on is a typical velocity, then the condition is that v2 0
Introductory Article: Electromagnetism 45

should be much less than 1=0 . That is, the velocities when we replace B by cB to put it into the same
involved should be much less than units as E). The magnetic fields generated by
currents in electrical circuits are not, however,
1
c ¼ pffiffiffiffiffiffiffiffiffi ¼ 3  108 m s1 dominated by large electric fields. This is because
0  0 the currents are created by the flow, at slow
This, of course, is the velocity of light. velocity, of electrons, while overall the matter in
the wire is roughly electrically neutral, with the
electric fields of the positively charged nuclei and
The Limits of Galilean Invariance negatively charged electrons canceling.
Our basic principles EM1–EM3 must now be seen to This is the physical context to keep in mind in
be approximations – they describe the interactions of the following deduction of Faraday’s law of
particles and fields when the particles are moving induction from Galilean invariance for velocities
relative to each other at speeds much less than that of much less than c. The law relates the electromotive
light. To emphasize that we cannot expect, in force or ‘‘voltage’’ around an electrical circuit
particular, EM3 to hold for particles moving at to the rate of change of the magnetic field B over
speeds comparable with c, we must replace it by a surface spanning the circuit. In its differential
form, the law becomes one of Maxwell’s
EM30 . A charge moving with velocity v, where v  c, equations.
generates a magnetic field Suppose first that the fields are generated by
0 ev ^ r charges all moving relative to a given inertial
B¼ þ Oðv2 =c2 Þ ½20 frame of reference R with the same velocity v.
4r3
Then in a second frame R0 moving relative to R
The magnetic field of a system of charges in with velocity v, there is a stationary distribution of
general motion satisfies charge. If the velocity is much less than that of
div B ¼ 0 ½21 light, then the electric field E0 measured in R0 is
related to the electric and magnetic E and B
In the second part, we have retained [21] as a measured in R by
differential form of the statement that there are no
free magnetic poles; the magnetic field is generated E0 ¼ E þ v ^ B
only by the motion of the charges. With this change, Since the field measured in R0 is that of a stationary
the theory is consistent with the principle of distribution of charge, we have
relativity, provided that we ignore terms of order
v2 =c2 . The substitution of EM30 for EM3 resolves the curl E0 ¼ 0
conspicuous paradox; the symmetry noted by Ein- In R, the charges are all moving with velocity v, so
stein between the current generated by the motion of their configuration looks exactly the same from the
the conductor in a magnetic field and by the motion point r at time t as it does from the point r þ v at
of a magnet past a conductor is explained, provided time t þ . Therefore,
that the velocities are much less than that of light.
The central problem remains however; the equa- Bðr þ v; t þ Þ ¼ Bðr; tÞ
tions of electromagnetism are not invariant under Eðr þ v; t þ Þ ¼ Eðr; tÞ
a Galilean transformation with velocity comparable
to c. The paradox is still there, but it is more subtle and hence by taking derivatives with respect to 
than it appeared to be at first. There are three at  = 0,
possible ways out: (1) the noninvariance is real and @B
has observable effects (necessarily of order v2 =c2 or v  grad B þ ¼0
@t ½22
smaller); (2) Maxwell’s theory is wrong; or (3) the @E
Galilean transformation is wrong. Disconcertingly, v  grad E þ ¼0
@t
it is the last path that physics has taken. But that is
to jump ahead in the story. Our task is to complete So we must have
the derivation of Maxwell’s equations. 0 ¼ curl E0
¼ curl E þ curlðv ^ BÞ
Faraday’s Law of Induction ¼ curl E þ v div B  v  grad B
The magnetic field of a slow-moving charge will @B
¼ curl E þ ½23
always be small in relation to its electric field (even @t
46 Introductory Article: Electromagnetism

since div B = 0. It follows that 0 ev ^ r


B¼ þ Oðv2 =c2 Þ
@B 4r3
curl E þ ¼0 ½24 where r is the vector from the charge to the point at
@t
which the field is measured. In the frame of reference
Equation [24] is linear in B and E; so by adding R0 in which the charge is at rest, its electric field is
the magnetic and electric fields of different streams
er
of charges moving relative to R with different E0 ¼
40 r3
velocities, we deduce that it holds generally for the
electric and magnetic fields generated by moving In the frame in which it is moving with velocity
charges. v, E = E0 þ O(v=c). Therefore,
Equation [24] encodes Faraday’s law of electro-  2
magnetic induction, which describes how changing v ^ E0 v ^ E v
cB ¼ ¼ þO 2
magnetic fields can generate currents. In the static case c c c

@B By taking the curl of both sides, and dropping terms


¼0 of order v2 =c2 ,
@t
 
and the equation reduces to curl E = 0 – the v^E
curlðcBÞ ¼ curl
condition that the electrostatic field should be c
conservative; that is, it should do no net work 1
when a charge is moved around a closed loop. ¼ ðv div E  v  grad EÞ
c
More generally, consider a wire loop in the shape of
But
a closed curve . Let S be a fixed surface spanning .
Then we can deduce from eqn [24] that @E
I Z div E ¼ =0 ; v  grad E ¼ 
@t
E  ds ¼ curl E  dS
 S
by [22]. Therefore,
Z
@B 1 @E 1
¼  dS curlðcBÞ  ¼ J ¼ c0 J
S @t c @t c0
Z
d where J = v. By summing over the separate particle
¼ ðB  dSÞ ½25
dt S velocities, we conclude that
If the magnetic field is varying, so that the integral of B 1 @E
over S is not constant, then the integral of E around the curl B  ¼ 0 J
c2 @t
loop will not be zero. There will be a nonzero electric
field along the wire, which will exert a force on the holds for an arbitrary distribution of charges, provided
electrons in the wire and cause a current to flow. that their velocities are much less than that of light.
The quantity
I
E  ds Maxwell’s Equations
The basic principles, together with the assumption of
which is measured in volts, is the work done by the
Galilean invariance for velocities much less than that
electric field when a unit charge makes one circuit
of light, have allowed us to deduce that the electric and
of the wire. It is called the electromotive force
magnetic fields generated by a continuous distribution
around the circuit. The integral is the magnetic flux
of moving charges in otherwise empty space satisfy
linking the circuit. The relationship [25] between
electromotive force and rate of change of magnetic 
div E ¼ ½26
flux is Faraday’s law. 0
div B ¼ 0 ½27

The Field of Charges in Uniform Motion 1 @E


curl B  ¼ 0 J ½28
We can extract another of Maxwell’s equations c2 @t
from this argument. By EM30 , a single charge e with
velocity v generates an electric field E and a @B
curl E þ ¼0 ½29
magnetic field @t
Introductory Article: Electromagnetism 47

where  is the charge density, J is the current charge; it is a differential form of the statement
density, and c2 = 1=0 0 . These are Maxwell’s that charges are neither created nor destroyed.
equations, the basis of modern electrodynamics.
Together with the Lorentz-force law, they describe
the dynamics of charges and electromagnetic fields. Conservation of Charge
We have arrived at them by considering how basic
electromagnetic processes appear in moving frames To see the connection between the continuity
of reference – an unsatisfactory route because we equation and charge conservation, let us look at
have seen on the way that the principles on which the total charge within a fixed V bounded by a
we based the derivation are incompatible with surface S. If charge is conserved, then any increase
Galilean invariance for velocities comparable with or decrease in a short period of time must be
that of light. Maxwell derived them by analyzing an exactly balanced by an inflow or outflow of charge
elaborate mechanical model of electric and magnetic across S.
fields – as displacements in the luminiferous ether. Consider a small element dS of S with outward
That is also unsatisfactory because the model has unit normal and consider all the particles that have a
long been abandoned. The reason that they are particular charge e and a particular velocity v at
accepted today as the basis of theoretical and time t. Suppose that there are of these per unit
practical applications of electromagnetism has little volume ( is a function of position). Those that cross
to do with either argument. It is first that they are the surface element between t and t þ
t are those
self-consistent, and second that they describe the that at time t lie in the region of volume
behavior of real fields with unreasonable accuracy. jv  n dS
tj
shown in Figure 1. They contribute e v  dS
t to the
outflow of charge through the surface element. But
The Continuity Equation the value of J at the surface element is the sum of
It is not immediately obvious that the equations are e v over all possible values of v and e. By summing
self-consistent. Given  and J as functions of the over v, e, and the elements of the surface, therefore,
coordinates and time, Maxwell’s equations are two and by passing to the limit of a continuous
scalar and two vector equations in the unknown distribution, the total rate of outflow is
Z
components of E and B. That is, a total of eight
equations for six unknowns – more equations than J  dS
S
unknowns. Therefore, it is possible that they are in
fact inconsistent. Charge conservation implies that the rate of
If we take the divergence of eqn [29], then we outflow should be equal to the rate of decrease in
obtain the total charge within V. That is,
Z Z
@ d
ðdiv BÞ ¼ 0  dV þ J  dS ¼ 0 ½31
@t dt V S

which is consistent with eqn [27]; so no problem By differentiating the first term under the integral
arises here. However, by taking the divergence of sign and by applying the divergence theorem to the
eqn [28] and substituting from eqn [26], we get second integral,
Z  
@
0 ¼ div curl B þ div J dV ¼ 0 ½32
V @t
1 @
¼ 2 ðdiv EÞ þ 0 div J If this is to hold for any choice of V, then  and J
c @t 
@ must satisfy the continuity equation. Conversely, the
¼ 0 þ div J continuity equation implies charge conservation.
@t
This gives a contradiction unless
n
@
þ div J ¼ 0 ½30 dS
@t
νdt νdt
So the choice of  and J is not unconstrained; they
must be related by the continuity equation [30]. This
holds for physically reasonable distributions of Figure 1 The outflow through a surface element.
48 Introductory Article: Electromagnetism

pffiffiffiffiffiffiffiffiffi
The Displacement Current where c = 1= 0 0 . By taking the curl of eqn [36]
and by substituting from eqns [35] and [37], we
The third of Maxwell’s equations can be written as
obtain
   
@E 1 @E
curl B ¼ 0 J þ 0 ½33 2
0 ¼ grad ðdiv BÞ  r B  2 curl
@t c @t
1 @
in which form it can be read as an equation ¼ r2 B  2 ðcurl EÞ
for an unknown magnetic field B in terms of c @t
a known current distribution J and electric 1 @2B
¼ r2 B þ 2 2 ½38
field E. When E and J are independent of t, it c @t
reduces to Therefore, the three components of B in empty space
satisfy the (scalar) wave equation
curl B ¼ 0 J
&u ¼ 0
which determines the magnetic field of a steady
current, in a way that was already familiar Here & is the d’Alembertian operator, defined by
to Maxwell’s contemporaries. But his second 1 @2 1 @2 @2 @2 @2
2
term on the right-hand side of [33] was new; it &¼  r ¼   
c2 @t2 c2 @t2 @x2 @y2 @z2
adds to J the so-called vacuum displacement
current By taking the curl of eqn [37], we also obtain
& E = 0.
@E
0
@t
Monochromatic Plane Waves
The name comes from an analogy with the
behavior of charges in an insulating material. The fact that E and B are vector-valued solutions of
Here no steady current can flow, but the distribu- the wave equation in empty space suggests that we
tion of charges within the material is distorted look for ‘‘plane wave’’ solutions of Maxwell’s
by an external electric field. When the field equations in which
changes, the distortion also changes, and the result E ¼ a cos  þ b sin  ½39
appears as a current – the displacement current –
which flows during the period of change. Max- where a, b are constant vectors and
well’s central insight was that the same term !
should be present even in empty space. The  ¼ ðct  r  eÞ; e  e ¼ 1 ½40
c
consequence was profound; it allowed him to
with ! > 0, , , and e constant; ! is the frequency
explain the propagation of light as an electromag-
and e is a unit vector that gives the direction of
netic phenomenon.
propagation (adding  to t and ce to r leaves u
unchanged). This satisfies the wave equation, but for
a general choice of the constants, it will not be
The Source-Free Equations possible to find B such that eqns [34]–[37] also hold.
By taking the divergence of eqn [39], we obtain
In a region of empty space, away from the
charges generating the electric and magnetic fields, !
div E ¼ ðe  a sin   e  b cos Þ ½41
we have  = 0 = J, and Maxwell’s equations c
reduce to For eqn [34] to hold, therefore, we must choose a
and b orthogonal to e. For eqn [37] to hold, we
div E ¼ 0 ½34
must find B such that
! @B
div B ¼ 0 ½35 curl E ¼ ðe ^ a sin   e ^ b cos Þ ¼  ½42
c @t
1 @E A possible choice is
curl B  ¼0 ½36
c2 @t e^E 1
B¼ ¼ ðe ^ a cos  þ e ^ b sin Þ ½43
c c
@B and it is not hard to see that E and B then satisfy
curl E þ ¼0 ½37
@t [35] and [36] as well.
Introductory Article: Electromagnetism 49

The solutions obtained in this way are called nontrivial topology, then it may not be possible to
‘‘monochromatic electromagnetic plane waves.’’ find a suitable  or a throughout the whole of U.
Note that such waves are transverse in the sense Suppose now that we are given fields E and B
that E and B are orthogonal to the direction of satisfying Maxwell’s equations [26]–[29] with
propagation. The definition E can be written more sources represented by the charge density  and the
concisely in the form current density J. Since div B = 0, there exists a time-
  dependent vector field A (t, x, y, z) such that
E ¼ Re ða þ ibÞei ½44
B ¼ curl A
It is an exercise in Fourier analysis to show every
solution in empty space is a combination of If we substitute B = curl A into [29] and interchange
monochromatic plane waves. A plane wave has curl with the time derivative, then we obtain
‘‘plane’’ or ‘‘linear’’ polarization if a and b are  
@A
proportional. It has ‘‘circular’’ polarization if curl E þ ¼0
a  a = b  b, a  b = 0. @t
At the heart of Maxwell’s theory was the idea that It follows that there exists a scalar (t, x, y, z) such
a light wave with definite frequency or color is that
represented by a monochromatic plane solution of
his equations. @A
E ¼ grad   ½47
@t
Such a vector field A is called a ‘‘magnetic vector
Potentials potential’’; a function  such that eqn [47] holds is
For every solution of Maxwell’s equations in vacuo, called an ‘‘electric scalar potential.’’
the components of E and B satisfy the three- Conversely, given scalar and vector functions 
dimensional wave equation; but the converse is not and A of t, x, y, z, we can define B and E by
true. That is, it is not true in general that if @A
B ¼ curl A; E ¼ grad   ½48
&B ¼ 0; &E ¼ 0 @t
Then two of Maxwell’s equations hold automati-
then E and B satisfy Maxwell’s equations. For this
cally, since
to happen, the divergence of both fields must vanish,
and they must be related by [36] and [37]. These @B
additional constraints are somewhat simpler to div B ¼ 0; curl E þ ¼0
@t
handle if we work not with the fields themselves,
The remaining pair translate into conditions on A
but with auxiliary quantities called ‘‘potentials.’’
and . Equation [26] becomes
The definition of the potentials depends on
standard integrability conditions from vector calcu- @ 
div E ¼ r2   ðdiv AÞ ¼
lus. Suppose that v is a vector field, which may @t 0
depend on time. If curl v = 0, then there exists a
and eqn [28] becomes
function  such that
1 @E
v ¼ grad  ½45 curl B  ¼ r2 A þ grad div A
c2 @t  
If div v = 0, then there exists a second vector field a 1 @ @A
such that þ 2 grad  þ
c @t @t
v ¼ curl a ½46 ¼ 0 J
Neither  nor a is uniquely determined by v. In the If we put
first case, if [45] holds, then it also holds when  is
1 @
replaced by 0 =  þ f , where f is a function of time ¼ þ div ðAÞ
alone; in the second, if [46] holds, then it also holds c2 @t
when a is replaced by then we can rewrite the equations for A and  more
0 simply as
a ¼ a þ grad u
@ 
for any scalar function u of position and time. It &  ¼
@t 0
should be kept in mind that the existence statements
are local. If v is defined on a region U with &A þ grad ¼ 0 J
50 Introductory Article: Electromagnetism

Here we have four equations (one scalar, one vector) If we impose the Lorenz condition, then the only
in four unknowns ( and the components of A). Any remaining freedom in the choice of A and  is to
set of solutions , A determines a solution of make gauge transformations [49] in which u is a
Maxwell’s equations via [48]. solution of the wave equation &u = 0. Under the
Lorenz condition, Maxwell’s equations take the
form
Gauge Transformations & ¼ =0 ; &A ¼ 0 J ½51
Given solutions E and B of Maxwell’s equations, Consistency with the Lorenz condition follows from
what freedom is there in the choice of A and ? the continuity equation on  and J.
First, A is determined by curl A = B up to the In the absence of sources, therefore, Maxwell’s
replacement of A by equations for the potential in the Lorenz gauge
reduce to
A0 ¼ A þ grad u
& ¼ 0; &A ¼ 0 ½52
for some function u of position and time. The scalar
potential 0 corresponding to A0 must be chosen so together with the constraint
that
1 @
div A þ ¼0
@A0 c2 @t
grad 0 ¼ E þ
@t   We can, for example, choose three arbitrary solu-
@A @u tions of the scalar wave equation for the compo-
¼Eþ þ grad
@t @t nents of the vector potential, and then define  by
 
@u Z
¼ grad    ¼ c2 div Adt
@t
That is, 0 =   @u=@t þ f (t), where f is a function Whatever choice we make, we shall get a solution of
of t alone. We can absorb f into u by subtracting Maxwell’s equations, and every solution of Max-
Z
well’s equations (without sources) will arise from
f dt some such choice.

(this does not alter A0 ). So the freedom in the choice


of A and  is to make the transformation Historical Note
@u At the end of the eighteenth century, four types of
A 7! A0 ¼ A þ grad u;  7! 0 ¼   ½49 electromagnetic phenomena were known, but not
@t
the connections between them.
for any u = u(t, x, y, z). The transformation [49] is
called a ‘‘gauge transformation.’’  Magnetism, the word derives from the Greek for
Under [49], ‘‘stone from Magnesia.’’
0
 Static electricity, produced by rubbing amber with
1 @
7! 0 ¼ þ divðA0 Þ ¼  &u fur; the word ‘‘electricity’’ derives from the Greek
c2 @t for ‘‘amber.’’
It is possible to show, under certain very mild  Light.
conditions on , that the inhomogeneous wave  Galvanism or ‘‘animal electricity’’ – the electricity
equation produced by batteries, discovered by Luigi
Galvani.
&u ¼ ½50
The construction of a unified theory was a slow
has a solution u = u(t, x, y, z). If we choose u so that and painful business. It was hindered by attempts,
[50] holds, then the transformed potentials A0 and 0 which seem bizarre in retrospect, to understand
satisfy electromagnetism in terms of underlying mechanical
models involving such inventions as ‘‘electric fluids’’
1 @0
divðA0 Þ þ ¼0 and ‘‘magnetic vortices.’’ We can see the legacy of
c2 @t this period, which ended with Einstein’s work in
This is the ‘‘Lorenz gauge condition,’’ named after 1905, in the misleading and archaic terms that still
L Lorenz (not the H A Lorentz of the ‘‘Lorentz survive in modern terminology: ‘‘magnetic flux,’’
contraction’’). ‘‘lines of force,’’ ‘‘electric displacement,’’ and so on.
Introductory Article: Equilibrium Statistical Mechanics 51

Maxwell’s contribution was decisive, although  1846 Faraday suggested that light is a vibration
much of what we now call ‘‘Maxwell’s theory’’ is in magnetic lines of force.
due to his successors (Lorentz, Hertz, Einstein, and  1863 Maxwell published the equations that
so on); and, as we shall see, a key element in describe the dynamics of electric and magnetic
Maxwell’s own description of electromagnetism – fields.
the ‘‘electromagnetic ether,’’ an all-pervasive  1905 Einstein’s paper ‘‘On the electrodynamics
medium which was supposed to transmit electro- of moving bodies.’’
magnetic waves – was thrown out by Einstein.
A rough chronology is as follows.
 1800 Volta demonstrated the connection between Further Reading
galvanism and static electricity.
Chalmers AF (1975) Maxwell and the displacement current.
 1820 Oersted showed that the current from a Physics Education January 1975: 45–49.
battery generates a force on a magnet. Einstein A (1905) On the Electrodynamics of Moving Bodies. A
 1822 Ampère suggested that light was a wave translation of the paper can be found in The Principle of
motion in a ‘‘luminiferous ether’’ made up of two Relativity by Lorentz HA, Einstein A, Minkowski H, and
types of electric fluid. In the same year, Galileo’s Weyl H, with notes by Sommerfeld A. New York: Dover,
1952.
‘‘Dialogue concerning the two chief world sys- Roche J (1998) The present status of Maxwell’s displacement
tems’’ was removed from the index of prohibited current. European Journal of Physics 19: 155–166.
books. Siegel DM (1985) Mechanical image and reality in Maxwell’s
 1831 Faraday showed that moving magnets can electromagnetic theory. In: Harman PM (ed.) Wranglers and
Physicists. Manchester: Manchester University Press.
induce currents.

Introductory Article: Equilibrium Statistical Mechanics


G Gallavotti, Università di Roma ‘‘La Sapienza,’’ interactions and, possibly, to external conservative
Rome, Italy forces: a typical example is a gas in a container
ª 2006 G Gallavotti. Published by Elsevier Ltd. subject to forces due to the walls of  and gravity,
All rights reserved. besides the internal interactions. This is a very
restricted class of systems and states.
A more general case is when the system is in a
stationary state but it is also subject to nonconservative
Foundations: Atoms and Molecules
forces: a typical example is a gas or fluid in which a
Classical statistical mechanics studies properties of wheel rotates, as in the Joule experiment, with some
macroscopic aggregates of particles, atoms, and device acting to keep the temperature constant. The
molecules, based on the assumption that they are device is called a thermostat and in statistical
point masses subject to the laws of classical mechanics it has to be modeled by forces, including
mechanics. Distinction between macroscopic and nonconservative ones, which prevent an indefinite
microscopic systems is evanescent and in fact the energy transfer from the external forcing to the system:
foundations of statistical mechanics have been laid such a transfer would impede the occurrence of
on properties, proved or assumed, of few-particle stationary states. For instance, the thermostat could
systems. simply be a constant friction force (as in stirred
Macroscopic systems are often considered in incompressible liquids or as in electric wires in which
stationary states, which means that their micro- current circulates because of an electromotive force).
scopic configurations follow each other as time A more fundamental approach would be to
evolves while looking the same macroscopically. imagine that the thermostat device is not a phenom-
Observing time evolution is the same as sampling enologically introduced nonconservative force (e.g.,
(‘‘not too closely’’ time-wise) independent copies of a friction force) but is due to the interaction with an
the system prepared in the same way. external infinite system which is in ‘‘equilibrium at
A basic distinction is necessary: a stationary state infinity.’’
may or may not be in equilibrium. The first case In any event nonequilibrium stationary states are
arises when the particles are enclosed in a container intrinsically more complex than equilibrium states.
 and are subject only to their mutual conservative Here attention will be confined to equilibrium
52 Introductory Article: Equilibrium Statistical Mechanics

statistical mechanics of systems of N identical point theories), it will be useful to consider also systems of
particles Q = (q1 , . . . , qN ) enclosed in a cubic box , particles in dimension d 6¼ 3: in this case the above
with volume V and side L, normally assumed to 6N and 3N become, respectively, 2dN and dN.
have perfectly reflecting walls. Systems with dimension d = 1, 2 are in fact some-
Particles of mass m located at q, q0 will be times very good models for thin filaments or thin
supposed to interact via a pair potential ’(q  q0 ). films. For the same reason, it is often useful to
The microscopic motion follows the equations imagine that space is discrete and particles can only
be located on a lattice, for example, on Zd (see the
X
N X
mq
€i ¼  @qi ’ðqi  qj Þ þ Wwall ðqi Þ section ‘‘Lattice models’’).
j¼1 i The reader is referred to Gallavotti (1999) for
def
more details.
¼ @qi ðQÞ ½1
where the potential ’ is assumed to be smooth
except, possibly, for jq  q0 j  r0 where it could be Pressure, Temperature, and Kinetic
þ1, that is, the particles cannot come closer than Energy
r0 , and at r0 [1] is interpreted by imagining that they
undergo elastic collisions; the potential Wwall models The beginning was BERNOULLI’s derivation of
the container and it will be replaced, unless the perfect gas law via the identification of
explicitly stated, by an elastic collision rule. the pressure at numerical density  with the
The time evolution (Q, Q) _ ! St (Q, Q)_ will, there- average momentum transferred per unit time to
fore, be described on the position – velocity space, a surface element of area dS on the walls: that is,
Fb(N), of the N particles or, more conveniently, on the average of the observable 2mvv dS, with v
the phase space, i.e., by a time evolution St on the the normal component of the velocity of
momentum – position (P, Q, with P = mQ) _ space, the particles that undergo collisions with dS.
F (N). The motion being conservative, the energy If f (v)dv is the distribution of the Q normal compo-
nent of velocity and f (v)d3 v  i f (vi )d3 v, v =
X 1 X X

def
p2i þ ’ðqi  qj Þ þ Wwall ðqi Þ (v1 , v2 , v3 ), is the total velocity distribution,
i
2m i<j i the average of the momentum transferred is pdS
def
given by
¼ KðPÞ þ ðQÞ Z Z
will be a constant of motion; the last term in  is dS 2mv2 f ðvÞdv ¼ dS mv2 f ðvÞdv
v>0
missing if walls are perfect. This makes it convenient to Z  
regard the dynamics as associated with two dynamical 2 m 2 3 2 K
¼  dS v f ðvÞd v ¼  dS ½2
systems (F (N), St ) on the 6N-dimensional phase 3 2 3 N
space, and (F U (N), St ) on the (6N  1)-dimensional
Furthermore (2=3)hK=Ni was identified as pro-
surface of energy U. Since the dynamics [1] is def
portional to the absolute temperature hK=Ni =
Hamiltonian on phase space, with Hamiltonian
const (3=2)T which, with present-day notations, is
def
X 1 def written as (2=3)hK=Ni = kB T. The constant kB was
HðP; QÞ ¼ p2i þ ðQÞ ¼ K þ 
2m (later) called Boltzmann’s constant and it is the
i
same for at least all perfect gases. Its independence
it follows that the volume d3N Pd3N Q is conserved on the particular nature of the gas is a conse-
(i.e., a region E has the same volume as St E) and quence of Avogadro’s law stating that equal
also the area (H(P, Q)  U)d3N Pd3N Q is conserved. volumes of gases at the same conditions of
The above dynamical systems are well defined, temperature and pressure contain equal number
i.e., St is a map on phase space globally defined for of molecules.
all t 2 (1, 1), when the interaction potential is Proportionality between average kinetic energy
bounded below: this is implied by the a priori and temperature via the universal constant kB
bounds due to energy conservation. For gravita- became in fact a fundamental assumption extending
tional or Coulomb interactions, much more has to to all aggregates of particles gaseous or not, never
be said, assumed, and done in order to even define challenged in all later works (until quantum
the key quantities needed for a statistical theory of mechanics, where this is no longer true, see the
motion. section ‘‘Quantum statistics’’.
Although our world is three dimensional (or at For more details, we refer the reader to Gallavotti
least was so believed to be until recent revolutionary (1999).
Introductory Article: Equilibrium Statistical Mechanics 53

Heat and Entropy U ¼ total energy of the system  K þ 


After Clausius’ discovery of entropy, BOLTZMANN, in T ¼ time average of the kinetic energy K ¼ hKi
order to explain it mechanically, introduced the heat
V ¼ the parameter on which ’ ½4
theorem, which he developed to full generality
between 1866 and 1884. Together with the men- is supposed to depend
tioned identification of absolute temperature with p ¼ time average of @V’ ; h@V’ i
average kinetic energy, the heat theorem can also be
considered a founding element of statistical A state is thus parametrized by U, V. If such
mechanics. parameters change by dU, dV, respectively, and
def def
The theorem makes precise the notion of time if dL =  pdV, dQ = dU þ pdV, then [3] holds. In
average and then states in great generality that fact, let x (U, V) be the extremes of the oscillations of
given any mechanical system one can associate with the motion with given U, V and define S as
its dynamics four quantities U, V, p, T, defined as
time averages of suitable mechanical observables Z xþ ðU;VÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
(i.e., functions on phase space), so that when the S ¼ 2 log ðU  ’ðxÞÞdx
x ðU;VÞ
external conditions are infinitesimally varied and R pffiffiffiffi
the quantities U, V change by dU, dV, respectively, ðdU  @V’ ðxÞdVÞðdx= KÞ
) dS ¼ R p ffiffiffi
ffi ½5
the ratio (dU þ pdV)=T is exact, i.e., there is a ðdx= KÞK
function S(U, V) whose corresponding variation
pffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
equals the ratio. It will be better, for the purpose of Noting that dx= K = 2=m dt, [3] follows because
considering very large boxes (V ! 1) to write this time averages are given by integrating with p respect
def pffiffiffiffi ffiffiffiffi
relation in terms of intensive quantities u = U=N and to dx= K and dividing by the integral of 1= K.
v = V=N as For more details, the reader is referred to Boltzmann
(1968b) and Gallavotti (1999).
du þ pdv
is exact ½3
T
Heat Theorem and Ergodic Hypothesis
i.e., the ratio equals the variation ds of
s(U=N, V=N)  (1=N)S(U, V). Boltzmann tried to extend the result beyond the one-
The proof originally dealt with monocyclic dimensional systems (e.g., to Keplerian motions,
systems, i.e., systems in which all motions are which are not monocyclic unless only motions with
periodic. The assumption is clearly much too a fixed eccentricity are considered). However, the
restrictive and justification for it developed from early statement that ‘‘aperiodic motions can be
the early ‘‘nonperiodic motions can be regarded regarded as periodic with infinite period’’ is really
as periodic with infinite period’’ (1866), to the the heart of the application of the heat theorem
later ergodic hypothesis and finally to the for monocyclic systems to the far more complex gas
realization that, after all, the heat theorem in a box.
does not really depend on the ergodic hypothesis Imagine that the gas container  is closed by a
(1884). piston of section A located to the right of the
Although for a one-dimensional system the proof origin at distance L and acting as a lid, so that the
of the heat theorem is a simple check, it was a real volume is V = AL. The microscopic model for the
breakthrough because it led to an answer to the piston will be a potential ’(L  ) if x = (, , ) are
general question as to under which conditions one the coordinates of a particle. The function ’(r)
could define mechanical quantities whose variations will vanish for r > r0 , for some r0  L, and
were constrained to satisfy [3] and therefore could diverge to þ1 at r = 0. Thus, r0 is the width of
be interpreted as a mechanical model of Clausius’ the layer near the piston where the force of the
macroscopic thermodynamics. It is reproduced in wall is felt by the particles that happen to be
the following. roaming there.
Consider a one-dimensional system subject to The contribution to the total P potential energy
forces with a confining potential ’(x) such that  due to the walls is Wwall = j ’(L  j ) and
j’0 (x)j > 0 for jxj > 0, ’00 (0) > 0 and ’(x) x!
!1 þ 1. @V ’ = A1 @L ’; assuming monocyclicity, it is neces-
All motions are periodic, so that the system is sary to evaluate
P the time average of @L (x) =
monocyclic. Suppose that the potential ’(x) depends @L Wwall   j ’0 (L  j ). As time evolves, the
on a parameter V and define a state to be a motion with particles xj with j in the layer within r0 of the
given energy U and given V; let wall will feel the force exercised by the wall and
54 Introductory Article: Equilibrium Statistical Mechanics

bounce back. One particle in the layer will con- and (up to a proportionality factor) absolute
tribute to the average of @L (x) the amount temperature, respectively.
Z t1 Boltzmann’s conception of space (and time) as
1
2 ’0 ðL  j Þdt ½6 discrete allowed him to conceive the property that
total time t0 the energy surface is constituted by ‘‘points’’ all of
if t0 is the first instant when the point j enters the which belong to a single trajectory: a property that
layer and t1 is the instant when the -component of would be impossible if the phase space was really a
the velocity vanishes ‘‘against the wall.’’ Since continuum. Regarding phase space as consisting of a
’0 (L  j ) is the -component of the force, the finite number of ‘‘cells’’ of finite volume hdN , for
integral is 2mj˙j j (by Newton’s law), provided, of some h > 0 (rather than of a continuum of points),
course, ˙j > 0: allowed him to think, without logical contradiction,
Suppose that no collisions between particles occur that the energy surface consisted of a single
while the particles travel within the range of the trajectory and, hence, that motion was a cyclic
potential of the wall, i.e., the mean free path is much permutation of its points (actually cells).
greater than the range of the potential ’ defining the Furthermore, it implied that the time average of
wall. The contribution of collisions to the average an observable F(P, Q) had to be identified with its
momentum transfer to the wall per unit time is average on the energy surface computed via the
therefore given by, see [2], Liouville distribution
Z Z
2mv f ðvÞwall Av dv C1 FðP; QÞðHðP; QÞ UÞdP dQ
v>0

if wall , f (v) are the average density near the wall with
and, respectively, the average fraction of particles Z
with a velocity component normal to the wall C¼ ðHðP; QÞ UÞdP dQ
between v and v þ dv. Here p, f are supposed to be
independent of the point on the wall: this should be (the appropriate normalization factor): a property
true up to corrections of size o(A). that was written symbolically
Thus, writing the average kinetic
R energy per particle dt dP dQ
and per velocity component, (m=2)v2 f (v)dv, as ¼R
T dP dQ
(1=2)1 (cf. [2]) it follows that
or
def
p ¼  h@V i ¼ wall 1 ½7 Z
1 T
lim FðSt ðP; QÞÞdt
has the physical interpretation of pressure. (1=2) 1 T!1 T 0
is the average kinetic energy per degree of freedom: R
hence, it is proportional to the absolute temperature FðP 0 ; Q0 ÞðHðP0 ; Q0 Þ  UÞ dP 0 dQ0
¼ R ½8
T (cf. see the section ‘‘Pressure, temperature, and ðHðP0 ; Q0 Þ  UÞ dP 0 dQ0
kinetic energy’’).
On the other hand, if motion on the energy The validity of [8] for all (piecewise smooth)
surface takes place on a single periodic orbit, the observables F and for all points of the energy
quantity p in [7] is the right quantity that would surface, with the exception of a set of zero area, is
make the heat theorem work; see [4]. Hence, called the ergodic hypothesis.
regarding the trajectory on each energy surface as For more details, the reader is referred to
periodic (i.e., the system as monocyclic) leads to the Boltzmann (1968) and Gallavotti (1999).
heat theorem with p, U, V, T having the right
physical interpretation corresponding to their appel-
lations. This shows that monocyclic systems provide
Ensembles
natural models of thermodynamic behavior.
Assuming that a chaotic system like a gas in a Eventually Boltzmann in 1884 realized that the
container of volume V will satisfy, for practical validity of the heat theorem for averages computed
purposes, the above property, a quantity p can be via the right-hand side (rhs) of [8] held indepen-
defined such that dU þ pdV admits the inverse of dently of the ergodic hypothesis, that is, [8] was not
the average kinetic energy hKi as an integrating necessary because the heat theorem (i.e., [3]) could
factor and, furthermore, p, U, V, hKi have the also be derived under the only assumption that the
physical interpretations of pressure, energy, volume, averages involved in its formulation were computed
Introductory Article: Equilibrium Statistical Mechanics 55

as averages over phase space with respect to the probability distributions attributing the same
probability distribution on the rhs of [8]. average values to the corresponding microscopic
Furthermore, if T was identified with the average observables (i.e., whose averages have the inter-
kinetic energy, U with the average energy, and p pretation of thermodynamic functions).
with the average force per unit surface on the walls 2. Once the correct correspondence between the
of the container  with volume V, the relation [3] elements of the different ensembles is established,
held for a variety of families of probability distribu- that is, once the pairs (u, v), (, v), (, ) are so
tions on phase space, besides [8]. Among these are: related to produce the same values for the
def
averages U, V, kB T =  1 , pj@j of
1. The ‘‘microcanonical ensemble,’’ which is the
collection of probability distributions on the rhs Z
2KðPÞ
of [8] parametrized by u = U=N, v = V=N (energy HðP; QÞ; V; ; @ ðq1 Þ2mðv1 nÞ2 dq1 ½12
and volume per particle), 3N

mc
u;v ðdP dQÞ
where (@ (q1 ) is a delta-function pinning q1 to
1 dP dQ the surface @), then the averages of all physi-
¼ ðHðP; QÞ UÞ ½9 cally interesting observables should coincide at
Zmc ðU; N; VÞ N!hdN
least in the thermodynamic limit,  ! 1. In this
where h is a constant with the dimensions of an way, the elements  of the considered collection
action which, in the discrete representation of of probability distributions can be identified with
phase space mentioned in the previous section, can the states of macroscopic equilibrium of the
be taken such that hdN equals the volume of the system. The ’s depend on parameters and there-
cells and, therefore, the integrals with respect to [9] fore they form an ensemble: each of them
can be interpreted as an (approximate) sum over corresponds to a macroscopic equilibrium state
the cells conceived as microscopic configurations whose thermodynamic functions are appropriate
of N indistinguishable particles (whence the N!). averages of microscopic observables and therefore
2. The ‘‘canonical ensemble,’’ which is the collec- are functions of the parameters identifying .
tion of probability distributions parametrized by
Remark The word ‘‘ensemble’’ is often used to
, v = V=N,
indicate the individual probability distributions of
1 dPdQ what has been called here an ensemble. The meaning
c;v ðdPdQÞ ¼ eHðP;QÞ ½10 used here seems closer to the original sense in the
Zc ð; N; VÞ N!hdN
1884 paper of Boltzmann (in other words, often by
to which more ensembles can be added, such as ‘‘ensemble’’ one means that collection of the phase
the grand canonical ensemble (Gibbs). space points on which a given probability distribu-
3. The ‘‘grand canonical ensemble’’ which is the tion is considered, and this does not seem to be the
collection of probability distributions parameter- original sense).
ized by , and defined over the space For instance, in the case of the microcanonical
F gc = [1N = 0 F (N), distributions this means interpreting energy, volume,
temperature, and pressure of the equilibrium state
gc
; ðdPdQÞ with specific energy u and specific volume v as
1 dPdQ proportional, through appropriate universal propor-
¼ e NHðP;QÞ ½11 tionality constants, to the integrals with respect to
Zgc ð; ; VÞ N!hdN
mc
u, v (dP dQ) of the mechanical quantities in [12].
The averages of other thermodynamic observables in
Hence, there are several different models of thermo-
the state with specific energy u and specific volume
dynamics. The key tests for accepting them as real
v should be given by their integrals with respect
microscopic descriptions of macroscopic thermo-
to mc u, v .
dynamics are as follows.
Likewise, one can interpret energy, volume,
1. A correspondence between the macroscopic temperature, and pressure of the equilibrium state
states of thermodynamic equilibrium and the with specific energy u and specific volume v as the
elements of a collection of probability distribu- averages of the mechanical quantities [12] with
tions on phase space can be established by respect to the canonical distribution c, v (dP dQ)
identifying, on the one hand, macroscopic which has average specific energy precisely u. The
thermodynamic states with given values of the averages of other thermodynamic observables in the
thermodynamic functions and, on the other, state with specific energy and volume u and v are
56 Introductory Article: Equilibrium Statistical Mechanics

given by their integrals with respect to c, v . A ensembles with the orthodicity property, hence
similar definition can be given for the description of leading to equivalent mechanical models of thermo-
thermodynamic equilibria via the grand canonical dynamics, can be naturally interpreted in connection
distributions. with the phenomenon of phase transition (see the
For more details, see Gibbs (1981) and Gallavotti section ‘‘Phase transitions and boundary conditions’’).
(1999). Clearly, the quoted results do not ‘‘prove’’
that thermodynamic equilibria ‘‘are’’ described by
the microcanonical, canonical, or grand canonical
Equivalence of Ensembles
ensembles. However, they certainly show that,
BOLTZMANN proved that, computing averages via the for most systems, independently of the number of
microcanonical or canonical distributions, the essen- degrees of freedom, one can define quite unambigu-
tial property [3] was satisfied when changes in their ously a mechanical model of thermodynamics estab-
parameters (i.e., u, v or , v, respectively) induced lishing parameter-free, system-independent, physically
changes du and dv on energy and volume, respec- important relations between thermodynamic quanti-
tively. He also proved that the function s, whose ties (e.g., @u (p(u, v)=T(u, v))  @v (1=T(u, v)), from [3]).
existence is implied by [3], was the same function The ergodic hypothesis which was at the root
once expressed as a function of u, v (or of any pair of the mechanical theorems on heat and entropy
of thermodynamic parameters, e.g., of T, v or p, u). cannot be taken as a justification of their validity.
A close examination of Boltzmann’s proof shows Naively one would expect that the time scale
that the [3] holds exactly in the canonical ensemble necessary to see an equilibrium attained, called
and up to corrections tending to 0 as  ! 1 in the recurrence time scale, would have to be at least the
microcanonical ensemble. Identity of thermo- time that a phase space point takes to visit all
dynamic functions evaluated in the two ensembles possible microscopic states of given energy: hence,
holds, as a consequence, up to corrections of this an explanation of why the necessarily enormous size
order. In addition, Gibbs added that the same held of the recurrence time is not a problem becomes
for the grand canonical ensemble. necessary.
Of course, not every collection of stationary In fact, the recurrence time can be estimated once
probability distributions on phase space would the phase space is regarded as discrete: for the
provide a model for thermodynamics: Boltzmann purpose of countering mounting criticism, Boltz-
called ‘‘orthodic’’ the collections of stationary mann assumed that momentum was discretized in
distributions which generated models of thermo- units of (2mkB T)1=2 (i.e., the average momentum
dynamics through the above-mentioned identifica- size) and space was discretized in units of 1=3
tion of its elements with macroscopic equilibrium (i.e., the average spacing), implying a volume of
def
states. The microcanonical, canonical, and the later cells h3N with h = 1=3 (2mkB T)1=2 ; then he calcu-
grand canonical ensembles are the chief examples lated that, even with such a gross discretization, a
of orthodic ensembles. Boltzmann and Gibbs cell representing a microscopic state of 1 cm3 of
proved these ensembles to be not only orthodic hydrogen at normal condition would require a time
19
but to generate the same thermodynamic functions, (called ‘‘recurrence time’’) of the order of
1010
that is to generate the same thermodynamics. times the age of the Universe (!) to visit the entire
This meant freedom from the analysis of the truth energy surface. In fact, the phase space volume is
of the doubtful ergodic hypothesis (still unproved in  = (3 N(2mkB T)3=2 )N  h3N and the number of
any generality) or of the monocyclicity (manifestly cells of volume h3N is =(N!h3N ) ’ e3N ; and the
false if understood literally rather than regarding the time to visit all will be e3N
0 , with
0 a typical
phase space as consisting of finitely many small, atomic unit, e.g., 1012 s – but N = 1019 . In this
discrete cells), and allowed Gibbs to formulate the sense, the statement boldly made by young Boltz-
problem of statistical mechanics of equilibrium as mann that ‘‘aperiodic motions can be regarded as
follows. periodic with infinite period’’ was even made
quantitative.
Problem Study the properties of the collection of
The recurrence time is clearly so long to be
probability distributions constituting (any) one of
irrelevant for all purposes: nevertheless, the correct-
the above ensembles.
ness of the microscopic theory of thermodynamics
However, by no means the three ensembles just can still rely on the microscopic dynamics once it is
introduced exhaust the class of orthodic ensembles understood (as stressed by Boltzmann) that the
producing the same models of thermodynamics in reason why we observe approach to equilibrium,
the limit of infinitely large systems. The wealth of and equilibrium itself, over ‘‘human’’ timescales
Introductory Article: Equilibrium Statistical Mechanics 57

(which are far shorter than the recurrence times) is Not surprisingly, assumptions on the interparticle
due to the property that on most of the energy surface potential ’(q  q0 ) are necessary to achieve an
the (very few) observables whose averages yield existence proof of the limits in [13]. The assump-
macroscopic thermodynamic functions (namely pres- tions on ’ are not only quite general but also have a
sure, temperature, energy, . . .) assume the same value clear physical meaning. They are
even if N is only very moderately large (of the order of
1. stability: that PN is, existence of a constant B 0
103 rather than 1019 ). This implies that this value
such that i<j ’(qi  qj ) BN for all N 0,
coincides with the average and therefore satisfies the
q1 , . . . , qN 2 Rd , and
heat theorem without any contradiction with the
2. temperedness: that is, existence of constants "0 ,
length of the recurrence time. The latter rather
R > 0 such that j’(q  q0 )j < Bjq  q0 jd"0 for
concerns the time needed to the generic observable to
jq  q0 j > R.
thermalize, that is, to reach its time average: the
generic observable will indeed take a very long time to The assumptions are satisfied by essentially all
‘‘thermalize’’ but no one will ever notice, because the microscopic interactions with the notable exceptions
generic observable (e.g., the position of a pre-identified of the gravitational and Coulombic interactions,
particle) is not relevant for thermodynamics. which require a separate treatment (and lead to
The word ‘‘proof’’ is not used in the mathematical somewhat different results on the thermodynamic
sense so far in this article: the relevance of a behavior).
mathematically rigorous analysis was widely rea- For instance, assumptions (1), (2) are satisfied
lized only around the 1960s at the same time when if ’(q) is þ1 for jqj < r0 and smooth for jqj > r0 ,
the first numerical studies of the thermodynamic for some r0 0, and furthermore ’(q) > B0 jqj(dþ"0 )
functions became possible and rigorous results were if r0 < jqj  R, while for jqj > R it is j’(q)j <
needed to check the correctness of various numerical B1 jqj(dþ"0 ) , for some B0 , B1 , "0 > 0, R > r0 . Briefly,
simulations. ’ is fast diverging at contact and fast approaching 0
For more details, the reader is referred to Boltzmann at large distance. This is called a (generalized)
(1968a, b) and Gallavotti (1999). Lennard–Jones potential. If r0 > 0, ’ is called a
hard-core potential. If B1 = 0, the potential is said
to have finite range. (See Appendix 1 for physical
implications of violations of the above stability and
Thermodynamic Limit
temperedness properties.) However, in the following,
Adopting Gibbs axiomatic point of view, it is it will be necessary, both for simplicity and to contain
interesting to see the path to be followed to achieve the length of the exposition, to restrict consideration
an equivalence proof of three ensembles introduced to the case B1 = 0, i.e., to
in the section ‘‘Heat theorem and ergodic
hypothesis.’’ ’ðqÞ > B0 jqjðdþ"0 Þ ; r0 < jqj  R;
½14
A preliminary step is to consider, given a cubic j’ðqÞj  0; jqj > R
box  of volume V = Ld , the normalization factors
Zgc (, , V), Zc (, N, V), and Zmc (U, N, V) in [9], unless explicitly stated.
[10], and [11], respectively, and to check that the Assuming stability and temperedness, the exis-
following thermodynamic limits exist: tence of the limits in [13] can be mathematically
proved: in Appendix 2, the proof of the first is
def 1
pgc ð; Þ ¼ lim log Zgc ð; ; VÞ analyzed to provide the simplest example of the
V!1 V
technique. A remarkable property of the functions
def 1
 fc ð; Þ ¼ lim log Zc ð; N; VÞ pgc (, ), fc (, ), and smc (u, ) is that they are
V!1;N
V ¼ N ½13 convex functions: hence, they are continuous in the
k1
B smc ðu; Þ
interior of their domains of definition and, at one
1 variable fixed, are differentiable with respect to the
def
¼ lim log Zmc ðU; N; VÞ other with at most countably many exceptions.
V!1;N=V¼; U=N¼u N
In the case of a potential without hard core
def
where the density  = v1  N=V is used, instead of (max = 1), fc (, ) can be checked to tend to 0
v, for later reference. The normalization factors play slower than  as  ! 0, and to 1 faster than  as
an important role because they have simple thermo-  ! 1 (essentially proportionally to  log  in both
dynamic interpretation (see the next section): they cases). Likewise, in the same case, smc (u, ) can be
are called grand canonical, canonical, and micro- shown to tend to 0 slower than u  umin as u ! umin ,
canonical partition functions, respectively. and to 1 faster than u as u ! 1. The latter
58 Introductory Article: Equilibrium Statistical Mechanics

asymptotic properties can be exploited to derive, from with parameters (, ) should correspond with the
the relations between the partition functions in [13], canonical with parameters (, vgc ).
X
1 For more details, the reader is referred to Ruelle
Zgc ð; ; VÞ ¼ e N Zc ð; N; VÞ (1969) and Gallavotti (1999).
N¼0 ½15
Z 1
c
Z ð; N; VÞ ¼ eU Zmc ðU; N; VÞ dU Physical Interpretation of
B
Thermodynamic Functions
and, from the above-mentioned convexity, the
consequences The existence of the limits [13] implies several
properties of interest. The first is the possibility of
pmc ð; Þ ¼ maxð v1  v1 fc ð; v1 ÞÞ finding the physical meaning of the functions
v
½16 pgc , fc , smc and of the parameters , 
fc ð; v1 Þ ¼ maxðu þ k1 1
B smc ðu; v ÞÞ
u Note first that, for all V the grand canonical average
and that the maxima are attained in points, or hKi, is (d=2)1 hNi, so that 1 is proportional to
intervals, internal to the intervals of definition. Let the temperature Tgc = T(, ) in the grand canonical
vgc , uc be points where the maxima are, respectively, distribution: 1 = kB T(, ). Proceeding heuristically,
attained in [16]. the physical meaning of p(, ) and can be found
Note that the quantity e N Zc (, N, V)=Zgc (, , V) through the following remarks.
has the interpretation of probability of a density ConsiderR the microcanonical distribution mc u, v and

v1 = N=V evaluated in the grand canonical distribu- denote by the integral over (P, Q) extended to the
tion. It follows that, if the maximum in the first of domain of the (P, Q) such that H(P, Q) = U and, at
[16] is strict, that is, it is reached at a single point, the the same time, q1 2 dV, where dV is an infinitesimal
values of v1 in closed intervals not containing the volume surrounding the region . Then, by the
maximum point v1 microscopic definition of the pressure p (see the
gc have a probability behaving as
<e cV , c > 0, as V ! 1, compared to the probability introductory section), it is
of v1 ’s in any interval containing v1 Z
gc . Hence, vgc has N 2 p21 dP dQ
the interpretation of average value of v in the grand pdV ¼ 
ZðU; N; VÞ 3 2m N!hdN
canonical distribution, in the limit V ! 1. Z
2 dP dQ
Likewise, the interpretation of  KðPÞ ½18
3ZðU; N; VÞ N!hdN
euN Zmc ðuN; N; VÞ=Zc ð; N; VÞ
where   (H(P, Q)  U). The RHS of [18] can be
as probability in the canonical distribution of an compared with
energy density u shows that, if the maximum in the Z
second of [16] is strict, the values of u in closed @V ZðU; N; VÞdV N dP dQ
¼
intervals not containing the maximum point uc have ZðU; N; VÞ ZðU; N; VÞ N!hdN
a probability behaving as <ecV, c > 0, as V ! 1, to give
compared to the probability of u’s in any interval
containing uc . Hence, in the limit  ! 1, the @V Z dV p dV
¼N ¼ p dV
average value of u in the canonical distribution is uc . Z ð2=3ÞhKi
If the maxima are strict, [16] also establishes a R R
because hKi , which denotes the average K= 1,
relation between the grand canonical density, the should be essentially the same as the microcanonical
canonical free energy and the grand canonical para- average hKimc (i.e., insensitive to the fact that one
meter , or between the canonical energy, the micro- particle is constrained to the volume dV) if N is
canonical entropy, and the canonical parameter : large. In the limit V ! 1, V=N = v, the latter
¼ @v1 ðv1 1
kB  ¼ @u smc ðuc ; v1 Þ remark together with the second of [17] yields
gc fc ð; vgc ÞÞ; ½17
k1 1
B @v smc ðu; v Þ ¼ pðu; vÞ;
where convexity and strictness of the maxima imply
the derivatives existence. k1
B @u smc ðu; vÞ ¼  ½19
Remark Therefore, in the equivalence between respectively. Note that p 0 and it is not increasing
canonical and microcanonical ensembles, the cano- in v because smc () is concave as a function of
nical distribution with parameters (, v) should v = 1 (in fact, by the remark following [14]
correspond with the microcanonical with para- smc (u, ) is convex in  and, in general, if g() is
meters (uc , v). The grand canonical distribution convex in  then g(v1 ) is always concave in v = 1 ).
Introductory Article: Equilibrium Statistical Mechanics 59

Hence, dsmc (u, v) = (du þ pdv)=T, so that taking For more details the reader is referred to Ruelle
into account the physical meaning of p, T (as (1969) and Gallavotti (1999).
pressure and temperature, see the section ‘‘Pressure,
temperature, and kinetic energy’’), smc is, in thermo-
dynamics, the entropy. Therefore (see the second Phase Transitions and Boundary
of [16]), fc (, ) = uc þ k1
B smc (uc , ) becomes Conditions
fc ð; Þ ¼ uc  Tc smc ðuc ; Þ; The analysis in the last two sections of the relations
dfc ¼ p dv  smc dT ½20 between elements of ensembles of distributions
describing macroscopic equilibrium states not only
and since uc has the interpretation (as mentioned in
allows us to obtain mechanical models of thermo-
the last section) of average energy in the canonical
dynamics but also shows that the models, for a given
distribution c, v it follows that fc has the thermo-
system, coincide at least as  ! 1. Furthermore, the
dynamic interpretation of free energy (once com-
equivalence between the thermodynamic functions
pared with the definition of free energy, F = U  TS,
computed via corresponding distributions in differ-
in thermodynamics).
ent ensembles can be extended to a full equivalence
By [17] and [20],
of the distributions.
¼ @v1 ðv1 1
gc fc ð; vgc ÞÞ  uc  Tc smc þ pvgc If the maxima in [16] are attained at single points
vgc or uc the equivalence should take place in the
and vgc has the meaning of specific volume v. Hence, gc
sense that a correspondence between , , c, v , mc
u, v
after comparison with the definition of chemical can be established so that, given any local obser-
potential, V = U  TS þ pV, in thermodynamics, it vable F(P, Q), defined as an observable depending
follows that the thermodynamic interpretation of on (P, Q) only through the pi , qi with qi 2 , where
is the chemical potential and (see [16], [17]), the   is a finite region, has the same average with
grand canonical relation respect to corresponding distributions in the limit
pgc ð; Þ ¼  v1 1 1 1  ! 1.
gc  vgc ðuc þ kB smc ðuc ; v ÞÞ
The correspondence is established by considering
shows that pgc (, )  p, implying that pgc (, ) is ( , ) $ (, vgc ) $ (umc , v), where vgc is where the
the pressure expressed, however, as a function of maximum in [16] is attained, umc  uc is where the
temperature and chemical potential. maximum in [17] is attained and vgc  v, (cf. also
To go beyond the heuristic derivations above, it [19], [20]). This means that the limits
should be remarked that convexity and the property Z
that the maxima in [16], [17] are reached in the def
lim FðP; QÞa ðdP dQÞ ¼ hFia
interior of the intervals of variability of v or u are V!1

sufficient to turn the above arguments into rigorous ða  independentÞ; a ¼ gc; c; mc ½21
mathematical deductions: this means that given [19]
as definitions of p(u, v), (u, v), the second of [20] coincide if the averages are evaluated by the
gc
follows as well as pgc (, )  p(uv , v1gc ). But the distributions , , c, vc , mc
umc , vmc
values vgc and uc in [16] are not necessarily unique: Exceptions to [21] are possible: and are certainly
convex functions can contain horizontal segments likely to occur at values of u, v where the maxima in
and therefore the general conclusion is that the [16] or [17] are attained in intervals rather than in
maxima may possibly be attained in intervals. isolated points; but this does not exhaust, in general,
Hence, instead of a single vgc , there might be a the cases in which [21] may not hold.
whole interval [v , vþ ], where the rhs of [16] reaches However, no case in which [21] fails has to be
the maximum and, instead of a single uc , there regarded as an exception. It rather signals that an
might be a whole interval [u , uþ ] where the rhs of interesting and important phenomenon occurs. To
[17] reaches the maximum. understand it properly, it is necessary to realize that
Convexity implies that the values of or  the grand canonical, canonical, and microcanonical
for which the maxima in [16] or [17] are attained families of probability distributions are by far not
in intervals rather than in single points are rare the only ensembles of probability distributions
(i.e., at most denumerably many): the interpretation whose elements can be considered to generate
is, in such cases, that the thermodynamic functions models of thermodynamics, that is, which are
show discontinuities, and the corresponding orthodic in the sense of the discussion in the section
phenomena are called phase transitions (see the ‘‘Equivalence of ensembles.’’ More general families
next section). of orthodic statistical ensembles of probability
60 Introductory Article: Equilibrium Statistical Mechanics

distributions can be very easily conceived. In canonical, or microcanonical distributions with


particular: different kinds of boundary conditions.
For instance, a boundary condition with high
Definition Consider the grand canonical, canoni-
density may produce an equilibrium state with
cal, and microcanonical distributions associated
parameters , which also has high density, i.e., the
with an energy function in which the potential
density v1þ at the right extreme of the interval in
energy contains, besides the interaction  between
which the maximum in [16] is attained, while using a
particles located inside the container, also the
low-density boundary condition the limit in [21] may
interaction energy in, out between particles inside
describe the averages taken in a state with density v1

the container and external particles, identical to the
at the left extreme of the interval or, perhaps, with a
ones in the container but not allowed to move and
density intermediate between the two extremes.
fixed in positions such that in every unit cube 
Therefore, the following definition emerges.
external to  there is a finite number of them
bounded independently of . Such configurations of Definition If the grand canonical distributions
external particles will be called ‘‘boundary condi- with parameters (, ) and different choices of
tions of fixed external particles.’’ fixed external particles boundary conditions gene-
The thermodynamic limit with such boundary rate for some local observable F average values
conditions is obtained by considering the grand which are different by more than a quantity  > 0
canonical, canonical, and microcanonical distribu- for all large enough volumes  then one says that
tions constructed with potential energy function the system has a phase transition at (, ). This
 þ in, out in containers  of increasing size taking implies that the limits in [21], when existing, will
care that, while the size increases, the fixed particles depend on the boundary condition and their values
that would become internal to  are eliminated. The will represent averages of the observables in
argument used in the section ‘‘Thermodynamic limit’’ ‘‘different phases.’’ A corresponding definition is
to show that the three models of thermodynamics, given in the case of the canonical and microcano-
considered there, did define the same thermodynamic nical distributions when, given (, v) or (u, v), the
functions can be repeated to reach the conclusion that limit in [21] depends on the boundary conditions
also the (infinitely many) ‘‘new’’ models of thermo- for some F.
dynamics in fact give rise to the same thermodynamic
Remarks
functions and averages of local observables. Further-
more, the values of the limits corresponding to [13] 1. The idea is that by fixing one of the thermodynamic
can be computed using the new partition functions ensembles and by varying the boundary conditions
and coincide with the ones in [13] (i.e., they are one can realize all possible states of equilibrium of
independent of the boundary conditions). the system that can exist with the given values of
However, it may happen, and in general it is the parameters determining the state in the chosen
the case, for many models and for particular values ensemble (i.e., (, ), (, v), or (u, v) in the grand
of the state parameters, that the limits in [21] do canonical, canonical, or microcanonical cases,
not coincide with the analogous limits computed respectively).
in the new ensembles, that is, the averages of 2. The impression that in order to define a phase
some local observables are unstable with respect transition the thermodynamic limit is necessary
to changes of boundary conditions with fixed is incorrect: the definition does not require
particles. considering the limit  ! 1. The phenomenon
There is a very natural interpretation of such that occurs is that by changing boundary condi-
apparent ambiguity of the various models of tions the average of a local observable can
thermodynamics: namely, at the values of the change at least by amounts independent of the
parameters that are selected to describe the macro- system size. Hence, occurrence of a phase
scopic states under consideration, there may corre- transition is perfectly observable in finite volume:
spond different equilibrium states with the same it suffices to check that by changing boundary
parameters. When the maximum in [16] is reached conditions the average of some observable
on an interval of densities, one should not think of changes by an amount whose minimal size is
any failure of the microscopic models for thermo- volume independent. It is a manifestation of an
dynamics: rather one has to think that there are instability of the averages with respect to changes
several states possible with the same , and that in boundary conditions: an instability which does
they can be identified with the probability distribu- not fade away when the boundary recedes to
tions obtained by forming the grand canonical, infinity, i.e., boundary perturbations produce
Introductory Article: Equilibrium Statistical Mechanics 61

bulk effects and at a phase transition the averages an idealization void of physical reality, it is never-
of the local observable, if existing at all, will theless useful to define such states because certain
exhibit a nontrivial dependence on the boundary notions (e.g., that of pure state) can be sharply
conditions. This is also called ‘‘long range order.’’ defined, with few words and avoiding wide circum-
3. It is possible to show that when this happens then volutions, in terms of them. Therefore, let:
some thermodynamic function whose value is
Definition An infinite-volume state with parameters
independent of the boundary condition (e.g., the
(, v), (u, v) or (, ) is a collection of average values
free energy in the canonical distributions) has
F ! hFi obtained, respectively, as limits of finite-
discontinuous derivatives in terms of the para-
volume averages hFin defined from canonical, micro-
meters of the ensemble. This is in fact one of the
canonical, or grand canonical distributions in n with
frequently-used alternative definitions of phase
fixed parameters (, v), (u, v) or (, ) and with general
transitions: the latter two natural definitions of
boundary condition of fixed external particles, on
first-order phase transition are equivalent. How-
sequences n ! 1 for which such limits exist simul-
ever, it is very difficult to prove that a given system
taneously for all local observables F.
shows a phase transition. For instance, existence of
a liquid–gas phase transition is still an open Having set the definition of infinite-volume
problem in systems of the type considered until state consider a local observable G(X) and let
the section ‘‘Lattice models’’ below.
 G(X) = G(X þ ),  2 Rd , with X þ  denoting the
4. A remarkable unification of the theory of the configuration X in which all particles are trans-
equilibrium ensembles emerges: all distributions of lated by : then an infinite-volume state is called
any ensemble describe equilibrium states. If a a pure state if for any pair of local observables
boundary condition is fixed once and for all, then F, G it is
some equilibrium states might fail to be described
hF
 Gi  hFih
 Gi! 0 ½22
by an element of an ensemble. However, if all !1
boundary conditions are allowed then all equili-
which is called a cluster property of the pair F, G.
brium states should be realizable in a given
The result alluded to in remark (6) is that at least in
ensemble by varying the boundary conditions.
the case of hard-core systems (or of the simple lattice
5. The analysis leads us to consider as completely
systems discussed in the section ‘‘Lattice models’’) the
equivalent without exceptions grand canonical,
infinite-volume equilibrium states in the above sense
canonical, or microcanonical ensembles enlarged
exhaust at least the totality of the infinite-volume
by adding to them the distributions with poten-
pure states. Furthermore, the other states that can be
tial energy augmented by the interaction with
obtained in the same way are convex combinations of
fixed external particles.
the pure states, i.e., they are ‘‘statistical mixtures’’ of
6. The above picture is really proved only for
pure phases. Note that h
 Gi cannot be replaced, in
special classes of models (typically in models
general, by hGi because not all infinite-volume states
in which particles are constrained to occupy
are necessarily translation invariant and in simple
points of a lattice and in systems with hard core
cases (e.g., crystals) it is even possible that no
interactions, r0 > 0 in [14]) but it is believed to
translation-invariant state is a pure state.
be correct in general. At least it is consistent
with all that is known so far in classical Remarks
statistical mechanics. The difficulty is that,
1. This means that, in the latter models, general-
conceivably, one might even need boundary
izing the boundary conditions, for example
conditions more complicated than the fixed
considering external particles to be not identical
particles boundary conditions (e.g., putting
to the ones inside the system, using periodic or
different particles outside, interacting with
partially periodic boundary conditions, or the
the system with an arbitrary potential, rather
widely used alternative of introducing a small
than via ’).
auxiliary potential and first taking the infinite-
The discussion of the equivalence of the ensembles volume states in presence of it and then letting
and the question of the importance of boundary the potential vanish, does not enlarge further the
conditions has already imposed the consideration set of states (but may sometimes be useful: an
of several limits as  ! 1. Occasionally, it will example of a study of a phase transition by using
again come up. For conciseness, it is useful to set up the latter method of small fields will be given in
a formal definition of equilibrium states of an the section ‘‘Continuous symmetries: ‘no d = 2
infinite-volume system: although infinite volume is crystal’ theorem’’).
62 Introductory Article: Equilibrium Statistical Mechanics

2. If is the indicator function of a local event, it both sides of the equations of motion, mq€i = f i , by
will make sense to consider the probability of (1=2)qi and summing over i, it follows that
occurrence of the event in an infinite-volume state
defining it as h i. In particular, the probability 1X N
1X N
def 1
 mqi q
€i ¼  qi f i ¼ CðqÞ
density for finding p particles at x1 , x2 , . . . , xp , 2 i¼1 2 i¼1 2
called the p-point correlation function, will thus be
defined in an infinite-volume state. For instance, and the quantity C(q) defines the virial of the forces
if the state is obtained as a limit of canonical in the configuration q. Note that C(q) is not
states h in with parameters , ,  = Nn =Vn , in a translation invariant because of the presence of the
sequence of containers n , then forces due to the walls.
* + Writing the force f i as a sum of the internal and
X
Nn
the external forces (due to the walls) the virial C can
ðxÞ ¼ lim ðx  qj Þ
n
j¼1
be expressed naturally as sum of the virial Cint of the
n
* + internal forces (translation invariant) and of the
X p
Nn Y
virial Cext of the external forces.
ðx1 ; x2 ; . . . ; xp Þ ¼ lim ðxj  qij Þ
n
i1 ;...;ip j¼1
By dividing both sides of the definition of the
n
virial by
and integrating over the time interval
where the sum is over the ordered p-ples [0,
], one finds in the limit
! þ1, that is, up to
(j1 , . . . , jp ). Thus, the pair correlation (q, q0 ) quantities relatively infinitesimal as
! 1, that
and its possible cluster property are
hKi ¼ 12hCi and hCext i ¼ 3pV
0
ðq; q Þ
where p is the pressure and V the volume. Hence
R 0
def n expðUðq; q ; q1 ; . . . ; qNn 2 ÞÞ dq1 dqNn 2
¼ lim hKi ¼ 32 pV þ 12hCint i
n ðNn  2Þ!Zc0 ð;; Vn Þ

ðq; ðq0 þ xÞÞ  ðqÞðq0 þ xÞ ! 0 ½23


or
x!1
1 hCint i
¼ pv þ ½24
where  3N
Z
def Equation [24] is Clausius’ virial theorem: in the case
Zc0 ¼ eUðQÞ dQ
of no internal forces, it yields pv = 1, the ideal-gas
equation.
is the ‘‘configurational’’ partition function.
The internal virial Cint can be written, if f j ! i =
The reader is referred to Ruelle (1969), Dobrushin @qi ’(qi  qj ), as
(1968), Lanford and Ruelle (1969), and Gallavotti
(1999). N X
X
Cint ¼  f j!i qi
i¼1 i6¼j
X
Virial Theorem and Atomic Dimensions  @ qi ’ðqi  qj Þ ðqi  qj Þ
i<j
For a long time it has been doubted that ‘‘just
changing boundary conditions’’ could produce such which shows that the contribution to the virial by
dramatic changes as macroscopically different states the internal repulsive forces is negative while that of
(i.e., phase transitions in the sense of the definition in the attractive forces is positive. The average of Cint
the last section). The first evidence that by taking the can be computed by the canonical distribution,
thermodynamic limit very regular analytic functions which is convenient for the purpose. van der Waals
like N1 log Zc (, N, V) (as a function of , v = V=N) first used the virial theorem to perform an actual
could develop, in the limit  ! 1, singularities like computation of the corrections to the perfect-gas
discontinuous derivatives (corresponding to the max- laws. Simply neglect the third-order term in the
imum in [16] being reached on a plateau and to a density and use the approximation (q1 , q2 ) =
consequent existence of several pure phases) arose in 2 e’(q1 q2 ) for the pair correlation function, [23],
the van der Waals’ theory of liquid–gas transition. then
Consider a real gas with N identical particles with
mass m in a container  with volume V. Let the 1 3 2
hCint i ¼ V  IðÞ þ VOð3 Þ ½25
force acting on the ith particle be f i ; multiplying 2 2
Introductory Article: Equilibrium Statistical Mechanics 63

where numerical simulations. In fact, this idea has been


Z exploited in many numerical experiments, in which
1
IðÞ ¼ ðe’ðqÞ  1Þd3 q [24] plays a key role.
2 For more details, the reader is referred to Gallavotti
and the equation of state [24] becomes (1999).
IðÞ
pv þ þ Oðv2 Þ ¼ 1
v van der Waals Theory
For the purpose of illustration, the calculation of I Equation [27] is empirically used beyond its validity
can be performed approximately at ‘‘high tempera- region (small density and small ) by regarding A, B as
ture’’ ( small) in the case phenomenological parameters to be experimentally
   determined by measuring them near generic values of
r0 12 r0 6 p, V, T. The measured values of A, B do not ‘‘usually
’ðrÞ ¼ 4" 
r r vary too much’’ as functions of v, T and, apart from
this small variability, the predictions of [27] have
(the classical Lennard–Jones potential), ", r0 > 0.
reasonably agreed with experience until, as experi-
The result is
mental precision increased over the years, serious
I ffi ðb  aÞ inadequacies eventually emerged.
32 4 r0 3 Certain consequences of [27] are appealing: for
b ¼ 4v0 ; a ¼ "v0 ; v0 ¼ example, Figure 1 shows that it does not give a p
3 3 2
monotonic nonincreasing in v if the temperature is
Hence, small enough. A critical temperature can be defined
  as the largest value, Tc , of the temperature below
a b 1 1
pv þ  ¼ þO which the graph of p as a function of v is not
v v  v2
     monotonic decreasing; the critical volume Vc is the
a b 1 1 1 1 value of v at the horizontal inflection point
pþ 2 v¼ 1þ ¼ þO
v v  1  b=v  v2 occurring for T = Tc .
or For T < Tc the van der Waals interpretation of the
 equation of state is that the function p(v) may
a describe metastable states while the actual equilibrium
p þ 2 ðv  bÞ ¼ 1 þ Oðv2 Þ ½26
v states would follow an equation with a monotonic
which gives the equation of state for "  1. Equation dependence on v and p(v) becoming horizontal in the
[26] can be compared with the well-known empirical coexistence region of specific volumes. The precise
van der Waals equation of state: value of p where to draw the plateau (see Figure 1)
 a would then be fixed by experiment or theoretically
 p þ 2 ðv  bÞ ¼ 1 predicted via the simple rule that the plateau
v
associated with the represented isotherm is drawn at
or a height such that the area of the two cycles in the
ðp þ An2 =V 2 ÞðV  nBÞ ¼ nRT ½27 resulting loop are equal.
This is Maxwell’s rule: obtained by assuming
where, if NA is Avogadro’s number, A = aNA2 , that the isotherm curve joining the extreme points of
B = bNA , R = kB NA , n = N=NA . It shows the possi- the plateau and the plateau itself define a cycle
bility of accessing the microscopic parameters " and
r0 of the potential ’ via measurements detecting
deviations from the Boyle–Mariotte law, pv = 1,
of the rarefied gases: " = 3a=8b = 3A=8BNA
r0 = (3b=2 )1=3 = (3B=2 NA )1=3 .
As a final comment, it is worth stressing that the p
virial theorem gives in principle the exact correc-
tions to the equation of state, in a rather direct and
simple form, as time averages of the virial of the
internal forces. Since the virial of the internal forces vi vg v
is easy to calculate from the positions of the Figure 1 The van der Waals equation of state at a temperature
particles as a function of time, the theorem provides T < Tc where the pressure is not monotonic. The horizontal line
a method for computing the equation of state in illustrates the ‘‘Maxwell rule.’’
64 Introductory Article: Equilibrium Statistical Mechanics

(see Figure 1) representing a sequence of possible and call P0 (v) the (-independent) product of  times
macroscopic equilibrium states (the ones correspond- the pressure of the hard-core system without any
ing to the plateau) or states with extremely long time attractive tail (P0 (v) is not explicitly known except
of stability (‘‘metastable’’) represented by the curved if d = 1, in which case it is P0 (v)(v  b) = 1, b = r0 ),
part. This would be an isothermal Carnot cycle which, and let
therefore, could not produce Z
H work: since the work 1
produced in the cycle (i.e., pdv) is the signed area a¼ j’1 ðqÞjdq
enclosed by the cycle the rule just means that the area is 2 jqj>r0
zero. The argument is doubtful at least because it is not If p(, v; ) is the pressure when > 0 then it can be
clear that the intermediate states with p increasing proved that
with v could be realized experimentally or could even
def
be theoretically possible. pð; vÞ ¼ lim pð; v; Þ
!0
A striking prediction of [27], taken literally, is

that the gas undergoes a gas–liquid phase transition a


¼  2 þ P0 ðvÞ ½29
with a critical point at a temperature Tc , volume vc , v Maxwell0 s rule
and pressure pc that can be computed via [27] and
are given by RTc = 8A=27B, Vc = 3B (n = 1). where the subscript means that the graph of p(, v)
At the same time, the above prediction is interesting as a function of v is obtained from the function in
as it shows that there are simple relations between the square bracket by applying to it Maxwell’s rule,
critical parameters and the microscopic inter- described above in the case of the van der Waals
action constants, i.e., " ’ kB Tc and r0 ’ (Vc =NA ))1=3 : equation. Equation [29] reduces exactly to the
or more precisely " = 81kB Tc =64, r0 = (Vc =2 NA )1=3 van der Waals equation for d = 1, and for d > 1
if a classical Lennard–Jones potential (i.e., ’ = 4" it leads to an equation with identical critical
((r0 =jqj)12  (r0 =jqj)6 ); see the last section) is used behavior (even though P0 (v) cannot be explicitly
for the interaction potential ’. computed).
However, [27] cannot be accepted acritically not The reader is referred to Lebowitz and Penrose
only because of the approximations (essentially the (1979) and Gallavotti (1999) for more details.
neglecting of O(v1 ) in the equation of state), but
mainly because, as remarked above, for T < Tc the
function p is no longer monotonic in v as it must be;
Absence of Phase Transitions: d = 1
see comment following [19].
The van der Waals equation, refined and comple- One of the most quoted no-go theorems in statistical
mented by Maxwell’s rule, predicts the following mechanics is that one-dimensional systems of parti-
behavior: cles interacting via short-range forces do not exhibit
phase transitions (cf. the next section) unless the
ðp  pc Þ / ðv  vc Þ ;  ¼ 3; T ¼ Tc somewhat unphysical situation of having zero
ðvg  vl Þ / ðTc  TÞ ;  ¼ 1=2; for T ! Tc ½28 absolute temperature is considered. This is particu-
larly easy to check in the case of ‘‘nearest-neighbor
which are in sharp contrast with the experimental hard-core interactions.’’ Let the hard-core size be r0 ,
data gathered in the twentieth century. For the so that the interaction potential ’(r) = þ1 if r  r0 ,
simplest substances, one finds instead  ffi 5,  ffi 1=3. and suppose also that ’(r)  0 if f 2r0 . In this
Finally, blind faith in the equation of state [27] is case, the thermodynamic functions can be exactly
untenable, last but not least, also because nothing in computed and checked to be analytic: hence the
the analysis would change if the space dimension was equation of state cannot have any phase transition
d = 2 or d = 1: but for d = 1, it is easily proved that the plateau. This is a special case of van Hove’s theorem
system, if the interaction decays rapidly at infinity, establishing smoothness of the equation of state for
does not undergo phase transitions (see next section). interactions extending beyond the nearest neighbor
In fact, it is now understood that van der Waals’ and rapidly decreasing at infinity.
equation represents rigorously only a limiting situa- If the definition of phase transition based on the
tion, in which particles have a hard-core interaction sensitivity of the thermodynamic limit to variations
(or a strongly repulsive one at close distance) and a of boundary conditions is adopted then a more
further smooth interaction ’ with very long range. general, conceptually simple, argument can be given
More precisely, suppose that the part of the potential to show that in one-dimensional systems there
outside a hard-core radius r0 > 0 is attractive cannot be any phase transition if the potential
(i.e., non-negative) and has the form d ’1 ( 1 jqj)  0 energy of mutual interaction between a
Introductory Article: Equilibrium Statistical Mechanics 65

configuration Q of particles to the left of a reference configuration which, at least for one boundary
particle (located at the origin O, say) and a condition (e.g., periodic or open), has the same
configuration Q0 to the right of the particle (with energy.
Q [ O [ Q0 compatible with the hard cores) is A symmetry is said to be ‘‘continuous’’ if the
uniformly bounded below. Then a mathematical group of transformations is a continuous group. For
proof can be devised showing that the influence of instance, continuous systems have translational
boundary conditions disappears as the boundaries symmetry if considered in a container  with
recede to infinity. One also says that no long-range periodic boundary conditions. Systems with ‘‘too
order can be established in a one-dimensional case, much symmetry’’ sometimes cannot show phase
in the sense that one loses any trace of the boundary transitions. For instance, the continuous translation
conditions imposed. symmetry of a gas in a container  with periodic
The analysis fails if the space dimension is 2: in boundary conditions is sufficient to exclude the
this case, even if the interaction is short-ranged, the possibility of crystallization in dimension d = 2.
energy of interaction between two regions of space To discuss this, which is a prototype of a proof
separated by a boundary is of the order of the which can be used to infer absence of many
boundary area. Hence, one cannot bound above and transitions in systems with continuous symmetries,
below the probability of any two configurations in consider the translational symmetry and a potential
two half-spaces by the product of the probabilities satisfying, besides the usual [14] and with the
of the two configurations, each computed as if the symbols used in [14], the further property that
other was not there. This is because such a bound jqj2 j@ij2 ’(q)j < Bjqj(dþ"0 ) , with "0 > 0, for some B
would be proportional to the exponential of the holds for r0 < jqj  R. This is a very mild extra
surface of separation, which tends to 1 when the requirement (and it allows for a hard-core
surface grows large. This means that we cannot interaction).
consider, at least not in general, the configurations Consider an ‘‘ideal crystal’’ on a square lattice
in the two half-spaces as independently distributed. (for simplicity) of spacing a, exactly fitting in its
Analytically, a condition on the potential suffi- container  of side L assumed with periodic
cient to imply that the energy between a configura- boundary conditions: so that N = (L=a)d is the
tion to the left and one to the right of the origin is number of particles and ad is the density, which is
bounded below, if d = 1, is simply expressed by supposed to be smaller than the close packing
Z 1 density if the interaction ’ has a hard core. The
rj’ðrÞjdr < þ1 for r0 > r0 probability distribution of the particles is rather
r0 trivial:
Therefore, in order to have phase transitions in XY dQ
d = 1, a potential is needed that is ‘‘so long range’’ ¼ ðqpðnÞ  a nÞ
p n
N!
that it has a divergent first moment. It can be
shown by counterexamples that if the latter condi- the sum running over the permutations m ! p(m) of
tion fails there can be phase transitions even in the sites m 2 , m 2 Zd , 0 < mi  La1 . The density
d = 1 systems. at q is
The results just quoted also apply to discrete * +
X X
N
models like lattice gases or lattice spin models that bðqÞ ¼ ðq  a nÞ  ðq  qj Þ
will be considered later in the article. n j¼1
For more details, we refer the reader to Landau
and Lifschitz (1967), Dyson (1969), Gallavotti and its Fourier transform is proportional to
(1999), and Gallavotti et al. (2004). * +
def 1
X 2
ðkÞ¼ eik qj ; k ¼ n; n 2 Zd
N j
L
Continuous Symmetries: ‘‘No d = 2
(k) has value 1 for all k of the form K = (2 =a)n
Crystal’’ Theorem
and (1=N)O( maxc = 1, 2 jeikc a  1j2 ) otherwise. In
A second case in which it is possible to rule out presence of interaction, it has to be expected that,
existence of phase transitions or at least of certain in a crystal state, (k) has peaks near the values K:
kinds of transitions arises when the system under but the value of (k) can depend on the boundary
analysis enjoys large symmetry. By symmetry is conditions.
meant a group of transformations acting on the Since the system is translation invariant a crystal
configurations and transforming each of them into a state defined as a state with a distribution ‘‘close’’ to ,
66 Introductory Article: Equilibrium Statistical Mechanics

i.e., with (q)


ˆ with peaks at the ideal lattice points crystal will be identified with the impossibility of the
q = na, cannot be realized under periodic boundary [30]. Other criteria can be imagined, for example,
conditions, even when the system state is crystalline. considering crystals with a lattice different from
To realize such a state, a symmetry-breaking term is simple cubic, which lead to the same result by
needed in the interaction. following the same technique. Nevertheless, it is not
This can be done in several ways, for example, by mathematically excluded (but unlikely) that, with
changing the boundary condition. Such a choice some weaker existence definition, a crystal state
implies a discussion of how much the boundary could be possible even in two dimensions.
conditions influence the positions of the peaks of The following inequalities hold under the present
(k): for instance, it is not obvious that a boundary assumptions on the potential and in the canonical
condition will not generate a state with a period distribution with periodic boundary conditions
different from the one that a priori has been selected and parameters (, ),  = a3 in a box  with side
for disproval (a possibility which would imply a multiple of a (so that N = (La1 )d ) and potential of
reciprocal lattice of K’s different from the one interaction ’ þ "W. The further assumption that the
considered to begin with). Therefore, here the choice lattice na is not a close-packed lattice is (of course)
will be to imagine that an external weak force with necessary when the interaction potential has a hard
potential "W(q) acts forcing a symmetry breaking core. Then, for suitable B0 , B, B1 , B2 > 0, indepen-
that favors the occupation of regions around the dent of N, and " and for jk j < =a and for all 
points of the ideal lattice (which would mark the (if K 6¼ 0)
average positions of the particles in the crystal state  N 2 
that is being sought). The proof (Mermin’s theorem) 1 X
iðk þKÞ qj ð" ðKÞ þ " ðK þ 2k ÞÞ2
e B
that no equilibrium state with particles distribution N j¼1 B1 k 2 þ "B2
‘‘close’’ to , i.e., with peaks in place of the delta  N 2 
functions (see below), is essentially reproduced 1X dk X
iðkþKÞ qj
ðk Þ e  B0 < 1 ½31
below. N k N j¼1
P
Take W(q) = na2 (q  na), where (q)  0 is
smooth and zero everywhere except in a small where the averages are in the canonical distribu-
vicinity of the lattice points around which it tion (, ) with periodic boundary conditions and a
decreases to some negative minimum keeping a symmetry-breaking potential "W(q); (k) 0 is an
rotation symmetry around them. The potential W is (arbitrary) smooth function vanishing for 2jk j 
invariant under translations by the lattice steps. By with  < 2 =a and B0 depends on . See Appendix
the choice of the boundary condition and "W, the 3 for a derivation of [31].
density e" (q) will be periodic with period a so that Multiplying both sides of the first equation in [31]
" (k) will, possibly, not have a vanishing limit as by N 1 (k ) and summing over k , the crystallinity
N ! 1 only if k is a reciprocal vector K = (2 =a)n. condition in the form [30] implies
If the potential is ’ þ "W and if there exists a crystal Z
ðk Þ dk
state in which particles have higher probability of B0 Br2 ad 2 B þ "B
jkj< k 1 2
being near the lattice points na, it should be
expected that for small " > 0 the system will be For d = 1, 2 the integral diverges, as "1=2 or log "1 ,
found in a state with Fourier transform of the respectively, implying j" (K)j  ! r = 0: the criterion
"!0
density, " (k), satisfying, for some vector K 6¼ 0 in of crystallinity, [30] cannot be satisfied if d = 1, 2.
the reciprocal lattice, The above inequality is an example of a general
class of inequalities called infrared inequalities stem-
lim lim j" ðKÞj ¼ r > 0 ½30 ming from another inequality called Bogoliubov’s
"!0 N!1
inequality (see Appendix 3), which lead to the proof
that is, the requirement is that uniformly in " ! 0 that certain kinds of ordered phases cannot exist if
the Fourier transform of the density has a peak at the dimension of the ambient space is d = 2 when a
some K 6¼ 0. Note that if k is not in the reciprocal finite volume, under suitable boundary conditions
lattice " (k) N!!1 0, being bounded above by (e.g., periodic), shows a continuous symmetry. The
excluded phenomenon is, more precisely, the non-
 
1 existence of equilibrium states exhibiting, in the
O max jeikj a  1j2 thermodynamic limit, a symmetry lower than
N j¼1;2
the continuous symmetry holding in a finite volume.
" is periodic and its integral over q is
because (1=N)e In general, existence of thermodynamic equili-
equal to 1. Hence, excluding the existence of a brium states with symmetry lower than the
Introductory Article: Equilibrium Statistical Mechanics 67

symmetry enjoyed by the system in finite volume defined, in the grand canonical distribution with
and under suitable boundary conditions is called a parameters , (and empty boundary conditions), by
‘‘spontaneous symmetry breaking.’’ It is yet another X1
def 1
manifestation of instability with respect to changes  ðq1 ; .. .; qn Þ ¼ znþm
in boundary conditions, hence its occurrence reveals Zgc ð; ; VÞ m¼0
Z
a phase transition. There is a large class of systems dy dym
 eðq1 ;...;qn ;y1 ;...;ym Þ 1 ½32
for which an infrared inequality implies absence of  m!
spontaneous symmetry breaking: in most of the one-
This is the probability density for finding particles
or two-dimensional systems a continuous symmetry
with any momentum in the volume element dq1 dqn
cannot be spontaneously broken.
(irrespective of where other particles are), and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The limitation to dimension d  2 is a strong
z = e ( 2 m 1 h2 )d accounts for the integration
limitation to the generality of the applicability of
over the momenta variables and is called the activity:
infrared theorems to exclude phase transitions.
it has the dimension of a density (cf. [23]).
More precisely, systems can be divided into classes
Assuming that the potential has a hard core (for
each of which has a ‘‘critical dimension’’ below
simplicity) of radius R, the interaction energy
which too much symmetry implies absence of
q1 (q2 , . . . , qn ) of a particle at q1 with any number
phase transitions (or of certain kinds of phase
of other particles at q2 , . . . , qm with jqi  qj j > R is
transitions).
bounded below by B for some B 0 (related but
It should be stressed that, at the critical dimen-
not equal to the B in [14]). The functions  will be
sion, the symmetry breaking is usually so weakly
regarded as a sequence of functions ‘‘of one, two, . . .
forbidden that one might need astronomically large
particle positions’’:  = { (q1 , . . . , qn )}1
n = 1 vanish-
containers to destroy small effects (due to boundary
ing for qj 62 . Then, one checks that
conditions or to very small fields) which break the
symmetry. For example, in the crystallization just  ðq1 ; . . . ; qn Þ ¼ zn;1  ðq1 Þ þ K  ðq1 ; . . . ; qn Þ ½33a
discussed, the Fourier transform peaks are only
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi with
bounded by O(1= log "1 ). Hence, from a practical
point of view, it might still be possible to have some def
K ðq1 ; ... ;qn Þ ¼ eq1 ðq2 ;...;qn Þ ð ðq2 ;.. .;qn Þ n>1
kind of order even in large containers. 1 Z
X dy1 dys Y s
The reader is referred to Mermin (1968), Hohen- þ ðe’ðq1 yk Þ  1Þ
berg (1969), and Ruelle (1969). s¼1  s! k¼1
 ðq2 ;. ..;qn ; y1 ; ...; ys ÞÞ ½33b

High Temperature and Small Density where n,1 , n>1 are Kronecker deltas and  (q) is the
indicator function of . Equation [33] is called the
There is another class of systems in which no phase Kirkwood–Salzburg equation for the family of corre-
transitions take place. These are the systems with lation functions in . The kernel K of the equations is
stable and tempered interactions ’ (e.g., those independent of , but the domain of integration is .
satisfying [14]) in the high-temperature and low- Calling  the sequence of functions
density region. The property is obtained by showing  (q1 , . . . , qn )  0 if n 6¼ 1 and  (q) = z  (q), a
that the equation of state is analytic in the variables recursive expansion arises, namely
(, ) near the origin (0, 0).
A simple algorithm (Mayer’s series) yields the  ¼ z þ z2 K þ z3 K2  þ z4 K3  þ ½34
coefficients of the virial series
It gives the correlation functions, provided the series
X
1 converges. The inequality
pð; Þ ¼  þ ck ðÞk Z p
k¼2
jKp  ðq1 ;. ..;qn Þj  eð2Bþ1Þp je’ðqÞ  1jdq
It has the drawback that the kth order coefficient ck ()
is expressed as a sum of many terms (a number def ð2Bþ1Þp
¼e rðÞ3p ½35
growing more than exponentially fast in the order k)
and it is not so easy (but possible) to show shows that the series [34], called Mayer’s series,
combinatorially that their sum is bounded exponen- converges if jzj < e(2Bþ1) r()3 . Convergence is
tially in k if  is small enough. A more efficient uniform (as  ! 1) and (Kp ) (q1 , ... , qn ) tends to
approach leads quickly to the desired solution. a limit as V ! 1 at fixed q1 , ... , qn and the limit is
def P
Denoting F(q1 , . . . , qn ) = i<j ’(qi  qj ), consider simply (Kp )(q1 , .. ., qn ), if (q1 , .. ., qn )  0 for n 6¼ 1,
the (‘‘spatial or configurational’’) correlation functions and (q1 )  1. This is because the kernel K contains
68 Introductory Article: Equilibrium Statistical Mechanics

the factors (e’(q1 y)  1) which decay rapidly or, if therefore their configurations do not contain
’ has finite range, will eventually even vanish. It momentum variables.
is also clear that (Kp )(q1 , ... , qn ) is translation The interaction energy is just the potential
invariant. energy, and ensembles are defined as collections of
Hence, if jzje2Bþ1 r()3 < 1, the limits, as  ! 1, probability distributions on the position coordinates
of the correlation functions exist and can be of the particle configurations. Usually, the potential
computed by a convergent power series in z; the is a pair potential decaying fast at 1 and, often,
correlation functions will be translation invariant (in with a hard-core forbidding double or higher
the thermodynamic limit). occupancy of the same lattice site. For instance,
In particular, the one-point correlation function the lattice gas with potential ’, in a cubic box 
 = (q) is  = z(1 þ O(zr()3 )), which, to lowest order with jj = V = Ld sites of a square lattice with mesh
in z, just shows that activity and density essentially a > 0, is defined by the potential energy attributed
coincide when they are small enough. Furthermore, to the configuration X of occupied distinct sites,
p = (1=V) log Zgc (, , V) is such that i.e., subsets X :
Z X
1
z@z p ¼  ðqÞ dq HðXÞ ¼  ’ðx  yÞ ½37
V ðx;yÞ2X

(from the definition of  in [32]). Therefore,


where the sum is over pairs of distinct points in X.
1 The canonical ensemble and the grand canonical
pð; zÞ ¼ lim log Zgc ð; ; VÞ
V!1 V ensemble are the collections of distributions, para-
Z z 0
dz metrized by (, ), ( = N=V), or, respectively, by
¼ 0
ð; z0 Þ ½36 (, ), attributing to X the probability
0 z

and, since the density  is analytic in z as well and eHðXÞ


p; ðXÞ ¼ jXj;N ½38a
 ’ z for z small, the grand canonical pressure is Zcp ð; N; Þ
analytic in the density and p = (1 þ O(2 )), at small
density. In other words, the equation of state is, to or
lowest order, essentially the equation of a perfect gas. e jXj eHðXÞ
All quantities that are conceivably of some interest p; ðXÞ ¼ ½38b
Zgc
p ð; ; Þ
turn out to be analytic functions of temperature and
density. The system is essentially a free gas and it has where the denominators are normalization factors
no phase transitions in the sense of a discontinuity or that can, respectively, be called, in analogy with the
of a singularity in the dependence of a thermodynamic theory of continuous systems, canonical and grand
function in terms of others. Furthermore, the system canonical partition functions; the subscript p stands
cannot show phase transitions in the sense of sensitive for particles.
dependence on boundary conditions of fixed external A lattice gas in which in each site there can be at
particles. This also follows, with some extra work, most one particle can be regarded as a model for the
from the Kirkwood–Salzburg equations. distribution of a family of spins on a lattice. Such
The reader is referred to Ruelle (1969) and models are quite common and useful (e.g., they arise
Gallavotti (1969) for more details. in studying systems with magnetic properties).
Simply identify an ‘‘occupied’’ site with a ‘‘spin
up’’ or þ and an ‘‘empty’’ site with a ‘‘spin down’’
Lattice Models or  (say). If s = {x }x2 is a spin configuration, the
energy of the configuration ‘‘for potential ’ and
The problem of proving the existence of phase magnetic field h’’ will be
transitions in models of homogeneous gases with
X X
pair interactions is still open. Therefore, it makes HðsÞ ¼  ’ðx  yÞx y  h x ½39
sense to study the problem of phase transitions ðx;yÞ2 x
in simpler models, tractable to some extent but
nontrivial, and which are of practical interest in with the sum running over pairs (x, y) 2  of distinct
their own right. sites. If ’(x  y)  Jxy 0, the model is called a
The simplest models are the so-called lattice ferromagnetic Ising model. As in the case of
models in which particles are constrained to points continuous systems, it will be assumed to have a
of a lattice: they cannot move in the ordinary sense finite range for ’: that is, ’(x) = 0 for jxj > R, for
of the word (but, of course, they could jump) and some R, unless explicitly stated otherwise.
Introductory Article: Equilibrium Statistical Mechanics 69

The canonical and grand canonical ensembles in the can be shown to exist by a method similar to the
box  with respective parameters (, m) or (, h) will one discussed in Appendix 2. They have convexity
be defined as the probability distributions
P on the spin and continuity properties as in the cases of the
configurations s = {x }x2 with x2 x = M = mV
 continuum systems. In the case of a lattice gas, the
or without constraint on M, respectively; hence, f , p functions are still interpreted as free energy
 P  and pressure, respectively. In the case of spin, f (, h)
exp  ðx;yÞ ’ðx  yÞx y has the interpretation of magnetic free energy,
p;m ðsÞ ¼ while g(, m) does not have a special name in the
Zcs ð; M; Þ
thermodynamics of magnetic systems. As in the
p;h ðsÞ ½40
  continuum systems, it is occasionally useful to define
P P
exp h x   ðx;yÞ ’ðx  yÞx y infinite-volume equilibrium states:
¼
Zgc
s ð; h; Þ Definition An infinite-volume state with para-
meters (, h) or (, m) is a collection of average
where the denominators are normalization factors
values F ! hFi obtained, respectively, as limits of
again called, respectively, the canonical and grand
finite-volume averages hFin defined from canonical
canonical partition functions. As in the study of the
or grand canonical distributions in n with fixed
previous continuous systems, canonical and grand
parameters (, h) or (, m), or (u, v) and with general
canonical ensembles with ‘‘external fixed particle
boundary condition of fixed external spins or empty
configurations’’ can be defined together with the
sites, on sequences n ! 1 for which such limits
corresponding ensembles with ‘‘external fixed spin
exist simultaneously for all local observables F.
configurations’’; the subscript s stands for spins.
For each configuration X  of a lattice gas, let This is taken verbatim from the definition in the
{nx } be nx = 1 if x 2 X and nx = 0 if x 62 X. Then the section ‘‘Phase transitions and boundary condi-
transformation x = 2nx  1 establishes a correspon- tions.’’ In this way, it makes sense to define the
dence between lattice gas and spin distributions. In spin correlation Q functions for X = (x1 , . . . , xn ) as
the correspondence, the potential ’(x  y) of the hX i if X = j xj . For instance, we shall call
def
lattice gas generates a potential (1=4)’(x  y) for the (x1 , x2 ) = hx1 x2 i and a pure phase can be defined
corresponding spin system and the chemical potential as an infinite-volume state such that
for the lattice gas is associated with a magnetic
P field
hX Yþx i  hX ihYþx i ! 0 ½42
h for the spin system with h = (1=2)( þ x6¼0 ’(x)). x!1
The correspondence between boundary conditions
Again, for more details, we refer the reader to Ruelle
is natural: for instance, a boundary condition for the
(1969) and Gallavotti (1969).
lattice gas in which all external sites are occupied
becomes a boundary condition in which external
sites contain a spin þ. The close relation between
lattice gas and spin systems permits switching from Thermodynamic Limits and Inequalities
one to the other with little discussion. An interesting property of lattice systems is that it is
In the case of spin systems, empty boundary possible to study delicate questions like the existence
conditions are often considered (no spins outside ). of infinite-volume states in some (moderate) generality.
In lattice gases and spin systems (as well as in A typical tool is the use of inequalities. As the simplest
continuum systems), often periodic and semiperiodic example of a vast class of inequalities, consider the
boundary conditions are considered (i.e., periodic in ferromagnetic Ising model with some finite (but
one or more directions and with empty or fixed arbitrary) range interaction Jxy 0 in a field hx 0 :
external particles or spins in the others). J, h may even be not translationally invariant. Then
Thermodynamic limits for the partition functions def
the average of X = x1 x2 xn , X = (x1 , . . . , xn ),
1 in a state with ‘‘empty boundary conditions’’ (i.e., no
f ð; vÞ ¼ lim log Zcp ð; N; Þ external spins) satisfies the inequalities
!1
V=N¼v
N
1 gc
hX i; @hx hX i; @Jxy hX i 0 X = ðx1 ; . . . ; xn Þ
pð; Þ ¼ lim log Zp ð; ; Þ
!1 V More generally, let H(s) in [39] be replaced by
½41 P
1 H(s) =  X JX X with JX 0 and X can be any
gð; mÞ ¼ lim log Zcs ð; M; Þ
!1; V finite set; then, if Y = (y1 , . . . , yn ), X = (x1 , . . . , xn ),
M=V!m
the following Griffiths inequalities hold:
1
f ð; hÞ ¼ lim log Zgc
s ð; ; Þ
!1 V hX i 0; @JY hX i  hX Y i  hX ihY i 0 ½43
70 Introductory Article: Equilibrium Statistical Mechanics

The inequalities can be used to check, in ferromag- Symmetry-Breaking Phase Transitions


netic Ising models, [39], existence of infinite-volume
The simplest phase transitions (see the section
states (cf. the sections ‘‘Phase transitions and boundary
‘‘Phase transitions and boundary conditions’’) are
conditions’’ and ‘‘Lattice models’’) obtained by fixing
symmetry-breaking transitions in lattice systems:
the boundary condition B to be either ‘‘all external
they take place when the energy of the system in a
spins þ’’ or ‘‘all external sites empty.’’ If hFiB, 
container  and with some special boundary
denotes the grand canonical average with boundary
condition (e.g., periodic, antiperiodic, or empty) is
condition B and any fixed , h > 0, this means that
invariant with respect to the action of a group G on
for all local observables F(s  ) (i.e., for all F depending
phase space. This means that on the points x of
on the spin configuration in any fixed region ) all the
phase space acts a group of transformations G so
following limits exist:
that with each 2 G is associated a map x ! x
lim hFiB; ¼ hFiB ½44 which transforms x into x respecting the composi-
!1 tion law in G, that is, (x ) 0  x( 0 ). If F is an
observable, the action of the group on phase space
The reason is that the inequalities [43] imply that all
induces an action on the observable F changing F(x)
averages hX iB,  are monotonic in  for all fixed def
into F (x) = F(x 1 ).
X : so the limit [44] exists for F(s) = X . Hence,
A symmetry-breaking transition occurs when, by
it exists for all F’s depending only on finitely many
fixing suitable boundary conditions and taking the
spins, because any local function F ‘‘measurable in ’’
thermodynamic limit, a state F ! hFi is obtained in
can be expressed (uniquely) as a linear combination
which some local observable shows a nonsymmetric
of functions X with X  .
average hFi 6¼ hF i for some .
Monotonicity with empty boundary conditions is
An example is provided by the ‘‘nearest-neighbor
seen by considering the sites outside  and in a
ferromagnetic Ising model’’ on a d-dimensional lattice
region 0 with side one unit larger than that of 
with energy function given by [39] with h = 0 and
and imagining that the couplings JX with X 0 but
’(x  y)  0 unless jx  yj = 1, i.e., unless x, y are
X 6  vanish. Then, hX i0 hX i , because hX i0
nearest neighbors, in which case ’(x  y) = J > 0.
is an average computed with a distribution corre-
With periodic or empty boundary conditions, it
sponding to an energy with the couplings JX with
exhibits a discrete ‘‘up–down’’ symmetry s !s.
X 6 , but X 0 , changed from 0 to JX 0.
Instability with respect to boundary conditions
Likewise, if the boundary condition is þ, then
can be revealed by considering the two boundary
enlarging the box from  to 0 corresponds to
conditions, denoted þ or , in which the lattice
decreasing an external field h acting on the external
sites outside the container  are either occupied by
spins from þ1 (which would force all external spins to
spins þ or by spins . Consider also, for later
be þ) to a finite value h 0: so, increasing the box 
reference, (1) the boundary conditions in which
causes hX iþ, to decrease. Therefore, as  increases,
the boundary spins in the upper half of the
Ising ferromagnets spin correlations increase if the
boundary are þ and the ones in the lower
boundary condition is empty and decrease if it is þ.
part are : call this the -boundary condition
The inequalities can be used in similar ways to prove
(see Figure 2); or (2) the boundary conditions in
that the infinite-volume states obtained from þ or
empty boundary conditions are translation invariant;
and that in zero external field, h = 0, the þ and 
boundary conditions generate pure states if the interac-
tion potential is only a pair ferromagnetic interaction.
There are many other important inequalities
which can be used to prove several existence
theorems along very simple paths. Unfortunately,
their use is mostly restricted to lattice systems and A O B
requires very special assumptions on the energy
(e.g., ferromagnetic interactions in the above exam-
ple). The quoted examples were among the first
discovered and provide a way to exhibit nontrivial
thermodynamic limits and pure states.
For more details, see Ruelle (1969), Lebowitz Figure 2 The dashed line is the boundary of ; the outer spins
(1974), Gallavotti (1999), Lieb and Thirring (2001), correspond to the  boundary condition. The points A, B are
and Lieb (2002). points where an open ‘‘line’’ ends.
Introductory Article: Equilibrium Statistical Mechanics 71

which some of the opposite sides of  are because the last ratio in [46] does not exceed 1.
identified while þ or  conditions are assigned on Note that there are >3p different shapes of with
the remaining sides: call these ‘‘cylindrical or perimeter p and at most p2 congruent ’s containing
semiperiodic boundary conditions.’’ x; therefore, the probability that the spin at x is 
A new description of the spin configurations is when the boundary condition is þ satisfies the
useful: given s, draw a unit segment perpendicular inequality
to the center of each bond b having opposite spins at X
1
its extremes. An example of this construction is P;þ ðÞ  p2 3p e2Jp ! 0
provided by Figure 2 for the boundary condition . !1
p¼4
The set of segments can be grouped into lines
separating regions where the spins are positive from This probability can be made arbitrarily small so
regions where they are negative. If the boundary that hx i,þ is estimated by a quantity which is as
condition is þ or , the lines form ‘‘closed polygons’’, close to 1 as desired provided  is large enough and
whereas, if the condition is , there is also a single the closeness of hx i,þ to 1 is estimated by a
polygon 1 which is not closed (as in Figure 2). If the quantity which is both x and  independent.
boundary condition is periodic or cylindrical, all A similar argument for the ()-boundary condition,
polygons are closed but some may ‘‘go around’’ . or the remark that for h = 0 it is hx i, = hx i,þ ,
The polygons are also called ‘‘contours’’ and the length leads to conclude that, at large , hx i, 6¼ hx i,þ
of a polygon will be denoted j j. and the difference between the two quantities
The correspondence ( 1 , 2 , . . . , n , 1 ) ! s, for is positive uniformly in . This is the proof
the boundary condition  or, for the boundary (Peierls’ theorem) of the fact that there is, if  is
condition þ (or ), s ! ( 1 , . . . , n ) is one-to-one large, a strong instability, of the magnetization with
and, if h = 0, the energy H (s) of a configuration is respect to the boundary conditions, i.e., the nearest-
higher than J(number of bonds in ) P by an neighbor Ising model in dimension 2 (or greater, by an
P
amount 2J(j 1 j þ i j i j) or, respectively, 2J i j i j. identical argument) has a phase transition. If the
The grand canonical probability of each spin dimension is 1, the argument clearly fails and no phase
configuration is therefore proportional, if h = 0, transition occurs (see the section ‘‘Absence of phase
respectively, to transitions: d = 1’’).
P P For more details, see Gallavotti (1999).
e2Jðj 1 jþ i j i jÞ or e2J i j i j ½45
and the ‘‘up–down’’ symmetry is clearly reflected
by [45]. Finite-Volume Effects
The average hx i,þ of þ with þ boundary
The description in the last section of the phase
conditions is given by hx i,þ = 1  2P,þ (), where
transition in the nearest-neighbor Ising model can be
P,þ () is the probability that the spin x is 1. If the
made more precise both from physical and mathe-
site x is occupied by a negative spin then the point x is
matical points of view giving insights into the nature
inside some contour associated with the spin
of the phase transitions. Assume that the boundary
configuration s under consideration. Hence, if ( )
condition is the (þ)-boundary condition and
is the probability that a given contour belongs to
describe a spin configuration s by means of the
the set of contours
P describing a configuration s, it
associated closed disjoint polygons ( 1 , . . . , n ).
is P,þ ()  ox ( ) where ox means that
Attribute to s = ( 1 , . . . , n ) a probability propor-
‘‘surrounds’’ x.
tional to [45]. Then the following Minlos–Sinai’s
If  = ( 1 , . . . , n ) is a spin configuration and if
theorem holds:
the symbol  comp means that the contour is
‘‘disjoint’’ from 1 , . . . , n (i.e., { [ } is a new spin Theorem If  is large enough there exist C > 0,
configuration), then ( ) > 0 with ( )  e2Jj j and such that a spin
P configuration s randomly chosen out of the grand
P 2J j 0 j
3 e 0 2 canonical distribution with þ boundary conditions
ð Þ ¼ P P and h = 0 will contain, with probability approaching
2J j 0 j
 e 0 2
1 as  ! 1, a number K( ) (s) of contours con-
P P
e
2J
0 2
j 0 j gruent to such that
2Jj j comp
e P 2J P 0 j 0 j pffiffiffiffiffiffiffi
e
2 jKð Þ ðsÞ  ð Þjjj  C jj eJj j ½47
 e2Jj j ½46 and this relation holds simultaneously for all ’s.
72 Introductory Article: Equilibrium Statistical Mechanics

Thus, there are very few contours (and the larger analyticity holds at all h. For  large, the function
they are the smaller is, in absolute and relative f (, h) has an essential singularity at h = 0: a result
value, their number): a typical spin configuration in that can be interpreted as excluding a naive theory
the grand canonical ensemble with (þ)-boundary of metastability as a description of states governed
conditions is such that the large majority of the spins by an equation of state obtained from an analytic
is ‘‘positive’’ and, in the ‘‘sea’’ of positive spins, there continuation to negative values of h of f (, h).
are a few negative spins distributed in small and The above considerations and results further
rare regions (their number, however, is still of order clarify the meaning of a phase transition for a
of jj). finite system. For more details, we refer the
Another consequence of the analysis in the last reader to Gallavotti (1999) and Friedli and Pfister
section concerns the the approximate equation of (2004).
state near the phase transition region at low
temperatures and finite . If  is finite, the graph
of h versus m (, h) will have a rather different
Beyond Low Temperatures
behavior depending on the possible boundary con-
ditions. For example, if the boundary condition is
(Ferromagnetic Ising Model)
(þ) or (), one gets, respectively, the results A limitation of the results discussed above is the
depicted in Figure 3a and 3b, where m () denotes condition of low temperature (‘‘ large enough’’).
def
the spontaneous magnetization (i.e., m () = A natural problem is to go beyond the low-
limh!0þ lim!1 m (, h)). temperature region and to describe fully the phe-
With periodic or empty boundary conditions, the nomena in the region where boundary condition
diagram changes as in Figure 4. The thermody- instability takes place and first develops. A number
namic limit m(, h) = lim!1 m (, h) exists for all of interesting partial results are known, which
h 6¼ 0 and the resulting graph is in Figure 4b, considerably improve the picture emerging from
which shows that at h = 0 the limit is discontin- the previous analysis. A striking list, but far from
uous. It can be proved, if  is large enough, that exhaustive, of such results follows and focuses on
1 > limh ! 0þ @h m(, h) = () > 0 (i.e., the angle the properties of ferromagnetic Ising spin systems.
between the vertical part of the graph and the rest The reason for restricting to such cases is that they
is sharp). are simple enough to allow a rather fine analysis,
Furthermore, it can be proved that m(, h) is which sheds considerable light on the structure of
analytic in h for h 6¼ 0. If  is small enough, statistical mechanics suggesting precise formulation

mΩ(β, h) mΩ(β, h)

1 1
m*(β) m*(β)
–O(|Ω|–1/2) O(|Ω|–1/2) –O (|Λ|–1/2) O (|Λ|–1/2)
h h
–m*(β) –m*(β)

(a) (b)
Figure 3 The h vs m (, h) graphs for  finite and (a) þ and (b)  conditions.

mΩ(β, h) m(β, h)

1 1
m*(β) m*(β)
–O(|Ω|–1/2) O(|Ω|–1/2)
h h
–m*(β) –m*(β)

(a) (b)
Figure 4 (a) The h vs m (, h) graph for periodic or empty boundary conditions. (b) The discontinuity (at h = 0) of the thermodynamic limit.
Introductory Article: Equilibrium Statistical Mechanics 73

of the problems that it would be desirable to the unit circle) in the z-plane. Then, if J0 6¼ 0,
understand in more general systems. they lie in a closed set N 1 , -independent and
def contained in a neighborhood of N of width
1. Let z = e h and consider that the product of zV
shrinking to 0 when jjJ0 jj ! 0. This allows to
(V is the number of sites jj of ) times the
establish various relations between analyticity
partition function with periodic or perfect-wall
properties and boundary condition instability
boundary conditions and with finite-range
as described in (3) below.
ferromagnetic interaction, not necessarily nearest-
3. In the ferromagnetic Ising model, with not necessa-
neighbor; a polynomial in z (of degree 2V)
rily a nearest-neighbor interaction, one says that
is thus obtained. Its zeros lie on the unit
there is a gap around 0 if d () = 0 near  = 0. It
circle jzj = 1: this is Lee–Yang’s theorem. It
can be shown that if  is small enough there is a gap
implies that the only singularities of f (, h) in
for all h of width uniform in h.
the region 0 <  < 1, 1 < h < þ1 can be
4. Another question is whether the boundary
found at h = 0.
condition instability is always revealed by the
A singularity can appear only if the point z = 1
one-spin correlation function (i.e., by the magne-
is an accumulation point of the limiting distribu-
tization) or whether it might be shown only
tion (as  ! 1) of the zeros on the unit circle: if
by some correlation functions of higher order. It
the zeros are z1 , . . . , z2V then
can be proved that no boundary condition
instability occurs for h 6¼ 0; at h = 0 it is possible
1 only if
log zV Zð; h; ; periodicÞ
V
1X 2V lim mð; hÞ 6¼ limþ mð; hÞ
h!0
½50
¼ 2J þ h þ logðz  zi Þ h!0
V i¼1
5. A consequence of the Griffiths’ inequalities
and if (cf. the section ‘‘Thermodynamic limits and
inequalities’’) is that if [50] is true for a given
V 1  ðnumber of zeros of the form 0 then it is true for all  > 0 . Therefore, item
d ðÞ (4) leads to a natural definition of the critical
zj ¼ eij ;   j   þ dÞ !
!1 2 temperature Tc as the least upper bound of the
it is T ’s such that [50] holds (kB T = 1 ).
Z 6. If d = 2 the free energy of the nearest-neighbor

1 ferromagnetic Ising model has a singularity
f ð; hÞ ¼ 2J þ logðz  ei Þ d ðÞ ½48
2  at c and the value of c is known exactly
from the exact solutions of the model:
The existence of the measure d () follows from def
m(, 0þ ) = m ()  (1  sinh4 2J)1=8 . The loca-
the existence of the thermodynamic limit: but tion and nature of the singularities of f (, 0) as a
d () is not necessarily d-continuous, i.e., not function of  remains an open question for d = 3.
necessarily proportional to d. In particular, the question whether there is a
2. It can be shown that, with not necessarily a singularity of f (, 0) at  = c is open.
nearest-neighbor interaction, the zeros of the 7. For  < c there is instability with respect to
partition function do not move too much under boundary conditions (see (6) above) and a
small perturbations of the potential even if one natural question is: how many ‘‘pure’’ phases
perturbs the energy (at perfect-wall or periodic can exist in the ferromagnetic Ising model?
boundary conditions) into (cf. the section ‘‘Phase transitions and boundary
H0 ðsÞ ¼ H ðsÞ þ ðH ÞðsÞ conditions,’’ eqn [22]). Intuition suggests
X that there should be only two phases: the
ðH ÞðsÞ ¼ J0 ðXÞ X ½49
positively magnetized and the negatively
X 
magnetized ones.
0 One has to distinguish between translation-
where J (X) is very general and defined on
subsets X = (x1 , . . .P
, xk )  such that the quan- invariant pure phases and non-translation-invariant
tity jjJ0 jj = supy2Zd y2X jJ0 (X)j is small enough. ones. It can be proved that, in the case of the
More precisely, with a ferromagnetic pair two-dimensional nearest-neighbor ferromagnetic
potential J fixed, suppose that one knows that, Ising models, all infinite-volume states (cf. the
when J0 = 0, the partition function zeros in the section ‘‘Lattice models’’) are translationally invar-
variable z = eh lie in a certain closed set N (of iant. Furthermore, they can be obtained by
74 Introductory Article: Equilibrium Statistical Mechanics

considering just the two boundary conditions þ external cause favoring the occupation of a part of
and : the latter states are also pure states for the volume by a single phase. Such an asymmetry
models with non-nearest-neighbor ferromagnetic can be obtained in at least two ways: through a
interaction. The solution of this problem has led to weak uniform external field (in complete analogy with
the introduction of many new ideas and techniques the gravitational field in the liquid–vapor transition) or
in statistical mechanics and probability theory. through an asymmetric field acting only on boundary
8. In any dimension d 2, for  large enough, it can spins. The latter should have the same qualitative
be proved that the nearest-neighbor Ising model effect as the former, because in a phase transition
has only two translation-invariant phases. If the region a boundary perturbation produces volume
dimension is 3 and  is large, the þ and  effects (see sections ‘‘Phase transitions and inequal-
phases exhaust the set of translation-invariant ities’’ and ‘‘Symmetry-breaking phase transitions’’).
pure phases but there exist non-translation- From a mathematical point of view, it is simpler to
invariant phases. For  close to c , however, the use a boundary asymmetry to produce phase separa-
question is much more difficult. tions and the simplest geometry is obtained by
considering -cylindrical or þþ-cylindrical boundary
For more details, see Onsager (1944), Lee and
conditions: this means þþ or  boundary conditions
Yang (1952), Ruelle (1971), Sinai (1991), Gallavotti
periodic in one direction (e.g., in Figure 2 imagine the
(1999), Aizenman (1980), Higuchi (1981), and
right and left boundary identified after removing the
Friedli and Pfister (2004).
boundary spins on them).
Spins adjacent to the bases of  act as symmetry-
Geometry of Phase Coexistence breaking external fields. The þþ-cylindrical bound-
ary condition should favor the formation inside 
Intuition about the phenomena connected with the of the positively magnetized phase; therefore, it
classical phase transitions is usually based on the will be natural to consider, in the canonical
properties of the liquid–gas phase transition; this distribution, this boundary condition only when
transition is usually experimentally investigated in
the total magnetization is fixed to be the sponta-
situations in which the total number of particles is
neous magnetization m ().
fixed (canonical ensemble) and in presence of an On the other hand, the -boundary condition
external field (gravity). favors the separation of phases (positively magnetized
The importance of such experimental conditions phase near the top of  and negatively magnetized
is obvious; the external field produces a nontransla- phase near the bottom). Therefore, it will be natural
tionally invariant situation and the corresponding to consider the latter boundary condition in the
separation of the two phases. The fact that the case of a canonical distribution with magnetization
number of particles is fixed determines, on the other
m = (1  2)m () with 0 <  < 1 ([51]). In the latter
hand, the fraction of volume occupied by each of the
case, the positive phase can be expected to adhere to
two phases. the top of  and to extend, in some sense to be
Once more, consider the nearest-neighbor ferro- discussed, up to a distance O(L) from it; and then to
magnetic Ising model: the results available for it can change into the negatively magnetized pure phase.
be used to obtain a clear picture of the solution to To make the phenomenological description
problems that one would like to solve but which in precise, consider the spin configurations s through
most other models are intractable with present-day the associated sets of disjoint polygons (cf. the
techniques.
section ‘‘Symmetry-breaking phase transitions’’). Fix
It will be convenient to discuss phase coexistence in
the boundary conditions to be þþ or -cylindrical
the canonical ensemble distributions on configurations boundary conditions and note that polygons asso-
of fixed total magnetization M = mV (see the section ciated with a spin configuration s are all closed and
‘‘Lattice models’’; [40]). Let  be large enough to be in of two types: the ones of the first type, denoted
the two-phase region and, for a fixed  2 (0, 1), let 1 , . . . , n , are polygons which do not encircle ; the
m ¼  m ðÞ þ ð1  Þ ðm ðÞÞ second type of polygons, denoted by the symbols  ,
are the ones which wind up, at least once, around .
¼ ð1  2Þ m ðÞ ½51
So, a spin configuration s will be described by a set
that is, m is in the vertical part of the diagram of polygons; the statistical weight of a configuration
m = m(, h) at  fixed (see Figure 4). s = ( 1 , . . . , n , 1 , . . . , h ) is (cf. [45]):
Fixing m as in [51] does not yet determine the P
separation of the phases in two different regions; for P 
2J i
j i jþ j
j j j
this effect, it will be necessary to introduce some e ½52
Introductory Article: Equilibrium Statistical Mechanics 75

The reason why the contours that go around where ( )  e2Jj j is the same quantity as
the cylinder  are denoted by (rather than by ) is already mentioned in the text of the theorem of
that they ‘‘look like’’ open contours (see the section ‘‘Finite-volume effects’’. A similar result holds for
‘‘Symmetry-breaking phase transitions’’) if one forgets the contours below (cf. the comments on [47]).
that the opposite sides of  have to be identified. In the
The above theorem not only provides a detailed and
case of the -boundary conditions then the number of
rather satisfactory description of the phase separation
polygons of -type must be odd (hence 6¼ 0), while for
phenomenon, but it also furnishes a precise micro-
the þþ-boundary condition the number of -type
scopic definition of the line of separation between the
polygons must be even (hence it could be 0).
two phases, which should be naturally identified with
For more details, the reader is referred to Sinai
the (random) line .
(1991) and Gallavotti (1999).
A similar result holds in the canonical distribution
, þþ, m () where (i) is replaced by: no -type
polygon is present, while (ii), (iii) become super-
Separation and Coexistence of Phases fluous, and (iv) is modified in the obvious way. In
other words, a typical configuration for the distribu-
In the context of the geometric description of
tion the , þþ, m () has the same appearance as a
the spin configuration in the last section, consider
typical configuration of the corresponding grand
the canonical distributions with þþ-cylindrical or the
canonical ensemble with (þ)-boundary condition
-cylindrical boundary conditions and zero field: they
(whose properties are described by the theorem
will be denoted briefly as , þþ , ,  , respectively.
given in the section ‘‘Beyond low temperatures
The following theorem (Minlos–Sinai’s theorem)
(ferromagnetic Ising model’’).
provided the foundations of the microscopic theory
For more details, see Sinai (1991) and Gallavotti
of coexistence: it is formulated in dimension d = 2
(1999).
but, modulo obvious changes, it holds for d 2.
Theorem For 0 <  < 1 fixed, let m = (1  2)
m (); then for  large enough a spin configuration Phase Separation Line and Surface
s = ( 1 , . . . , n , 1 , . . . , 2hþ1 ) randomly chosen with Tension
the distribution ,  enjoys the properties (i)–(iv) below
Continuing to refer to the nearest-neighbor Ising
with a ,  -probability approaching 1 as  ! 1:
ferromagnet, the theorem of the last section means
(i) s contains only one contour of -type and that, if  is large enough, then the microscopic line ,
separating the two phases, is almost straight (since
jj j  ð1 þ "ðÞÞLj < oðLÞ ½53
"() is small). The deviations of from a straight line
where "() > 0 is a suitable (-independent) are more conveniently studied in the grand canonical
function of  tending to zero exponentially fast distributions 0 with boundary condition set to þ1 in
as  ! 1. the upper half of @, vertical sites included, and
(ii) If þ 
,  denote respectively, the regions above to 1 in the lower half: this is illustrated in Figure 2
and below , and jj  V, jþ j, j j are, (see the section ‘‘Symmetry-breaking phase transi-
respectively, the volumes of , þ ,  then tions’’). The results can be converted into very
3=4
similar results for grand canonical distributions with
jjþ
j   Vj < ðÞ V -cylindrical boundary conditions of the last section.
3=4
j
j  ð1  ÞV j < ðÞ V ½54 Define to be rigid if the probability that passes
through the center of the box  (i.e., 0) does not
where ()  !
! 1 exponentially fast; the expo- tend to 0 as  ! 1; otherwise, it is not rigid.
nent 3/4, P
here and below, is not
P optimal. The notion of rigidity distinguishes between the
(iii) If Mþ = x2 þ x and M

¼ x2 x , then possibilities for the line to be ‘‘straight.’’ The

3=4
‘‘excess’’ length "()L (see [53]) can be obtained in
jMþ
  m ðÞ Vj < ðÞ V two ways: either the line is essentially straight (in
3=4
M
 ð1  Þ m ðÞ Vj < ðÞV ½55 the geometric sense) with a few ‘‘bumps’’ distributed
with a density of order "() or, otherwise, it is only
(iv) If K (s) denotes the number of contours con- locally straight and with an important part of the
gruent to a given and lying in þ then, excess length being gained through a small bending
simultaneously for all the shapes of : on a large length scale. In three dimensions a similar
phenomenon is possible. Rigidity of , or its failure,
jK ðsÞ  ð Þ V j  CeJj j V 1=2 ; C > 0 ½56 can in principle be investigated by optical means;
76 Introductory Article: Equilibrium Statistical Mechanics

there can be interference of coherent light scattered temperature Tc (the latter being defined as the
by macroscopically separated surface elements of highest temperature below which there are at least
only if is rigid in the above sense. two pure phases). The temperature T ec , whose
It has been rigorously proved that, the line is not existence is rather well established in numerical
rigid in dimension 2. And, at least at low tempera- experiments, would be called the ‘‘roughening
ture, the pfluctuation
ffiffiffiffi of the middle point is of the transition’’ temperature. The rigidity of is con-
order O( L). In dimension 3 however, it has been nected with the existence of translationally non-
shown that the surface is rigid at low enough invariant equilibrium states. The latter exist in
temperature. dimension d = 3, but not in dimension d = 2, where
A deeper analysis is needed to study the shape of the discussed nonrigidity of , established all the
the separation surface under other conditions, for way to Tc , provides the intuitive reason for the
example, with þ boundary conditions in a canoni- absence of non-translation-invariant states. It has
cal distribution with magnetization intermediate been shown that in d = 3 the roughening tempera-
between m (). It involves, as a prerequisite, the ture Tec () necessarily cannot be smaller than the
definition and many properties of the surface critical temperature of the two-dimensional Ising
tension between the two phases. Here only model with the same coupling.
the definition of surface tension in the case of Note that existence of translationally noninvar-
-boundary conditions in the two-dimensional case iant equilibrium states is not necessary for the
will be mentioned. If Zþþ (, m ()) and Zþ (, m) description of coexistence phenomena. The theory
are, respectively, the canonical partition functions of the nearest-neighbor two-dimensional Ising model
for the þþ- and -cylindrical boundary conditions is a clear proof of this statement.
the tension
() is defined as The reader is referred to Onsager (1944), van
Beyeren (1975), Sinai (1991), Miracle-Solé (1995),
1 Zþ ð; mÞ Pfister and Velenik (1999), and Gallavotti (1999) for

ðÞ ¼  lim log þþ
!1 L Z ð; m ðÞÞ more details.

The limit can be shown to be -independent for 


large enough: the definition and its justification is Critical Points
based on the microscopic geometric description in
the section ‘‘Geometry of phase co-existence.’’ The Correlation functions for a system with short-range
definition can be naturally extended to higher interactions and in an equilibrium state (which is
dimension (and to more general non-nearest-neighbor a pure phase) have cluster properties (see [22]):
models). If d = 2, the tension
can be exactly their physical meaning is that in a pure phase there
computed at all temperatures below criticality and is independence between fluctuations occurring in
is 
() = 2J þ log tanh J. widely separated regions. The simplest cluster
More remarkably, the definition can be extended to property concerns the ‘‘pair correlation function,’’
define the surface tension
(, n) in the ‘‘direction n,’’ that is, the probability density (q1 , q2 ) of finding
that is, when the boundary conditions are such particles at points q1 , q2 independently of where
that the line of separation is in the average the other particles may happen to be (see [23]).
orthogonal to the unit vector n. In this way, if In the case of spin systems, the pair correlation
d = 2 and  2 (0, 1) is fixed, it can be proved that (q1 , q2 ) = hq1 q2 i will be considered. The pair
at low enough temperature the canonical distribu- correlation of a translation-invariant equilibrium
tion with þ boundary conditions and intermediate state has a cluster property ([22], [42]), if
magnetization m = (1  2)m () has typical
jðq1 ; q2 Þ  2 j ! 0 ½57
configurations containing a spin  region of area jq1 q2 j!1

V; furthermore, if the container is rescaled to


size L = 1, the region will have a limiting shape where  is the probability density for finding a
filling an area  bounded by a smooth curve particle at q (i.e., the physical density of the state) or
whose form is determined by the classical macro-  = hq i is the average of the value of the spin at q
scopic Wulff ’s theory of the shape of crystals in (i.e., the magnetization of the state).
terms of the surface tension
(n). A general definition of critical point is a point c in
An interesting question remains open in the three- the space of the parameters characterizing equili-
dimensional case: it is conceivable that the surface, brium states, for example, , in grand canonical
although rigid at low temperature, might become distributions, , v in canonical distributions, or , h
ec smaller than the critical
‘‘loose’’ at a temperature T in the case of lattice spin systems in a grand canonical
Introductory Article: Equilibrium Statistical Mechanics 77

distribution. In systems with short-range interaction This means that if i are regarded as points in R d
(i.e., with ’(r) vanishing for jrj large enough) the there are functions 2n such that
point c is a critical point if the pair correlation tends  
1 2n1
to 0 (see [57]), slower than exponential (e.g., as a 2n 0; ; . . . ; ¼ !2n 2n ð0; 1 ; . . . ; 2n1 Þ
power of the distance jrj = jq1  q2 j).
A typical example is the two-dimensional Ising 0< 2R ½59
model on a square lattice and with nearest-neighbor
and h0 1 . . . 2n1 i / 2n (0, 1 , . . . , 2n1 ) if 1 
ferromagnetic interaction of size J. It has a single
jxi  xj j  l0 (). The numbers !2n define a sequence
critical point at  = c , h = 0 with sinh 2c J = 1. The
of critical exponents.
cluster property is that hx y i  hx ihy i! 0 as
jxyj!1 Other critical exponents can be associated with
ðÞjxyj ðÞjxyj approaching the critical point along other directions
e e
Aþ ðÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; A ðÞ (e.g., along h ! 0 at  = c ). In this case, the length up
jx  yj jx  yj2 to which there are scaling phenomena is l0 (h) = ‘o h .
1 Further, the magnetization m(h) tends to 0 as h ! 0 at
Ac ; ½58
jx  yj1=4 fixed  = c as m(h) = m0 h1= for  > 0.
None of the feautres of critical exponents is known
for  < c ,  > c , or  = c , respectively, where rigorously, including their existence. An exception is the
A (), Ac , () > 0. The properties [58] stem from case of the two–dimensional nearest-neighbor Ising
the exact solution of the model. ferromagnet where some exponents are known exactly
At the critical point, several interesting phenom- (e.g., !2 = 1=4, !2n = n!2 , or  = 1, while ,  are not
ena occur: the lack of exponential decay indicates rigorously known). Nevertheless, for Ising ferromag-
lack of a length scale over which really distinct nets (not even nearest-neighbor but, as always here,
phenomena can take place, and properties of the finite-range) in all dimensions, all of the exponents
system observed at different length scales are likely mentioned are conjectured to be the same as those
to be simply related by suitable scaling transforma- of the nearest-neighbor Ising ferromagnet. A further
tions. Many efforts have been dedicated at finding exception is the derivation of rigorous relations
ways of understanding quantitatively the scaling between critical exponents and, in some cases, even
properties pertaining to different observables. The their values under the assumption that they exist.
result has been the development of the renormaliza-
Remark Naively it could be expected that in a pure
tion group approach to critical phenomena (cf. the
state in P zero field with hx i = 0 the quantity
section ‘‘Renormalization group’’). The picture that
s = jj1=2 x2 x , if  is a cubic box of side ‘,
emerges is that the closer the critical point is the
should have a probability distribution which is
larger becomes the maximal scale of length below
Gaussian, with dispersion lim!1 hs2 i. This is
which scaling properties are observed. For instance,
‘‘usually true,’’ but not always. Properties [58]
in a lattice spin system in zero field the magnetiza-
show that in the d = 2 ferromagnetic nearest-
tion Mjja in a box   should have essentially
neighbor Ising model, hs2 i diverges proportionally
the same distribution for all ’s with side < l0 () and 1
to ‘24 so that the variable s cannot have the above
l0 () ! 1 as  ! c , provided a is suitably chosen.
Gaussian distribution. The variable S = jj7=8
P
The number a is called a critical exponent.
There are several other ‘‘critical exponents’’ that x2 x will have a finite dispersion: however,
there is no reason that it should be Gaussian. This
can be defined near a critical point. They can
makes clear the great interest of a fluctuation theory
be associated with singularities of the thermody-
and its relevance for the critical point studies (see
namic function or with the behavior of
the next two sections).
the correlation functions involving joint densities at
two or more than two points. As an example, For more details, the reader is referred to Onsager
consider a lattice spin system: then the ‘‘2n–spins (1944), Domb and Green (1972), McCoy and Wu
correlation’’ h0 1 . . . 2n1 ic could behave propor- (1973), and Aizenman (1982).
tionally to 2n (0, 1 , . . . , 2n1 ), n = 1, 2, 3, . . . , for a
suitable family of homogeneous functions n , of
some degree !2n , of the coordinates (1 , . . . , 2n1 ) Fluctuations
at east when the reciprocal distances are large but
As it appears from the discussion in the last section,
< l0 () and
fluctuations of observables around their averages
l0 ðÞ ¼ const:ð  c Þ ! 1 have interesting properties particularly at critical
!0 points. Of particular interest are observables that
78 Introductory Article: Equilibrium Statistical Mechanics

are averages, over large volumes , of local functions function of m(h). If p = M =jj the function F(p) is
F(x) on phase space: this is so because macroscopic given by
observables often have this form. For instance, given
FðpÞ ¼ ðf ð; hðpÞÞ  f ð; hÞ  @h f ð; hÞðhðpÞ  hÞÞ ½60
a region  inside the system container ,  ,
consider a configuration x = (P, Q) and the number then a quite general result is:
P
of particles
P N  = q2 1 in , or the potential energy
TheoremP The relations (1)–(3) hold if the potential
F = P (q, q0 )2 ’(q  q0 ) or the kinetic energy
satisfies x j’(x)j < 1 and if F(p) [60] is smooth
K = q2 (1=2m)p2 . In the case of lattice spin
and F00 (p) 6¼ 0 in open intervals around those in
systems, consider a configuration
P s and, for instance,
which p is considered, that is, around p = 0 for the
the magnetization M = i2 i in . Label the
law of large numbers and for the central limit law or
above four examples by  = 1, . . . , 4.
in an open interval containing a, b for the case of the
Let  be the probability distribution describing
large deviations law.
the equilibrium state in which the quantities X are
def
considered; let x = hX =jji and p = (X  In the cases envisaged, the theory of equivalence
x )=jj. Then typical properties of fluctuations that of ensembles implies that the function F can also be
should be investigated are ( = 1, . . . , 4): computed via thermodynamic functions naturally
associated with other equilibrium ensembles. For
1. for all  > 0 it is lim!1  (jpj > ) = 0 (law of
instance, instead of the grand canonical f (, h), one
large numbers);
could consider the canonical g(, m) (see [41]), then
2. there is D > 0 such that

pffiffiffiffiffiffi Z b FðpÞ ¼ ðgð; pÞ  gð; mÞ  @m gð; mÞðp  mÞÞ ½61


dz 2
ðp jj 2 ½a; bÞ ! pffiffiffiffiffiffiffiffiffiffiffiffi ez =2D
!1 a 2 D It has to be remarked that there should be a
strong relation between the central limit law and the
(central limit law); and law of large deviations. Setting aside stating the
3. there is an interval I = (p , , p ,þ ) and a concave conditions for a precise mathematical theorem, the
function F (p), p 2 I, such that if [a, b] I then statement can be efficiently illustrated in the case of
1 a ferromagnetic lattice spin system and with   ,
log ðp 2 ½a; bÞ ! max F ðpÞ by showing that the law of large deviations in small
jj !1 p2½a;b
intervals, around the average m(h0 ), at a value h0 of
(large deviations law). the external field, is implied by the validity of the
The law of large numbers provides the certainty central limit law for all values of h near h0 and vice
of the macroscopic values; the central limit law versa (here  is fixed). Taking h0 = 0 (for simplicity),
pffiffiffiffiffiffi the heuristic reasons are the following. Let h, be
controls the small fluctuations (of order jj) of X
around its average; and the large deviations law the grand canonical distribution in external field h.
concerns the fluctuations of order jj. Then:
The relations (1)–(3) above are not always true: 1. The probability h, (p 2 dp) is proportional,
they can be proved under further general assump- by definition, to 0, (p 2 dp)ehpjj . Hence,
tions if the potential ’ Psatisfies [14] in the case of if the central limit law holds for all h near
particle systems or if q j’(q)j < 1 in the case h0 = 0, there will exist two functions m(h) and
of lattice spin systems. The function F (p) is D(h) > 0, defined for h near h0 = 0, with
defined in terms of the thermodynamic limits of m(0) = 0 and
suitable thermodynamic functions associated with
jj
the equilibrium state  . The further assumption is, 0 ðp 2 dpÞehp
essentially in all cases, that a suitable thermody- !
namic function in terms of which F (p) will be ðp  mðhÞÞ2
¼ const:exp jj þ oðÞ dp ½62
expressed is smooth and has a nonvanishing second 2DðhÞ
derivative.
For the purpose of a simple concrete example, 2. There is a function (m) such that @m (m(h)) = h
consider P a lattice spin system P of Ising type with and @m 2
(m(h)) = D(h)1 . (This is obtained by
energy  x, y2 ’(x  y)x y  x h Px and the fluc- noting that, given D(h), the differential equation
tuations of the magnetization M = x2 x ,  , @m h = D(h)1 with the initial value h(0) = 0
in the grand canonical equilibrium states h,  . determines the function h(m); therefore, (m)
Let the free energy be f (, h) (see [41]), let is determined by a second integration, from
def
m = m(h) = hM =jji and let h(m) be the inverse @m (m) = h(m).
Introductory Article: Equilibrium Statistical Mechanics 79

It then follows, heuristically, that the probability The discussion of the last section shows that at
of p in zero field has the form const. e(p)jj dp so the critical point the nature of the large fluctuations
that the probability that p 2 [a, b] will be const is also expected to change: no central limit law is
exp (jj maxp2[a,b] (p)). expected to hold in general because of the example
Conversely, the large deviations law for p at h = 0 of [58] with the divergence of the average of the
implies the validity of the central limit law for the normal second moment of the magnetization in a
fluctuations of p in all small enough fields h: this box as the side tends to 1.
simply arises from the function F(p) having a For more details the reader is referred to Olla
negative second derivative. (1987).
This means that there is a ‘‘duality’’ between central
limit law and large deviation law or that the law of
Renormalization Group
large deviations is a ‘‘global version’’ of the central
limit law, in the sense that: The theory of fluctuations just discussed concerns
only fluctuations of a single quantity. The problem
1. if the central limit law holds for h in an interval
of joint fluctuations of several quantities is also
around h0 then the fluctuations of the magnetiza-
interesting and in fact led to really new develop-
tion at field h0 satisfy a large deviation law in a
ments in the 1970s. It is necessary to restrict
small enough interval J around m(h0 ); and
attention to rather special cases in order to illustrate
2. if a large deviation law is satisfied in an interval
some ideas and the philosophy behind the approach.
around h0 then the central limit law holds for the
Consider, therefore, the equilibrium distribution 0
fluctuations of magnetization around its average
associated with one of the classical equilibrium
in all fields h with h  h0 small enough.
ensembles. To fix the ideas we consider the
Going beyond the heuristic level in establishing equilibrium distribution of an Ising energy function
the duality amounts to giving a precise meaning to H0 , having included the temperature factor in the
‘‘small enough’’ and to discuss which properties of energy: the inclusion is done because the discussion
m(h) and D(h), or F(p) are needed to derive will deal with the properties of 0 as a function of .
properties (1), (2). It will also be assumed that the average of each spin
For purposes of illustration consider the Ising is zero (‘‘no magnetic field,’’ see [39] with h = 0).
model with ferromagnetic short range interaction ’: Keeping in mind a concrete case, imagine that H0
then the central limit law holds for all h if  is small is the energy function of the nearest-neighbor Ising
enough and, under the same condition on , the ferromagnet in zero field.
large deviations law holds for all h and all intervals Imagine that the volume  of the container has
[a, b] (1, 1). If  is not small then the condition periodic boundary conditions and is very large,
h 6¼ 0 has to be added. Hence, the conditions are ideally infinite. Define the family of blocks kx,
fairly weak and the apparent exceptions concern the parametrized by x 2 Zd and with k an integer,
value h = 0 and  not small where the statements consisting of the lattice sites x = {ki  xi < (k þ 1)
may become invalid because of possible phase i }. This is a lattice of cubic blocks with side size k
transitions. that will be called the ‘‘k-rescaled lattice.’’
P
In presence of phase transitions, the law of large Given , the quantities mx = kd x2kx x are
numbers, the central limit law, and law of large called the block spins and define the map
deviations should be reformulated. Basically, one R ,k 0 = k transforming the initial distribution on
has to add the requirement that fluctuations are the original spins into the distribution of the block
considered in pure phases and change, in a natural spins. Note that if the initial spins have only two
way, the formulation of the laws. For instance, values x = 1, the block spins take values between
the large fluctuations of magnetization in a pure kd=kd and kd=kd at steps of size 2=kd . Further-
phase of the Ising model in zero field and large  more, the map R , k makes sense independently of
(i.e., in a state obtained as limit of finite-volume how many values the initial spins can assume, and
states with þ or  boundary conditions) in even if they assume a continuum of values Sx 2 R.
intervals [a, b] which do not contain the average Taking  = 1 means, for k large, looking at the
magnetization m are not necessarily exponen- probability distribution of the joint large fluctuations
tially small with the size of jj: if [a, b] in the blocks kx. Taking  = 1=2 corresponds to
[m , m ] they are exponentially small but only studying a joint central limit property for the block
with the size of the surface of  (i.e., with variables.
jj(d1)=d) ) while they are exponentially small with Considering a one-parameter family of initial
the volume if [a, b] \ [ m , m ] = ;. distributions 0 parametrized by a parameter 
80 Introductory Article: Equilibrium Statistical Mechanics

(that will be identified with the inverse temperature), Note that this theorem is stated without even
typically there will be a unique value () of  such mentioning the renormalization maps Rn1=2 : it can
that the joint fluctuations of the block variables nevertheless be interpreted as stating that
admit a limiting distribution, X 1
Rn1=2 H0 ! S2x ½65
probk ðmx 2 ½ax ; bx ; s 2 Þ n!1
d
2DðÞ
x2Z
Z fbx g Y
! g ððSx Þx2 Þ dSx ½63 but the interpretation is not rigorous because [64]
k!1 fax g x2 does not state require that Rn1=2 H0 () makes sense
for n 1. It states that at high temperature block
for some distribution g (z) on R . Q spins have normal independent fluctuations: it is
If  > (), the limit will then be x2 (Sx ) dSx , therefore an extension of the central limit law.
or if  < () the limit will not exist (because the There are a few cases in which the map R can be
block variables will be too large, with a dispersion rigorously shown to be well defined at least when
diverging as k ! 1). acting on special equilibrium states like the high-
It is convenient to choose as sequence of k ! 1 temperature lattice spin systems: but these are
the sequence k = 2n with n = 0, 1, . . . because in this exceptional cases of relatively little interest.
way it is R ,k  R n ,1 and the limits k ! 1 along Nevertheless, there is a vast literature dealing with
the sequence k = 2n can be regarded as limits on a approximate representations of the map R . The
sequence of iterations of a map R , 1 acting on the reason is that, assuming not only its existence but
probability distributions of generic spins Sx on the also that it has the properties that one would
lattice Zd (the sequence 3n would be equally normally expect to hold for a map acting on a finite
suited). dimensional space, it follows that a number of
It is even more convenient to consider probability consequences can be drawn; quite nontrivial ones as
distributions that are expressed in terms of energy they led to the first theory of the critical point that
functions H which generate, in the thermodynamic goes beyond the van der Waals theory discribed in
limit, a distribution : then R ,1 defines an action the section ‘‘van der Waals theory.’’
R on the energy functions so that R H = H 0 if H The argument proceeds essentially as follows. At
generates , H 0 generates 0 and R ,1  = 0 . Of the critical point, the fluctuations are expected to be
course, the energy function will be more general anomalous (cf. the last remark P in the section
pffiffiffiffiffiffi‘‘Critical
than [39] and at least a form like U in [49] has to points’’) in the sense that h( x2 x = jj)2 i will
be admitted. tend to 1, because  = 1=2 does P not correspond to
In other words, R gives the result of the action the right fluctuation scale of 2  , signaling that
of R ,1 expressed as a map acting on the energy R n1=2,1 0 (c ) will not have a limit but, possibly, there
functions. Its iterates also define a semigroup is c > 1=2 such that R n c ,1 0 (c ) converges to a limit
which is called the block spin renormalization in the sense of [63]. In the case of the critical nearest-
group. neighbor Ising ferromagnetic c = 7=8 (see ending
While the map R ,1 is certainly well defined as a remark in the section ‘‘Critical points’’). Therefore, if
map of probability distributions into probability the map R c , 1 is considered as acting on 0 (), it will
distributions, it is by no means clear that R is well happen that forQall  < c , R n c ,1 0 (c ) will converge to
defined as a map on the energy functions. Because, if a trivial limit x2 (Sx ) dSx because the value c is
 is given by an energy function, it is not clear that greater than 1/2 while normal fluctuations are expected.
R ,1  is such. If the map Rc can be considered Q as a map on the
A remarkable theorem can be (easily) proved energy functions, this says that x2 (Sx ) dSx is a
when R , 1 and its iterates act on initial 0 ’s which ‘‘(trivial) fixed point of the renormalization group’’
are equilibrium states of a spin system with short- which ‘‘attracts’’ the energy functions H0 corre-
range interactions and at high temperature ( small). sponding to the high-temperature phases.
In this case, if  = 1=2, the sequence of distributions The existence of the critical c can be associated
R n
1=2,1 0 () admits a limit which is given by with the existence of a nontrivial fixed point H for
a product of independent Gaussians: Rc which is hyperbolic with just one Lyapunov
exponent > 1; hence, it has a stable manifold of
probk ðmx 2 ½ax ; bx ; s 2 Þ codimension 1. Call  the probability distribution
Z fbx g Y  Y corresponding to H .
1 dSx
! exp  S2x pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½64 The migration towards the trivial fixed point for
k!1 fax g 2DðÞ 2 DðÞ
x2 x2  < c can be explained simply by the fact that for
Introductory Article: Equilibrium Statistical Mechanics 81

such values of  the initial energy function H0 is (e.g., the Wilson–Fisher "-expansion) that allow one
outside the stable manifold of the nontrivial fixed to pass from the well-defined R , 1 to the action of
point and under application of the renormalization R on the energy functions, it is possible to obtain
transformation Rnc , H0 migrates toward the trivial quite unambiguously values for c and expressions
fixed point, which is attractive in all directions. for H which are associated with the action of Rc
By increasing , it may happen that, for on various classes of models.
 = c , H0 crosses the stable manifold of the For instance, it can lead to conclude that the
nontrivial fixed point H for Rc . Then Rnc c H0 critical behavior of all ferromagnetic finite-range
will no longer tend to the trivial fixed point but it lattice spin systems (with energy functions given by
will tend to H : this means that the block spin [39]) have critical points controlled by the same c
variables will exhibit a completely different fluctua- and the same nontrivial fixed point: this property is
tion behavior. If  is close to c , the iterations of Rc far from being mathematically proved, but it is
will bring Rnc H0 close to H , only to be eventually considered a major success of the theory. One has to
repelled along the unstable direction reaching a compare it with van der Waals’ critical point theory:
distance from it increasing as n j  c j. for the first time, an approximation scheme has
This means that up to a scale length O(2n() ) lattice led, even though under approximations not fully
units with n() j  c j = 1 (i.e., up to a scale O(j controllable, to computable critical exponents which
c jlog2 )), the fluctuations will be close to those of the are not equal to those of the van der Waals theory.
fixed point distribution  , but beyond that scale they The renormalization group approach to critical
will come close to those of the trivial fixed point: to see phenomena has many variants, depending on which
them the block spins would have to be normalized kind of fluctuations are considered and on the models
with index  = 1=2 and they would appear as to which it is applied. In statistical mechanics, there
uncorrelated Gaussian fluctuations (cf. [64], [65]). are a few mathematically complete applications:
The next question concerns finding the nontrivial certain results in higher dimensions, theory of dipole
fixed points, which means finding the energy gas in d = 2, hierarchical models, some problems in
functions H and the corresponding c which are condensed matter and in statistical mechanics of
fixed points of Rc . If the above picture is correct, lattice spins, and a few others. Its main mathematical
the distributions  corresponding to the H would successes have occured in various related fields where
describe the critical fluctuations and, if there was not only the philosophy described above can be
only one choice, or a limited number of choices, of applied but it leads to renormalization transforma-
c and H this would open the way to a universality tions that can be defined precisely and studied in
theory of the critical point hinted already by the detail: for example, constructive field theory, KAM
‘‘primitive’’ results of van der Waals’ theory. theory of quasiperiodic motions, and various pro-
The initial hope was, perhaps, that there would be a blems in dynamical systems.
very small number of critical values c and H However, the applications always concern special
possible: but it rapidly faded away leaving, however, cases and in each of them the general picture of the
the possibility that the critical fluctuations could be trivial–nontrivial fixed point dichotomy appears
classified into universality classes. Each class would realized but without being accompanied, except in
contain many energy functions which, upon iterated rare cases (like the hierarchical models or the
actions of Rc , would evolve under the control of the universality theory of maps of the interval), by the
trivial fixed point (always existing) for  small while, full description of stable manifold, unstable direction,
for  = c , they would be controlled, instead, by a and action of the renormalization transformation on
nontrivial fixed point H for Rc with the same c and objects other than the one of immediate interest (a
the same H . For  < c , a ‘‘resolution’’ of the generality which looks often an intractable problem,
approach to the trivial fixed point would be seen by but which also turns out not to be necessary).
considering the map R1=2 rather than Rc whose In the renormalization group context, mathema-
iterates would, however, lead to a Gaussian distribu- tical physics has played an important role also by
tion like [64] (and to a limit energy function like [65]). providing clear evidence that universality classes
The picture is highly hypothetical: but it is could not be too few: this was shown by the
the first suggestion of a mechanism leading to numerous exact solutions after Onsager’s solution
critical points with the character of universality of the nearest-neighbor Ising ferromagnet: there are
and with exponents different from those of the van in fact several lattice models in d = 2 that exhibit
der Waals theory or, for ferromagnets on a lattice, critical points with some critical exponents exactly
from those of its lattice version (the Curie–Weiss computable and that depend continuously on the
theory). Furthermore, accepting the approximations models parameters.
82 Introductory Article: Equilibrium Statistical Mechanics

For more details, we refer the reader to McCoy Lack of equipartition is important, as it solves
and Wu (1973), Baxter (1982), Bleher and Sinai paradoxes that arise in classical statistical mechanics
(1975), Wilson and Fisher (1972), Gawedzky and applied to systems with infinitely many degrees
Kupiainen (1983, 1985), Benfatto and Gallavotti of freedom, like crystals (modeled by lattices of
(1995), and Mastropietro (2004). coupled oscillators) or fields (e.g., the electromagnetic
field important in the study of black body radiation).
However, although this has been the first surprise of
Quantum Statistics quantum statistics (and in fact responsible for the
Statistical mechanics is extended to assemblies of very discovery of quanta), it is by no means the last.
quantum particles rather straightforwardly. In the At low temperatures, new unexpected (i.e.,
case of N identical particles, the observables are with no analogs in classical statistical mechanics)
operators O on the Hilbert space phenomena occur: Bose–Einstein condensation
(superfluidity), Fermi surface instability (supercon-
HN ¼ L2 ðÞN
 or HN ¼ ðL2 ðÞ  C2 ÞN
 ductivity), and appearance of off-diagonal long-
where  = þ, , of the symmetric ( = þ, bosonic range order (ODLRO) will be selected to illustrate
particles) or antisymmetric ( = , fermionic parti- the deeply different kinds of problems of quantum
cles) functions (Q), Q = (q1 , . . . , qN ), of the posi- statistical mechanics. Largely not yet understood,
tion coordinates of the particles or of the position such phenomena pose very interesting problems not
and spin coordinates (Q, s), s = (1 , . . . , N ), nor- only from the physical point of view but also from
malized so that the mathematical point of view and may pose
Z challenges even at the level of a definition. However,
XZ
j ðQÞj2 dQ ¼ 1 or j ðQ; sÞj2 dQ ¼ 1 it should be kept in mind that in the interesting cases
s (i.e., three-dimensional systems and even most two-
and one-dimensional systems) there is no proof that
here only j = 1 is considered. As in classical the objects defined below really exist for the systems
mechanics, a state is defined by the average values like [66] (see, however, the final comment for an
hOi that it attributes to the observables. important exception).
Microcanonical, canonical, and grand canonical
ensembles can be defined quite easily. For instance,
Bose–Einstein Condensation
consider a system described by the Hamiltonian
(
h = Planck’s constant) In a canonical state with parameters , v, a defini-
tion of the occurrence of Bose condensation is in
2 X
h N X X
terms of the eigenvalues j (, N) of the kernel
HN ¼   qj þ ’ðqj  qj0 Þ þ wðqj Þ
2m j¼1 j<j0 j (q, q0 ) on L2 (), called the one-particle reduced
def
density matrix, defined by
¼KþF ½66
X1 En ð;NÞ Z
e
where periodic boundary conditions are imagined N n ðq; q1 ; . . . ; qN1 Þ
n¼1
tr eHN
on  and w(q) is periodic, smooth potential (the side
0
of  is supposed to be a multiple of the periodic  n ðq ; q1 ; . . . ; qN1 Þ dq1 . . . dqN1 ½68
potential period if w 6¼ 0). Then a canonical
where En (, N) are the eigenvalues of HN and
equilibrium state with inverse temperature  and
n (q1 , . . . , qN ) are the corresponding eigenfunctions.
specific volume v = V=N attributes to the observable
If j are ordered by increasing value, the state with
O the average value
parameters , v is said to contain a Bose–Einstein
def tr eHN O condensate if 1 (, N) bN > 0 for all large  at
hOi ¼ ½67 v = V=N,  fixed. This receives the interpretation
tr eHN
that there are more than bN particles with equal
Similar definitions can be given for the grand momentum. The free Bose gas exhibits a Bose
canonical equilibrium states. condensation phenomenon at fixed density and
Remarkably, the ensembles are orthodic and a ‘‘heat small temperature.
theorem’’ (see the section ‘‘Heat theorem and ergodic
hypothesis’’) can be proved. However, ‘‘equipartition’’
Fermi Surface
does not hold: that is, hKi 6¼ (d=2)N1 , although  1
is still the integrating factor of dU þ p dV in the heat The wave functions n (q1 , 1 , . . . , qN , N )  n (Q, s)
theorem; hence, 1 continues to be proportional to are now antisymmetric in the permutations of the
temperature. pairs (qi , i ). Let (Q, s; N, n) denote the nth
Introductory Article: Equilibrium Statistical Mechanics 83

eigenfunction of the N-particle energy HN in [66] with The system is said to contain Cooper pairs with
eigenvalue E(N, n) (labeled by n = 0, 1, . . . and non- spins , ( = þ or  = ) if there exist functions
decreasingly ordered). Setting Q00 = (q001 , . . . , q00Np ), g (q, ) 6¼ 0 with
s 00 = (001 , . . . , 00Np ), introduce the kernels H
p (Q, s;
N Z
0
Q0 , s 0 ) by g ðq; Þg ðq; Þ dq ¼ 0 if  6¼ 0
p ðQ;s;Q0 ;s 0 Þ
 Z X such that
def N X1 EðN;nÞ
e
¼ p! dNp Q00 lim ðx  y; ; x0  y0 ; 0 ; x  x0 Þ
p tr eHN V!1
s 00 n¼0 X
 ðQ;s;Q00 ;s 00 ;N;nÞ ðQ0 ; s 0 ;Q00 ;s 00 ; N; nÞ ½69 !0
g ðx  y; Þg ðx0  y0 ; 0 Þ ½70
xx !1

which are called p-particle reduced density matrices
In this case, g (x  y, ) with largest L2 norm can be
(extending the corresponding one-particle reduced
def P called, after normalize, the wave function of the paired
density matrix [68]). Denote (q1  q2 ) =  1
state of lowest energy: this is the analog of the plane
(q1 , , q2 , ). It is also useful to consider spinless
wave for a free particle (and, like it, it is manifestly not
fermionic systems: the corresponding definitions are
normalizable, i.e., it is not square integrable as a
obtained simply by suppressing the spin labels and
function of x, y). If the system contains Cooper pairs
will not be repeated.
and the nonleading terms in the limit [70] vanish
Let r1 (k) be the Fourier transform of 1 (q  q0 ): the
quickly enough the two-particle reduced density
Fermi surface can be defined as the locus of the k’s in
matrix [70] regarded as a kernel operator has an
the neighborhood of which @k r1 (k) is unbounded as
eigenvalue of order V as V ! 1: that is, the state of
 ! 1,  ! 1. The limit as  ! 1 is important
lowest energy is ‘‘macroscopically occupied,’’ quite
because the notion of a Fermi surface is, possibly,
like the free Bose condensation in the ground state.
precise only at zero temperature, that is at  = 1.
Cooper pairs instability might destroy the Fermi
So far, existence of Fermi surface (i.e., the smooth-
surface in the sense that r1 (k) becomes analytic in k;
ness of r1 (k) except on a smooth surface in k-space)
but it is also possible that, even in the presence of
has been proved in free Fermi systems (’ = 0) and
them, there remains a surface which is the locus of the
1. certain exactly soluble one-dimensional spinless singularities of the function r1 (k). In the first case,
systems and there should remain a trace of it as a very steep
2. in rather general one-dimensional spinless systems gradient of r1 (k) of the order of an exponential in the
or systems with spin and repulsive pair interac- inverse of the coupling strength; this is what happens
tion, possibly in an external periodic potential. in the BCS model for superconductivity. The model is,
however, a mean-field model and this particular
The spinning case in a periodic potential and
regularity aspect might be one of its peculiarities. In
dimension d 2 is the most interesting case to study
any event, a smooth singularity surface is very likely to
for its relevance in the theory of conduction in
exist for some interesting density matrix (e.g., in the
crystals. Essentially no mathematical results are
BCS model with ‘‘gap parameter ’’ the wave function
available as the above-mentioned ones do not Z
concern any case in dimension >1: this is a rather 1
gðx  y; Þ  d
eik ðxyÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dk
deceiving aspect of the theory and a challenge. ð2 Þ "ðkÞ>0 "ðkÞ2 þ 2
In dimension 2 or higher, for fermionic systems
with Hamiltonian [66], not only there are no results of the lowest energy level of the Cooper pairs is
available, even without spin, but it is not even clear singular on a surface coinciding with the Fermi
that a Fermi surface can exist in presence of surface of the free system).
interesting interactions.
ODLRO
Cooper Pairs Consider the k-fermion reduced density matrix
The superconductivity theory has been phenomeno- k (Q, s; Q0 , s 0 ) as kernel operators Ok on L2 (( 
logically related to the existence of Cooper pairs. C2 )k ). Suppose k is even, then if Ok has a (generalized)
Consider the Hamiltonian [66] and define (cf. [69]) eigenvalue of order N k=2 as N ! 1, N=V = , the
system is said to exhibit off-diagonal long-range order
ðx  y; ; x0  y0 ; 0 ; x  x0 Þ of order k. For k odd, ODLRO is defined to exist if Ok
def
has an eigenvalue of order N (k1)=2 and k 3 (if k = 1
¼ 2 ðx; ; y; ; x0 ; 0 ; y0 ; 0 Þ the largest eigenvalue of O1 is necessarily 1).
84 Introductory Article: Equilibrium Statistical Mechanics

For bosons, consider the reduced density matrix Appendix 1: The Physical Meaning of the
k (Q; Q0 ) regarding it as a kernel operator Ok on Stability Conditions
L2 ()kþ and define ODLRO of order k to be present
if O(k) has a (generalized) eigenvalue of order N k as It is useful to see what would happen if the
N ! 1, N=V = . conditions of stability and temperedness (see [14])
ODLRO can be regarded as a unification of the are violated. The analysis also illustrates some of the
notions of Bose condensation and of the existence of typical methods of statistical mechanics.
Cooper pairs, because Bose condensation could be
said to correspond to the kernel operator 1 (q1  q2 ) Coalescence Catastrophe due
in [68] having a (generalized) eigenvalue of order N, to Short-Distance Attraction
and to be a case of ODLRO of order 1. If the state is
pure in the sense that it has a cluster property (see The simplest violation of the first condition in [14]
the sections ‘‘Phase transitions and boundary condi- occurs when the potential ’ is smooth and negative
tions’’ and ‘‘Lattice models’’), then the existence of at the origin.
ODLRO, Bose condensation, and Cooper pairs Let  > 0 be so small that the potential at distances
implies that the system shows a spontaneously  2 is  b < 0. Consider the canonical distribution
broken symmetry: conservation of particle number with parameters , N in a (cubic) box  of volume V.
and clustering imply that the off-diagonal elements The probability Pcollapse that all the N particles are
of (all) reduced density matrices vanish at infinite located in a little sphere of radius  around the center
separation in states obtained as limits of states with of the box (or around any prefixed point of the box) is
periodic boundary conditions and Hamiltonian [66], estimated from below by remarking that
and this is incompatible with ODLRO.  
N b
The free Fermi gas has no ODLRO, the BCS model   b
 N2
2 2
of superconductivity has Cooper pairs and ODLRO
with k = 2, but no Fermi surface in the above sense so that
(possibly too strict). Fermionic systems cannot have
Pcollapse
ODLRO of order 1 (because the reduced density Z
matrix of order 1 is bounded by 1). dpdq ðKðpÞ þ ðqÞÞ
e
The contribution of mathematical physics has h3N N!
¼ ZC
been particularly effective in providing exactly dpdq ðKðpÞ þ ðqÞÞ
e
soluble models: however, the soluble models deal h3N N!
with one-dimensional systems and it can be shown pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi3 !N
4 2m 1 3N bð1=2ÞNðN  1Þ
that in dimensions 1, 2 no ODLRO can take place. 3
e
3h N!
A major advance is the recent proof of ODLRO and Z ½71
Bose condensation in the case of a lattice version of dq ðqÞ
e
[66] at a special density value (and d 3). h3N N!
In no case, for the Hamiltonian [66] with ’ 6¼ 0,
The phase space is extremely small: nevertheless,
existence of Cooper pairs has been proved nor
such configurations are far more probable than the
existence of a Fermi surface for d > 1. Nevertheless,
configurations which ‘‘look macroscopically cor-
both Bose condensation and Cooper pairs formation
rect,’’ that is, configurations with particles more or
can be proved to occur rigorously in certain limiting
less spaced by the average particle distance expected
situations. There are also a variety of phenomena
in a macroscopically homogeneous configuration,
(e.g., simple spectral properties of the Hamiltonians)
namely (N=V)1=3 = 1=3 . Their energy (q) is of
which are believed to occur once some of the
the order of uN for some u, so that their probability
above-mentioned ones do occur and several of
will be bounded above by
them can be proved to exist in concrete models. Z
If d = 1, 2, ODLRO can be proved to be impos- dpdq ðKðpÞ þ uNÞ
e
sible at T > 0 through the use of Bogoliubov’s h3N N!
Pregular  Z
inequality (used in the ‘‘no d = 2 crystal theorem,’’ dpdq ðKðpÞ þ ðqÞÞ
see the section ‘‘Continuous symmetries: ‘no d = 2 e
h3N N!
crystal’ theorem’’). pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi3
For more details, the reader is referred to Penrose V N 2m1 uN
3N e
and Onsager (1956), Yang (1962), Ruelle (1969), ¼ Z h N! ½72
Hohenberg (1967), Gallavotti (1999), and dq ðqÞ
e
Aizenman et al. (2004). h3N N!
Introductory Article: Equilibrium Statistical Mechanics 85

However, no matter how small  is, the interactions in the above subsection; it occurs when
ratio Pregular =Pcollapse will approach 0 as V ! 1, the potential is too repulsive at 1, that is,
N=V ! v1 ; this occurs extremely rapidly because
2
ebN =2 eventually dominates over V N
eN log N . ’ðqÞ
þ gjqj3þ" as q!1
Thus, it is far more probable to find the system in a so that the temperedness condition is again
microscopic volume of size  rather than in a violated.
configuration in which the energy has some macro- In addition, in this case, the system does not
scopic value proportional to N. This catastrophe can occupy the whole volume: it will generate a layer of
be called an ultraviolet catastrophe (as it is due to the particles sticking, in close-packed configuration, to
behavior at very short distances) and it causes the the walls of the container. Therefore, if the density is
collapse of the particles into configurations concen- lower than the close-packing density,  < cp , the
trated in regions as small as we please as V ! 1. system will leave a region around the center of the
container  empty; and the volume of the empty
Coalescence Catastrophe due region will still be of the order of the total volume of
to Long-Range Attraction the box (i.e., its diameter will be a fraction of the
It occurs when the potential is too attractive near 1. box side L). The proof is completely analogous to
For simplicity, suppose that the potential has a hard the one of the previous case; except that now the
core, i.e., it is þ1 for r < r0 , so that the above- configuration with lowest energy will be the one
discussed coalescence cannot occur and the system sticking to the wall and close packed there, rather
density bounded above by a certain quantity cp < 1 than the one close packed at the center.
(close-packing density). Also this catastrophe is important as it is realized in
The catastrophe occurs if ’(q)
gjqj3þ" , g, " > 0, systems of charged particles bearing the same charge:
for jqj large. For instance, this is the case for matter the charges adhere to the boundary in close-packing
interacting gravitationally; if k is the gravitational configuration, and dispose themselves so that the
constant, m is the particle mass, then g = km2 and " = 2. electrostatic potential energy is minimal. Therefore,
The probability Pregular of ‘‘regular configurations,’’ charges deposited on a metal will not occupy the whole
where particles are at distances of order 1=3 from volume: they will rather form a surface layer minimiz-
their close neighbors, is compared with the probability ing the potential energy (i.e., so that the Coulomb
Pcollapse of ‘‘catastrophic configurations,’’ with the potential in the interior is constant). In general, charges
particles at distances r0 from their close neighbors to in excess of neutrality do not behave thermodynami-
form a configuration of density cp =(1 þ )3 almost in cally: for instance, besides not occupying the whole
close packing (so that r0 is equal to the hard-core volume given to them, they will not contribute
radius times 1 þ ). In the latter case, the system does normally to the specific heat.
not fill the available volume and leaves empty a region Neutral systems of charges behave thermodyna-
whose volume is a fraction
((cp  )=cp )V of V. mically if they have hard cores, so that the
Further, it can be checked that the ratio Pregular =Pcollapse ultraviolet catastrophe cannot occur or if they obey
tends to 0 at a rate O(exp (g 12 N(cp (1 þ )3  ))) quantum-mechanical laws and consist of fermionic
if  is small enough (and  < cp ). particles (plus possibly bosonic particles with
A system which is too attractive at infinity will not charges of only one sign).
occupy the available volume but will stay confined in a For more details, we refer the reader to Lieb
close-packed configuration even in empty space. and Lebowitz (1972) and Lieb and Thirring (2001).
This is important in the theory of stars: stars cannot
be expected to obey ‘‘regular thermodynamics’’ and in
Appendix 2: The Subadditivity Method
particular will not ‘‘evaporate’’ because their particles
interact via the gravitational force at large distances. A simple consequence of the assumptions is that the
Stars do not occupy the whole volume given to them exponential in (5.2) can be bounded above by
 PN 2
(i.e., the universe); they do not collapse to a point only eBN exp( 2m i = 1 P i ) so that
because the interaction has a strongly repulsive core  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffid 
(even when they are burnt out and the radiation pressure  B
1  Zgc ð; ; VÞ  exp Ve e 2m1
is no longer able to keep them at a reasonable size).
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffid
)0 log Zgc ð; ; VÞ  e eB 2m 1 ½73
Evaporation Catastrophe V
This is another infrared catastrophe, that is, a Consider, for simplicity, the case of a hard-core
catastrophe due to the long-range structure of the interaction with finite range (cf. [14]). Consider a
86 Introductory Article: Equilibrium Statistical Mechanics

sequence of boxes n with sides 2n L0 , where L0 > 0 be the Poisson bracket. Integration by parts, with
is arbitrarily fixed to be > 2R. The partition function periodic boundary conditions, yields
Zgc (, z) relative to the volume n is R
1 NZ A fC; eH gdPdQ
X z hA fC; Hgi  
Zn ¼ dQeFðQÞ Zc ð; ; NÞ
N!
N¼0  n
  1 hfA ; Cgi ½75
because the integral over the P variables can be
explicitly performed and included in zN if z is as a general identity. The latter identity implies, for
defined as z = e (2m1 )d=2 . A = {C, H}, that
Then the box n contains 2d boxes n1 for n 1
hfH; Cg fH; Cgi ¼ 1 hfC; fH; C ggi ½76
and
d
  Hence, the Schwartz inequality hA Aih{H, C}
1  Zn  Z2n1 exp B2dðLn1 =RÞd1 22d ½74
{H, C}i jh{A , C}ij2 combined with the two
because the corridor of width 2R around the relations in [75], [76] yields Bogoliubov’s inequality:
boundaries of the 2d cubes n1 filling n has
jhfA ; Cgij2
volume 2RLn1 2d and contains at most hA Ai  1 ½77
(Ln1 =R)d1 2d particles, each of which interacts hfC; fC ; Hggi
with at most 2d other particles. Therefore,
Let g, h be arbitrary complex (differentiable)
def functions and @ j = @ qj
pn ¼ Ldn log Zn
 Ldn1 log Zn1 þ B d 2n ðL0 =RÞd1 def
X
N
def
X
N
AðQÞ ¼ gðqj Þ; CðP; QÞ ¼ pj hðqj Þ ½78
for some d > 0. Hence, 0  pn  pn1 þ d 2n j¼1 j¼1
for some d > 0 and pn is bounded above and below P1 2
uniformly in n. So, the limit [13] exists on the sequence Then H = 2 pj þ F(q1 , . . . , qN ), if
Ln = L0 2n and defines a function p1 (, ).
1X X
A box of arbitrary size L can be filled with about Fðq1 ; . . . ; qN Þ ¼ ’ðjqj  qj0 jÞ þ " Wðqj Þ
(L=Ln )d boxes of side Ln with n  so large that, 2 j6¼j0 j
prefixed  > 0, jp1  pn j <  for all n n . Likewise,
a box of size Ln can be filled by about (Ln =L)d so that, via algebra,
boxes of size L if n is large. The latter remarks lead X
us to conclude, by standard inequalities, that the fC; Hg  ðhj @ j F  pj ðpj @ j Þhj Þ
j
limit in [13] exists and coincides with p1 .
The subadditivity method just demonstrated for def
with hj = h(qj ). If h is real valued, h{C, {C , H}}i
finite-range potentials with hard core can be extended becomes, again via algebra,
to the potentials satisfying just stability and tempered- * +
ness (cf. the section ‘‘Thermodynamic limit’’). X
For more details, the reader is referred to Ruelle hj hj0 @ j @ j0 FðQÞ
jj0
(1969) and Gallavotti (1999). * +
X 4X 2
2
þ " hj Wðqj Þ þ ð@ j hj Þ
j
 j
Appendix 3: An Infrared Inequality
(integrals on pj just replace p2j by 21 and
The infrared inequalities stem from Bogoliubov’s
h(pj )i (pj )i0 i = 1 i, i0 ). Therefore, the average
inequality. Consider as an example the problem of
h{C, {C , H}}i becomes
crystallization discussed in the section ‘‘Continuous
symmetries: ‘no d = 2 crystal’ theorem’’. Let h i *
1X
denote average over a canonical equilibrium state ðhj  hj0 Þ2 ’ðjqj  qj0 jÞ
with Hamiltonian 2 jj0
+
N p2
X X X

j
þ UðQÞ þ "WðQÞ þ" h2j Wðqj Þ þ 41 ð@ j hj Þ2 ½79
j¼1
2 j j

with given temperature P and density parameters Choose g(q)  ei(k þK) q , h(q) = cos q k and
, ,  = a3 . Let {X, Y} = j (@pj X @ qj Y  @qj X @pj Y) bound (hj  hj0 )2 by k 2 (qj  qj0 )2 , (@ j hj )2 by k 2 and
Introductory Article: Equilibrium Statistical Mechanics 87

h2j by 1. Hence [79] is bounded above by ND(k ) the interior points, in this case on the derivatives of FV
with with respect to , ,  at 0. The latter are identical to
* ! the averages in [80], [81]. In this way, the constants
def 2 1 1 X 2 B1 , B2 , B0 such that D(k )  k 2 B1 þ "B2 and B0 > D1
Dðk Þ ¼ k 4 þ ðq  qj0 Þ j’ðqj  qj0 Þj
2N j6¼j0 j are found.
+ For more details, the reader is referred to Mermin
1X (1968).
þ" jWðqj Þj ½80
N j

This can be used to estimate the denominator in


Further Reading
[77]. For the LHS remark that
Aizenman M (1980) Translation invariance and instability of phase
X
N
coexistence in the two dimensional Ising system. Communica-
iq ðk þKÞ 2
hA ; Ai ¼ j e j tions in Mathematical Physics 73: 83–94.
j¼1 Aizenman M (1982) Geometric analysis of ’4 fields and Ising
models. 86: 1–48.
and Aizenman M, Lieb EH, Seiringer R, Solovej JP, and Yngvason J
D X E 2 (2004) Bose–Einstein condensation as a quantum phase

jhfA ; Cgij2 ¼ hj @gj transition in a optical lattice, Physical Review A 70: 023612.
j Baxter R (1982) Exactly Solved Models. London: Academic Press.
Benfatto G and Gallavotti G (1995) Renormalization group.
¼ jK þ k j2 N2 ð" ðKÞ þ " ðK þ 2k ÞÞ2 Princeton: Princeton University Press.
Bleher P and Sinai Y (1975) Critical indices for Dyson’s asympto-
hence [77] becomes, after multiplying both sides tically hierarchical models. Communications in Mathematical
by the auxiliary function (k ) (assumed even and Physics 45: 247–278.
vanishing for jk j > =a) and summing over k , Boltzmann L (1968a) Über die mechanische Bedeutung des zweiten
* + Haupsatzes der Wärmetheorie. In: Hasenöhrl F (ed.) Wissenschaf-
def 1
X 1 XN
tliche Abhandlungen, vol. I, pp. 9–33. New York: Chelsea.
iðKþk Þ qj 2
D1 ¼ ðk Þ j e j Boltzmann L (1968b) Über die Eigenshaften monzyklischer und
N k N j¼1
anderer damit verwandter Systeme. In: Hasenöhrl FP (ed.)
1X Wissenshafltliche Abhandlungen, vol. III, pp. 122–152.
ðk Þ New York: Chelsea.
N k Dobrushin RL (1968) Gibbsian random fields for lattice systems
jKj2 ð" ðKÞ þ " ðK þ 2k ÞÞ2 with pairwise interactions. Functional Analysis and Applica-
 ½81 tions 2: 31–43.
4 Dðk Þ Domb C and Green MS (1972) Phase Transitions and Critical
Points. New York: Wiley.
To apply [77] the averages in [80], [81] have to be Dyson F (1969) Existence of a phase transition in a one–dimensional
bounded above: this is a technical point that is Ising ferromagnet. Communications in Mathematical Physics 12:
discussed here, as it illustrates a general method of 91–107.
using the results on the thermodynamic limits and Dyson F and Lenard A (1967, 1968) Stability of matter. Journal
their convexity properties of Mathematical Physics 8: 423–434, 9: 698–711.
P to obtain d PN
estimates.
ik qj 2 Friedli S and Pfister C (2004) On the singularity of the free energy at
Note that h(1=N) k (k)d P kj j=1 e j i is a first order phase transition. Communications in Mathematical
identically P e þ (2=N)h j<j0 ’(q
’(0) e j  qj0 )i with Physics 245: 69–103.
def
e
’(q) = (1=N) k (k )eik q . Gallavotti G (1999) Statistical Mechanics. Berlin: Springer.
Let ’ ,  (q) def = ’(q) þ q2 j’(q)j þ ’ e(q) and Gallavotti G, Bonetto F and Gentile G (2004) Aspects of the
def Ergodic, Qualitative and Statistical Properties of Motion.
let FV ( , , ) = (1=N) log Zc ( , , ) with Zc the
Berlin: Springer.
partition function P in the volume  Pcomputed Gawedzky K and Kupiainen A (1983) Block spin renormalization
0
with
P energy U = jj0 ’ ,  (qj  qj0 ) þ " j W(qj ) þ group for dipole gas and (@)4 . Annals of Physics 147:
" jW(qj )j. Then FV ( , , ) is convex in ,  198–243.
and it is uniformly bounded above and below if Gawedzky K and Kupiainen A (1985) Massless lattice 44 theory:
jj, j"j, jj  1 (say) and j j  0 : here 0 > 0 exists rigorous control of a renormalizable asymptotically free model.
Communications in Mathematical Physics 99: 197–252.
if r 2 j’(r)j satisfies the assumption set at the Gibbs JW (1981) Elementary Principles in Statistical Mechanics.
beginning of the section ‘‘Continuous symmetries: Woodbridge (Connecticut): Ox Bow Press (reprint of the 1902
‘no d = 2 crystal’ theorem’’ and the density is smaller edition).
than a close packing (this is because the potential U0 Higuchi Y (1981) On the absence of non translationally invariant
will still satisfy conditions similar to [14] uniformly Gibbs states for the two dimensional Ising system. In: Fritz J,
Lebowitz JL, and Szaz D (eds.) Random Folds. Amsterdam:
in j"j, jj < 1 and j j small enough). North-Holland.
Convexity and boundedness above and below Hohenberg PC (1967) Existence of long range order in one and
in an interval imply bounds on the derivatives in two dimensions. Physical Review 158: 383–386.
88 Introductory Article: Functional Analysis

Landau L and Lifschitz LE (1967) Physique Statistique. Moscow: Miracle-Solé S (1995) Surface tension, step free energy and facets
MIR. in the equilibrium crystal shape. Journal Statistical Physics 79:
Lanford O and Ruelle D (1969) Observables at infinity and 183–214.
states with short range correlations in statistical mechanics. Olla S (1987) Large deviations for Gibbs random fields.
Communications in Mathematical Physics 13: 194–215. Probability Theory and Related Fields 77: 343–357.
Lebowitz JL (1974) GHS and other inequalities. Communications Onsager L (1944) Crystal statistics. I. A two dimensional Ising
in Mathematical Physics 28: 313–321. model with an order–disorder transition. Physical Review 65:
Lebowitz JL and Penrose O (1979) Towards a rigorous molecular 117–149.
theory of metastability. In: Montroll EW and Lebowitz JL Penrose O and Onsager L (1956) Bose–Einstein condensation and
(eds.) Fluctuation Phenomena. Amsterdam: North-Holland. liquid helium. Physical Review 104: 576–584.
Lee TD and Yang CN (1952) Statistical theory of equations of Pfister C and Velenik Y (1999) Interface, surface tension and
state and phase transitions, II. Lattice gas and Ising model. Reentrant pinning transition in the 2D Ising model. Commu-
Physical Review 87: 410–419. nications in Mathematical Physics 204: 269–312.
Lieb EH (2002) Inequalities. Berlin: Springer. Ruelle D (1969) Statistical Mechanics. New York: Benjamin.
Lieb EH and Lebowitz JL (1972) Lectures on the Thermodynamic Ruelle D (1971) Extension of the Lee–Yang circle theorem.
Limit for Coulomb Systems, In: Lenard A (ed.) Springer Physical Review Letters 26: 303–304.
Lecture Notes in Physics, vol. 20, pp. 135–161. Berlin: Springer. Sinai Ya G (1991) Mathematical Problems of Statistical Mechanics.
Lieb EH and Thirring WE (2001) Stability of Matter from Atoms Singapore: World Scientific.
to Stars. Berlin: Springer. van Beyeren H (1975) Interphase sharpness in the Ising model.
Mastropietro V (2004) Ising models with four spin interaction at Communications in Mathematical Physics 40: 1–6.
criticality. Communications in Mathematical Physics 244: Wilson KG and Fisher ME (1972) Critical exponents in 3.99
595–642. dimensions. Physical Review Letters 28: 240–243.
McCoy BM and Wu TT (1973) The two Dimensional Ising Yang CN (1962) Concept of off-diagonal long-range order and
Model. Cambridge: Harvard University Press. the quantum phases of liquid He and of superconductors.
Mermin ND (1968) Crystalline order in two dimensions. Physical Reviews of Modern Physics 34: 694–704.
Review 176: 250–254.

Introductory Article: Functional Analysis


S Paycha, Université Blaise Pascal, Aubière, France which was developed later) settled on firm ground.
ª 2006 Elsevier Ltd. All rights reserved. Strongly inspired by algebraic methods, Fredholm’s
work at the turn of the nineteenth century, in which
emerged the concept of kernel of an operator,
became a founding stone for the modern theory of
Introduction
integral equations. Hilbert developed further Fred-
Functional analysis is concerned with the study of holm’s methods for symmetric kernels, exploiting
functions and function spaces, combining techniques analogies with the theory of real quadratic forms
borrowed from classical analysis with algebraic and thereby making clear the importance of the
techniques. Modern functional analysis developed notion of square-integrable functions. With Hilbert’s
around the problem of solving equations with Grundzüge einer allgemeinen Theorie der Integral-
solutions given by functions. After the differential gleichung, a further step was made from the
and partial differential equations, which were ‘‘algebra of the infinite’’ to the ‘‘geometry of the
studied in the eighteenth century, came the integral infinite.’’ The contribution of Fréchet, who intro-
equations and other types of functional equations duced the abstract notion of a space endowed with a
investigated in the nineteenth century, at the end of distance, made it possible to transfer Euclidean
which arose the need to develop a new analysis, geometry to the framework of what have since
with functions of an infinite number of variables then been called Hilbert spaces, a basic concept in
instead of the usual functions. In 1887, Volterra, mathematics and quantum physics.
inspired by the calculus of variations, suggested a The usefulness of functional analysis in the study
new infinitesimal calculus where usual functions are of quantum systems became clear in the 1950s when
replaced by functionals, that is, by maps from a Kato proved the self-adjointness of atomic Hamilto-
function space to R or C, but he and his followers nians, and Garding and Wightman formulated
were still missing some algebraic and topological axioms for quantum field theory. Ever since func-
tools to be developed later. Modern analysis was tional analysis lies at the very heart of many
born with the development of an ‘‘algebra of the approaches to quantum field theory. Applications
infinite’’ closely related to classical linear algebra of functional analysis stretch out to many branches
which by 1890 had (up to the concept of duality, of mathematics, among which are numerical
Introductory Article: Functional Analysis 89

analysis, global analysis, the theory of pseudodiffer- any non-negative integer k, the space Ck ([0, 1]) of
ential operators, differential geometry, operator functions on P [0, 1] of class Ck equipped with the
algebras, noncommutative geometry, etc. norm kf kk = ki= 0 kf (i) k1 expressed in terms of a
finite number of seminorms kf (i) k1 = supx2[0,1]
jf (i) (x)j, i = 0, . . . , k, is also a Banach space.
Topological Vector Spaces
The space C1 ([0, 1]) of smooth functions on the
Most topological spaces one comes across in practice interval [0, 1] is not anymore a Banach space since
are metric spaces. A metric on a topological space E its topology is described by a countable family of
is a map d : E  E ! [0, þ 1[ which is symmetric, seminorms kf kk with k varying in the positive
such that d(u, v) = 0 , u = v and which verifies the integers. The metric
triangle inequality d(u, w)  d(u, v) þ d(v, w) for all X
1
vectors u, v, w. A topological space E is metrizable if kf  gkk
dðf ; gÞ ¼ 2k
there is a metric d on E compatible with the topology k¼1
1 þ kf  gkk
on E, in which case the balls with radius 1=n centered
turns it into a Fréchet space, that is, a locally convex
at any point x 2 E form a local base at x – that is, a
complete metric space. The space S(Rn ) of rapidly
collection of neighborhoods of x such that every
decreasing functions, which are smooth functions f
neighborhood of x contains a member of this
on Rn for which
collection. A sequence (un ) in E then converges to
u 2 E if and only if d(un , u) converges to 0. kf k; :¼ sup jx Dx f ðxÞj
The Banach fixed-point theorem on a complete x2R n
metric space (E, d) is a useful tool in nonlinear is finite for any multiindices  and , is also a
functional analysis: it states that a (strict) contrac- Fréchet space with the topology given by the
tion on E, that is, a map T : E ! E such that seminorms k  k,  . Further examples of Fréchet
d(Tu, Tv)  k(u, v) for all u 6¼ v 2 E and fixed 0 < spaces are the space C1 0 (K) of smooth functions
k < 1, has a unique fixed point T u0 = u0 . In with support in a fixed compact subset K  Rn
particular, it provides local existence and uniqueness equipped with the countable family of seminorms
of solutions of differential equations dy=dt = F(y, t)
with initial condition y(0) = y0 , where F is Lipschitz kD f k1; K ¼ sup jDx f ðxÞj;  2 N n0
x2K
continuous.
Linear functional analysis starts from topological and the space C1 (M, E) of smooth sections of a
vector spaces, that is, vector spaces equipped with a vector bundle E over a closed manifold M equipped
topology for which the operations are continuous. A with a similar countable family of seminorms. Given
topological vector space equipped with a local base an open subset  = [p2N Kp with Kp , p 2 N com-
whose members are convex is said to be locally pact subsets of Rn , the space D() = [p2N C1 0 (Kp )
convex. Examples of locally convex spaces are equipped with the inductive limit topology – for
normed linear spaces, namely vector spaces which a sequence (fn ) in D() converges to f 2 D()
equipped with a norm, a concept that first arose in if each fn has support in some fixed compact subset
the work of Fréchet. A seminorm on a vector space K and (D fn ) converges uniformly to D f on K for
V is a map  : V ! [0,1[ which obeys the triangle each mutilindex  – is a locally convex space.
identity (u þ v)  (u) þ (v) for any vectors u, v Among Banach spaces are Hilbert spaces which
and such that (u) = jj(u) for any scalar  and have properties very similar to those of finite-
any vector u; if (u) = 0 ) u = 0, it is a norm, often dimensional spaces and are historically the first
denoted by k  k. A norm on a vector space E gives type of infinite-dimensional space to appear with the
rise to a translation-invariant distance function works of Hilbert at the beginning of the twentieth
d(u, v) = ku  vk making it a metric space. century. A Hilbert space is a Banach space equipped
Historically, one of the first examples of normed with a norm kk that derives from an inner product,
spaces is the space C([0, 1]) investigated by Riesz of that is, kuk2 = hu, ui with h , i a positive-definite
(real- or complex-valued) continuous functions on bilinear (or sesquilinear according to whether the
the interval [0, 1] equipped with the supremium base space is real or complex) form. Hilbert spaces
norm kf k1 := supx2[0,1] jf (x)j. In the 1920s, the are fundamental building blocks in quantum
general definition of Banach space arose in connec- mechanics; using (closed) tensor products, from a
tion with the works of Hahn and Banach. A normed Hilbert P space H one builds the Fock space
linear space is a Banach space if it is complete as a F (H) = 1 k
k = 0  H and
P from there the bosonic
metric space for the induced metric, C([0, 1]) being a Fock space F (H) = 1 
k=0 s
k
H (where s stands
prototype of a Banach space. More generally, for for the (closed) symmetrized tensor product) as well
90 Introductory Article: Functional Analysis

P
as the fermionic Fock space F (H) = 1 k
k=0  H to define W s, p () and H s (M, E) with s any real
(where k stands for the antisymmetrized (closed) number.
tensor product). Sobolev spaces arise in many areas of mathe-
A prototype of Hilbert space is the space l2 (Z) of matics; one central example in probability theory is
complex-valued
P sequences (un )n2Z such that the Cameron–Martin space H 1 ([0, t]) embedded in
2
ju
n2Z n j is finite, which is already implicit in the Wiener space C([0, t]). This embedding is a
Hilbert’s Grundzügen. Shortly afterwords, Riesz and particular case of more general Sobolev embedding
Fischer, with the help of the integration tool theorems, which embed (possibly continuously,
introduced by Lebesgue, showed that the space sometimes even compactly (the notion of compact
L2 (]0, 1[) (first introduced by Riesz) of square- operator is discussed in a later section)) W k, p -
summable functions on the interval ]0, 1[, that is, Sobolev spaces in Lq -spaces with q > p such as the
functions f such that continuous inclusion W k, p (R n )  Lq (R n ) with
Z 1 1=2 1=q = 1=p  k=n, or in Cl -spaces with l  k such
kf kL2 ¼ 2
jf ðxÞj dx as, for a bounded open and regular enough subset 
0 of Rn and for any s  l þ n=p with p > n, the
 (the set of
continuous inclusion W s, p ()  Cl ()
is finite, provides an example of Hilbert space.
functions in Cl () such that D u can be continu-
These were then further generalized to spaces  for all jj  l).
ously extended to the closure 
Lp (]0, 1[) of p-summable (1  p < 1) functionals
Sobolev embeddings have important applications for
on ]0, 1[ (i.e., functions f such that
the regularity of solutions of partial differential
Z 1 1=p equations, when showing that weak solutions one
p
kf kLp ¼ jf ðxÞj dx constructs are in fact smooth. In particular, on an n-
0
dimensional closed manifold M for s > l þ n=2, the
is finite), which are not Hilbert unless p = 2 but which Sobolev space H s (M, E) can be continuously
provide further examples of Banach spaces, the space embedded in the space Cl (M, E) of sections of E of
L1 (]0, 1[) of functions on ]0, 1[ bounded almost class C l , which in particular implies that the
everywhere with respect to the Lebesgue measure, solutions of a hypoelliptic partial differential equa-
offering yet another example of Banach space. tion Au = v with v 2 L2 (M, E) are smooth, as for
In 1936, Sobolev gave a generalization of the example in the case of solutions of the Seiberg–
notion of function and their derivatives through Witten equations.
integration by parts, which led to the so-called
Sobolev spaces W k, p (]0, 1[) of functions f 2
Lp (]0, 1[) with derivatives up to order k lying in
Duality
Lp (]0, 1[), obtained as the closure of C1 (]0, 1[) for The concept of duality (in a topological sense) was
the norm initiated at the beginning of the twentieth century by
!1=p Hadamard, who was looking for continuous linear
Xk
p functionals on the Banach space C(I) of continuous
f 7! kf kW k;p ¼ k@ j f kLp
functions on a compact interval I equipped with a
j¼1
uniform topology. It is implicit in Hilbert’s theory
(for p = 2, W k, p (]0, 1[) is a Hilbert space often and plays a central part in Riesz’ work, who
denoted by H k (]0, 1[). They differ from the Sobolev managed to express such continuous functionals as
spaces W0k, p (]0, 1[), which correspond to the closure Stieltjes integrals, one of the starting points for the
of the set D(]0, 1[) for the norm f 7!kf kW k, p ; for modern theory of integration.
example, an element u 2 W 1, p (]0, 1[) lies in The topological dual of a topological vector space
1, p
W0 (]0, 1[) if and only if it vanishes at 0 and 1, E is the space E of continuous linear forms on E
that is, if and only if it satisfies Dirichlet-type which, when E is a normed space, can be equipped
boundary conditions on the boundary of the inter- with the dual norm kLkE = supu2E, kuk1 jL(u)j.
val. Similarly, one defines Sobolev spaces Dual spaces often provide a receptacle for singular
W0k, p (R) = W k, p (R) on R, Sobolev spaces W k, p () objects; any of the functions f 2 Lp (Rn )(p  1) and
and W0k, p () on open subsets   Rn and using a the delta-function at point x 2 Rn, x : f 7! f (x), all lie
partition of unity on a closed manifold M, Sobolev in the space S 0 (R n ) dual to S(Rn ) of tempered
spaces H k (M, E) = W k, 2 (M, E) of sections of vector distributions on Rn , which is itself contained in the
bundles E over M. Using the Fourier transform space D0 (Rn ) of distributions dual to D(Rn ).
(discussed later), one can drop the assumption that k Furthermore, the topological dual E of a nuclear
be an integer and extend the notion of Sobolev space space E contains the support of a probability
Introductory Article: Functional Analysis 91

measure with characteristic function (see the next Lp () can be identified via the Riesz representation

section) given by a continuous positive-definite with Lp () with p conjugate to p, that is, 1=p þ
function on E. Among nuclear spaces are projective 1=p = 1 and Lp () is reflexive, whereas the topolo-
limits E = \p2N Hp (a sequence (un ) 2 E converges gical duals of W s, p () and W0s, p () both coincide
to u 2 E whenever it converges to u in each Hp ) of with W0s, p () so that only W0s, p () is reflexive.
countably many nested Hilbert spaces     Hp  Neither L1 () nor its topological dual L1 () is
Hp1      H0 such that the embedding Hp  reflexive since L1 () is strictly contained in the
Hp1 is a trace-class operator (see the section topological dual of L1 () for there are continuous
‘‘Operator algebras’’). If Hp is the closure of E for linear forms L on L1 () that are not of the form
the norm k  kp , the topological dual E0 of E for the Z
norm k  k0 is an inductive limit E0 = [p2N0 Hp , LðuÞ ¼ uv 8u 2 L1 ðÞ with v 2 L1 ðÞ
where Hp are the dual (with respect to k  k0 ) 

Hilbert spaces with norm k  kp (a sequence (un ) 2 Similarly, the topological dual E of a normed
E0 converges to u 2 E0 whenever it lies in some Hp linear space E can be equipped with the topology
and converges to u for the topology of Hp ) and we induced by the dual norm k  kE and the the weak -
have topology, namely the weakest one for which the
maps L 7! L(u), u 2 E, are continuous, and the unit
E      Hp  Hp1      H0
ball in E is indeed compact for this topology
¼ H00  H1      Hp      E0 (Banach–Alaoglu theorem).
Duality does not always preserve separability – a
As a result of the theory of elliptic operators on a topological vector space is separable if it has a
closed manifold, the Fréchet space C1 (M, E) of countable dense subspace – since L1 (), which is
smooth sections of a vector bundle over a closed not separable, is the topological dual of L1 (),
manifold M is nuclear as the inductive limit of which is separable. However, as a consequence of
countably many Sobolev spaces Hp (M, E) with the Hahn–Banach theorem, if the topological dual of
L2 -dual given by the projective limit of countably a Banach space is separable then so is the original
many Sobolev spaces H p (M, E). space and one has equivalence when adding the
The existence of nontrivial continuous linear reflexivity assumption; a Banach space is reflexive
forms on a normed linear space E is ensured by the and separable whenever its topological dual is. For
Hahn–Banach theorem, which asserts that for any s, p
1  p < 1, Lp () and W0 () are separable and
closed linear subspace F of E, there is a nonvanish- moreover reflexive if p 6¼ 1.
ing continuous linear form that vanishes on F. When
the space is a Hilbert space (H,h , iH ), it follows
from the Riesz–Fréchet theorem that any continuous
Fourier Transform
linear form L on H is represented in a unique way
by a vector v 2 H such that L(u) = hv, uiH for all In the middle of the eighteenth century, oscillations
u 2 H, thus relating the dual pairing on the left with of a vibrating string were interpreted by Bernouilli
the Hilbert inner product on the right and identify- as a limit case for the oscillation of n-point masses
ing the topological dual H with H. when n tends the infinity, and Bernouilli introduced
The strong topology induced by the norm k  k on the novel idea of the superposition principle by
a normed vector space E – that is, the topology in which the general oscillation of the string should
which a sequence (un ) converges to u whenever decompose in a superposition of ‘‘proper oscilla-
kun  uk ! 0 – is too refined to have compact sets tions.’’ This point of view triggered off a discussion
when E is infinite dimensional since the compactness as to whether or not an arbitrary function can be
of the unit ball in E for the strong topology expanded as a trigonometric series. Other examples
characterizes finite-dimensional spaces. Since com- of expansions in ‘‘orthogonal functions’’ (this termi-
pact sets are useful for existence theorems, one is nology actually only appears with Hilbert) had been
inclined to weaken the topology: the weak topology found in the mean time in relation to oscillation
on E – which coincides with the strong topology problems and investigations on heat theory, but it
when E is finite dimensional and for which a was only in the nineteenth century, with the works
sequence (un ) converges to u if and only if L(un ) ! of Fourier and Dirichlet, that the superposition
L(u) 8L 2 E – has compact unit ball if and only if E problem was solved.
is reflexive or, in other words, if E can be canonically Separable Hilbert spaces can be equipped with a
identified with its double dual (E ) . For 1 < p < 1, countable orthonormal system {en }n2Z (hen , em iH =
given an open subset   Rn, the topological dual of mn with h , iH the scalar product on H) which is
92 Introductory Article: Functional Analysis

complete, that is, any vector u 2 H can P be expanded Fourier transform maps a Gaussian function
2
in this system in a unique way u = n2Z u ^n en with x 7! e(1=2)jxj on Rn , where  is a nonzero scalar,
1 2
Fourier coefficientsPu ^n = hu, en i. The latter obey to another Gaussian function  7! e(1=2) jj (up to
Parseval’s relation n2Z j^ un j2 = kuk2 (where k  k is a nonzero multiplicative factor), a starting point for
the norm associated with h , i), and the Fourier T-duality in string theory. More generally, the
transform u 7! (^ u(n))n2Z gives rise to an isometric characteristic function
isomorphism between the separable Hilbert space Z
H and the Hilbert space l2 (Z) of square-summable
^ðÞ :¼ eihx;iH
ðdxÞ
sequences of complex numbers. In particular, the H

space L2 (S1 ) of L2 -functions on the unit circle of a Gaussian probability measure


with covariance
S1 = R=Z with its usual Haar measure dt is separ- C on a Hilbert space H is the function
able with complete orthonormal system t 7! en (t) =  7! e(1=2)h, CiH . Such probability measures typically
e2int , n 2 Z and the Fourier transform arise in Euclidean quantum field theory; in axio-
 Z 1  matic quantum field theory, the analyticity proper-
u 7! t 7! u
^ðnÞ ¼ e2int uðtÞ dt ties of n-point functions can be derived from the
0 n2Z Wightman axioms using Fourier transforms. Thus,
2
Fourier transformation underlies many different
identifies it with the space l (Z). Under this aspects of quantum field theory.
identification, the Hilbert subspace l2 (N) obtained
as the range in l2 (Z) of the projection pþ : (u)n2Z 7!
(un )n2N corresponds to the Hardy space H2 (S1 ). Fredholm operators
The Fourier transform extends to the space S(Rn ),
sending a function f 2 S(Rn ) to the map A complex-valued continuous function K on [0, 1] 
Z [0, 1] gives rise to an integral operator
^ 1 Z 1
7!f ðÞ ¼ pffiffiffiffiffiffiffiffiffiffiffinffi eix f ðxÞ dx
ð2Þ Rn A:f ! Kðx; yÞf ðyÞ dy
0
and maps S(Rn ) onto itself linearly and continuously on complex-valued continuous functions on [0, 1]
with continuous inverse f 7! ^f (). When n = 1, the (equipped with the supremum norm k  k1 ) with the
Poisson formulaPrelates f 2 S(R)Pwith its Fourier following upper bound property:
transform ^f by 1 n = 1 f (2n) =
1 ^
n = 1 f (n).
Since Fourier transformation turns (up to a kA f k1  Sup½0;1
½0;1
jKðx; yÞj kf k1
constant multiplicative factor) differentiation D
In other words, A is a bounded linear operator with
for a multiindex  = (1 , . . . , n ) into multiplication
norm bounded from above by sup[0, 1][0, 1] jK(x, y)j;
by  = 11    nn , it can be used to define W s, p -
a linear operator A : E ! F from a normed linear
Sobolev spaces with s a real number as the space of
space (E,k  kE ) to a normed linear space (F,k  kF ) is
LRp -functions with finite Sobolev norms kukW s, p =
bounded (or continuous) if and only if its (operator)
^()jp )1=p (which coincide with the ones
( j(1 þ jj)s u
norm jkAkj := supkukE 1 kA ukF is bounded.
defined previously when s = k is a non-negative
An integral operator
integer).
Fourier transforms are also used to describe a Z 1
linear pseudodifferential operator A (see next two A:f ! Kðx; yÞf ðyÞ dy
0
sections where the notions of bounded and
unbounded linear operator are discussed) of order defined by a continuous kernel K is, moreover,
a acting on smooth functions on an open subset U compact; a compact operator is a bounded operator
of Rn in terms of its symbol A – a smooth map of normed spaces that maps bounded sets to a
on U  Rn with compact support in x such that for precompact sets, that is, to sets whose closure is
any multi-indices ,  2 Nn0 , there is a constant compact. Other examples of compact operators on
C, with normed spaces are finite-rank operators, operators
with finite-dimensional range. In fact, any compact
jDx D ðx; Þ  C; ð1 þ jjÞajj operator on a separable Hilbert space can be
approximated in the topology induced by the
for any  2 Rn – by operator norm jk  kj by a sequence of finite-rank
Z operators.
1
ðAf ÞðxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffinffi eix A ðx; Þ^f ðÞ d Inspired by the work of Volterra, who, in the case
ð2Þ Rn
of the integral operator defined above, produced
Introductory Article: Functional Analysis 93

continuous solutions = (I  A)1 f of the equation bounded. Unbounded operators arise in partial
f = (I  A) for f 2 C([0, 1]), Fredholm in 1900 differential equations that involve differential opera-
(Sur une classe d’équations fonctionnelles) studied the tors such as the Laplacian  on an open subset  
equation f = (I  A) , introducing a complex para- Rn . The following equations provide fundamental
meter . He proved what is since then called the examples of partial differential equations which
Fredholm alternative, which states that either the arose over time from the study of various problems
equation f = (I  A) has a unique solution for every in mathematical physics with the works of Poisson,
f 2 C([0, 1]) or the corresponding homogeneous equa- Fourier, and Cauchy:
tion (I  A) = 0 has nontrivial solutions. In modern
u ¼ 0 Laplace equation
language, it means that the resolvent R(A,
) = (A 
2

I)1 of a compact linear operator A is surjective if and @ t


þ u ¼ 0 wave equation
only if it is injective. The Fredholm alternative is a @t2
powerful tool to solve partial differential equations @u
þ u ¼ 0 heat equation
among which the Dirichlet problem, the solutions of @t
which P are harmonic functions u (i.e., u = 0, where
and later the Schrödinger equation in quantum
 =  ni= 1 @ 2 u=@x2i ) on some domain  2 Rn with
mechanics:
Dirichlet boundary conditions uj@ = f , where f is a
continuous function on the boundary @. The Dirichlet @u
i ¼ u
problem has geometric applications, in particular to the @t
nonlinear Plateau problem, which minimizes the area of where t is a time parameter.
a surface in Rd with given boundary curves and which An unbounded linear operator on an infinite-
reduces to a (linear) Dirichlet problem. dimensional normed space is usually defined on a
The operator B = I  A built from the compact domain D(A) which is strictly contained in E. The
operator A is a particular Fredholm operator, namely a Laplacian  is defined on the dense domain
bounded linear operator B : E ! F which is invertible D(A) = H 2 (R n ) in L2 (Rn ); it defines a bounded
‘‘up to compact operators,’’ that is, such that there is a operator from H 2 (Rn ) to L2 (R n ) but does not
bounded linear operator C : F ! E with both BC  IF extend to a bounded operator on L2 (R n ). Like this
and CB  IE compact. A Fredholm operator B has a operator, most unbounded operators A : E ! F one
finite-dimensional kernel Ker B and when (E,h , iE ) comes across have dense domain D(A) in E and are
and (F,h , iF ) are Hilbert spaces its cokernel Ker B , closed, that is, their graph {(u, Au), u 2 D(A)} is
where B is the adjoint of B defined by closed as a subset of the normed linear space E  F.
hB u; viF ¼ hu; B viE 8u 2 E; 8v 2 F When not actually closed, they can be closable, that
is, they can have a closed extension called the
is also finite dimensional, so that it has a well- closure of the operator. By the closed-graph theo-
defined index ind(B) = dim(Ker B)  dim(Ker B ), a rem, when E and F are Banach spaces, a linear
starting point for index theory. Töplitz operators operator A : E ! F is continuous whenever its graph
T , where is a continuous function on the unit is closed, as a consequence of which a closed linear
circle S1 , provide first examples of Fredholm operator A : E ! F defined on a dense domain is
operators; they act on the Hardy space H2 (S1 ) by bounded provided its domain coincides with the
!
X X whole space.
Ten am e m ¼ amþn em For a closed operator A : E ! F with dense
m0 m0 domain D(A), when E and F are Hilbert spaces
under the identification H2 (S1 ) ’ l2 (N)  l2 (Z), equipped with inner products h , iE and h , iF , the
with l2 (Z) equipped with the canonical complete adjoint A of A is defined on its domain D(A ) by
orthonormal basis (en , n 2 Z). The Fredholm index hAu; viF ¼ hu; A viE 8ðu; vÞ 2 DðAÞ  DðA Þ
ind(Ten ) is exactly the integer n so that the index of
its adjoint is n, as a consequence of which the index A self-adjoint operator A with domain D(A) is one
map from Fredholm operators to integers is onto. for which D(A) = D(A ) and A = A ; the Laplacian
 on Rn is self-adjoint on the Sobolev space H 2 (Rn )
but it is only essentially self-adjoint on the dense
One-Parameter (Semi) groups domain D(R n ), the latter meaning that its closure is
Unlike in the finite-dimensional situation, a linear self-adjoint.
operator A : E ! F between two normed linear Unbounded self-adjoint operators can arise as
spaces (E,k  kE ) and (F,k  kF ) is not expected to be generators of one-parameter semigroups of bounded
94 Introductory Article: Functional Analysis

operators. A one-parameter family of bounded such that E  H = H  E, which in the particular


operators Tt , t  0 (Tt , t 2 R) on a Hilbert space H case of the standard Wiener measure
on the
is a semigroup (resp. group) if Ts Tt = Ttþs 8t, s  0 Wiener space E = C([0, t]) and with Hilbert space
(resp. 8t, s 2 R) and it is strongly continuous (or given by the Cameron–Martin space H = H 1 ([0, t]),
simply continuous) if limt ! t0 Tt u = Tt0 u at any t0  0 is the bilinear form
(resp. t0 2 R) and for any u 2 H. Z
Stones’ theorem sets up a one-to-one correspon-  rvi
ðu; vÞ 7! hru; 
H
dence between continuous one-parameter unitary
(Ut Ut = Ut Ut = I) groups Ut , t 2 R on a Hilbert with r the (closed) gradient of Malliavin calculus.
space such that U0 = Id and self-adjoint operators The operator , where  is the Laplacian on R n ,
A obtained as infinitesimal generators, that is, as the generates the heat-operator semigroup et , t  0. It
strong limit has a smooth kernel Kt 2 C1 (Rn  Rn ) defined by
Z
Ut u  u n
Au ¼ lim ; u2H ðet f ÞðxÞ ¼ Kt ðx; yÞf ðyÞdy 8f 2 C1
0 ðR Þ
t!0 t Rn

of Ut , t 2 R, which in a compact form reads and defines a smoothing operator, an operator that
Ut = eitA . An important example in quantum maps Sobolev function to smooth function. In
mechanics is Ut = eit H U0 , t 2 R with H a self- general, a pseudodifferential operators A on an
adjoint Hamiltonian, which solves the Schrödinger open subset U of Rn with symbol A only has a
equation d=dtu = iHu. The Lie–Trotter formula, distribution kernel
which has important applications for Feynman Z
path integrals, expresses the unitary semigroup KA ðx; yÞ ¼ eihxy;i ðÞd
generated by A þ B, where A, B, and A þ B are Rn
self-adjoint on their respective domains as a strong The kernel of the inverse Laplacian ( þ m2 )1
limit on Rn (the non-negative real number m2 stands
 itA itB n for the mass) called Green’s function on R n ,
eitðAþBÞ ¼ lim e n e n plays an essential role in the theory of Feynman
t!1
graphs.
On the other hand, positive operators on a
Hilbert space (H,h , iH ) – that is, A self-adjoint
and such that hAu, uiH  0 8u 2 D(A) – generate
Spectral Theory
one-parameter semigroups Tt = etA , t  0. Hille
and Yosida proved that on a Hilbert space, strongly Spectral theory is the study of the distribution of the
continuous contraction (i.e., jkTt kj  1 8t > 0) values of the complex parameter  for which, given
semigroups such that T0 = Id are in one-to-one a linear operator A on a normed space E, the
correspondence with densely defined positive opera- operator A  I has an inverse and of the properties
tors A : D(A)  H ! H that are maximal (i.e., I þ A of this inverse when it exists, the resolvent
is onto), obtained as (minus the) infinitesimal R(A, ) = (A  I)1 of A. The resolvent (A) of A
generators is the set of complex numbers  for which A  I is
invertible with densely defined bounded inverse. The
Tt u  u
Au ¼ lim ; u2H spectrum Sp(A) of A is the complement in C of the
t!0 t resolvent; it consists of a union of three disjoint sets:
of the corresponding semigroups. Similarly, a posi- the set of all complex numbers  for which A  I is
tive densely defined self-adjoint operator A on a not injective, called the point spectrum – such a  is
Hilbert space H gives rise to a densely defined
pffiffiffiffi pclosed
ffiffiffiffi an eigenvalue of A with associated eigenfunction
symmetric sesquilinear form (u, v) 7! pffiffiffiffi Au, AviH
h any u 2 D(A) such that Au = u; the set of points 
(see next section for a definition of A;h , iH is the for which A  I has a densely defined unbounded
scalar product on H) and this map yields a one- inverse R(A, ) called the continuous spectrum; and
to-one correspondence between operators and the set of points  for which A  I has a well-
sesquilinear forms on H with the aforementioned defined unbounded but not densely defined inverse
properties, one of the starting points for the theory R(A, ) called the residual spectrum.
of Dirichlet forms. To a probability measure
on A bounded operator has bounded spectrum and a
a separable Banach space E, one can associate a self-adjoint operator A acting on a Hilbert space has
densely defined closed symmetric sesquilinear form real spectrum and no residual spectrum since the
(it is in fact a Dirichlet form) on a Hilbert space H range of A  I is dense. As a consequence of the
Introductory Article: Functional Analysis 95

Fredholm alternative, the spectrum of a compact with involution given by the adjoint operation
operator consists only of point spectrum; it is A 7! A ; it is a C -algebra, that is, an algebra over
countable with accumulation point at 0. A Hamilto- C with a norm k  k and an involution such that A
nian of a quantum mechanichal system can have is closed for this norm and such that kabk  kakkbk
both point and continuous spectra, but its point and ka ak = kak2 for all a, b 2 A and by the
spectrum is of special interest because the corre- Gelfand–Naimark theorem, every C -algebra is
sponding eigenfunctions are stationary states of the isomorphic to a sub-C -algebra of some L(H). The
system. As was first pointed out by Kac (‘‘Can you notion of spectrum extends from bounded opera-
hear the shape of a drum?’’), the spectrum of an tors to C -algebras; the spectrum sp(a) of an
operator acting on functions can reflect the geome- element a in a C -algebra A is a (compact) set of
try of the space these functions are defined on, a complex numbers such that a    1 is not inver-
starting point for many interesting and far-reaching tible. The notion of self-adjointness also extends
questions in differential geometry. (a = a ), and just as a self-adjoint operator B 2
A self-adjoint linear operator on a Hilbert space L(H) is non-negative (in which case its spectrum
can be described in terms of a family of projections lies in Rþ ) if and only if B = A A for some bounded
E ,  2 R via the spectral representation operator A, an element b 2 A is said to be non-
Z negative if and only if b = a a for some a 2 A, in
A¼ dE which case sp(a)  R þ 0.
SpðAÞ The algebra C(X) of continuous functions f : X !
Given a Borel real-valued function f on R, the operator C vanishing at infinity on some locally compact
Z Hausdorff space X equipped with the supremum
f ðAÞ ¼ f ðÞdE norm and the conjugation f 7! f is also a C -algebra
SpðAÞ and a prototype for abelian C -algebras, since
yields another self-adjoint operator. A positive Gelfand showed that every abelian C -algebra is
operator A on a dense domain D(A) of some Hilbert isometrically isomorphic to C(X), with X compact if
space (H,h , iH ) has non-negative spectrum and for the algebra is unital. To a C -algebra A, one can
any positive real number t, the map  7! et gives associate an abelian group K0 (A) which is dual to the
the associated bounded heat-operator Grothendieck group K0 (X) of isomorphism classes of
Z vector bundles over a compact Hausdorff space X.
etA ¼ et dE Compact operators on a Hilbert space H form
SpðAÞ the only proper two-sided ideal K(H) of the C -
pffiffiffi algebra L(H) which is closed for the operator norm
while the map  7!  gives rise to a positive
pffiffiffiffi pffiffiffiffi2 topology on L(H). The quotient L(H)=K(H) is
operator A such that A = A. called the Calkin space, after Calkin, who classi-
The resolvent can also be used to define new fied all two-sided ideals in L(H) for a separable
operators Hilbert space H; one can set up a one-to-one
Z correspondence between such ideals and certain
1
f ðAÞ ¼ f ðÞRðA; Þd sequence spaces. Corresponding to the Banach
2i C
spacePl1 (Z) of complex-valued sequences (un ) such
from a linear operator via a Cauchy-type integral that n2N jun j < 1, is the -ideal IP1 (H) of trace-
along a countour C around the spectrum; this way class operators. The trace tr(A) = n2Z hA en ,en iH
one defines complex powers Az of (essentially self- of a negative operator A 2 L(H) lies in [0, þ1]
adjoint) positive elliptic pseudodiffferential opera- and is independent of the choice of the complete
tors which enter the definition of the zeta-function, orthonormal basis {en , n 2 Z} of H equipped with
z 7! (A, z), of the operator A. The -function is a the inner product h , iH . I 1 (H) is the Banach space
useful tool to extend the ordinary determinant to of bounded linear operators on H such that
-determinants of self-adjoint elliptic operators, kAk1 = tr(jAj) is bounded. Given an (esssentially
thereby providing an ansatz to give a meaning to self-adjoint) positive differential operator D of
partition functions in the path integral approach to order d acting on smooth functions on a closed
quantum field theory. n-dimensional Riemannian manifold M, its
complex power Dz is a trace class on the space
of L2 -functions on M provided Re(z) > n=d and the
Operator Algebras
corresponding trace tr(Dz ) extends to a mero-
Bounded linear operators on a Hilbert space H morphic function on the whole plane, the
form an algebra L(H) closed for the operator norm -function (D, z) which is holomorphic at 0.
96 Introductory Article: Minkowski Spacetime and Special Relativity

More generally, Banach spaces lp (Z), 1  p < 1, operators) are particularly useful. A Hölder-type
of
P complex-valued sequences (un )n2Z such that inequality shows that the product of two Hilbert–
p
ju
n2Z n j < 1 relate to Schatten ideals I p (H), 1  Schmidt operators is trace-class. Moreover, for any
p < 1, where I p (H) is the Banach space of bounded two Hilbert–Schmidt operators A and B, the
linear operators on H such that kAkp = (tr(jAjp ))1=p ‘‘cyclicity property’’ that tr(A B) = tr(B A) holds,
is bounded. Just as all lp -sequences converge to 0, and the sesquilinear form (A, B) 7! tr(A B ) makes
the Schatten ideals I p (H) all lie in K(H) and we L2 (H) a Hilbert space.
have     I pþ1 (H)  I p (H)      K(H).
Compact operators and Schatten ideals are
useful to extend index theory to a noncommuta-
Further Reading
tive context; a Fredholm module (H, F) over an
involutive algebra A is given by an involutive Adams R (1975) Sobolev Spaces. London: Academic Press.
representation  of A in a Hilbert space H and Dunford N and Schwartz J (1971) Linear Operators. Part I.
a self-adjoint bounded linear operator F on H General Theory. Part II. Spectral Theory. Part III. Spectral
Operators. New York: Wiley.
such that F2 = IdH and the operator brackets Hille E (1972) Methods in Classical and Functional Analysis.
[F, (a)] are compact for all a 2 A. To a London: Academic Press and Addison-Wesley.
p-summable Fredholm module (H, F), that is, Kato T (1982) A Short Introduction to Perturbation Theory for
[F, (a)] 2 I p (H) for all a 2 A, one associates a Linear Operators. New York–Berlin: Springer.
representative of the Chern character ch (H, F) Reed M and Simon B (1980) Methods of Modern Mathematical
Physics vols. I–IV, 2nd edn. New York: Academic Press.
given by a cyclic cocycle on A, which pairs up with Riesz F and SZ-Nagy B (1968) Leçons d’analyse fonctionnelle.
K-theory to build an integer-valued index map Paris: Gauthier–Villars: Budapest Akademiai Kiado.
on K-theory. Rudin W (1994) Functional Analysis, 2nd edn. New York:
Schatten ideals are also useful to investigate the International Series in Pure and Applied Mathematics.
geometry of infinite-dimensional spaces such as loop Yosida K (1980) Functional Analysis, 6th edn. Die Grundlehren
der Mathematischen Wissenschaften in Einzeldarstellungen
groups, for which the Hilbert–Schmidt operators Band vol. 132. Berlin–New York: Springer.
(operators in I 2 (H) are also called Hilbert–Schmidt

Introductory Article: Minkowski Spacetime and Special Relativity


G L Naber, Drexel University, Philadelphia, PA, USA for all w 2 M implies v = 0). Further, g has index 1,
ª 2006 Elsevier Ltd. All rights reserved. that is, there exists a basis {e1 , e2 , e3 , e4 } for M with
8
< 1 if a ¼ b ¼ 1; 2; 3
gðea ; eb Þ ¼ ab ¼ 1 if a ¼ b ¼ 4
Introduction :
0 if a 6¼ b
Minkowski spacetime is generally regarded as the g is called a Lorentz inner product for M and any
appropriate mathematical context within which to basis of the type just described is an orthonormal
formulate those laws of physics that do not refer basis for M. We shall often write v  w for the value
specifically to gravitational phenomena. Here we g(v, w) of g on (v, w) 2 M  M. A vector v 2 M is
shall describe this context in rigorous terms, said to be spacelike, timelike, or null if v  v is
postulate what experience has shown to be its positive, negative, or zero, respectively, and the set
correct physical interpretation, and illustrate by CN of all null vectors is called the null cone in M. If
means of examples its appropriateness for the {e1 , e2 , e3 , e4 } is an orthonormal basis and if
formulation of physical laws. we write v = v1 e1 þ v2 e2 þ v3 e3 þ v4 e4 = va ea (using
the Einstein summation convention, according to
Minkowski Spacetime which a repeated index, one subscript and one
and the Lorentz Group superscript, is summed over its possible values) and
w = wb eb , then
Minkowski spacetime M is a four-dimensional real
vector space on which is defined a bilinear form
v  w ¼ v1 w1 þ v2 w2 þ v3 w3  v4 w4
g : M  M ! R that is symmetric (g(v, w) = g(w, v)
for all v, w 2 M) and nondegenerate (g(v, w) = 0 ¼ ab va wb
Introductory Article: Minkowski Spacetime and Special Relativity 97

Timelike 1. (orthogonality) T  = ,
CN where T means transpose and
0 1
1 0 0 0
B0 1 0 0C
 ¼ ðab Þ ¼ B
@0
C
Null 0 1 0A
Spacelike 0 0 0 1

2. (orientability) det  = 1, and


3. (time orientability) 4 4 1.
We shall refer to any 4  4 matrix  = (a b ) satisfying
these three conditions as a Lorentz transformation
(although one often sees the adjectives ‘‘proper’’ and
Figure 1 Spacelike, timelike and null vectors.
‘‘orthochronous’’ appended to emphasize conditions
(2) and (3), respectively). The set L of all such matrices
forms a group under matrix multiplication that we call
In particular, v is null if and only if simply the Lorentz group. It is a simple matter to show
(Naber 1992, lemma 1.3.4) from the orthogonality
ðv4 Þ2 ¼ ðv1 Þ2 þ ðv2 Þ2 þ ðv3 Þ2 condition (1) that, if 4 4 = 1, then  must be of the
form
(hence the name null ‘‘cone’’ for CN ). Timelike vectors 0 1
are ‘‘inside’’ the null cone and spacelike vectors are 0
B ðRi j Þ 0C
‘‘outside’’ (see Figure 1). B C
@ 0A
We select some orientation for the vector space M
and will henceforth consider only oriented, ortho- 0 0 0 1
normal bases for M. From the Schwartz inequality where (Ri j ) is an element of SO(3), that is, a 3  3
for R3 , one can show (Naber 1992, theorem 1.3.1) orthogonal matrix with determinant 1. The set R of
that, if v is timelike and w is either timelike or null all matrices of this form is a subgroup of L called
and nonzero, then v  w < 0 if and only if v4 w4 > 0 the rotation subgroup. Although it will play no role
in any orthonormal basis. In particular, one can in what we do here, it should be pointed out that in
define an equivalence relation on the set of all many applications (e.g., in particle physics) it is
timelike vectors by decreeing that two such, v and necessary to consider the larger group of transfor-
w, are equivalent if and only if v  w < 0. For mations of M generated by the Lorentz group and
reasons that will emerge shortly we then say that v spacetime translations (xa ! xa þ a , for some con-
and w have the same time orientation. There are stants a , a = 1, 2, 3, 4). This is called the inhomoge-
precisely two equivalence classes, one of which we neous Lorentz group, or Poincaré group.
select and designate future directed. Timelike vectors
in the other class are then called past directed. One
can show (Naber 1992, section 1.3 and corollary
Physical Interpretation
1.4.5) that this classification can be extended to
nonzero null vectors as well (but not to spacelike For the purpose of describing how one is to think of
vectors). We will call an oriented, orthonormal basis Minkowski spacetime and the Lorentz group physi-
time oriented if its timelike vector e4 is future cally it will be convenient to distinguish (intuitively
directed and will consider only these in what and terminologically, if not mathematically) between a
follows. An oriented, time-oriented, orthonormal ‘‘vector’’ in M and a ‘‘point’’ in M (the ‘‘tip’’ of a
basis for M will be called an admissible basis. If vector). The points in M are called events and are to be
{e1 , e2 , e3 , e4 } and {^e1 , ^e2 , ^e3 , ^e4 } are two such bases thought of as actual physical occurrences, albeit
and if we write idealized as ‘‘point events’’ which have no spatial
extension and no duration. One might picture, for
eb ¼ 1 b^e1 þ 2 b^e2 þ 3 b^e3 þ 4 b^e4 example, an instantaneous collision, or explosion, or
¼ a b^ea ; b ¼ 1; 2; 3; 4 ½1 an ‘‘instant’’ in the history of some point material
particle or photon (‘‘particle of light’’).
then the matrix  = (a b ) (a = row index, Events are observed and identified by the assign-
b = column index) can be shown to satisfy the ment of coordinates. We will be interested in
following three conditions (Naber 1992, section 1.3): coordinates assigned in a very particular way by a
98 Introductory Article: Minkowski Spacetime and Special Relativity

very particular type of observer. Specifically, our coordinate axes. On the other hand, for any real
admissible observers preside over three-dimensional, number  one can define an element L() of L by
right-handed, Cartesian spatial coordinate systems, 0 1
cosh  0 0  sinh 
relative to which photons always move along B C
0 1 0 0
straight lines in any direction. With a single clock LðÞ ¼ B
@
C
A ½3
0 0 1 0
located at the origin, such an observer can determine
 sinh  0 0 cosh 
the speed, c, of light in vacuo by the so-called Fizeau
procedure (emit a photon from the origin when the and, if two admissible bases are related by this Lorentz
clock there reads t1 , bounce it back from a mirror transformation, then the coordinate transformation [2]
located at (x1 , x2 , x3 ), receive the photon at the becomes
origin again when the clock there reads t2 and set
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^1 ¼ ðcosh Þ x1  ðsinh Þ x4
x
c = 2 (x1 )2 þ (x2 )2 þ (x3 )2 =(t2  t1 )). Now place an
^2 ¼ x2
x
identical clock at each spatial point and synchronize ½4
them by emitting from the origin a spherical ^3 ¼ x3
x
electromagnetic wave (photons in all directions) ^4 ¼ ðsinh Þ x1 þ ðcosh Þ x4
x
and setting the clock whose location is (x1 , x2 , x3 )
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Letting  = tanh  (so that 1 <  < 1) and suppressing
to read (x1 )2 þ (x2 )2 þ (x3 )2 =c at the instant the ^2 = x2 and x
x ^3 = x3 , one obtains
wave arrives. An observer now assigns to an event
1 
the three spatial coordinates of the location at which ^1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x1  pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x4
x
1 2 1  2
it occurred in his coordinate system as well as the ½5
time reading on the clock at that location at the  1
^4 ¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x1 þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x4
x
instant the event occurred. We shall assume also 1 2 1  2
that our admissible observers are inertial in the sense
of Newtonian mechanics (the trajectory of a particle This corresponds to two observers whose spatial
on which no forces act, when described in terms axes are oriented as shown in Figure 2 with the
of the coordinates just introduced, is a point or a hatted coordinate system moving along the common
straight line traversed at constant speed). It is an x1 -, x
^1 -axis with speed jj, to the right if  > 0 and
experimental fact (and quite a remarkable one) that to the left if  < 0.
all of these admissible observers (whether or not they We remark that, reverting to traditional time units,
are in relative motion) agree on the numerical value of  = v=c, where jvj is the relative speed of the two
the speed of light in vacuo (c
3.00  1010 cm s1 ). coordinate systems, and [5] becomes what is gener-
We shall exploit this fact at the outset to have all of our ally referred to as a ‘‘Lorentz transformation’’ in
admissible observers measure time in units of distance elementary expositions of special relativity, that is,
by simply multiplying their time coordinates t by c. x1  vt
The resulting time coordinate is denoted x4 = ct. In ^1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
x
1  v2 =c2
these units all speeds are dimensionless and the speed ½6
of light in vacuo is 1. t  ðv=c2 Þx1
^t ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
In our mathematical model M of the world of 1  v2 =c2
events, this very subtle and complex notion of an
admissible observer is fully identified with the
conceptually very simple notion of an admissible
x2 xˆ 2
basis {e1 , e2 , e3 , e4 }. If x 2 M is an event and if we
write x = xa ea , then (x1 , x2 , x3 ) are the spatial and x4 (β > 0)
is the time coordinate supplied for x by the
corresponding observer. If {^e1 , ^e2 , ^e3 , ^e4 } is another
basis/observer related to {e1 , e2 , e3 , e4 } by [1] and if
we write x = x ^a^ea , then

^a ¼ a b xb ;
x a ¼ 1; 2; 3; 4 ½2
x 1, xˆ 1
Thus, Lorentz transformations relate the space and
time coordinates supplied for any given event by two
admissible observers. If (a b ) 2 R, then the two x3 xˆ 3
observers differ only in the orientation of their spatial Figure 2 Observers in standard configuration.
Introductory Article: Minkowski Spacetime and Special Relativity 99

There is a sense in which, to understand the material object (e.g., the observer’s clock situated at
kinematic effects of special relativity, it is enough that point) we find that the events x0 and x are both
to restrict one’s attention to the so-called special ‘‘experienced’’ by this material particle and that,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Lorentz transformations L(). Specifically, one can moreover, jg(x  x0 , x  x0 )j is just the time lapse
show (Naber 1992, theorem 1.3.5) that if  2 L is between the events recorded by a clock carried along by
any Lorentz transformation, then there exists a real this material particle. To any other admissible observer
number  and two rotations R1 , R2 2 R such that this material particle appears ‘‘free’’ (not subject to
 = R1 L()R2 . Since R1 and R2 involve no relative forces) because it moves on a straight line with constant
motion, all of the kinematics is contained in L(). speed. This leads us to the following definitions. If
We shall explore these kinematic effects in more x0 , x 2 M are such that x  x0 is timelike, then the
detail shortly. straight line in M containing x0 and x is called the
Now suppose that x and x0 are two distinct events world
p line of a free material particle in M and
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
in M and consider the displacement vector x  x0 jg(x  x0 , x  x0 )j, usually written (x  x0 ), or
from x0 to x. If {e1 , e2 , e3 , e4 } is an admissible basis simply , is the proper time separation of x0 and x.
and if we write x = xa ea and x0 = xa0 ea , then x  One can think of (x  x0 ) as a sort of ‘‘length’’ for
x0 = (xa  xa0 )ea = xa ea . If x  x0 is null, then x  x0 measured, however, by a clock carried along by
 1 2  2 2  3 2  4 2 a free material particle that experiences both x0 and x.
x þ x þ x ¼ x It is an odd sort of length, however, since it satisfies
so the spatial separation of the two events is equal to not the usual triangle inequality, but the following
the distance light would travel during the time lapse ‘‘reversed’’ version.
between the events. The same must be true in any Reversed triangle inequality (Naber 1992, theorem
other admissible basis since Lorentz transformations 1.4.2) Let x0 , x and y be events in M for which y  x
are the matrices of linear maps that preserve the and x  x0 are timelike with the same time orientation.
Lorentz inner product. Consequently, all admissible Then y  x0 = (y  x) þ (x  x0 ) is timelike and
observers agree that x0 and x are ‘‘connectible by
a photon.’’ They even agree as to which of the two ðy  x0 Þ ðy  xÞ þ ðx  x0 Þ ½7
events is to be regarded as the ‘‘emission’’ of the
with equality holding if and only if y  x and x  x0
photon and which is to be regarded as its ‘‘reception’’
are linearly dependent.
since one can show (Naber 1992, theorem 1.3.3)
that, when a vector is either timelike or null and The sense of the inequality in [7] has interesting
nonzero, the sign of its fourth coordinate is the same consequences about which we will have more to say
in every admissible basis (because 4 4 1). Thus, shortly.
x4  x40 is either positive for all admissible observers Finally, let us suppose that x  x0 is spacelike.
(x0 occurred before x) or negative for all admissible Then, in any admissible basis
observers (x0 occurred after x). Since photons move  1 2  2 2  3 2  4 2
along straight lines in admissible coordinate systems x þ x þ x > x
we adopt the following terminology. If x0 , x 2 M are
such that x  x0 is null, then the straight line in M so the spatial separation of x0 and x is greater than the
containing x0 and x is called the world line of a distance light could travel during the time lapse that
photon in M and is to be thought of as the set of all separates them. There is clearly no admissible observer
events in the history of some particle of light that for whom the events occur at the same location. No
‘‘experiences’’ both x0 and x. free material particle (or even photon) can experience
Let us now suppose instead that x  x0 is timelike. both x0 and x. However, one can show (Naber 1992,
Then, in any admissible basis, section 1.5) that, given any real number T (positive,
negative, or zero), one can find an admissible basis
 1 2  2 2  3 2  4 2 x4 = T. Some admissible
{^e1 , ^e2 , ^e3 , ^e4 } in which ^
x þ x þ x < x
observers will judge the events simultaneous, some
so the spatial separation of x0 and x is less than the will assert that x0 occurred before x, and others will
distance light would travel during the time lapse reverse the order. Temporal order, cause and effect,
between the events. In this case, one can prove (Naber have no meaning for such pairs of events. For those
1992, section 1.4) that there exists an admissible basis admissible observers for whom the events are simulta-
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

x1 = ^
{^e1 , ^e2 , ^e3 , ^e4 } in which ^ x2 = ^
x3 = 0, that is, neous (^ x4 = 0), the quantity g(x  x0 , x  x0 ) is
there is an admissible observer for whom the two the distance between them and for this reason this
events occur at the same spatial location, one after the quantity is called the proper spatial separation of x0
other. Thinking of this location as occupied by some and x (whenever x  x0 is spacelike).
100 Introductory Article: Minkowski Spacetime and Special Relativity

For any two events x0 , x 2 M, g(x  x0 , x  x0 ) is is unnecessary, but makes the pictures easier to
given in any admissible basis by (x1 )2 þ (x2 )2 þ draw). The x ^1 -axis will be represented by the
(x3 )2  (x4 )2 and is called the interval separating straight line x ^4 = 0 which, from [5], is given by
x0 and x. It is the closest analog in Minkowskian x4 = x1 (in Figure 3 we have assumed that  > 0).
geometry to the (squared) length in Euclidean Similarly, the x ^4 -axis is identified with the line
geometry. It can, however, assume any real value x4 = (1=)x1 . Since Lorentz transformations leave
depending on the physical relationship between the Lorentz inner product invariant, the hyperbolas
the events x0 and x. Historically, of course, it was (x1 )2  (x4 )2 = k coincide with (^ x1 )2  (^x4 )2 = k and
the various physical interpretations of this interval we calibrate the axes accordingly, for example, the
that we have just described which led Minkowski branch of (x1 )2  (x4 )2 = 1 with x1 > 0 intersects
(Einstein et al. 1958) to the introduction of the the x1 -axis at the point (x1 , x4 ) = (1, 0) and intersects
structure that bears his name. the x ^1 -axis at the point (^ x1 , x
^4 ) = (1, 0). This
necessitates a different scale on the hatted and
unhatted axes, but one can show (Naber 1992,
Kinematic Effects section 1.3) that, with this calibration, all coordi-
nates can be obtained geometrically by projecting
All of the well-known kinematic effects of special parallel to the opposite axis (e.g., the x4 - and x ^4 -
relativity (the addition of velocities formula, the coordinates of an event result from projecting
relativity of simultaneity, time dilation, and length parallel to the x1 - and x ^1 -axes, respectively).
contraction) follow easily from what we have done. Thus, a line of simultaneity in the hatted
Because it eases visualization and because, as we (respectively, unhatted) coordinates is parallel to
mentioned earlier, it suffices to do so, we will limit our the x^1 - (respectively, x1 -) axis so that, in general, a
discussion to the special Lorentz transformations. pair of events lying on one will not lie on the other
Let 1 and 2 be two real numbers and consider (note, however, that these lines are ‘‘really’’ three-
the corresponding elements L(1 ) and L(2 ) of dimensional hyperplanes so what appears to be a
L defined by [3]. Sum formulas for sinh  and point of intersection is actually a two-dimensional
cosh  imply that L(1 )L(2 ) = L(1 þ 2 ). Defining ‘‘plane of agreement’’, any two events in which are
i = tanh i , i = 1, 2, and  = tanh (1 þ 2 ), the sum judged simultaneous by both observers).
formula for tanh  then gives For any two events whatsoever the relationship
1 þ 2 between the time lapse ^ x4 in the hatted coordinates
¼ ½8 4
and the time lapse x in the unhatted coordinates is,
1 þ 1 2
from [5],
The physical interpretation is simple. One has three
 1
admissible observers whose spatial axes are related x4 ¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x1 þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x4
^
in the manner shown in Figure 2. If the speed of the 1  2 1  2
second relative to the first is 1 and the speed of the so the two are generally not equal. Consider, in
third relative to the second is 2 , then the speed of particular, two events on the world line of a point
the third relative to the first is not 1 þ 2 as a at rest in the unhatted coordinate system, for
Newtonian predisposition would lead one to expect,
but rather , given by [8]. This is the relativistic
addition of velocities formula.
We have seen already that, when the interval x4
x̂ 4 (x 1)2 – (x 4)2 = 1
between x0 and x is spacelike, the events will be
judged simultaneous by some admissible obser- Hatted line of simultaneity
vers, but not by others. Indeed, if x4 = 0
and the observers
pffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi are related by [5], then ^x4 = Unhatted line of simultaneity
2 1 1
(= 1   )x = ^ x , which will not be
zero unless  = 0 and so there is no relative motion xˆ 1
(^x1 cannot be zero since then ^ xa = 0 for
a = 1, 2, 3, 4 and x = x0 ). This phenomenon is (xˆ 1, xˆ 4) = (1, 0)
called the relativity of simultaneity and we now
construct a simple geometrical representation of it.
x1
Select two perpendicular lines in the plane to (x 1, x 4) = (1, 0)
represent the x1 - and x4 -axes (the Euclidean ortho-
gonality of the lines has no physical significance and Figure 3 Relativity of simultaneity.
Introductory Article: Minkowski Spacetime and Special Relativity 101

example, two readings on the clock at rest at the


x4 xˆ 1 = 1
origin in this system. Then x1 = 0 so xˆ 4 (x 1)2 – (x 4)2 = 1
1
x4 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x4 > x4
^
1  2
x1 = 0,
This effect is entirely symmetrical since, if ^
then [5] implies xˆ 1
1
x4 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^
x4 > ^
x4
1  2 (xˆ 1, xˆ 4) = (1, 0)
Each observer judges the other’s clocks to be
running slow. This phenomenon is called time x1
(x 1, x 4) = (1, 0)
dilation and is clearly visible in the spacetime
diagram in Figure 4 (e.g., both observers agree
Figure 5 Length contraction.
on the time reading ‘‘0’’ for the clock at the origin of
the unhatted system, but the line x ^4 = 1 intersects
the world line of the clock, i.e., the x4 -axis, at a system. Its ‘‘length’’ in this coordinate system is ^ x1 .
point below (x1 , x4 ) = (0, 1)). The world lines of its end points are two straight
We should emphasize that this phenomenon is lines parallel to the x^4 -axis. If the unhatted observer
quite ‘‘real’’ in the physical sense. For example, locates two events on these world lines ‘‘simulta-
4
certain types of elementary particles (mesons) found neously’’ their coordinates
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi will satisfy x = 0 and,
1 2 1
in cosmic radiation are so short-lived (at rest) that, by [5] ^x = (1= 1   )x so
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
even if they could travel at the speed of light, the x1 ¼ 1   2 ^ x1 < ^x1
time required to traverse our atmosphere would be
some ten times their normal life span. They should and the moving measuring rod appears contracted
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi in
not be able to reach the earth, but they do. Time its direction of motion by a factor of 1   2 . As
dilation ‘‘keeps them young’’ in the sense that what for time dilation, this phenomenon, known as length
seems a normal life time to the meson appears much contraction, is entirely symmetrical, quite real, and
longer to us. clearly visible in a spacetime diagram (Figure 5).
Finally, since admissible observers generally
disagree on which events are simultaneous and
since the only way to measure the ‘‘length’’ of a The Relativity Principle
moving object (say, a measuring rod) is to locate its
We have found that admissible observers can disagree
end points ‘‘simultaneously,’’ it should come as no
about some rather startling things (whether or not two
surprise that length, like simultaneity, and time,
events are simultaneous, the time lapse between two
depends on the admissible observer measuring it.
events even when no one thinks they are simultaneous,
Specifically, let us consider a measuring rod lying
and the length of a measuring rod). This would be
at rest along the x ^1 -axis of the hatted coordinate
a matter of no concern at all, of course, if one could
determine, in any given situation, who was really
(x 1)2 – (x 4)2 = –1
right. Surely, two events are either simultaneous or
x4
xˆ 4
they are not and we need only sort out which
admissible observer has the correct view of the
situation? Unfortunately (or fortunately, depending
xˆ 4 = 1 on one’s point of view) this distinction between
the judgments made by different admissible observers
(x 1, x 4) = (0, 1) is precisely what physics forbids.
(xˆ 1, xˆ 4) = (0, 1) xˆ 1 The relativity principle (Einstein et al. 1958). All
admissible observers are completely equivalent for
the formulation of the laws of physics.
We must be clear that this is not a mathematical
x1
statement. It is rather a statement about the physical
world around us and how it should be described,
Figure 4 Time dilation. gleaned from observations, some of which are
102 Introductory Article: Minkowski Spacetime and Special Relativity

complex and subtle and some of which are common- approximation to the integral and appealing to our
place (a passenger in a smooth, quiet airplane interpretation of the proper time separation
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
traveling at constant groundspeed cannot ‘‘feel’’  = ab xa xb . There are subtleties, however,
his motion relative to the earth). It is a powerful both mathematical and physical (Naber 1992, section
guide for constructing the laws of relativistic 1.4). The mathematical ones are addressed by the
physics, but even more fundamentally it prohibits following result (which combines theorems 1.4.6
us from regarding any particular admissible observer and 1.4.8 of Naber (1992)).
as having a privileged view of the universe. In
Theorem Let x0 and x be two events in M. Then
particular, we are forbidden from attaching any
x  x0 is timelike and future directed if and only if
objective significance to such questions as, ‘‘were the
there exists a timelike world line  : [0 , 1 ] ! M in
two supernovae simultaneous?’’, ‘‘How long did the
M with (0 ) = x0 and (1 ) = x and, in this case,
meson survive?’’, and ‘‘What is the distance between
the Crab Nebula and Alpha Centauri?’’ This is LðÞ   ðx  x0 Þ ½9
severe, but one must deal with it.
with equality holding if and only if  is a parametriza-
tion of a timelike straight line.
Particles and 4-Momentum The inequality [9] asserts that if two material
particles experience both x0 and x, then the one
If I R is an interval, then a map  : I ! M is a curve
that is free (and so can be regarded as at rest in
in M. Relative to any admissible basis we can write
some admissible coordinate system) has longer to
ðÞ ¼ xa ðÞ ea wait for the occurrence of the second event (moving
clocks run slow). For many years this basically
for each  2 I. We shall assume that  is smooth in
obvious fact was christened ‘‘The Twin Paradox.’’
the sense that each xa (), a = 1, 2, 3, 4, is infinitely
Just as a smooth curve in Euclidean space has an
differentiable (C1 ) on I and the velocity vector
arc length parametrization, so a timelike world line
dxa has a proper time parametrization defined as
0 ðÞ ¼ ea
d follows. For each  in [0 , 1 ] let
Z  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
is nonzero for every  2 I (we adopt the usual
custom, in a vector space, of identifying the tangent  ¼  ðÞ ¼ jg ð0 ð Þ; 0 ð ÞÞj d
0
space at each point with the vector space itself). This
definition of smoothness clearly does not depend on (the proper time length of  from (0 ) to ()).
the choice of admissible basis for M. The curve  is Then  = () has a smooth inverse  = () so  can
said to be spacelike, timelike, or null if be reparametrized by . We will abuse our notation
slightly and write
dxa dxb
0 ðÞ  0 ðÞ ¼ ab ð Þ ¼ xa ð Þea
d d
is positive, negative, or zero, respectively, for each The velocity vector with this parametrization is
 2 I. A timelike curve  for which 0 () is future denoted
directed for each  2 I is called a timelike world line dxa
and its image is identified with the set of all events U ¼ U ð Þ ¼ ea
d
in the history of some (not necessarily free) point
material particle. If I = [0 , 1 ] and  : [0 , 1 ] ! M called the 4-velocity of the world line and is the unit
is a timelike world line, then the proper time length tangent vector field to , that is,
of  is defined by Uð Þ  Uð Þ ¼ 1 ½10
Z 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
LðÞ ¼ jgð0 ðÞ; 0 ðÞÞj d for each . An admissible observer is, of course,
0 more likely to parametrize a world line by his own
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi time coordinate x4 . Then
Z 1
dxa dxb
¼ ab d
d d   dx1 dx2 dx3
0
0 x4 ¼ 4 e1 þ 4 e2 þ 4 e3 þ e4
dx dx dx
and interpreted as the time lapse between the events
(0 ) and (1 ) as recorded by a clock carried along by so
the particle whose world line is . This interpretation   0 4 
g  ðx Þ; 0 ðx4 Þ  ¼ 1  kVk2
is easily motivated by writing out a Riemann sum
Introductory Article: Minkowski Spacetime and Special Relativity 103

where to as the ‘‘relativistic mass’’ of the particle, but we


sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi shall avoid this terminology. The fourth component
 1 2  2 2  3 2
dx dx dx of P is given by
kVk ¼ 4
þ 4
þ
dx dx dx4
P4 ¼ P  e4
is the usual magnitude of the particle’s velocity m 1
vector ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ m þ mkVk2 þ    ½15
2 2
  1  kVk
V ¼ V x4
dx1 dx2 dx3 The appearance of the term (1=2)mkVk2 corre-
¼ e 1 þ e 2 þ e3 sponding to the Newtonian kinetic energy suggests
dx4 dx4 dx4
¼ V i ei that P4 be denoted E and called the total relativistic
energy measured by the given admissible observer
in the given admissible coordinate system. One finds for the particle:
then that
 1=2 E ¼ P  e4 ½16
U ¼ 1  kVk2 ðV þ e 4 Þ ½11
Now, one must understand that the concept of
‘‘energy’’ in physics is a subtle one and simply
We shall identify a material particle in M with a
giving P  e4 this name does not ensure that there
pair (, m), where  is a timelike world line and m is
is any physical content. Whether or not the name
a positive constant called the particle’s proper mass
is appropriate can only be determined experimen-
(or rest mass). If each dxa =d, a = 1, 2, 3, 4, is
tally. In particular, one should ask if the appear-
constant, then (, m) is a free material particle with
ance of the term m in [15] is consistent with
proper mass m. The 4-momentum of (, m) is
the view that P4 represents the ‘‘energy’’ of the
defined by P = mU. Thus,
particle. Observe that if kVk = 0 (i.e., if the particle
P  P ¼ m2 ½12 is at rest relative to the given observer), then [15]
gives
In any admissible basis we write
dxa E ¼ m ð¼ mc2 ; in standard unitsÞ ½17
P ¼ Pa ea ¼ mUa ea ¼ m ea
d
 1=2 which we interpret as saying that, even when the
¼ m 1  kVk2 ðV þ e 4 Þ ½13 particle is at rest, it still has energy. If this is really
‘‘energy’’ in the physical sense, then it should be
The ‘‘spatial part’’ of P in these coordinates is possible to liberate and use it. That this is, indeed,
m possible has, of course, been rather convincingly
P ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi V demonstrated.
1  kVk2 Next we observe that not only material particles,
but also photons possess ‘‘momentum’’ and
which, for kVk 1, is approximately mV. Identify- ‘‘energy’’ and therefore should have 4-momentum
ing m with the inertial mass of Newtonian (witness, e.g., the photoelectric effect in which
mechanics (measured by an observer for whom the photons collide with and eject electrons from their
particle’s speed is small), this is simply the classical orbits in an atom). Unlike a material particle,
momentum of the particle. Somewhat
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi more expli- however, a photon’s characteristic feature is not
2
citly, if one expands 1= 1  kVk by the Binomial proper mass, but frequency
, or wavelength
Theorem one finds that = 1=
, related to its energy E by E = h
(h being
m Planck’s constant) and these are highly observer
Pi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi V i dependent (Doppler effect). There is, moreover, no
1  kVk2 ‘‘proper frequency’’ analogous to ‘‘proper mass’’
1 since there is no admissible observer for whom the
¼ mV i þ mV i kVk2 þ    ; i ¼ 1; 2; 3 ½14 photon is at rest. In an attempt to model these
2
features we consider a point x0 2 M, a future
which gives the components of the classical momen- directed null vector N and an interval I R. The
tum plus ‘‘relativistic corrections.’’ In order curve  : I ! M defined by
to preserve a formal similarity with Newtonian
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

mechanics one often sees m= 1  kVk2 referred ðÞ ¼ x0 þ N ½18
104 Introductory Article: Minkowski Spacetime and Special Relativity

is a parametrization of the world line of a photon parallel, in which case the sum is null and future
through x0 . Being null, N can be written in any directed (Naber 1992, lemma 1.4.3). We call this
admissible basis as sum the total 4-momentum of A. Now we formulate
a definition which is intended to model a finite set
N ¼ ðN  e4 Þðd þ e4 Þ ½19 of free particles colliding at some event with a
(perhaps new) set of free particles emerging from the
where
h collision (e.g., an electron and proton collide, with a
d ¼ ðN  e1 Þ2 þ ðN  e2 Þ2 neutron and neutrino emerging from the collision).
A contact interaction in M is a triple (A, x, A), ~
i1=2 h
þ ðN  e 3 Þ2 ðN  e1 Þe1 where A and A~ are two finite sets of free particles,
i neither of which contains a pair of particles with
þ ðN  e2 Þe2 þ ðN  e3 Þe3 ½20 linearly dependent 4-momenta (which would pre-
sumably be physically indistinguishable) and x 2 M
is the direction vector of the world line in the is an event such that
corresponding spatial coordinate system. Now, by
1. x is the terminal point of all of the particles in A
analogy with [16], we define a photon in M to
(i.e., for each world line  : [0 , 1 ] ! M of a
be a curve in M of the form [18], take N to be its
particle in A, (1 ) = x);
4-momentum and define the energy E of the photon ~ and
2. x is the initial point of all the particles in A,
in the admissible basis {e1 , e2 , e3 , e4 } by
3. the total 4-momentum of A equals the total
4-momentum of A. ~
E ¼ N  e4 ½21
Properly (3) is called the conservation of 4-momentum.
Then, by [19], ~ is
If A consists of a single free particle, then (A, x, A)
N ¼ E ðd þ e4 Þ ½22 called a decay (e.g., a neutron decays into a proton, an
electron and an antineutrino).
The corresponding frequency
and wavelength ~
Consider, for example, an interaction (A, x,A)
are then defined by
= E=h and = 1=
. In another
^ þ ^e4 ), where d
^d ^ for which A~ consists of a single photon. The total
admissible basis, one has N = E(
^ 4-momentum of A~ is null so the same must be true of
and E are defined by the hatted versions of [20] and
A. Since the 4-momenta of the individual particles in
[21]. One can then show (Naber 1992, section 1.8)
A are timelike or null and future directed their sum
that
can be null only if they are, in fact, all null and
E^
^ 1   cos  parallel. Since A cannot contain distinct photons with
¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi parallel 4-momenta, it must consist of a single photon
E
1  2
which, by (3), must have the same 4-momentum as
1 ~ In essence, ‘‘nothing happened at
¼ ð1   cos Þ þ  2 ð1   cos Þ þ    ½23 the photon in A.
2 x.’’ We conclude that no nontrivial interaction of the
where  is the relative speed of the two spatial type modeled by our definition can result in a single
coordinate systems and  is the angle (in the photon and nothing else. Reversing the roles of A
unhatted spatial coordinate system) between the and A~ shows that, if 4-momentum is to be conserved,
direction d of the photon and the direction of a photon cannot decay.
motion of the hatted spatial coordinate system. Next let us consider the decay of a single material
Equation [23] is the formula for the relativistic particle into two material particles, for example, the
Doppler effect with the first term in the series being spontaneous disintegration of an atom through
the classical formula. -emission. Thus, we consider a contact interaction
We conclude this section by examining a few ~ in which A consists of a single free material
(A, x, A)
simple interactions between particles of the sort particle of proper mass m0 and A~ consists of two
modeled by our definitions, assuming only that free material particles with proper masses m1 and
4-momentum is conserved in the interaction. For m2 . Let P0 , P1 , and P2 be the 4-momenta of the
convenience, we will use the term free particle to particles of proper mass m0 , m1 , and m2 , respec-
refer to either a free material particle or a photon. tively. Then P0 = P1 þ P2 . Appealing to the
If A is a finite set of free particles, then each ‘‘reversed triangle inequality,’’ the fact that P1 and
element of A has a unique 4-momentum which is a P2 are linearly independent and future directed, and
future-directed timelike or null vector. The sum of [12] we conclude that
any such collection of vectors is timelike and future
directed, except when all of the vectors are null and m0 > m1 þ m2 ½23
Introductory Article: Minkowski Spacetime and Special Relativity 105

The excess mass m0  (m1 þ m2 ) of the initial (, m, q) is a ‘‘test charge’’). Let us write [24] more
particle is regarded, via [17], as a measure of the simply as
amount of energy required to split m0 into two
pieces. Stated somewhat differently, when the two ~ m dU
FðUÞ ¼ ½25
particles in A~ were held together to form the single q d
particle in A, the ‘‘binding energy’’ contributed to
the mass of this latter particle. Dotting both sides of [25] with U gives
Reversing the roles of A and A~ in the last m dU m d
example gives a contact interaction modelling an ~
FðUÞ U ¼ U ¼ ðU  UÞ
q d 2q d
inelastic collision (two free material particles with
masses m1 and m2 collide and coalesce to form a m d
¼ ð1Þ ¼ 0
third of mass m0 ). The inequality [23] remains true, 2q d
of course, and a somewhat more detailed analysis
Since any future-directed timelike unit vector u is
(Naber 1992, section 1.8) yields an approximate
the 4-velocity of some charged particle, we find
formula for m0  (m1 þ m2 ) which can be com- ~  u = 0 for any such vector. Linearity then
that F(u)
pared (favorably) with the Newtonian formula for ~  v = 0 for any timelike vector. Now,
implies F(v)
the loss in kinetic energy that results from the
if u and v are timelike and future directed, then u þ v
collision (energy which, classically, is viewed as ~ þ v)  (u þ v) = F(u)
~  vþ
is timelike so 0 = F(u
taking the form of heat in the combined particle). ~ ~ ~
u  F(v) and therefore F(u)  v =  u  F(v). But M
An analysis of the interaction in which both A and
has a basis of future-directed timelike vectors so
A~ consist of an electron and a photon yields (Naber
1992, section 1.8) a formula for the so-called ~ ~
FðxÞ  y ¼ x  FðyÞ ½26
Compton effect. Many more such examples of this
sort are treated in great detail in Synge (1972, for all x, y 2 M. Thus, at each point, the linear
chapter VI, § 14). transformation F ~ must be skew-symmetric with
respect to the Lorentz inner product. One could
therefore model an electromagnetic field on M by
Charged Particles and Electromagnetic an assignment to each point of a skew-symmetric
Fields linear transformation whose job it is to assign to the
4-velocity of a charged particle whose world line
A charged particle in M is a triple (, m, q), where passes through that point the change in 4-momen-
(, m) is a material particle and q is a nonzero real tum that the particle should expect to experience
number called the charge of the particle. Charged because of the presence of the field. However, a
particles do two things of interest to us. By their slightly different perspective has proved more con-
very presence they create electromagnetic fields and venient. Notice that a skew-symmetric linear trans-
they also respond to the electromagnetic fields formation F ~ : M ! M and the Lorentz inner
created by other charges. product together determine a bilinear form F : M 
Charged particles ‘‘respond’’ to an electromag- M ! R given by
netic field by experiencing changes in 4-momentum.
The quantitative nature of this response, that is, the ~
Fðx; yÞ ¼ FðxÞ y
equation of motion, is generally taken to be the
so-called Lorentz 4-force law which expresses ~  x=
which is also skew-symmetric (F(y, x) = F(y)
the proper time rate of change of the particle’s F(x, y)) and that, conversely, a skew-symmetric
4-momentum at each point of the world line as a bilinear form uniquely determines a skew-symmetric
linear function of the 4-velocity. Thus, at each point linear transformation. Now, an assignment of a
() of the world line skew-symmetric bilinear form to each point of M is
nothing other than a 2-form on M and it is in the
dPðÞ language of forms that we choose to phrase classical
~ðÞ ðUðÞÞ
¼ qF ½24
d electromagnetic theory (a concise introduction to
this language is available, for example, in Spivak
where F ~( ) :M ! M is a linear transformation (1965, chapter 4).
determined, in each admissible coordinate system, Nature imposes a certain restriction on which
by the classical electric E and magnetic B fields (here 2-forms can reasonably represent an electromagnetic
we are assuming that the contribution of q to the field on M (‘‘Maxwell’s equations’’). To formulate
ambient electromagnetic field is negligible, that is, these we introduce a source 1-form J as follows: If
106 Introductory Article: Minkowski Spacetime and Special Relativity

x1 , x2 , x3 , x4 is any admissible coordinate system on On regions in which there are no charges, so that
M, then J = 0, [28] and [31] become the source free Maxwell
equations
J ¼ J1 dx1 þ J2 dx2 þ J3 dx3  dx4 ½27
dF ¼ 0 ½32
where : M ! R is a charge density function and
J = J1 e1 þ J2 e2 þ J3 e3 is a current density vector field and
(these are to be regarded as the usual ‘‘smoothed d F ¼ 0 ½33
out,’’ pointwise versions of ‘‘charge per unit
volume’’ and ‘‘charge flow per unit area per unit that is, both F and  F are closed 2-forms.
time’’ as measured by the corresponding admissible Any 2-form F on M can be written in any admissible
observer). Now, our formal definition is as follows: coordinate system as F = (1/2)Fab dxa ^ dxb (summa-
The electromagnetic field on M determined by the tion convention!), where (Fab ) is the skew-symmetric
source 1-form J on M is a 2-form F on M that matrix of components of F. In order to make contact
satisfies Maxwell’s equation with the notation generally employed in physics, we
introduce the following names for these components:
dF ¼ 0 ½28 0 1
0 B3 B2 E1
and B B3 0 B1 E2 C
ðFab Þ ¼ B
@ B2 B1
C ½34
 
0 E3 A
d F¼J ½29 E1 E2 E3 0
A few comments are in order here. We have chosen Thus,
units in which not only the speed of light, but also
various other constants that one often finds in F ¼ E1 dx1 ^ dx4 þ E2 dx2 ^ dx4
Maxwell’s equations (the dielectric constant 0 and þ E3 dx3 ^ dx4 þ B3 dx1 ^ dx2
magnetic permeability 0 ) are 1 and a factor of 4 in
[29] is ‘‘normalized out.’’ The  in [29] is the Hodge þ B2 dx3 ^ dx1 þ B1 dx2 ^ dx3 ½35
star operator determined by the Lorentz inner Computing  F, dF, d F and  d F and writing
product and the chosen orientation of M. This is a E = E1 e1 þ E2 e2 þ E3 e3 and B = B1 e1 þ B2 e2 þ B3 e3
natural isomorphism one finds that dF = 0 is equivalent to

: p ðMÞ ! 4p ðMÞ; p ¼ 0; 1; 2; 3; 4 div B ¼ 0 ½36

of the p-forms on M to the (4  p)-forms on M and is and


most simply defined as follows: let x1 , x2 , x3 , x4 be any @B
admissible coordinate system on M. If 1 2 0 (M) curl E þ ¼0 ½37
@t
is the constant function (0-form) on M whose value
is 1 2 R, then while  d F = J is equivalent to
div E ¼ ½38

1 ¼ dx1 ^ dx2 ^ dx3 ^ dx4
and
is the volume form on M. If 1  i1 <    < ik  4,
then  (dxi1 ^    ^ dxik ) is uniquely determined by @E
curl B  ¼J ½39
 i    @t
dx 1 ^    ^ dxik ^ dxi1 ^    ^ dxik Equations [36]–[39] are the more traditional render-
¼ dx1 ^ dx2 ^ dx3 ^ dx4 ings of Maxwell’s equations.
In another admissible coordinate system
Thus, for example,  dx2 = dx1 ^ dx3 ^ dx4 ,  (dx1 ^ ^1 , x
x ^2 , x
^3 , x
^4 on M (related to the first by [2]) the
dx2 ) = dx3 ^ dx4 ,  (dx1 ^ dx2 ^ dx3 ^ dx4 ) = 1, 2-form F would be written F = (1=2)F ^ab d^
xa ^ d^
xb .
etc. It follows that, if  is a p-form on M, then Setting xa a
^ =  x 
and x b b
^ =  x 
gives
F = (1=2)(a  b  F^ab )dx ^ dx , so

 ¼ ð1Þpþ1  ½30
^ab ;
F ¼ a  b  F ;  ¼ 1; 2; 3; 4 ½40
(a more thorough discussion is available in Choquet-
Bruhat et al. (1977, chapter V A3)). In particular, Now, suppose that we wish to describe the electro-
[29] is equivalent to magnetic field of a uniformly moving charge.
According to the relativity principle, it does not
d F ¼  J ½31 matter at all whether we view the charge as moving
Introductory Article: Minkowski Spacetime and Special Relativity 107

relative to a ‘‘fixed’’ admissible observer, or the and


observer as moving relative to a ‘‘stationary’’ charge. !
Thus, we shall write out the field due to a charge q 1  
B ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 0e1  x3 e2 þ x2 e3
fixed at the origin of the hatted coordinate system 1   2 r
(‘‘Coulomb’s law’’) and transform, by [40], to an !
unhatted coordinate system moving relative to it. q 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 ððe1 Þ  rÞ ½44
Relative to x ^1 , x
^2 , x
^3 , x
^4 , the familiar inverse square 1   2 r
law for a fixed point charge q located at the spatial
^ = 0 and E ^ = (q=^r3 )^r , where ^r = x for the field of a charge moving uniformly with
origin gives B ^1^e1 þ
velocity e1 at the instant the charge passes through
^2^e2 þ x
x ^3^e3 and ^r = ((^ x1 )2 þ (^x2 )2 þ (^x3 )2 )1=2 (note
^ is defined only on M  Span{^e4 }). Thus, the origin. Observe that when  1, r
r, so [43]
that E
says that the electric field of a slowly moving charge
0 1
0 0 0 ^1
x is approximately the Coulomb field. When  1,
B ^2 C
^ab Þ ¼ q B 0
ðF
0 0 x C ½41
[44] reduces to the Biot–Savart law.
^r3 @ 0 0 0 ^3 A
x Let us consider one other simple application, that
1 2 3 is, the response of a charged particle (, m, q) to an
^ x ^ x ^ x 0
^ab ) electromagnetic field which, for some admissible
It is a simple matter to verify that, on its domain, (F observer, is constant and purely magnetic. For
satisfies the source free Maxwell equations. Taking  to simplicity, we assume that, for this observer E = 0
be the special Lorentz transformation corresponding to and B = be3 , where b is a nonzero constant. The
[5] and writing out [40] with (F ^ab ) given by [41] yields
corresponding 2-form F has components
 1 0 1
x
^ 0 b 0 0
E1 ¼ q 3
^r B b 0 0 0 C
 2 ðFab Þ ¼ B
@ 0 0 0 0A
C
2 q x
^
E ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi 0 0 0 0
1   ^r3 2
 3 (from [34]). The corresponding linear transforma-
q x
^
E3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 ~ has the same matrix relative to this basis so,
tion F
1   ^r 2
½42 with () = xa ()ea and U() = Ua ()ea , the Lorentz
B1 ¼ 0 4-force law [25] reduces to the system of linear
 3 differential equations
2 q x
^
B ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3
1   ^r 2 dU1 bq 2 dU2 bq
 2 ¼ U ; ¼  U1
q x
^ d m d m
3
B ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 dU3 dU4
1   ^r 2 ¼ 0; ¼0
d d
We wish to express these in terms of measurements The system is easily solved and the results easily
made by the unhatted observer at the instant the integrated to give
charge passes through his spatial origin. Setting  
x4 = 0 in [5] gives bq
ðÞ ¼ x0 þ a sin þ  e1
1 m
^1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x1 ;
x ^2 ¼ x2 ;
x ^3 ¼ x3
x  
bq
1  2 þ a cos þ  e2
m
and so  
a2 b2 q2 2
þ ce3 þ 1 þ þ c e4 ½45
1 m2
^r2 ¼ ðx1 Þ2 þ ðx2 Þ2 þ ðx3 Þ2
1  2
where x0 = xa0 ea 2 M is constant and a, , and c are
which, for convenience, we write r2 . Making these real constants with a > 0 (we have used U  U = 1
substitutions in [42] gives to eliminate one other arbitrary real constant). Note
! that, at each point on , (x1  x10 )2 þ (x2  x20 )2 = a2 .
q 1  1 
E ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 x e1 þ x2 e2 þ x3 e3 Thus, if c 6¼ 0 the spatial trajectory in this coordi-
1   2 r nate system is a helix along the e3 -direction
!
q 1 (i.e., along the magnetic field lines). If c = 0, the
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 r ½43 trajectory is a circle in the x1 –x2 plane. This case
1   2 r is of some practical significance since one can
108 Introductory Article: Minkowski Spacetime and Special Relativity

introduce constant magnetic fields in a bubble is any 2-form satisfying dF = 0 and g is an arbitrary
chamber so as to induce a particle of interest to 0-form, then locally, on a neighborhood of any
follow a circular path. We show now how to point, there exists a 1-form A satisfying
measure the charge-to-mass ratio for such a particle.  
Taking c = 0 in [45] and computing U(), then using dA ¼ F and d A¼g ½47
[11] to solve for the coordinate velocity vector V of (a more general result is proved in Parrott (1987,
the particle gives appendix 2) and a still more general one in section
   2.9 of this same source). The usefulness of the
abq=m bq
V ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cos þ  e1 second condition in [47] can be illustrated as
m
1  kVk2 follows. Suppose we are given some (physical)
   configuration of charges and currents (i.e., some
bq
þ sin þ  e2 source 1-form J) and we wish to find the corre-
m
sponding electromagnetic field F. We must solve
From this one computes Maxwell’s equations dF = 0 and  d F = J (subject to
 1 whatever boundary conditions are appropriate).
m2 Locally, at least, we may seek instead a correspond-
kV k2 ¼ 1 þ
a b2 q2
2 ing potential A (so that F = dA). Then the first of
Maxwell’s equations is automatically satisfied
(note that this is a constant). Solving this last equation (dF = d(dA) = 0) and we need only solve
for q=m (and assuming q > 0 for convenience) one  
d (dA) = J. To simplify the notation let us tempora-
arrives at rily write  =  d and consider the operator  =
q 1 kVk d  þ  d on forms (variously called the Laplace–
¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Beltrami operator, Laplace–de Rham operator, or
m ajbj
1  kVk2 Hodge Laplacian on Minkowski spacetime). Then
Since a, b, and kVk are measurable, one obtains the A ¼ dðAÞ þ ðdAÞ ¼ dð d AÞ þ  d ðdAÞ ½48
desired charge-to-mass ratio.
To conclude we wish to briefly consider the According to the result quoted above, we may
existence and use of ‘‘potentials’’ for electromagnetic narrow down our search by imposing the condition
 
fields. Suppose F is an electromagnetic field defined d A = 0, that is
on some connected, open region X in M. Then F is
A ¼ 0 ½49
a 2-form on X which, by [28], is closed. Suppose
also that the second de Rham cohomology H 2 (X ; R) (this is generally referred to as imposing the Lorentz
of X is trivial (since M is topologically R4 this will gauge). With this, [48] becomes A =  d (dA) and
be the case, for example, when X is all of M, or an to satisfy the second Maxwell equation we must
open ball in M, or, more generally, an open ‘‘star- solve
shaped’’ region in M). Then, by definition, every
closed 2-form on X is exact so, in particular, there A ¼ J ½50
exists a 1-form A on X satisfying Thus, we see that the problem of (locally) solving
F ¼ dA ½46 Maxwell’s equations for a given source J reduces
to that of solving [49] and [50] for the potential A.
In particular, such a 1-form A always exists locally To understand how this simplifies the problem, we
on a neighborhood of any point in X for any F. Such note that a calculation in admissible coordinates
an A is not uniquely determined, however, because, shows that the operator  reduces to the compo-
if A satisfies [46], then so does A þ df for any nentwise d’Alembertian &, defined on real-valued
smooth real-valued function (0-form) f on X (d2 = 0 functions by
implies d(A þ df ) = dA þ d2 f = dA = F). Any 1-form
A satisfying [46] is called a (gauge) potential for F. @2 @2 @2 @2
&¼ 2
þ 2
þ 2

The replacement A ! A þ df for some f is called a @ðx1 Þ @ðx2 Þ @ðx3 Þ @ðx4 Þ2
gauge transformation of the potential and the
freedom to make such a replacement without Thus, eqn [50] decouples into four scalar equations
altering [46] is called gauge freedom. &Aa ¼ Ja ; a ¼ 1; 2; 3; 4 ½51
One can show that, given F, it is always possible
to locally solve dA = F for A subject to an arbitrary each of which is the well-studied inhomogeneous
specification of the 0-form  d A. More precisely, if F wave equation.
Introductory Article: Quantum Mechanics 109

Further Reading Naber GL (1992) The Geometry of Minkowski Spacetime. Berlin:


Springer.
Choquet-Bruhat Y, De Witt-Morette C, and Dillard-Bleick M Parrott S (1987) Relativistic Electrodynamics and Differential
(1977) Analysis, Manifolds and Physics. Amsterdam: North- Geometry. Berlin: Springer.
Holland. Spivak M (1965) Calculus on Manifolds. New York: W A Benjamin.
Einstein A et al. (1958) The Principle of Relativity. New York: Synge JL (1972) Relativity: The Special Theory. Amsterdam:
Dover. North-Holland.

Introductory Article: Quantum Mechanics


G F dell’Antonio, Università di Roma ‘‘La Sapienza,’’ are inadequate for the description of emission and
Rome, Italy absorption of light, in which the internal structure of
ª 2006 Elsevier Ltd. All rights reserved. the atom plays a major role.
The birth of the old quantum theory is placed
traditionally at the date of M Planck’s discussion of
the blackbody radiation in 1900.
Historical Background Planck put forward the postulate that light is
In this section we shall briefly recall the basic emitted and absorbed by matter in discrete energy
empirical facts and the first theoretical attempts quanta through ‘‘resonators’’ that have an energy
from which the theory and the formalism of present- proportional to their frequency. This assumption
day quantum mechanics (QM) has grown. In the led, through the use of Gibb’s rules of Statistical
next sections we shall give the mathematical and Mechanics applied to a gas of resonators, to a law
computational structure of QM, mention the physi- (Planck’s law) which reproduces the empirical
cal problems that QM has solved with much findings on the radiation from a blackbody. It led
success, and describe the serious conceptual consis- Einstein to ascribe to light (which had, since the
tency problems which are posed by QM (and which times of Maxwell, a successful description in terms
remain unsolved up to now). of waves) a discrete, particle-like nature. Nine years
Empirical rules of discretization were observed later A Einstein gave further support to Planck’s
already, starting from the 1850s, in the absorption postulate by showing that it can reproduce correctly
and in the emission of light. Fraunhofer noticed the energy fluctuations in blackbody radiation and
that the dark lines in the absorption spectrum of even clarifies the properties of specific heat. Soon
the light of the sun coincide with the bright lines in afterwards, Einstein (1924, 1925) proved that the
the emission lines of all elements. G Kirchhoff and putative particle of light satisfied the relativistic laws
R Bunsen reached the conclusion that the relative (relation between energy and momentum) of a
intensities of the emission and absorption of light particle with zero mass.
implied that the ratio between energy emitted and This dual nature of light received further support
absorbed is independent of the atom considered. from the experiments on the Compton effect and
This was the starting point of the analysis by from description, by Einstein, of the photoelectric
Planck. effect (Einstein 1905). It should be emphasized
On the other hand, by the end of the eighteenth that while Planck considered with light in interaction
century, the spatial structure of the atom had been with matter
as composed of bits of energy h
(h ’
investigated; the most successful model was that of 6, 6  1027 erg s), Einstein’s analysis went much
Rutherford, in which the atom appeared as a small further in assigning to the quantum of light properties
nucleus of charge Z surrounded by Z electrons of a particle-like (localized) object. This marks a
attracted by the nucleus according to Coulomb’s complete departure from the laws of classical electro-
law. This model represents, for distances of the magnetism. Therefore, quoting Einstein,
order of the size of an atom, a complete departure
It is conceivable that the wave theory of light, which
from Newton’s laws combined with the laws of retains its effectiveness for the representation of purely
classical electrodynamics; indeed, according to these optical phenomena and is based on continuous functions
laws, the atom would be unstable against collapse, over space, will lead to contradiction with the experiments
and would certainly not exhibit a discrete energy when applied to phenomena in which there is creation or
spectrum. We must conclude that the classical laws conversion of light; indeed these phenomena can be better
110 Introductory Article: Quantum Mechanics

described on the assumption that light is distributed P Jordan, W Pauli, P Dirac and, on the mathema-
discontinuously in space and described by a finite number tical side, also by J von Neumann and A Weyl. This
of quanta which move without being divided and which formulation maintains that one should only consider
must be absorbed or emitted as a whole. relations between observable quantities, described
Notice that, for wavelength of 8103 Å, a 30 W by elements that depend only on the initial and final
lamp emits roughly 1020 photons s1 ; for macro- states of the system; each state has an internal
scopic objects the discrete nature of light has no energy. By energy conservation, the difference
appreciable consequence. between the energies must be proportional (with a
Planck’s postulate and energy conservation imply universal constant) to the frequency of the radiation
that in emitting and absorbing light the atoms of the absorbed or emitted. This is enough to define the
various elements can lose or gain energy only by energy of the state of a single atom modulo an
discrete amounts. Therefore, atoms as producers or additive constant. The theory must also take into
absorbers of radiation are better described by a account the probability of transitions under the
theory that assigns to each atom a (possible infinite) influence of an external electromagnetic field.
discrete set of states which have a definite energy. We shall give some details later on, which will
The old quantum theory of matter addresses help to follow the basis of this approach.
precisely this question. Its main proponent is The other attempt was originated by L de Broglie
N Bohr (Bohr 1913, 1918). The new theory is following early remarks by HW Bragg and
entirely phenomenological (as is Planck’s theory) M Brillouin. Instead of emphasizing the discrete
and based on Rutherford’s model and on three nature of light, he stressed the possible wave nature
more postulates (Born 1924): of particles, using as a guide the Hamilton–Jacobi
formulation of classical mechanics. This attempt
(i) The states of the atom are stable periodic was soon supported by the experiments of Davisson
orbits, as given by Newton’s laws, of energy and Germer (1927) of scattering of a beam of ions
En , n 2 Zþ , given by En = hn f (n), where h is from a crystal. These experiments showed that,
Plank’s constant, n is the frequency of the while electrons are recorded as ‘‘point particles,’’
electron on that orbit, and f(n) is for each atom their distribution follows the law of the intensity for
a function approximately linear in Z at least for the diffraction of a (dispersive) wave. Moreover, the
small values of Z. relation between momentum and frequency was,
(ii) When radiation is emitted or absorbed, the within experimental errors, the same as that
atom makes a transition to a different state. obtained by Einstein for photons.
The frequency of the radiation emitted or The theory started by de Broglie was soon placed
absorbed when making a transition is in almost definitive form by E Schrödinger. In this
n, m = h1 jEn  Em j. approach one is naturally led to formulate and solve
(iii) For large values of n and m and small values of partial differential equations and the full develop-
(n  m)=(n þ m) the prediction of the theory ment of the theory requires regularity results from
should agree with those of the classical theory the theory of functions.
of the interaction of matter with radiation. Schrödinger soon realized that the relations which
were found in the approach of Heisenberg could be
Later, A Sommerfeld gave a different version of the easily (modulo technical details which we shall
first postulate, by requiring that the allowed orbits discuss later) obtained within the formalism he was
be those for which the classical action is an integer advocating and indeed he gave a proof that the two
multiple of Planck’s constant. formalisms were equivalent. This proof was later
The old quantum theory met success when refined, from the mathematical point of view, by
applied to simple systems (atoms with Z < 5) but J von Neumann and G Mackey.
it soon appeared evident that a new, radically In fact, Schrödinger’s approach has proved much
different point of view was needed and a fresh more useful in the solution of most physical
start; the new theory was to contain few free problems in the nonrelativistic domain, because it
parameters, and the role of postulate (iii) was now can rely on the developments and practical use of
to fix the value of these parameters. the theory of functions and of partial differential
There were two (successful) attempts to construct equations. Heisenberg’s ‘‘algebraic’’ approach has
a consistent theory; both required a more sharply therefore a lesser role in solving concrete problems
defined mathematical formalism. The first one was in (nonrelativistic) QM.
sparked by W Heisenberg, and further important If one considers processes in which the number of
ideas and mathematical support came from M Born, particles may change in time, one is forced to
Introductory Article: Quantum Mechanics 111

introduce a Hilbert space that accommodates states neighborhood of !0 ), one finds that u
^(x, !) is an
with an arbitrarily large number of particles, as is approximate solution of the equation
the case of the theory of relativistic quantized field
or in quantum statistical mechanics; it is then more !20 2
uðx; !Þ ¼
^ n ðx; !Þ^
uðx; !Þ ½1
difficult to follow the line of Schrödinger, due to c2
difficulties in handling spaces of functions of Writing u(x, !) = A(x, !) ei(!=c)W(x, !) the phase
infinitely many variables. The approach of Heisen- W(x, !) satisfies, in the high-frequency limit, the
berg, based on the algebra of matrices, has a rather eikonal equation jrW(x, !)j2 = n2 (x, !). One can
natural extension to suitable algebras of operators; define for the solution a phase velocity vf and it
the approach of Schrödinger, based on the descrip- turns out that vf = c=jrW(x, !)j.
tion of a state as a (wave) function, encounters more On the other hand, classical mechanics can also be
difficulties since one must introduce functionals over described by propagation of surfaces of constant value
spaces of functions and the description of dynamics for the solution W(x, t) of the Hamilton–Jacobi
does not have a simple form. equation H(x, rW) = E, with H = p2 =2m þ V(x).
From this point of view, the generalization of Recall that high-frequency (the realm of geometric
Heisenberg’s approach has led to much progress in optics) corresponds to small distances. This analogy
the understanding of the structure of the resulting led Schrödinger (1926) to postulate that the dynamics
theory. Still some relevant results have been satisfied by the waves associated with the particles was
obtained in a Schrödinger representation. We shall given by the (Schrödinger) equation
not elaborate further on this point.
We shall end this introductory section with a @ ðx; tÞ h2
ih ¼ x ðx; tÞ þ VðxÞ ðx; tÞ ½2
short description of the emergence of the structure @t 2m
of QM in Heisenberg’s and Schrödinger’s
This wave was to describe the particle and its motion,
approaches; this will provide a motivation for the
but, being complex valued, it could not represent any
axiom of QM which we shall introduce in the
measurable property. It is a mathematical
R property of
following section. For an extended analysis, see, for
the solutions of [2] that the quantity j (x, t)j2 d3 x is
example, Jammer (1979).
preserved in time. Furthermore, if one sets
The specific form that was postulated by
de Broglie (1923) for the wave nature of a particle ðx; tÞ  j ðx; tÞj2
relies on the relation of geometrical optics with
h 
wave propagation and on the formulation of jðx; tÞ  i ½ ðx; tÞr ðx; tÞ  ðx; tÞr ðx; tÞ ½3
Hamiltonian mechanics as a sort of ‘‘wave front 2m
propagation’’ through the solution of the Hamilton– one easily verifies the local conservation law
Jacobi equation and the introduction of group
@
velocity. þ div jðx; tÞ ¼ 0 ½4
By the analogy with electromagnetic wave, it is @t
natural to associate with a free nonrelativistic These mathematical properties led to the statis-
particle of momentum p and mass m the plane wave tical interpretation given by Max Born: in those
experiments in which the position of the particles is
h p2 measured, the integral of j (x, t)j2 over a region  of
p ðx; tÞ ¼ eiðpxEtÞ=h ; h¼
 ; E¼ space gives the probability that at time t the particle
2 2m
is localized in the region . Moreover, the current
Schrödinger obtained the equation for a quantum associated with a charged particle is given locally by
particle in a field of conservative forces with j(x, t) defined above.
potential V(x) by considering an analogy with the Let us now briefly review Heisenberg’s approach.
propagation of an electromagnetic wave in a At the heart of this approach are: empirical formulas
medium with refraction index n(x, !) that varies for the intensities of emission and absorption of
slowly on the scale of the wavelength. Indeed, in this radiation (dispersion relations), Sommerfeld’s quan-
case the ‘‘wave’’ follows the laws of geometrical tum condition for the action and the vague
optics, and has therefore a ‘‘particle-like’’ behavior. statement ‘‘the analogue of the derivative for the
If one denotes by u^(x, !) the Fourier transom (with discrete action variable is the corresponding finite
respect to time) of a generic component of the difference quotient.’’ And, most important, the
electric field and one assumes that the field be remark that the correct description of atomic
essentially monochromatic (so that the support of physics was through quantities associated with
u
^(x, !) as a function of ! is in a very small pairs of states, that is, (infinite) matrices and the
112 Introductory Article: Quantum Mechanics

empirical fact that the frequency (or rather the wave The conclusion Born and Heisenberg drew is that
number) !k, j of the radiation (emitted or absorbed) the matrix A that takes the place of the momentum
in the transition between the atomic levels k and in the classical theory must be such that
j (k 6¼ j) satisfies the Ritz combination principle jAnþm, n j2 = e2 hm1 f (n þ m, n). In the same vein,
!m, j þ !j, k = !m, k . It easy to see that any doubly considering the polarization in a static electric
indexed family satisfying this relation must have the field, it is possible to find an expression for the
form !m, k = Em  Ek for suitable constant Ej . matrix that takes the place of the coordinate x in
It was empirically verified by Kramers that the classical Hamiltonian theory.
dipole moment of an atom in an external monochro- In general, the new approach (matrix mechanics)
matic external field with frequency  was proportional associates matrices with some relevant classical
to the field with a coefficient (of polarization) observables (such as functions of position or
  momentum) with a time dependence that is derived
e2 X fi Fi from the empirical dispersion relations of Kramers,
P  ½5
4m i i2   2 i2   2 the correspondence principle, Bohr’s rule, Sommer-
feld action principle and first- (and second-) order
where e, m are the charge and the mass of the
perturbation theory for the interaction of an atom
electron and fi , Fi are the probabilities that the
with an external electromagnetic field. It was soon
frequency  is emitted or absorbed.
clear to Born and Jordan (1925) that this dynamics
A detailed analysis of the phenomenon of polarization
took the form ihA_ = AH  HA for a matrix H that
in classical mechanics, with the clearly stated aim ‘‘of
for the case of the hydrogen atom is obtained for the
presenting the results in a way that may give hints for the
classical Hamiltonian with the prescription given for
construction of a New Mechanics’’ was made by Max
the coordinates x and p. It was also seen as plausible
Born (1924). He makes use of action-angle variables
the relation [^ ^k ] = iI among the matrices x
xh , p ^k and
{ Ji , i } assuming that the atom can be considered as a
^k corresponding to position and momentum. One
p
collection of harmonic oscillators with frequency i
year later P Dirac (1926) pointed out the structural
coupled linearly to the electric field of frequency .
identity of this relation with the Poisson bracket of
In the dipole approximation one obtains the
Hamiltonian dynamics, developed a ‘‘quantum alge-
following result for the polarization P (linear
bra’’ and a ‘‘quantum differentiation’’ and proved
response in energy to the electric field):
that any  -derivation (derivation which preserves
X jAðJÞj2 ð  mÞ the adjoint) of the algebra BN of N  N matrices is
P¼ 2ðm  rJ Þ ½6 inner, that is, is given by (a) = i[a, h] for a
ðmÞ>0 ððm  Þ2  
Hermitian matrix h. Much later this theorem was
where k = @H=@Jk , H is the interaction Hamiltonian), extended (with some assumptions) to the algebra of
and A( J) is a suitable matrix. In order to derive the all bounded operators on a separable Hilbert space.
new dynamics, having as a guide the correspondence Since the derivations are generators of a one-
principle, one has to compare this result with the parameter continuous group of automorphisms,
Kramers dispersion relation, which we write (to make that is, of a dynamics, this result led further strength
the comparison easier) in the form to the ideas of Born and Heisenberg.
The algebraic structure introduced by Born,
e2 X fm;n fn;m Jordan, and Heisenberg (1926) was used by Pauli
P¼ 2 2
 2 Em > En ½7
4m n;m n;m   n;m  2 (1927) to give a purely group-theoretical derivation
of the spectrum of the hydrogen atom, following the
Bohr’s rule implies that (n þ , n) = (E(n þ  lines of the derivation in symplectic mechanics of the
E(n))=h. SO(4) symmetry of the Coulomb system. This
Born and Heisenberg noticed that, for n suffi- remarkable success gave much strength to the
ciently large and k small, one can approximate the Heisenberg formulation of QM, which was soon
differential operator in [6] with the corresponding recognized as an efficient instrument in the study of
difference operator, with an error of the order of k/n. the atomic world.
Therefore, [6] could be substituted by The algebraic formulation was also instrumental
" in the description given by Pauli (1928) of the
1
X jAnþm;n j2
P¼ h 2
‘‘spin’’ (a property of electrons empirically postu-
2
mk >0 ðn þ mÞ   lated by Goudsmidt and Uhlenbeck to account for a
#
jAnm;n j2 hyperfine splitting of some emission lines) as
 ½8 ‘‘internal’’ degree of freedom without reference to
ðn  mÞ2  2 spatial coordinates and still connected with the
Introductory Article: Quantum Mechanics 113

properties of the the system under the group of interpretation forces the particle wave to be square
spatial rotations. This description through matrices integrable, and mathematics provides a limitation on
has a major role also in the formulation by Pauli of the simultaneous localization in momentum and
the exclusion principle (and its relation with Fermi– position leading to Heisenberg’s uncertainty princi-
Dirac statistics), which gave further credit to the ple. Dynamics is obtained from a particle–wave
Heisenberg’s theory by helping in reproducing duality and an analogy with the relativistic wave
correctly the classification of the atoms. equation in the low-energy regime. The presence of
These features may explain why the ‘‘standard’’ bound states with quantized energies is seen as a
formulation of the axioms of QM given in the next consequence of the well-known fact that waves
section shows the influence of Heisenberg’s confined to a bounded spatial region have their
approach. On the other hand, comparison with wave number (and therefore energy) quantized.
experiments is usually set in the framework in
Schrödinger’s approach. Posing the problems in
terms of properties of the solution of the Schrödinger Formal Structure
equation, one is led to a pragmatic use of the
In this section we describe the formal mathematical
formalism, leaving aside difficulties of interpreta-
structure that is commonly associated with QM. It
tion. This separation of ‘‘the axioms’’ from the
constitutes a coherent mathematical theory, but the
‘‘practical use’’ may be one of the reasons why a
interpretation axiom it contains leads to conceptual
serious analysis of the axioms and of the problems
difficulties.
that arise from them is apparently not a concern for
We state the axioms in the form in which they
most of the research in QM, even from the point of
were codified by J von Neumann (1966); they
view of mathematical physics.
constitute a mathematically precise rendering of the
One should stress that both the approach of Born
formalism of Born, Heisenberg, and Jordan. The
and Heisenberg and that of de Broglie and Schrö-
formalism of Schrödinger per se does not require
dinger are rooted in a mixture of attention to the
general statements about the category of
experimental data, deep understanding of the pre-
observables.
vious theory, bold analogies and approximations,
and deep concern for the consistency of the ‘‘new Axiom I
mechanics.’’
(i) Observables are represented by self-adjoint opera-
There is an essential difference between the
tors in a complex separable Hilbert space H.
starting points of the two approaches. In Heisen-
(ii) Every such operator represents an observable.
berg’s approach, the atom has a priori no spatial
structure; the description is entirely in terms of its Remark Axiom I (ii) is introduced only for mathe-
properties under emission and absorption of light, matical simplicity. There is no physical justification
and therefore its observable quantities are repre- for part (ii). In principle, an observable must be
sented by matrices. Dynamics enters through the connected to a procedure of measurement (observa-
study of the interaction with the electromagnetic tion) and for most of the self-adjoint operators on H
field, and some analogies with the classical theory of (e.g., in the Schrödinger representation for
electrodynamics in an asymptotic regime (correspon- ixk (@=@xh )xk ) such procedure has not yet been given).
dence principle). In this way, as we have briefly
Axiom II
indicated, the special role of some matrices, which
have a mutual relation similar to the relation of (i) Pure states of the systems are represented by
position and momentum in Hamiltonian theory. normalized vectors in H.
Following this analogy, it is possible to extend the (ii) If a measurement of the observable A is made on
theory beyond its original scope and consider a system in the state represented by the element
phenomena in which the electrons are not bound  2 H, the average of the numerical values one
to an atom. obtains is < , A >, a real number because A is
In the approach of Schrödinger, on the other self-adjoint (we have denoted by < , > the
hand, particles and collections of particles are scalar product in H).
represented by spatial structures (waves). Spatial
Remark Notice that Axiom II makes no statement
coordinates are therefore introduced a priori, and
about the outcome of a single measurement.
the position of a particle is related to the intensity of
the corresponding wave (this was stressed by Born). Using the natural complex structure of B(H), pure
Position and momentum are both basic measurable states can be extended as linear real functionals on
quantities as in classical mechanics. Physical B(H).
114 Introductory Article: Quantum Mechanics

One defines a state as any linear real positive b, 2 R then immediately after the measure-
functional on B(H) (all bounded operators on the ment the system can be in any (not necessarily
separable Hilbert space H) and says that a state is pure) state which lies in the convex hull of the
normal if it is continuous in the strong topology. pure states which are in the spectral subspace of
It can be proved that a normal state can be the operator A in the interval b; 
decomposed into a convex combination of at most (b  , b þ ).
a denumerable set of pure states. With these
Note Statements (ii) and (iii) can be extended
definitions a state is pure iff it has no nontrivial
without modification to the case in which the initial
decomposition. It is worth stressing that this state-
state is not a pure state, and is represented by a
ment is true only if the operators that correspond to
density matrix
.
observable quantities generate all of B(H); one refers
to this condition by stating that there are no Remark 1 Axiom III makes sure that if one
superselection rules. performs, immediately after the first, a further
By general results in the theory of the algebra measurement of the same observable A the outcome
B(H), a normal state  is represented by a positive will still lie in the interval b; . This is needed to
operator of trace class
through the formula give some objectivity to the statement made about
(A) = Tr(
A). Since a positive trace-class operator the outcome; notice that one must place the
(usually referred to as density matrix in analogy condition ‘‘immediately after’’ because the evolution
with its classical counterpart) has eigenvalues k may not leave invariant the spectral subspaces of A.
that are positive and sum up to 1, the decomposition
P If the operator A has, in the interval b; , only
of the normal state  takes the form
= k k k , discrete (pure point) spectrum, one can express
where k is the projection operator onto the kth Axiom III in the following way: the outcome can
eigenstate (counting multiplicity). be any state that can be represented by a convex
It is also convenient to know that if a sequence of affine superposition of the eigenstates of A with
normal states
k on B(H) converges weakly (i.e., for eigenvalues contained in b; .
each A 2 B(H) the sequence
k (A) converges) then
the limit state is normal. This useful result is false in In the very special case when A has only one
general for closed subalgebras of B(H), for example, eigenvalue in b; and this eigenvalue is not
for algebras that contain no minimal projections. degenerate, one can state Axiom III in the following
Note that no pure state is dispersion free with form (commonly referred to as ‘‘reduction of the
respect to all the observables (contrary to what wave packet’’): the system after the measurement is
happens in classical mechanics). Recall that the pure and is represented by an eigenstate of the
dispersion of the state 
with respect to the operator A.
observable A is defined as 
(A) 
(A2 )  (
(A))2 .
Remark 2 Notice that the third axiom makes a
The connection of the state with the outcome of a
statement about the state of the system after the
single measurement of an observable associated with
measurement is completed.
an operator A is given by the following axiom, which
we shall formulate only for the case when the self- It follows from Axiom III that one can measure
adjoint operator A has only discrete spectrum. The ‘‘simultaneously’’ only observables which are repre-
generalization to the other case is straightforward but sented by self-adjoint operators that commute with
requires the use of the spectral projections of A. each other (i.e., their spectral projections mutually
commute). It follows from the spectral representa-
Axiom III
tion of the self-adjoint operators that a family {Ak }
(i) If A has only discrete spectrum, the possible of commuting operators can be considered (i.e.,
outcomes of a measurement of A are its there is a representation in which they are) functions
eigenvalues {ak }. over a common measure space.
(ii) If the state of the system immediately before the Axioms I–III give a mathematically consistent
measurement is represented by the vector P 2 H, formulation of QM and allow a statistical descrip-
the probability that the outcome be ak is h j < , tion (and statistical prediction) of the outcome of
A; k
h > j, where h
A; k
are a complete orthonormal the measurement of any observable. It is worth
set in the Hilbert space spanned by the eigenvec- remarking that while the predictions will have only
tors of A to the eigenvalue ak . a statistical nature, the dynamical evolution of the
(iii) If a system is in the pure state  and one observables (and by duality of the states) will be
performs a measurement of the observable described by deterministic laws. The intrinsically
A with outcome aj 2 (b  , b þ ) for some statistical aspect of the predictions comes only from
Introductory Article: Quantum Mechanics 115

the third postulate, which connects the mathemati- statistical mixture of the same two states, defined
cal content of the theory with the measurement by the density matrix
= jaj2  þ jbj2  , where we
process. have denoted by  the orthogonal projection onto
The third axiom, while crucial for the connection the normalized vector . Therefore, the search for
of the mathematical formalism with the experimen- these interference terms is one of the means to verify
tal data, contains the seed of the conceptual the predictions of QM, and their smallness under
difficulties which plague QM and have not been given conditions is a sign of quasiclassical behavior
cured so far. of the system under study.
Indeed, the third axiom indicates that the process Strictly connected to superposition are entangle-
of measurement is described by laws that are ment and the partial trace operation. Suppose that
intrinsically different from the laws that rule the one has two systems which when considered
evolution without measurement. This privileged role separately are described by vectors in two Hilbert
of the changing by effect of a measurement leads to spaces Hi , i = 1, 2, and which have observables Ai 2
serious conceptual difficulties since the changing is B(Hi ). When we want to study their mutual
independent of whether or not the result is recorded interaction, it is natural to describe both of them in
by some observer; one should therefore have a way the Hilbert space H1  H2 and to consider the
to distinguish between measurements and generic observables A1  I and I  A2 .
interactions with the environment. When the systems interact, the interaction will not
A related problem that is originated by Axiom III in general commute with the projection operator 1
is that the formulation of this axiom refers implicitly onto H1 . Therefore, even if the initial state is of the
to the presence of a classical observer that certifies form 1  2 , i 2 Hi , the final state (after the
the outcomes of measurements and is allowed to interaction) is a vector 2 H1  H2 which cannot
make use of classical probability theory. This be written as = 1  2 with i 2 Hi . It can be
observer is not subjected therefore to the laws shown, however, that there always exist two
of QM. orthonormal family P vectors n 2 H1 and n 2 H2
These two aspects of the conceptual difficulties such
P that = cn n  n for suitable cn 2 C,
have their common origin in the separation of the jcn j2 = 1 (this decomposition is not unique in
measuring device and of the measured systems into general).
disjoint entities satisfying different laws. The diffi- Recalling that  (A1  I) =  (A1 ), one can write
culties in the theory of measurement have not yet X
received a satisfactory answer, but various attempts  ðA1  IÞ ¼ jcn j2 n ðA1 Þ ¼ 
ðA1 Þ
have been made, with various degree of success, and X

 jcn j2 n
some of them are described briefly in the section n
‘‘Interpretation problems.’’ It appears therefore that
QM in its present formulation is a refined and The map 2 :  ! 
1 is called reduction or also
successful instrument for the description of the conditioning) with respect to H2 ; it is also called
nonrelativistic phenomena at the Planck scale, but ‘‘partial trace’’ with respect to H2 . The first notation
its internal consistency is still standing on shaky reflects the analogy with conditioning in classical
ground. probability theory.
Returning to the axioms, it is worth remarking The map 2 can be extended by linearity to a map
explicitly that according to Axiom II a state is a from normal states (density matrices) on B(H1  H2 )
linear functional over the observables, but it is to normal states on B(H1 ) and gives rise to a
represented by a sesquilinear function on the positivity-preserving and trace-preserving map.
complex Hilbert space H. Since Axiom II states One can in fact prove (Takesaki 1971) that any
that any normalized element of H represents a state conditioning for normal states of a von Neumann
(and elements that differ only by a phase represent algebra M is completely positive in the sense that it
the same state) together with , also  a þ remains positive after tensorization of M with B(K),
b , jaj2 þ jbj2 = 1 represent a state superposition of where K is an arbitrary Hilbert space.
 and (superposition principle). It can also be proved that a partial converse is
But for an observable A, one has in general true, that is, that every completely positive trace-
 (A) 6¼ jaj2  (A) þ jbj2  (A), due to the cross-terms preserving map  on normal states of a von
in the scalar product. The superposition principle is Neumann algebra A B(H) can be written, for a
one of the characteristic features of QM. The suitable choice of a larger Hilbert space K and
superposition of the two pure states  and has partial Pisometries Vk , in the form (Kraus form)
properties completely different from those of a (a) = k Vk aVk .
116 Introductory Article: Quantum Mechanics

But it must be remarked that, if U(t) is a one- described above for a trace. Most of the definitions
parameter group of unitary operators on H1  H2 (e.g., of entropy) can be given in this enlarged
and
is a density matrix, the one-parameter family context, but differences may occur, since in general
of maps (t) 
! 2 (U(t)
U (t)) does not, in A does not contain finite-dimensional projections,
general, have the semigroup property (t þ s) = and therefore the trace function is not the trace
(t)  (s) s, t > 0 and therefore there is in general commonly defined in a Hilbert space. We shall not
no generator (of a reduced dynamics) associated describe further this very interesting and much
with it. Only in special cases and under very strong developed theory, of major relevance in quantum
hypothesis and approximations is there a reduced statistical mechanics. For a thorough presentation
dynamics given by a semigroup (Markov property). see Ohya and Petz (1993).
Since entanglement and (nontrivial) conditioning are The simplest and most-studied example is the
marks of QM, and on the other side the Markov case when each Hilbert space Hi is a complex
property described above is typical of conditioning in two-dimensional space. The resulting system is
classical mechanics, it is natural to search for condi- constructed in analogy with the Ising model of
tions and approximations under which the Markov classical statistical mechanics, but in contrast to that
property is recovered, and more generally under which system it possesses, for each value of the index i,
the coherence properties characteristic of QM are infinitely many pure states. The corresponding
suppressed (decoherence). We shall discuss briefly this algebra of observables is a closed subalgebra of
problem in the section ‘‘Interpretation problems,’’ (C2  C2 ) Z and generically does not contain any
devoted to the attempts to overcome the serious finite-dimensional projection.
conceptual difficulties that descend from Axiom III. This model, restricted to the case (C2  C2 )K , K a
It is seen from the remarks and definitions above finite integer, has become popular in the study of
that normal states (density matrices) play the role quantum information and quantum computation, in
that in classical mechanics is attributed to measures which case a normalized element of Hi is called a q-bit
over phase space, with the exception that pure states (in analogy with the bits of information in classical
in QM do not correspond to Dirac measures (later information theory). It is clear that the unit sphere in
on we shall discuss the possibility of describing a (C2  C2 ) contains many more than four points, and
quantum-mechanical states with a function (Wigner this gives much more freedom for operations on the
function) on phase space). system. This is the basis of quantum computation and
In this correspondence, evaluation of an observa- quantum information, a very interesting field which
ble (a measurable function over phase space) over a has received much attention in recent years.
state (a normalized, positive measure) is related to
finding the (Hilbert space) trace of the product of an
operator in B(H) with a density matrix. Notice that Quantization and Dynamics
the trace operation shares some of the properties of The evolution in nonrelativistic QM is described by
the integral, in particular tr AB = tr BA if A is in the Schrödinger equation in the representation in
trace class and B 2 B(H) (cf. g 2 L1 and f 2 L1 ) which for an N-particle system the Hilbert space is
and tr AB > 0 if A is a density matrix and B is a L2 (R3N  Ck , where Ck is a finite-dimensional space
positive operator. This suggests to define functions which accounts for the fact that some of the
over the density matrices that correspond to quan- particles may have a spin content.
tities which are important in the theory of dynami- Apart from (often) inessential parameters, the
cal systems, in particular the entropy. Schrödinger equation for spin-0 particles can be
This is readily done if the Hilbert space is finite written typically as
dimensional, and in the infinite-dimensional case if
one takes as observables all Hermitian bounded @
ih ¼ H
operators. In quantum statistical mechanics one is @t
led to consider an infinite collection of subsystems, X
N

each one described with a Hilbert space (finite or H mk ðihrk þ Ak Þ2


k¼1
infinite dimensional) Hi , i = 1, 2, . . . , the space of
representation is a subspace K of H1  H2     , X
N X
N
þ Vk ðxk Þ þ Vi;k ðxi  xk Þ ½9
and the observables are a (weakly closed) subalgebra
k¼1 i6¼k;1
A of B(K) (typically constructed as an inductive
limit of elements of the form I  I     Ak  I   ). where h is Planck’s constant, Ak are vector-valued
In this context one also considers normal states on A functions (vector potentials), and Vk and Vi, k are
and defines a trace operation, with the properties scalar-valued function (scalar potentials) on R3 .
Introductory Article: Quantum Mechanics 117

If some particles have of spin 1/2, the correspond- One is led to wonder whether the structure of
ing kinetic energy term should read  (i h
 r)2 , fields (operator-valued elements in the dual of
where
k , k = 1, 2, 3, are the Pauli matrices and one compactly supported smooth functions on classical
must add a term W(x) which is a matrix field with spacetime), taken over in a simple way from the
values in Ck  Ck and takes into account the field structure of classical electromagnetism, is a
coupling between the spin degrees of freedom. valid instrument in the description of phenomena
Notice that the local operator i
 r is a ‘‘square that take place at a scale incomparably smaller than
root’’ of the Laplacian. the scale (atomic scale) at which we have reasons to
A relativistic extension of the Schrödinger equa- believe that the formalisms of Schrödinger and
tion for a free particle of mass m
0 in dimension Heisenberg provide a suitable model for the descrip-
3 was obtained by Dirac in a space of spinor- tion of natural phenomena.
valued functions k (x, t), k = 0, 1, 2, 3, which carries The phenomena which are related to the interac-
an irreducible representation of the Lorentz group. tion of a quantum nonrelativistic particle interacting
In analogy with the electromagnetic field, for which with the quantized electromagnetic field take place
a linear partial differential equation (PDE) can be at the atomic scale. These phenomena have been the
written using a four-dimensional representation of subject of very intense research in theoretical
the Lorentz group, the relativistic Dirac equation is physics, mostly within perturbation theory, and the
the linear PDE analysis to the first few orders has led to very
spectacular results (although there is at present no
X
3
@ proof that the perturbation series are at least
i k ¼m ; x0  ct
@xk asymptotic).
k¼0
In this field rigorous results are scarce, but
where the k generate the algebra ofP a representation recently some progress has been made, establishing,
of the Lorentz group. The operator (@=@xk ) k is a among other things, the existence of the ground
local square root of the relativistically invariant state (a nontrivial result, because there is no gap
d’Alembert operator @ 2 =@x20 þ   m  I. separating the ground-state energy from the con-
When one tries to introduce (relativistically tinuous part of the spectrum) and paving the way
invariant) local interactions, one faces the same for the description of scattering phenomena; the
problem as in the classical mechanics, namely one latter result is again nontrivial because the photon
must introduce relativistically covariant fields (e.g., field may lead to an anomalous infrared (long-
the electromagnetic field), that is, systems with an range) behavior, much in the same way that the
infinite number of degrees of freedom. If this field is long-range Coulomb interaction requires a special
considered as external, one faces technical problems, treatment in nonrelativistic scattering theory.
which can be overcome in favorable cases. But if one This contribution to the Encyclopedia is meant to
tries to obtain a fully quantized theory (by also be an introduction to QM and therefore we shall
quantizing the field) the obstacles become unsur- limit ourselves to the basic structure of nonrelativis-
mountable, due also to the nonuniqueness of the tic theory, which deals with systems of a finite
representation of the canonical commutation rela- number of particles interacting among themselves
tions if these are taken as the basis of quantization, and with external (classical) potential fields, leaving
as in the finite-dimensional case. for more specialized contributions a discussion of
In a favorable case (e.g., the interaction of a more advanced items in QM and of the successes
quantum particle with the quantized electromagnetic and failures of a relativistically invariant theory of
field) one can set up a perturbation scheme in a interaction between quantum particles and quan-
parameter  (the physical value of  in natural units tized fields.
is roughly 1/137). We shall come back later to We shall return therefore to basics.
perturbation schemes in the context of the Schrö- One may begin a section on dynamics in QM by
dinger operator; in the present case one has been discussing some properties of the solutions of the
able to find procedures (renormalization) by which Schrödinger equation, in particular dispersive effects
the series in  that describe relevant physical and the related scattering theory, the problem of
quantities are well defined term by term. But even bound states and resonances, the case of time-
in this favorable case, where the sum of the first few dependent perturbation and the ionization effect,
terms of the series is in excellent agreement with the the binding of atoms and molecules, the Rayleigh
experimental data, one has reasons to believe that scattering, the Hall effect and other effects in
the series is not convergent, and one does not even nanophysics, the various multiscale and adiabatic
know whether the series is asymptotic. limits, and in general all the physical problems that
118 Introductory Article: Quantum Mechanics

have been successfully solved by Schrödinger’s QM topologies. The strongest result refers to Wigner’s
(as well as the very many interesting and unsolved case. One can prove that if a one-parameter group
problems). of Wigner automorphism t is measurable in the
We will consider briefly these issues and the weak topology (i.e., t
(A) is measurable in t for
approximation schemes that have been developed in every choice of A and
) then it is possible to choose
order to derive explicit estimates for quantities of the U(t) provided by Wigner’s theorem in such a
physical interest. Since there are very many excellent way that they form a group which is continuous in
reviews of present-day research in QM (e.g., Araki the strong topology. Similar results are obtained for
and Ezawa (2004), Blanchard and Dell’Antonio the cases of Kadison and Segal automorphism, but
(2004), Cycon et al. (1986), Islop and Sigal (1996), in both cases one has to assume continuity of t in a
Lieb (1990), Le Bris (2005), Simon (2002), and stronger topology (the strong operator topology in
Schlag (2004)) we refer the reader to the more the Segal case, the norm topology in Kadison’s).
specialized contributions to this Encyclopedia for a Weak continuity is sufficient if the operator product
detailed analysis and precise statements about the is preserved (in this case one speaks of automorph-
results. isms of the algebra of bounded operators). The
We prefer to come back first to the foundations of existence of the continuous group U(t) defines a
the theory; we shall take the point of view of Hamiltonian evolution. One has indeed:
Heisenberg and start discussing the mapping proper-
Theorem 1 (Stone). The map t ! U(t), t 2 R is a
ties of the algebra of observables and of the states.
weakly continuous representation of R in the set of
Since transition probabilities play an important role,
unitary operators in a Hilbert space H if and only if
we consider only transformations  which are such
there exists a self-adjoint operator H on (a dense set
that, for any pair of pure states 1 and 2 , one has
of) H such that U(t) = eitH and therefore
< (1 ), (2 ) > = < 1 , 2 >. We call these maps
Wigner automorphisms. dUðtÞ
A result of Wigner (see Weyl (1931)) states that if  2 DðHÞ ! i  ¼ HUðtÞ ½10
dt
 is a Wigner automorphism then there exists a
unique operator U , either unitary or antiunitary, The operator H is called generator of the dynamics
such that (P) = U PU for all projection operators. described by U(t).
If there is a one-parameter group of such auto- Note In Schrödinger’s approach the operator
morphisms, the corresponding operators are all described in Stone’s theorem is called Hamiltonian,
unitary (but they need not form a group). in analogy with the classical case. In the case of one
A generalization of this result is due to Kadison. particle of mass m in R3 subject to a conservative
Denoting by I1, þ the set of density matrices, a force with potential energy V(x) it has the following
Kadison automorphism  is, by definition, such that form, in units in which h = 1:
for all
1 ,
2 2 I1, þ and all 0 < s < 1 one has (s
1 þ
(1  s)
2 ) = s(
1 ) þ (1  s)(
2 ). For Kadison auto- 1 X @x2
k
morphisms the same result holds as for Wigner’s. H¼  þ VðxÞ; ¼ ½11
2m k
@x2k
A similar result holds for automorphisms of the
observables. Notice that the product of two Hermi- If the potential V depends on time, Stone’s theorem
tian operators is not Hermitian in general, but is not directly applicable but still the spectral
Hermiticity is preserved under Jordan’s product properties of the self-adjoint operators Ht and of
defined as A  B  (1=2)[AB þ BA]. the Kernel of the group  ! eiHt  are essential to
A Segal automorphism is, by definition, an solve the (time-dependent) Schrödinger equation.
automorphism of the Hermitian operators that The semigroup t ! etH0 is usually a positivity-
preserves the Jordan product structure. A theorem preserving semigroup of contractions and defines a
of Segal states that is a Segal automorphism if and Markov process; in favorable cases, the same is true
only if there exist an orthogonal projector E, a of t ! etH (Feynmann–Kac formula).
unitary operator U in EH, and an antiunitary There is an analogous situation in the general
operator V in (I  E)H such that (A) = W AW  , theory of dynamical systems on a von Neumann
where W  U V. algebra; in analogy with the case of elliptic
We can study now in more detail the description operators, one defines as ‘‘dissipation’’ a map  on
of the dynamics in terms of automorphism of a von Neumann algebra M which satisfies (a a)

Wigner or Kadison type when it refers to states a (a) þ (a )a for all a 2 M. The positive dissipa-
and of Segal type when it refers to observables. We tion  is called completely positive if it remains
require that the evolution be continuous in suitable positive after tensorization with B(K) for any
Introductory Article: Quantum Mechanics 119

Hilbert space K. Notice that according to this the essentially self-adjoint operator that acts on the
definition every  -derivation is a completely positive smooth functions with compact support as multi-
dissipation. For dissipations there is an analog of the plication by the coordinate x and p ^ is defined
theorem of Stinespring, and often bounded dissipa- similarly in Fourier space. This representation can
tion can be written as be trivially generalized to construct operators q ^k and
 X ^k in L2 (RN ).
p
X 1
ðaÞ = i½h; a þ 
Vk aVk  {Vk Vk ; a} Another frequently used representation of [12] is
2 on L2 (S1 ) (and when generalized to N degrees of
for a 2 M freedom, on T N ). In this representation, the operator
^
(the symbols {. , .} denote the anticommutator). PNis defined
p
ik=2
by ck ! kck on functions f () =
c
k = M k e , 0
M, N < 1. In this case the
In general terms, by quantization is meant the operator q ^ is defined as multiplication by the angle
construction of a theory by deforming a commutative coordinate . It is easy to check that this representa-
algebra of functions on a classical phase X in such a tion is inequivalent to the previous one and that [12]
way that the dynamics of the quantum system can be is satisfied (as an identity) on the (dense) set of
derived from the prescription of deformation, usually vectors which are in the domain both of p ^q^ and
by deforming the Poisson brackets if X is a cotangent of q^p^. But notice that the domain of essential self-
bundle T  M (Halbut 2002, Landsman 2002). We adjointness of p ^ is not left invariant by the action of
shall discuss only the Weyl quantization (Weyl 1931) ^ (f () is a function on S1 only if f (2) = 0).
q
that has its roots in Heisenberg’s formulation of QM We shall denote p ^ in this representation by the
and refers to the case in which the configuration space symbol @=@per and refer to it as the Bloch
is RN , or, with some variant (Floquet–Zak) the representation. It can be modified by setting the
N-dimensional torus. We shall add a few remarks action of p ^ as cn ! ncn þ , 0 <  < 2, and this
on the Wick (anti-Weyl) quantization. More general gives rise to the various Bloch–Zak and magnetic
formulations are needed when one tries to quantize a representations.
classical system defined on the cotangent bundle of The Bloch representation can be extended to
a generic variety and even more so if it defined on a periodic functions on R1 noticing that L2 (R) =
generic symplectic manifold. L2 (S1 )  l2 (N); similarly, the Bloch–Zak and the
The Weyl quantization is a mathematically accu- magnetic representation can be extended to L2 (RN ).
rate rendering of the essential content of the The difference between the representations can be
procedure adopted by Born and Heisenberg to seen more clearly if one considers the one-parameter
construct dynamics by finding operators which groups of unitary operators generated by the
play the role of symplectic coordinates. ‘‘canonical operators’’ q ^ and p^. In the Schrödinger
Consider a system with one degree of freedom. representation on L2 (R), these groups satisfy
The first naive attempt would be to find operators
q ^ that satisfy the relation
^, p UðaÞVðbÞ ¼ eiab VðbÞUðaÞ
^
^ iI
q; p
½^ ½12 UðaÞ ¼ eia^q ; VðbÞ ¼ eibp

and to construct the Hamiltonian in analogy with and therefore, setting z = a þ ib and W(z) 
the classical case. To play a similar role, the eiab=2 V(b)U(a) one has
operators q^ and p ^ must be self-adjoint and satisfy 0
[12] at least in a weak sense. If both are bounded, WðzÞWðz0 Þ ¼ ei!ðz;z Þ=2 Wðz þ z0 Þ
½13
[12] implies eibp^ q
^eibp^ = q
^ þ bI (the exponential is z 2 C; !ðz; z0 Þ ¼ Imðz; z0 Þ
defined through a convergent series) and therefore
the spectrum of q ^ is the entire real line, a contra- The unitary operators W(z) are therefore projective
diction. Therefore, that inclusion sign in [12] is strict representations of the additive group C. This
and we face domain problems, and as a consequence generalizes immediately to the case of N degrees
[12] has many inequivalent solutions (‘‘equivalence’’ of freedom; the representation is now of the
here means ‘‘unitary equivalence’’). additive group CN and ! is the standard symplectic
Apart from ‘‘pathological’’ ones, defined on form on CN .
L2 -spaces over multiple coverings of R, there are In the Bloch representation, the unitaries
inequivalent solutions of [12] which are effectively U(a)V(b)U (a)V  (b) are not multiples of the iden-
used in QM. tity, and have no particularly simple form. The map
The most common solution is on the Hilbert space CN 3 z ! W(z) with the structure [13] is called Weyl
L2 (R) (with Lebesgue measure), with x ^ defined as system; it plays a major role in QM. The following
120 Introductory Article: Quantum Mechanics

theorem has therefore a major importance in the is the formal adjoint of ak in L2 (R). One has
mathematical theory of QM. jak (Nk þ 1)1=2 j < 1. In the domain of N these
operators satisfy the following relations (canonical
Theorem 2 (von Neumann 1965). There exists
commutation relations)
only one, modulo unitary equivalence, irreducible
representation of the Weil system. ½ak ; ah  ¼ k;h ; ½ah ; ak  ¼ 0
½15
The proof of this theorem follows a general ½Nk ; ah  ¼ ah h;k ; ½Nh ; ak  ¼ ak h;k
pattern in the theory of group representations. One
In view of the last two relations, the operator ak is
introduces an algebra W (N) of operators
called the annihilation operator (relative to the kth
Z
degree of freedom) and its formal adjoint is called
Wf  f ðzÞWðzÞdz; f 2 L1 ðCN Þ the creation operator. The operators ak have as
spectrum the entire complex plane, the operators ak
called Weyl algebra. have empty spectrum; the eigenvectors of Nk are the
It easy to see that jWf j = jf j1 and that f ! Wf is a Hermite polynomials in the variable xk . The
linear isomorphism of algebras if one considers W (N) eigenvectors of ak (i.e., the solutions in L2 (R) of
with its natural product structure and L1 as a the equation ak  =  , 2 C) are called coherent
noncommutative algebra with product structure states; they have a major role in the Bargmann–
Z
i Fock–Segal quantization and in general in the
f  g  dz0 f ðz  z0 Þgðz0 Þ exp !ðz; z0 Þ ½14 semiclassical limit.
2
The operators {Nk } generate a maximal abelian
So far the algebra W (N) is a concrete algebra of system and therefore the space L2 (RN ) has a natural
bounded operators on L2 (R2 ). But it can also be representation as the symmetrized subspace of
considered an abstract C -algebra which we still k (CN )k (Fock representation). In this representa-
denote by W (N) . tion, a natural basis is given by the common
It is easy to see that, according to [14], if f0 is eigenvectors {nk } , k = 1, . . . , N, of the operators Nk .
chosen to be a suitable Gaussian, then Wf0 is a A generic vector can be written as
projection operator which commutes with all the
X X
Wf ’s. Moreover, Wf Wg = f , g Wf g for a suitable ¼ cfnk g fnk g ; jcfnk g j2 < 1
phase factor . Considering the Gelfand–Neumark– fnk g fnk g
Segal construction for the C -algebra W (N) , one
finds that these properties lead to a decomposition and therefore can be represented by the sequence c{nk } .
of any representation in cyclic irreducible equivalent Notice that the creation operators do not create
ones, completing the proof of the theorem. particles in RN but rather act as a shift in the basis
The Weyl system has a representation (equivalent of the Hermite polynomials.
to the Schrödinger one) in the space L2 (RN , g), It is traditional to denote by (L2 (RN )) the Fock
where g is Gauss’s measure. This allows an exten- representation (also called second quantization
sion in which CN is replaced by an infinite- because for each degree of freedom the wave
dimensional Banach space equipped with a Gauss function is written in the quantized basis of the
measure (weak distribution (Segal 1965, Gross harmonic oscillator) and to denote by (A) the lift
1972, Wiener 1938)). Uniqueness fails in this more of a matrix A 2 B(CN ). These notations are espe-
general setting (uniqueness is strictly connected with cially used if CN is substituted with a Banach space
the compactness of the unit ball in CN ). Notice that X. This terminology was introduced by Segal in his
in the Schrödinger representation (and, therefore, in work on quantization of the wave equation; it is
any other representation) the Hamiltonian for the used ever since, mostly in a perturbative context.
harmonic oscillator defines a positive self-adjoint In the theory of quantized fields, the space CN is
operator substituted with a Banach space, X, of functions.
In this setting, ‘‘second quantization’’ (Segal 1965,
X
N
@2 Nelson 1974) considers the state {nk } as represent-
N¼ Nk ; Nk ¼  þ x2k  1
@xk2 ing a configuration of the system in which there are
1
precisely nk particles in the kth physical state (this
The spectrum of each of the commuting operators presupposes having chosen a basis in the space of
Nk consists of the positive integers (including 0) and distribution on R3 ). There is no problem in doing
is therefore called number operator for the kth this (Gross 1972) and one can choose for X a
degree of freedom. The operatorpffiffiffi Nk can be written suitable Sobolev space (which one depends on the
as Nk = ak ak , where ak = (1= 2)(xk þ @=@xk ) and ak Gaussian measure given in X) if one wants that the
Introductory Article: Quantum Mechanics 121

generalization of the commutation relations [15] be is reasonable to introduce the following relations
of the form [a (f ), a(g)] = < f , g> with a suitable (canonical anticommutation relations:
scalar product <  ,  > in X. The problem with
quantization of relativistic fields is that, in order to fak ; ah g ¼ k;h ; fah ; ak g ¼ 0
½16
ensure locality, one is forced to use a Sobolev space ½Nk ; ah  ¼ ah h;k ; fA; Bg  AB  BA
of negative index (depending on the dimension of
The Hilbert space is now N H2 , where H2 is a
physical space), and this gives rise to difficulties in
two-dimensional complex Hilbert space. Notice that
the definition of the dynamics for nonlinear vector
H2 carries an irreducible two-dimensional represen-
fields.
tation of sU(2)  o(3) (spin representation) so that
One should notice that in the work of Segal
this quantization associates spin 1/2 and
(1965), and then in Constructive field theory antisymmetry.
(Nelson 1974), the Fock representation is placed in
The operators in [16] are all bounded (in fact
a Schrödinger context exhibiting the relevant opera-
bounded by 1 in norm). The Fock representation is
tors as acting on a space L2 (X, g), where X is a
constructed as in the case of Weyl (see Araki
subspace of the space of Schwartz distributions on
(1988)), with nk equal 0 or 1 for each index k.
the physical space of the particles one wants to
The infinite-dimensional case is defined in the same
describe and g is a suitably defined Gauss measure
way, and leads to inequivalent irreducible represen-
on X.
tations (Araki 1988); only in one of them is the
The Fock representation is related to the Bargmann– number operator defined and bounded below. Some
Fock–Segal representation (Bargmann 1967), a repre-
of these representations can be given a Schrödinger-
sentation in a space of holomorhic functions on CN
like form, with the introduction of a gauge and an
square integrable with respect to a Gaussian measure.
integration formalism based on a trace (Gross
For its development, this representation relies on the
1972). This system is much used in quantum
properties of Toeplitz operators and on Tauberian
statistical mechanics because it deals with bounded
estimates. It is much used in the study of the
operators and can take advantage of strong results
semiclassical limit and in the formulation of QM in
in the theory of C -algebras. In the finite-dimensional
systems for which the classical version has, for phase case (and occasionally also in the general case) it is
space, a manifold which is not a cotangent bundle
used in quantum information (the space H2 is the
(e.g., the 2-sphere).
space of a quantum bit).
Remark The Fock representation associated with Returning to the Weyl system, we now introduce
the Weyl system in the infinite-dimensional context the strictly related Wigner function which plays an
can describe only particles obeying Bose–Einstein important role in the analysis of the semiclassical
statistics; indeed, the states are qualified by their limit and in the discussion of some scaling limits, in
particle content for each element of the basis chosen particular the hydrodynamical limit and the Bose–
and there is no possibility of identifying each Einstein condensation when N ! 1.
particle in an N-particle state. This is obvious in The Wigner function W for a pure state  is a
the finite-dimensional case: the Hermite polynomial real-valued function on the phase space of the
of order 2 cannot be seen as ‘‘composed’’ of two classical system which represents the state faithfully.
polynomials of order 1. It is defined as
Z 
In the infinite-dimensional context, if one wants y  y
to treat particles which obey Fermi–Dirac statistics, W ðx; Þ ¼ ð2Þn eið ;xÞ x þ x dy
Rn 2 2
one must rely on the Pauli exclusion principle (Pauli
1928), which states that two such particles cannot The Wigner function is not positive in general (the
be in the same configuration; to ensure this, the only exceptions are those Gaussian states that satisfy
wave function must be antisymmetric under permu- (x)  (p)
h). But is has the interesting property
tation of the particle symbols. It is a matter of fact that its marginalsRreproduce correctly the Born rule.
(and a theorem in relativistic quantum field theory ˆ 2 . If the func-
In fact, one has W (x, ) dx = j( )j
n
which follows in that theory from covariance, tion (t, x) x 2 R is a solution of the free Schrödinger
locality and positivity of the energy (Streater and equation ih@=@t = h2  then its Wigner function
Wightman 1964) that particles with half-integer spin satisfies the Liouville (transport) equation @W =@t þ
obey the Fermi–Dirac statistics. Therefore, to quan-  rW = 0.
tize such systems, one must introduce (commuta- The Wigner function is strictly linked with the
tion) relations different from those of Weyl. Since it Weyl quantization. This quantization associates
must now be that (a )2 = 0, due to antisymmetry, it with every function
(p, x) in a given regularity
122 Introductory Article: Quantum Mechanics

class an operator
(D, x) (the Weyl symbol of the Under the correspondence A $ A, ^ linear symplec-
function
) defined by tic maps correspond to unitary transformations.
Z This is not in general the case for nonlinear maps.
ð
ðD; xÞf ; gÞ 
ð ; xÞWðf ; gÞð ; xÞ d dx One can prove that conditions (1)–(5) give
a complete characterization of the map A $ A. ^
Z  p p Moreover, the correspondence cannot be extended
Wðf ; gÞð ; xÞ  eið ;pÞ f x þ ; x  dp
2 2 to other functions in phase space. Indeed, one has:
It can be verified that the action of F preserves the Theorem 3 (van Hove). Let G be the class of
Schwartz classes S and S0 and is unitary in L2 (R2N ). functions C1 on R2N which are generators of global
Moreover, one has
(D, x) =
(D, x). symplectic flows. For g 2 G let g (t) be the
The relation between Weyl’s quantization and corresponding group. There cannot exist for every
Wigner functions can be readily seen from the g a correspondence g $ ^g, with ^g self-adjoint, such
natural duality between bounded operators and that ^g(x, p) = g(^ ^).
x, p
pure states:
Z We described the Weyl quantization as a corre-
^ ^Þ  aðp; qÞðp; qÞ dp dq
trðA spondence between functions in the Schwartz class S
Z and a class of bounded operators. Weyl’s quantiza-
0 tion can be extended to a much wider class of
ðp; qÞ ¼ eiðp;q Þ ðq0 ; qÞ dq0
functions. Operators that can be so constructed are
called Fourier integral operators. One uses the
We give now a brief discussion of the general notation
ˆ 
(D, x).
structure of a quantization, and apply it to the We have the following useful theorems (Robert
Weyl quantization. By quantization of a Hamilto- 1987):
nian system we mean a correspondence, parame-
trized by a small parameter  h, between classical Theorem 4 Let l1 , . . . , lK be linear functions on RN
observables (real functions on a phase space F ) and such that {li lk } = 0. Let P be a polynomial and let
quantum observables (self-adjoint operators on a
( , x)  P[l1 ( , x), lK ( , x)]. Then
Hilbert space H) with the property that the (i)
(D, x) maps S in L2 (RN ) and self-adjoint;
corresponding structures coincide in the limit h ! 0 (ii) if g is continuous, then (g(
)(D, x) = g(
(D, x)).
and the difference for  h 6¼ 0 can be estimated in a
suitable topology. One proves that
(D, x) extends to a continuous
This last requirement is important for the applica- map S0 (X) ! S0 (X) and, moreover,
tions and, from this point of view, Weyl’s quantiza- Theorem 5 (Calderon–Vaillancourt). If
0 
tion gives stronger results than the other formalisms P  
jjþjj 2Nþ1 jD Dx
j < 1 the norm of the opera-
of quantization.
tor
(D, x) is bounded by
0 .
We limit our analysis to the case F  T  X, with
X  RN , and we make use of the realization of H as Any operator obtained from a suitable class of
L2 (RN ). functions through Weyl’s quantization is called a
Let {xi } be Cartesian coordinates in RN and pseudodifferential operator. If
(q, p) = P(p), where
consider a correspondence A ! A ^ that satisfies the P is a polynomial,
(p,
ˆ q) is a differential operator.
following requirements: Moreover, if
(p, x) 2 L2 then
(D, x) is a
^ is linear; Hilbert–Schmidt operator and
1. A$A
2. xk $ x^k where x ^k is multiplication by xk ; Z 1=2
n=2 2
3. pk $ i h@=@xk ; j
ðD; xÞjHS ¼ ð2hÞ jAðzÞj dz
4. if f is a continuous function in RN , one has
f (x) $ f (^ x) and ^f (p) = (Ff )(^
x), where F denotes a
Pseudodifferential operators turn out to be very
Fourier transform;
^  ,   (, ), ,  2 RN , where L is the important in particular in the quantum theory of
5. L $ L
molecules (Le Bris 2003), where adiabatic analysis
generator of the translations in phase space in
^  is the generator of the one- and Peierls substitution rules force the use of
the direction  and L
pseudodifferential operators.
parameter group t ! W(t) associated with  by
The next important problem in the theory of
the Weyl system.
quantization is related to dynamics.
Note that (1) and (4) imply (2) and (3) through a Let  be a quantization procedure and let H(p, q)
limit procedure. be a classical Hamiltonian on phase space. Let At be
Introductory Article: Quantum Mechanics 123

the evolution of a classical observable A under the annihilation operators by placing all creation opera-
flow defined by H and assume that (At ) is well tors to the left.
defined or all t. We now come back to Schrödinger’s equation and
Is there a self-adjoint operator H ^ such that notice that it can be derived within Heisenberg’s
^ ^
(At ) = eitH (A) eitH ? If so, can one estimate formalism and Weyl’s quantization scheme from the
^  (H)j? Conversely, if the generator of the
jH Hamiltonian of an N-particle system in Hamiltonian
quantized flow is, by definition, H ^ (as is usually mechanics (at least if one neglects spin, which has
assumed), is it possible to give an estimate of the no classical analog).
difference j(At )  ((A))t j for a dense set of  2 Apart from (often) inessential parameters, the
^ ^ ~ t  At j ,
H, where At  eitH AeitH , or to estimate jA 1 Schrödinger equation for N scalar particles in R3
where A ~ t is defined by (A~ t ) = ((A)) . Is it possible can be written as
t
to write an asymptotic series in  h for the differences?
For the Weyl quantization some quantitative @ X N
ih ¼ ðihrk þ Ak Þ2  þ V   H
results have been obtained if one makes use of the @t k¼1 ½17
semiclassical observables (Robert 1987). We shall
 2 L2 ðR3N Þ
not elaborate further on this point.
For completeness, we briefly mention another where Ak are vector-valued functions (vector poten-
quantization procedure which is often used in tials) and V = Vk (xk ) þ Vi, k (xi  xk ) are scalar-
mathematical physics. valued function (scalar potentials) on R3 .
Typical problems in Schrödinger’s quantum
Wick Quantization mechanics are:
This quantization assigns positive operators to 1. Self-adjointness of H, existence of bound states
positive functions, but does not preserve polynomial (discrete spectrum of the operator), their number
relations. It is strictly related to the Bargmann– and distribution, and, in general, the properties
Fock–Segal representation. of the spectrum.
Call coherent state centered in the point (y, ) of 2. Existence, completeness, and continuity proper-
phase space the normalized solution of (ip ^þx ^ ties of the wave operators
i þ x)y,  (x) = 0.
Wick’s quantization of the classical observable A W  s  lim eitH0 eitH ½18
1
is by definition the map A ! OpW (A), where
Z and the ensuing existence and properties of the
OpW ðAÞ  ð2 hÞn Aðy; Þð ; y; Þy; dy d S-matrix and of the scattering cross sections. In
[18] H0 is a suitable reference operator, usually
One can prove, either directly or going through  (with periodic boundary conditions if the
Weyl’s representation, that potentials are periodic in space), for which
Schrödinger’s equation can be somewhat analy-
1. if A
0 then OpW
h (A)
0; tically controlled.
2. the Weyl symbol of the operator OpW h (A) is 3. Existence and property of a semiclassical limit.
Z Z
1 2 2
hÞn
ð Aðy; Þeh½ðxyÞ þð Þ  dy d In [17] and [18] we have implicitly assumed that H
is time independent; very interesting problems arise
3. for every A 2 O(0) one has kOpW ^
h (A)  Ak = when H depends on time, in particular if it is
O(h).
periodic or quasiperiodic in time, giving rise to
Wick’s quantization associates with every vector ionization phenomena. In the periodic case, one is
 2 H a positive Radon measure  in phase R space, helped by Floquet’s theory, but even in this case
called Husimi measure. It is defined by A d = many interesting problems are still unsolved.
(OpWh (A)  ), A 2 S(z). Wick’s quantization is less
 If the potentials are sufficiently regular, the
adapted to the treatment of nonrelativistic particles, spectrum of H consists of an absolutely continuous
in particular Eherenfest’s rule does not apply, and part (made up of several bands in the space-periodic
the semiclassical propagation theorem has a more case) and a discrete part, with few accumulation
complicated formulation. It is very much used for points.
the analysis in Fock space in the theory of quantized On the contrary, if V(x, !) is a measurable
relativistic fields, where a special role is assigned to function on some probability space , with a
Wick ordering, according to which the polynomials suitable distribution (e.g., Gaussian), the spectrum
in x
^h and p ^h are reordered in terms of creation and may have totally different properties almost surely.
124 Introductory Article: Quantum Mechanics

For example, in the case N = 1 (so that the terms Vi, j  ! 0, a very singular PDE (the coefficients of the
h
are absent) in one and two spatial dimensions the differential terms go to zero in this limit).
spectrum is pure point and dense, with eigenfunctions Dividing each term of the equation by h (because
which decrease at infinity exponentially fast (although we do not want to change the scale of time) leads, in
not uniformly); as a consequence, the evolution group the case of one quantum particle in R3 in potential
does not give rise to a dispersive motion. The same is field V(x) (we treat, for simplicity, only this case), to
true in three dimensions if the potential is sufficiently the equation
strong and the kinetic energy content of the initial state
@ðx; tÞ
is sufficiently limited. This very interesting behavior is i ¼ hðx; tÞ þ h1 VðxÞðx; tÞ ½19
due roughly to the randomness of the ‘‘barriers’’ @t
generated by the potential and is also present, to a It is convenient therefore to ‘‘rescale’’ the spatial
large extent, for potentials quasiperiodic in space variables by a factor pffiffiffi h1=2 (i.e., choose different
(Pastur and Figotin 1992). units) setting x = hX and look for solutions of [19]
In these as well as in most problems related which remain regular in the limit h ! 0 as functions
to Schrödinger’s equation, a crucial role is taken of the rescaled variable X. One searches therefore
by the resolvent operator (H  I)1 , where is for solutions that on the ‘‘physical scale’’ have
any complex number outside the spectrum of H; support that becomes ‘‘vanishingly small’’ in the
many of the results are obtained when the difference limit. It is therefore not surprising that, in the limit,
(H  I)1  (H0  I)1 is a compact operator. these solutions may describe point particles; the
Problems of type (1) and (2) are of great physical main result of semiclassical analysis is that he
interest, and are of course common with theoretical coordinates of these particles obey Hamilton’s laws
physics and quantum chemistry (Le Bris 2003), of classical mechanics.
although the instruments of investigation are some- This can be roughly seen as follows (accurate
what different in mathematical physics. The semi- estimates are needed to make this empirical analysis
classical limit is often more of theoretical interest, precise). Using multiscale analysis, one may write the
but its analysis has relevance in quantum chemistry solution in the form (X, x, t) and seek solutions
and its methods are very useful whenever it is which are smooth in X and x. Both terms on the right-
convenient to use multiscale methods, as in the hand side of pffiffiffi[19] contain contributions of order 2
study of molecular spectra. and 1 in h and in order to have regular solutions
We start with a brief description of point (3); it one must have cancellations between equally singular
provides a valid instrument in the description of contributions. For this, one must perform an expan-
quantum-mechanical systems at a scale where it is sion to the second order of the potential (assumed at
convenient to use units in which the physical least twice differentiable) around a suitable trajectory
constant  h has a very small value ( h ’ 1027 in q(t), q 2 R3 , and choose this trajectory in such a way
CGS units). From Heisenberg’s commutation rela- that the cancellations take place.
tions, [^ ^] 
x, p hI, it follows that the product of the A formal analysis shows that this is achieved only
dispersion (uncertainty) of the position and momen- if the trajectory chosen is precisely a solution of the
tum variables is proportional to  h and therefore at classical Lagrange equations. Of course, a more
least one of these two quantities must have very refined analysis and good estimates are needed to
large values (compared to  h). One considers usually make this argument precise, and to estimate the
the case in which these dispersions have comparable error that is made when one pffiffiffi neglects in the resulting
values, which is therefore very small, of the order of equation terms of order h; in favorable cases, for
magnitude  h1=2 (but very large as compared with h). each chosen T the error in the solution for most
In order to make connection with the Hamilton– initial
pffiffiffi conditions of the type described is of order
Jacobi formalism of classical mechanics one can also h for jtj < T.
consider the case in which the dispersion in This semiclassical result is most easily visualized
momentum is of the order  h (the WKB method). using the formalism of Wigner functions (the
The semiclassical limit takes advantage mathema- technical details, needed to to make into a proof
tically from the fact that the parameter h is very the formal arguments, take advantage of regularity
small in natural units, and performs an asymptotic estimates in the theory of functions).
analysis, in which the terms of ‘‘lowest order’’ are In natural units, one defines
exactly described and the difference is estimated.  N  
The problem one faces is that the Schrödinger i
Wh; ðx; ; tÞ ¼ W x; ; t
equation becomes, in the ‘‘mathematical limit’’ 2 h
Introductory Article: Quantum Mechanics 125

In terms of the Wigner function Wh,  the Schrödin- success. We give here a very naive introduction to
ger equation [19] takes the form these problems and refer the reader to the more
specialized contributions to this Encyclopedia for a
@f h rigorous analysis and exact statements.
þ  rx f h þ Kh  f h ¼ 0
@t ½20 Of course, most of the problems of physical
h ðt ¼ 0Þ ¼ 0 ð hÞ interest are not ‘‘exactly solvable,’’ in the sense that
rarely the final result is given explicitly in terms of
where
simple functions. As a consequence, exact numerical
    
i i ; y 1 hy
 hy results, to be compared with experimental data, are
Kh ¼ e h V x þ V x rarely obtained in physically relevant problems, and
ð2ÞN 2 2
most often one has to rely on approximation
It can be proved (Robert 1987) that if the potential schemes with (in favorable cases) precise estimates
is sufficiently regular and if the initial datum on the error.
converges in a suitable topology to a positive Formal perturbation theory is the easiest of such
measure f0 , then, for all times, Wh,  (x, t) converges schemes, but it seldom gives reliable results to
to a (weak) solution of the Liouville equation physically interesting problems. One writes
@f H  H þ V ½21
þ  rx f  rVðxÞ  r f ¼ 0
@t
where  is a small real parameter, and sets a formal
This leads to the semiclassical limit if, for example, scheme in case (1) by writing
one considers a sequence of initial data n where n
is a sequence of functions centered at x0 with X
1 X
1

Fourier transform centered at p0 and dispersion of H   E  ; E  k Ek ;   k k


0 0
order  h1=2 both in position and in momentum. In
this case, the limit measure is a Dirac measure and, in case (2), iterating Duhamel’s formula
centered on the classical paths. Z t
In the course of the proof of the semiclassical limit e itH
¼e itH0
þ i eiðtsÞH VeisH0 ds ½22
theorem, one becomes aware of the special status of 0
the Hamiltonians that are at most quadratic in x ^ and Very seldom the perturbation series converges, and
^. Indeed, it is easy to verify that for these
p one has to resort to more refined procedures.
Hamiltonians the expectation values of x ^ and p ^
In some cases, it turns out to be convenient to
obey the classical equation of motion (P Ehrenfest consider the formal primitive E ~  of E (as a
rule). differentiable function of ) and prove that it is
From the point of view of Heisenberg, this can be differentiable in  for 0 <  < 0 (but not for  = 0).
understood as a consequence of the fact that In favorable cases, this procedure may lead to
operators at most bilinear in a and a form an
algebra D under commutation and, moreover, the X
N

homogeneous part of order 2 is a closed subalgebra E ¼ k Ek þ RN ðÞ; lim jRN jðÞ ¼ þ1


N!1
0
such that its action on D (by commutation) has the
same structure as the algebra of generators of the with explicit estimates of jRN ()j for 0  < 0 .
Hamiltonian flow and its tangent flow. Apart from Re-summation techniques of the formal power
(important) technicalities, the proof of the semiclas- series may be of help in some cases.
sical limit theorem reduces to the proof that one can The estimate of the lowest eigenvalues of an
estimate the contribution of the terms of order operator bounded below is often done by variational
higher than 2 in the expansion of the quantum analysis, making use of min–max techniques applied
Hamiltonian at the classical trajectory as being of to the quadratic form Q()  (, H).
order  h1=2 in a suitable topology (Hepp 1974). Semiclassical analysis can be useful to search for
We end this overview by giving a brief analysis of the distribution of eigenvalues and in the study of
problems (1) and (2), which refer to the description the dynamics of states whose dispersions both in
of phenomena that are directly accessible to com- position and in momentum are very large in units in
parison with experimental data, and therefore have which h = 1.
been extensively studied in theoretical physics and A case of particular interest in molecular and
quantum chemistry (Mc Weeny 1992); some of atomic physics occurs when the physical parameters
them have been analyzed with the instruments of which appear in H (typically the masses of the
mathematical physics, often with considerable particles involved in the process) are such that one
126 Introductory Article: Quantum Mechanics

can a priori guess the presence of coordinates which apt to concur with mathematical investigation to a
have a rapid dependence on time (fast variables) and fuller comprehension of QM.
a complementary set of coordinates whose depen-
dence on time is slow. This suggests that one can try
an asymptotic analysis, often in connection with Interpretation Problems
adiabatic techniques. Seldom one deals with cases in
In this section we describe some of the conceptual
which the hypotheses of elementary adiabatic
problems that plague present-day QM and some of
theorems are satisfied, and one has to refine the
the attempts that have been made to cure these
analysis, mostly through subtle estimates which
problems, either within its formalism or with an
ensure the existence of quasi invariant subspaces.
altogether different approach.
Asymptotic techniques and refined estimates are
also needed to study the effective description of a
Approaches within the QM Formalism
system of N interacting identical particles when N
becomes very large; for example, in statistical We begin with the approaches ‘‘from within.’’ We
mechanics, one searches for results which are valid have pointed out that the main obstacle in the
when N ! 1. measurement problem is the description of what
The most spectacular results in this direction are occurs during an act of measurement. Axiom III
the proof of stability of matter by E Lieb and claims that it must be seen as a ‘‘destruction’’ act,
collaborators, and the study of the phenomenon of and the outcome is to some extent random. The
Bose–Einstein condensation and the related Gross– final state of the system is one of the eigenstates of
Pitaevskii (nonlinear Schrödinger) equation. The the observable, and the dependence on the initial
experimental discovery of the state of matter state is only through an a priori probability assign-
corresponding to a Bose–Einstein condensate is a ment; the act of measurement is therefore not a
clear evidence of the nonclassical behavior of matter causal one, contrary to the (continuous) causal
even at a comparatively macroscopic size. From the reversible description of the interaction with the
point of view of mathematical physics, the ongoing environment. One should be able to distinguish
research in this direction is very challenging. a priori the acts of measurement from a generic
One should also recognize the increasing role that interaction.
research in QM is taking in applications, also in There is a further difficulty. Due to the super-
connection with the increasing success of nanotech- position principle, if a system S on which we want
nology. In this respect, from the point of view of to make a measurement of the property associated
mathematical physics, the study of nanostructure with the operator A ‘‘interacts’’ with an instrument
(quantum-mechanical systems constrained to very I described by the operator S, the final state of the
small regions of space or to lower-dimensional combined system will be a coherent superposition of
manifolds, such as sheets or graphs) is still in its tensor product of (normalized) eigenstates of the
infancy and will require refined mathematical two systems
techniques and most likely entirely new ideas. X X
Finally, one should stress the important role ¼ cn;m A S
n  m; jcn;m j2 ¼ 1 ½23
n;m n;m
played by numerical analysis (Le Bris 2003) and
especially computer simulations. In problems involv- Measurement as described by Axiom III of QM
ing very many particles, present-day analytical claims that once the measurement P is over, the
techniques provide at most qualitative estimates measured system is, with probability m jcn, m j2 , in
and in favorable cases bounds on the value of the the state A n and the instrument is in a state which
quantities of interest. Approximation schemes are carries the information about the final state of the
not always applicable and often are not reliable. system (after all, what one reads at the end is an
Hints for a progress in the mathematical treatment indicator of the final state of the instrument).
of some relevant physical phenomena of interest in It is therefore convenient to write in the form
QM (mostly in condensed matter physics) may come X X
from the ab initio analysis made by simulations on ¼ dn A
n  n ; jdn j2 ¼ 1 ½24
n
large computers; this may provide a qualitative and,
to a certain extent, quantitative behavior of the (this defines n if the spectrum of A is pure point and
solutions of Schrödinger’s equation corresponding to nondegenerate). It is seen from [24] that, due to the
‘‘typical’’ initial conditions. In recent times the reduction postulate, we know that the the measured
availability of more efficient computing tools has system is in the state A n0 if a measurement of an
made computer simulation more reliable and more observable T with nondegenerate spectrum,
Introductory Article: Quantum Mechanics 127

eigenvectors {n }, and eigenvalues {zn } gives the measured system, and these are the observables that
results zn0 . specify the outcome of the measurement in prob-
Along these lines, one does not solve the measure- abilistic terms.
ment problem (the outcome is still probabilistic) but The scattering approach relies on the Schrödinger
at least one can find the reason why the measuring approach to QM, and on results from the theory of
apparatus may be considered ‘‘classical.’’ scattering. This approach describes the interaction of
It is more convenient to go back to [23] and to the system S (typically a heavy particle) with an
assume that one is able to construct the measuring environment made of a large number of light particles
apparatus in such a way that one divides (roughly) and seeks to describe the state of S after the
its pure (microscopic) states in sets n (each interaction when one does not have any information
corresponding to a ‘‘macroscopic’’ state) which are on the final state of the light particle. One seeks to
(roughly) in one-to-one correspondence to the prove that the reduced density matrix is (almost)
eigenstates of A. The sets n contain a very large diagonal in a given representation (typically the one
number, Nn , of elements, so that the sets n need given by the spatial coordinates). This defines the
not be given with extreme precision. And the sets n observable (typically, position) that can be measured
must be in a sense ‘‘stable’’ under small external and the probability of each outcome.
perturbations. Both approaches rely on the loss of information in
It is clear from this rough description that the the process to cancel the effect of the superposition
apparatus should contain a large number of small principle and to bring the measurement problem
components and still its interaction with the ‘‘small’’ within the realm of classical probability theory.
system A should lead to a more or less sudden None of them provides a causal dependence of the
change of the sets n . result of the measurement on the initial state of the
A concrete model of this mechanism has been system.
proposed by K Hepp (1972) for the case when A is a We describe only very briefly these attempts.
2  2 matrix, and the measuring apparatus is made In its more basic form, the ‘‘scattering approach’’
of a chain of N spins, N ! 1; the analysis was has as starting point the Schrödinger equation for a
recently completed by Sewell (2005) with an system of two particles, one of which has mass very
estimate on the error which is made if N is finite much smaller than the other one. The heavy particle
but large. This is a dynamical model, in which the may be seen as representing the system on which a
observable A (a spin) interacts with a chain of spins measurement is being made. The outline of the
(‘‘moves over the spins’’) leaving the trace of its method of analysis (which in favorable cases can be
passage. It is this trace (final macroscopic state of made rigorous) (Joos and Zeh 1985, Tegmark 1993)
the apparatus) which is measured and associated is the following. One chooses units in which the
with the final state of A. The interaction is not mass of the heavy particle is 1, and one denotes by 
‘‘instantaneous’’ but may require a very short time, the mass of the light particle. If x is the coordinate
depending on the parameters used to describe the of the heavy particle and y that of the light one, and
apparatus and the interaction. if the initial state of the system is denoted by
We call ‘‘decoherence’’ the weakening of the 0 (x, y), the solution of the equation for the system
superposition principle due to the interaction with is (apart from inessential factors)
the environment.
Two different models of decoherence have been t ¼ expfiðx  1 y þ WðxÞ þ Vðx  yÞÞtg0
analyzed in some detail; we shall denote them
thermal-bath model and scattering model; both are Making use of center-of-mass and relative coordi-
dynamical models and both point to a solution, to nates, one sees that when  is very small one should
various extents, of the problem of the reduction to a be able to describe the system on two timescales,
final density matrix which commutes with the one fast (for the light particle) and one slow (for the
operator A (and therefore to the suppression of the heavy one) and, therefore, place oneself in a setting
interference terms). which may allow the use of adiabatic techniques. In
The thermal-bath model makes use of the this setting, for the measure of the heavy particle
Heisenberg representation and relies on results of (e.g., its position) one may be allowed to consider
the theory of C -algebras. This approach is closely the light particle in a scattering regime, and use the
linked with (quantum) statistical mechanics; its aim wave operator corresponding to a potential
is to prove, after conditioning with respect to the Vx (y)  V(y  x).
degrees of freedom of the bath, that a special role Taking the partial trace with respect to the
emerges for a commuting set of operators of the degrees of freedom of the light particle (this
128 Introductory Article: Quantum Mechanics

corresponds to no information of its final state) one So the appearance of classical properties of a
finds, at least heuristically, that the state of the quantum system corresponds to the ‘‘emergence’’ of
heavy particle is now described (due to the trace an algebra with nontrivial center. Since automorphic
operation) by a density matrix
for which in the evolutions of an algebra preserve its center, this
coordinate representation the off-diagonal terms program can be achieved only if we admit the loss of

x, x0 are slightly suppressed by a factor x, x0 = 1  quantum coherence, and this requires that the
(Wxþ , Wxþ0 ) where represents the initial state of quantum systems we describe are open and interact
the light particle and Wxþ is the wave operator for with the environment, and moreover that the
the motion of the light particle in the potential Vx . commutative algebra which emerges be stable for
One must assume that function  which represents time evolution.
the initial state of the heavy particle is sufficiently It may be shown that one must consider quantum
localized so that x, x0 < 1 for every x0 6¼ x in its environment in the thermodynamic limit, that is,
support. consider the interaction of the system to be
If the environment is made of very many measured with a thermal bath. A discussion of the
particles (their number N() must be such that possible emergence of classical observables and of
lim ! 0 N() = 1) and the heavy particle can be the corresponding dynamics is given by Gell-Mann
supposed to have separate interactions with all of (1993). In all these approaches, the commutative
them, the off-diagonal elements of the density subalgebra is selected by the specific form of the
matrix tend to 0 as  ! 0 and the resulting density interaction; therefore, the measuring apparatus
0 0
matrix tends to R have the form (x, x ) = (x  x ) determines the algebra of classical observables.
(x), (x)
0, (x) dx = 1. If it can be supposed On the experimental side, a number of very
that all interactions take place within a time T()  , interesting results have been obtained, using very
 > 0 one has (x) = j (x)j2 . refined techniques; these experiments usually also
If the interactions are not independent, the determine the ‘‘decoherence time.’’ The experimental
analysis becomes much more involved since it has results, both for the collision model (Hornberger
to be treated by many-body scattering theory; this et al. 2003) and for the thermal-bath model
suggests that the scattering approach can be hardly (Hackermueller et al. 2004), are done mostly with
used in the context of the ‘‘thermal-bath model.’’ In fullerene (a molecule which is heavy enough and is
any case, the selection of a ‘‘preferred basis’’ (the not deflected too much after a collision with a
coordinate representation) depends on the fact that particle of the gas). They show a reasonable
one is dealing with a scattering phenomenon. A few accordance with the (rough) theoretical conclusions.
steps have been made for a rigorous analysis (Teta The most refined experiments about decoherence
2004) but we are very far from a mathematically are those connected with quantum optics (circularly
satisfactory answer. polarized atoms in superconducting cavities). These
The thermal-bath approach has been studied are not related to the wave nature of the particles
within the algebraic formulation of QM and stands but in a sense to the ‘‘wave nature’’ of a photon as a
on good mathematical ground (Alicki 2002, single unit. The electromagnetic field is now
Blanchard et al. 2003, Sewell 2005). Its drawback regarded as an incoherent superposition of states
is that it is difficult to associate the formal scheme with an arbitrarily large number of photons.
with actual physical situations and it is difficult to Polarized photons can be produced one by one,
give a realistic estimate on the decoherence time. and they retain their individuality and their polar-
The thermal-bath approach attributes the deco- ization until each of them interacts with ‘‘the
herence effect to the practical impossibility of environment’’ (e.g., the boundary of the cavity or a
distinguishing between a vast majority of the pure particle of the gas). In a sense, these experimental
states of the systems and the corresponding statis- results refer to a ‘‘decoherence by collision’’ theory.
tical mixtures. In this approach, the observables are The experiments by Haroche (2003) prove that
represented by self-adjoint elements of a weakly coherence may persist for a measurable interval of
closed subalgebra M of all bounded operators B(H) time and are the most controlled experiments on
on a Hilbert space H. This subalgebra may depend coherence so far.
on the measuring apparatus (i.e, not all the
apparatuses are fit to measure a set of observables).
Other Approaches
A ‘‘classical’’ observable by definition commutes
with all other observables and therefore must belong We end this section with a brief discussion of the
to the center of A which is isomorphic to a problem of ‘‘hidden variables’’ and a presentation of
collection of functions on a probability space M. an entirely different approach to QM, originated by
Introductory Article: Quantum Mechanics 129

D Bohm (1952) and put recently on firm mathema- configuration of the points, the dynamics in a
tical grounds by Duerr et al. (1999). The approach is potential field V(x) is described in the following
radically different from the traditional one and it is way: for the wave  by a nonrelativistic Schrödinger
not clear at present whether it can give a solution to equation with potential V and for the coordinates by
the measurement problem and a description of all the ordinary differential equation (ODE)
the phenomena which traditional QM accounts for.  
But it is very interesting from the point of view of  rk 
x_ k ¼ ðh=mk ÞIm ðxÞ; xk 2 R3
the mathematics involved.  
We have remarked that the formulation of QM
that is summarized in the three axioms given earlier where mk is the mass of the mth particle.
has many unsatisfactory aspects, mainly connected Notice that the vector field is singular at the zeros
with the superposition principle (described in its of the wave function, therefore global existence and
extremal form by the Schrödinger’s cat ‘‘paradox’’) uniqueness must be proved. To see why Bohmian
and with the problem of measurement which mechanics is empirically equivalent to QM, at least
reveals, for example, through the Einstein–Rosen– for measurement of position, notice that the
Podolski ‘‘paradox,’’ an intrinsic nonlocality if one equation for the points coincides with the continuity
maintains that their ‘‘objective’’ properties can be equation in QM. It follows that if one has at time
attributed to systems which are far apart. From the zero a collection of points distributed with density
very beginning of QM, attempts have been made to j0 j2 , the density at time t will be j(t)j2 where (t)
attribute these features to the presence of ‘‘hidden is the solution of the Schrödinger equation with
variables’’; the statistical nature of the predictions initial datum 0 .
of QM is, from this point of view, due to the Bohm (1952) formulated the theory as a modi-
incompleteness of the parameters used to describe fication of Newton’s laws (and in this form it has
the systems. The impossibility of matching the been widely used) through the introduction of a
statistical prediction of QM (confirmed by experi- ‘‘quantum potential’’ VQ . This was achieved by
mental findings) with a local theory based on hidden writing the wave function in its polar form
variables and classical probability theory has been  = ReiS=h and writing the continuity equation as a
known for sometime (Kochen and Specker 1967), modified Hamilton–Jacobi equation. The version of
also through the use of ‘‘Bell inequalities’’ (Bell Bohm’s theory discussed in Duerr et al. (1999)
1964) among correlations of outcomes of separate introduces only the guiding wave function and the
measurements performed on entangled system coordinates of the points, and puts the theory on
(mainly two photons or two spin-1/2 particles firm mathematical grounds. Through an impressive
created in a suitable entangled state). series of mathematical results, these authors and
A proof of the intrinsic nonlocality of QM (in the their collaborators deal with the completeness of
above sense) was given by L Hardy (see Haroche the velocity vector field, the asymptotic behavior of
(2003)). the points trajectories (both for the scattering regime
While experimental results prove that one and for the trapped trajectories, which are shown to
cannot substitute QM with a ‘‘naive’’ theory of correspond to bound states in QM), with a rigorous
hidden variables, more refined attempts may have analysis of the theorem on the flux across a surface
success. We shall only discuss the approach of Bohm (a cornerstone in scattering theory) and the detailed
(following a previous attempt by de Broglie) as analysis of the ‘‘two-slit’’ experiment through a
presented in Duerr et al. (1999). It is a dynamical study of the interaction with the measuring appara-
theory in which representative points follow ‘‘classical tus. The theory is completely causal, both for the
paths’’ and their motion is governed by a time- trajectories of the points and for the time develop-
dependent vector ‘‘velocity’’ field (in this sense, it is ment of the pilot wave, and can also accommodate
not Newtonian). In a sense, Bohmian mechanics is a points with spin. It leads to a mathematically precise
minimal completion of QM if one wants to keep the formulation of the semiclassical limit, and it may
position as primitive observable. To these primitive also resolve the measurement problem by relating
objects, Bohm’s theory adds a complex-valued func- the pilot wave of the entire system to its approximate
tion  (the ‘‘guiding wave’’ in Bohm’s terminology) decomposition in incoherent superposition of pilot
defined on the configuration space Q of the particles. wave associated with the particle and to the measur-
In the case of particles with spin, the function  is ing apparatus (this would be the way to see the
spinor-valued. Dynamics is given by two equations: ‘‘collapse of the wave function’’ in QM). A weak
one for the coordinates of the particles and one for point of this approach is the relation of the
the guiding wave. If x  x1 , . . . , xN describes the representative points with observable quantities.
130 Introductory Article: Quantum Mechanics

Further Reading Transactions Serial A Mathematical and Physics Engineering


Science 361: 1339–1347.
Alicki R (2004) Pure decoherence in quantum systems. Open Syst. Heisenberg W (1925) Uber Quantenteoretische Umdeutung
Inf. Dyn. 11: 53–61. Kinematisches und Mechanischer Beziehungen. Zeitschrift fur
Araki H (1988) In: Jorgensen P and Muhly P (eds.) Operator Physik 33: 879–893.
Algebras and Mathematical Physics, Contemporary Mathe- Heisenberg W (1926) Uber quantentheoretische Kinematik und
matics 62. Providence, RI: American Mathematical Society. Mechanik. Matematishes Annalen 95: 694–705.
Araki H and Ezawa H (eds.) (2004) Topics in the theory of Hepp K (1974) The classical limit of quantum correlation functions.
Schroedinger Operators. River Edge, NJ: World Scientific. Communications in Mathematical Physics 35: 265–277.
Bach V, Froelich J, and Sigal IM (1998) Quantum electrody- Hepp K (1975) Results and problem in the irreversible statistical
namics of constrained non relativistic particles. Advanced mechanics of open systems. Lecture Notes in Physics 39.
Mathematics 137: 299–395. Berlin: Spriger Verlag.
Bargmann V (1967) On a Hilbert space of analytic functions and Hornberger K and Sype EJ (2003) Collisional decoherence
an associated integral transform. Communications of Pure and reexamined. Physical Reviews A 68: 012105, 1–16.
Applied Mathematics 20: 1–101. Islop P and Sigal S (1996) Introduction to spectral theory with
Bell J (1966) On the problem of hidden variables in quantum application to Schroedinger operators. Applied Mathematical
mechanics. Reviews of Modem Physics 38: 4247–4280. Sciences 113. New York: Springer Verlag.
Blanchard P and Dell’Antonio GF (eds.) (2004) Multiscale Jammer M (1989) The Conceptual Development of Quantum
methods in quantum mechanics, theory and experiments. Mechanics, 2nd edn. Tomash Publishers, American Institute
Trends in Mathematics. Boston: Birkhauser. of Physics.
Blanchard P and Olkiewiz R (2003) Decoherence in the Joos E et al. (eds.) (2003) Quantum Theory and the Appearance
Heisenberg representation. International Journal of Physics B of a Classical World, second edition. Berlin: Springer Verlag.
18: 501–507. Kochen S and Speker EP (1967) The problem of hidden variables
Bohm D (1952) A suggested interpretation of quantum theory in in quantum mechanics. Journal of Mathematics and
terms of ‘‘hidden’’ variables I, II. Physical Reviews 85: 161–179, Mechanics 17: 59–87.
180–193. Le Bris C (2002) Problematiques numeriques pour la simulation
Bohr N (1913) On the constitution of atoms and molecules. moleculaire. ESAIM Proceedings of the 11th Society on
Philosophical Magazine 26: 1–25, 476–502, 857–875. Mathematics and Applied Industries, pp. 127–190. Paris.
Bohr N (1918) On the quantum theory of line spectra. Kongelige Le Bris C and Lions PL (2005) From atoms to crystals: a
Danske Videnskabernes Selskabs Skrifter Series 8, IV, 1, 1–118. mathematical journey. Bulletin of the American Mathematical
Born M (1924) Uber quantenmechanik. Zeitschrift fur Physik 32: Society (NS) 42: 291–363.
379–395. Lieb E (1990) From atoms to stars. Bulletin of the American
Born M and Jordan P (1925) Zur quantenmechanik. Zeitschrift Mathematical Society 22: 1–49.
fur Physik 34: 858–888. Mackey GW (1963) Mathematical Foundations of Quantum
Born M, Jordan P, and Heisenberg W (1926) Zur quantenmecha- Mechanics. New York–Amsterdam: Benjamin.
nik II. Zeitschrift fur Physik 35: 587–615. Mc Weeny E (1992) An overview of molecular quantum
Cycon HL, Frese RG, Kirsh W, and Simon B (1987) Schroedinger mechanics. Methods of Computational Molecular Physics.
operators with application to quantum mechanics and geome- New York: Plenum Press.
try. Texts and Monogrphs in Physics. Berlin: Springer Verlag. Nelson E (1973) Construction of quantum fields from Markoff
de Broglie L (1923) Ondes et quanta. Comptes Rendue 177: fields. Journal of Functional Analysis 12: 97–112.
507–510. von Neumann J (1996) Mathematical foundation of quantum
Dell’Antonio GF (2004) On decoherence. Journal of Mathema- mechanics. Princeton Landmarks in Mathematics. Princeton
tical Physics 44: 4939–4955. NJ: Princeton University Press.
Dirac PAM (1925) The fundamental equations of quantum Nielsen M and Chuang I (2000) Quantum Computation and
mechanics. Proceedings of the Royal Society of London A Quantum Information. Cambridge, MA: Cambridge Uni-
109: 642–653. versity Press.
Dirac PAM (1926) The quantum algebra. Proceedings of the Ohya M and Petz D (1993) Quantum Entropy and Its Use. Text
Cambridge Philosophical Society 23: 412–428. and Monographs in Physics. Berlin: Springel Verlag.
Dirac PAM (1928) The quantum theory of the electron. Proc. Pauli W (1927) Zur Quantenmechanik des magnetische Elektron.
Royal Soc. London A 117: 610–624, 118: 351–361. Zeitschrift fur Physik 43: 661–623.
Duerr D, Golstein S, and Zanghı̀ N (1996) Bohmian mechanics as Pauli W (1928) Collected Scientific Papers, vol. 2. 151–160,
the foundation of quantum mechanics. Boston Studies Philo- 198–213, 1073–1096.
sophical Society 184: 21–44. Dordrecht: Kluwer Academic. Robert D (1987) Aoutur de l’approximation semi-classique.
Einstein A (1905) The Collected Papers of Albert Einstein, vol. 2, Progress in Mathematics 68. Boston: Birkhauser.
pp. 347–377, 564–585. Princeton, NJ: Princeton University Schroedinger E (1926) Quantizierung als Eigenwert probleme.
Press. Annalen der Physik 79: 361–376, 489–527, 80: 437–490,
Einstein A (1924–1925) Quantentheorie des einatomigen idealen 81: 109–139.
gases. Berliner Berichte (1924) 261–267, (1925) 3–14. Segal I (1996) Quantization, Nonlinear PDE and Operator Algebra,
Gell-Mann M and Hartle JB (1997) Strong decoherence. pp. 175–202. Proceedings of the Symposium on Pure Mathe-
Quantum-Classical Correspondence, pp. 3–35. Cambridge, matics 59. Providence, RI: American Mathematical Society.
MA: International Press. Sewell J (2004) Interplay between classical and quantum structure
Gross L (1972) Existence and uniqueness of physical ground state. in algebraic quantum theory. Rend, Circ. Mat. Palermo Suppl.
Journal of Functional Analysis 19: 52–109. 73: 127–136.
Haroche S (2003) Quantum Information in cavity quantum Simon B (2000) Schrodinger operators in the twentieth century.
electrodynamics. Royal Society of London Philosophical Journal of Mathematical Physics 41: 3523–3555.
Introductory Article: Topology 131

Streater RF and Wightman AS (1964) PCT, Spin and Statistics Wiener N (1938) The homogeneous chaos. American Journal of
and All That. New York–Amsterdam: Benjamin. Mathematics 60: 897–936.
Takesaki M (1971) One parameter autmorphism groups and Wigner EP (1952) Die Messung quantenmechanischer operatoren.
states of operator algebras. Actes du Congrés International des Zeitschrift fur Physik 133: 101–108.
Mathamaticiens Nice, 1970, Tome 2, pp. 427–432. Paris: Yafaev DR (1992) Mathematical scattering theory. Transactions
Gauthier Villars. of Mathematical Monographs. Providence, RI: American
Teta A (2004) On a rigorous proof of the Joos–Zeh formula for Mathematical Society.
decoherence in a two-body problem. Multiscale Methods in Zee HI (1970) On the interpretation of measurement in quantum
Quantum Mechanics, pp. 197–205. Trends in Mathematics. theory. Foundations of Physics 1: 69–76.
Boston: Birkhauser. Zurek WH (1982) Environment induced superselection rules.
Weyl A (1931) The Theory of Groups and Quantum Mechanics. Physical Reviews D 26(3): 1862–1880.
New York: Dover.

Introductory Article: Topology


Tsou Sheung Tsun, University of Oxford, Oxford, UK (i) ;, X 2 T .
ª 2006 Elsevier Ltd. All rights reserved. (ii) Let I be an index set. then
[
A 2 T ;  2 I ¼) 2I
A 2 T
Tn
Introduction (iii) Ai 2 T , i = 1, . . . , n ¼) i=1 Ai 2 T .
This will be an elementary introduction to general Definition 2 A member of the topology T is called
topology. We shall not even touch upon algebraic an open set (of X with topology T ).
topology, which will be dealt with in Cohomology
Theories, although in some mathematics departments Remark The last two properties are more easily
it is introduced in an advanced undergraduate course. put as arbitrary unions of open sets are open, and
We believe such an elementary article is useful for finite intersections of open sets are open. One can
the encyclopaedia, purely for quick reference. Most easily see the significance of this: if we take the
of the concepts will be familiar to physicists, but ‘‘usual topology’’ (which will be defined in due
usually in a general rather vague sense. This article course) of the real line, then the intersection of all
will provide the rigorous definitions and results open intervals (1=n, 1=n), n a positive integer, is
whenever they are needed when consulting other just the single point {0}, which is manifestly not
articles in the work. To make sure that this is the open in the usual sense.
case, we have in fact experimentally tested the Example If we postulate that ;, and the entire set
article on physicists for usefulness. X, are the only open subsets, we get what is called
Topology is very often described as ‘‘rubber-sheet the indiscrete or coarsest topology. At the other
geometry,’’ that is, one is allowed to deform objects extreme, if we postulate that all subsets are open,
without actually breaking them. This is the all- then we get the discrete or finest topology. Both
important concept of continuity, which underlies seem quite unnatural if we think in terms of the
most of what we shall study here. real line or plane, but in fact it would be more
We shall give full definitions, state theorems unnatural to explicitly exclude them from the
rigorously, but shall not give any detailed proofs. definition. They prove to be quite useful in certain
On the other hand, we shall cite many examples, respects.
with a view to applications to mathematical physics,
taking for granted that familiar more advanced Definition 3 A subset of X is closed if its
concepts there need not be defined. By the same complement in X is open.
token, the choice of topics will also be so dictated. Remarks
(i) One could easily build a topology using closed
",1,5,1,0,0pc,0pc,0pc,0pc>Essential sets instead of open sets, because of the simple
Concepts relation that the complement of a union is the
intersection of the complements.
Definition 1 Let X be a set. A collection T of
(ii) From the definitions, there is nothing to prevent
subsets of X is called a topology if the following are
a set being both open and closed, or neither
satisfied:
132 Introductory Article: Topology

Definition 4 A set equipped with a topology is This space is neither Hausdorff nor compact (see
called a topological space (with respect to the given later for definition of compactness).
topology). Elements of a topological space are
Definition 13 Let X and Y be two topological
sometimes called points.
spaces and let f : X ! Y be a map from X to Y. We
Definition 5 Let x 2 X. A neighborhood of x is a say that f is continuous if f 1 (A) is open (in X)
subset of X containing an open set which contains x. whenever A is open (in Y).
Remark This seems a clumsy definition, but turns Remark Continuity is the single most important
out to be more useful in the general case than concept here. In this general setting, it looks a little
restricting to open neighborhoods, which is often done. different from the ‘‘–’’ definition, but this latter works
only for metric spaces, which we shall come to shortly.
Definition 6 A subcollection of open sets B  T is
called a basis for the topology T if every open set is Definition 14 A map f : X ! Y is a homeomorph-
a union of sets of B. ism if it is a continuous bijective map such that its
Definition 7 A subcollection of open sets S  T is inverse f 1 is also continuous.
called a sub-basis for the topology T if every open Remark Homeomorphisms are the natural maps
set is a union of finite intersections of sets of S. for topological spaces, in the sense that two home-
Definition 8 The closure A  of a subset A of X is omorphic spaces are ‘‘indistinguishable’’ from the
the smallest closed set containing A. point of view of topology. Topological invariants
are properties of topological spaces which are
Definition 9 The interior Å of a subset A of X is preserved under homeomorphisms.
the largest open set contained in A.
Definition 15 Let B  A. Then one can define the
Remark It is sometimes useful to define the relative topology of B by saying that a subset C  B
 Å = {x 2 A,
boundary of A as the set An  x 62 Å}.
is open if and only if there exists an open set D of A
Definition 10 Let A be a subset of a topological such that C = D \ B.
space X. A point x 2 X is called a limit point of A if Definition 16 A subset B  A equipped with the
every open set containing x contains some point of relative topology is called a subspace of the
A other than x. topological space A.
Definition 11 A subset A of X is said to be dense in
 = X. Remark Thus, if for subsets of the real line, we
X if A consider A = [0, 3], B = [0, 2], then C = (1, 2] is open
Definition 12 A topological space X is called a in B, in the relative topology induced by the usual
Hausdorff space if for any two distinct points x, y 2 X, topology of R.
there exist an open neighborhood of A of x and an Definition 17 Given two topological spaces X and Y,
open neighborhood B of y such that A and B are
we can define a product topological space Z = X  Y,
disjoint (that is, A \ B = ;).
where the set is the Cartesian product of the two sets X
Remark and Examples and Y, and sets of the form A  B, where A is open in
X and B is open in Y, form a basis for the topology.
(i) This is looking more like what we expect.
However, certain mildly non-Hausdorff spaces Remark Note that the open sets of X  Y are not
turn out to be quite useful, for example, in twistor always of this product form (A  B).
theory. A ‘‘pocket’’ furnishes such an example.
Definition 18 Suppose there is a partition of X into
Explicitly, consider X to be the subset of the real
disjoint subsets A ,  2 I , for some index set I , or
plane consisting of the interval [1, 1] on the x-
equivalently, there is defined on X an equivalence
axis, together with the interval [0, 1] on the line
relation  . Then one can define the quotient
y = 1, where the following pairs of points are
topology on the set of equivalence classes {A ,  2
identified: (x, 0) ffi (x, 1), 0 < x  1. Then the two
I }, usually denoted as the quotient space X=  = Y,
points (0, 0) and (0, 1) do not have any disjoint
as follows. Consider the map  : X ! Y, called the
neighborhoods. Strictly speaking, one needs the
canonical projection, which maps the element x 2 X
notion of a quotient topology, introduced below.
to its equivalence class [x]. Then a subset U  Y is
(ii) For a more ‘‘truly’’ non-Hausdorff topology,
open if and only if 1 (U) is open.
consider the space of positive integers N =
{1, 2, 3, . . . }, and take as open sets the following: Proposition 1 Let T be the quotient topology on
;, N, and the sets {1, 2, . . . , n} for each n 2 N. the quotient space Y. Suppose T 0 is another
Introductory Article: Topology 133

topology on Y such that the canonical projection is Definition 25 A metric space is a set X together
continuous, then T 0  T . with a function d : X  X ! R satisfying
Definition 19 An (open) cover {U :  2 I } for X is a (i) d(x, y)  0,
collection of open sets U  X such that their union (ii) d(x, y) = 0 , x = y,
equals X. A subcover of this cover is then a subset of (iii) d(x, z)  d(x, y) þ d(y, z) (‘‘triangle inequality’’).
the collection which is itself a cover for X.
Remarks
Definition 20 A topological space X is said to be
(i) The function d is called the metric, or distance
compact if every cover contains a finite subcover.
function, between the two points.
Remark So for a compact space, however one (ii) This concept of metric is what is generally
chooses to cover it, it is always sufficient to use a known as ‘‘Euclidean’’ metric in mathematical
finite number of open subsets. This is one of the physics. The distinguishing feature is the posi-
essential differences between an open interval (not tive definiteness (and the triangle inequality).
compact) and a closed interval (compact). The former One can, and does, introduce indefinite metrics
is in fact homeomorphic to the entire real line. (for example, the Minkowski metric) with
various signatures. But these metrics are not
Definition 21 A topological space X is said to be
usually used to induce topologies in the spaces
connected if it cannot be written as the union of two
concerned.
nonempty disjoint open sets.
Definition 26 Given a metric space X and a point
Remark A useful equivalent definition is that any
x 2 X, we define the open ball centred at x with
continuous map from X to the two-point set {0, 1},
radius r (a positive real number) as
equipped with the discrete topology, cannot be
surjective. Br ðxÞ ¼ fy 2 X : dðx; yÞ < rg
Definition 22 Given two points x, y in a topolo- Given a metric space X, we can immediately
gical space X, a path from x to y is a continuous define a topology on it by taking all the open balls in
map f : [0, 1] ! X such that f (0) = x, f (1) = y. We X as a basis. We say that this is the topology
also say that such a path joins x and y. induced by the given metric. Then we can recover
Definition 23 A topological space X is path- our usual ‘‘–’’ definition of continuity.
connected if every two points in X can be joined Proposition 4 Let f : X ! Y be a map from the metric
by a path lying entirely in X. space X to the metric space Y. Then f is continuous
Proposition 2 A path-connected space is connected. (with respect to the corresponding induced topologies)
at x 2 X if and only if given any  > 0, 9 > 0 such that
Proposition 3 A connected open subspace of R n is d(x, x0 ) <  implies d(f (x, ), f (x0 )) < .
path-connected.
Note that we do not bother to give two different
Definition 24 Given a topological space X, define symbols to the two metrics, as it is clear which
an equivalence relation by saying that x  y if and spaces are involved. The proof is easily seen by
only if x and y belong to the same connected taking the relevant balls as neighborhoods. Equally
subspace of X. Then the equivalence classes are easy is the following:
called (connected) components of X.
Proposition 5 A metric space is Hausdorff.
Examples
Definition 27 A map f : X ! Y of metric spaces is
(i) The Lie group O(3) of 3  3 orthogonal matrices uniformly continuous if given any  > 0 there exists
has two connected components. The identity  > 0 such that for any x1 , x2 2 X, d(x1 , x2 ) < 
connected component is SO(3) and is a subgroup. implies d(f (x1 ), f (x2 )) < .
(ii) The proper orthochronous Lorentz transformations
of Minkowski space form the identity component Remark Note the difference between continuity
of the group of Lorentz transformations. and uniform continuity: the latter is stronger and
requires the same  for the whole space.
Definition 28 Two metrics d1 and d2 defined on X
Metric Spaces are equivalent if there exist positive constants a and
b such that for any two points x, y 2 X we have
A special class of topological spaces plays an
important role: metric spaces. ad1 ðx; yÞ  d2 ðx; yÞ  bd1 ðx; yÞ
134 Introductory Article: Topology

Remark This is clearly an equivalence relation. Definition 31 A metric space X is complete if every
Two equivalent metrics induce the same topology. Cauchy sequence in X converges to a limit in it.
Examples Examples
(i) Given a set X, we can define the discrete metric (i) The closed interval [0, 1] on the real line is
as follows: d0 (x, y) = 1 whenever x 6¼ y. This complete, whereas the open interval (0, 1) is
induces the discrete topology on X. This is quite not. For example, the Cauchy sequence
a convenient way of describing the discrete {1=n, n = 2, 3, . . . } has no limit in this open
topology. interval. (Considered as a sequence on the real
(ii) In R, the usual metric is d(x, y) = jx  yj, and line, it has of course the limit point 0.)
the usual topology is the one induced by this. (ii) The spaces Rn are complete.
(iii) More generally, in Rn , we can define a metric (iii) The Hilbert space ‘2 consisting of all
for every p  1 by sequences
P 2of real numbers {x1 , x2 , . . . } such
( )1=p that 1 1 xk converges is complete with respect
X
n
p to the obvious metric which is a generalization
dp ðx; yÞ ¼ jxk  yk j
to infinite dimension of d2 above. For arbi-
k¼1
trary p  1, one can similarly define ‘ p , which
where x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ). In are also complete and are hence Banach
particular, for p = 2 we have the usual Eucli- spaces.
dean metric, but the other cases are also useful.
To continue the series, one can define Remarks Completeness is not a topological invar-
iant. For example, the open interval (1, 1) and the
d1 ¼ max fjxk  yk jg whole real line are homeomorphic (with respect to
1<k<n
the usual topologies) but the former is not complete
All these metrics induce the same topology on R n . while the latter is. The homeomorphism can
(iv) In a vector space V, say over the real or the conveniently be given in terms of the trigonometric
complex field, a function k k : V ! Rþ is called function tangent.
a norm if it satisfies the following axioms:
(a) kxk = 0 if and only if x = 0, Definition 32 A subset B of the metric space X is
(b) kxk = jjkxk, and bounded if there exists a ball of radius R (R > 0)
(c) kx þ yk  kxk þ kyk. which contains it entirely.

Then it is easy to see that a metric can be defined Theorem 1 (Heine–Borel) Any closed bounded
using the norm subset of Rn is compact.
Remark The converse is also true. We have thus a
dðx; yÞ ¼ kx  yk
nice characterization of compact subsets of R n as
In many cases, for example, the metrics defined in being closed and bounded.
example (iii) above, one can define the norm of a
Proposition 6 Any bounded sequence in Rn has a
vector as just the distance of it from the origin. One
convergent subsequence.
obvious exception is the discrete metric.
A slightly more general concept is found to be Definition 33 Consider a sequence {fn } of real-
useful for spaces of functions and operators: that of valued functions on a subset A (usually an interval)
seminorms. A seminorm is one which satisfies the of R. We say that {fn } converges pointwise in A if
last two of the conditions, but not necessarily the the sequence of real numbers {fn (x)} converges for
first, for a norm, as listed above. every x 2 A. We can then define a function f : A ! R
by f (x) = limn!1 fn (x), and write fn ! f .
Definition 29 Given a metric space X, a sequence
of points {x1 , x2 , . . . } is called a Cauchy sequence if, Definition 34 A sequence of functions fn : A !
given any  > 0, there exists a positive integer N R, A  R is said to converge uniformly to a function
such that for any k, ‘ > N we have d(xk , x‘ ) < . f : A ! R if given any  > 0, there exists a positive
integer N such that, for all x, jfn (x)  f (x)j < 
Definition 30 Given a sequence of points
whenever n > N.
{x1 , x2 , . . . } in a metric space X, a point x 2 X is
called a limit of the sequence if given any  > 0, Theorem 2 Let fn : (a, b) ! R be a sequence of
there exists a positive integer N such that for any functions continuous at the point c 2 (a, b), and
n > N we have d(x, xn ) < . We say that the suppose fn converges uniformly to f on (a, b). Then f
sequence converges to x. is continuous at c.
Introductory Article: Topology 135

Remark and Example The pointwise limit of take equivalence classes of functions which are equal
continuous functions need not be continuous, as almost everywhere (that is, up to a null set), but very
can be shown by the following example: often we can take representatives of these classes
fn (x) = xn , x 2 [0, 1]. We see that the limit function and just deal with genuine functions instead. Note
f is not continuous: that of all Lp , only L2 is a Hilbert space.
n
f ðxÞ ¼
0 x 6¼ 1 Definition 38 In the space Lp , we define its norm by
1 x¼1 Z 1=p
p
Definition 35 Let X be a metric space. A map kf k ¼ jf ðxÞj dx
f : X ! X is a contraction if there exists c < 1 such
that d(f (x), f (y))  cd(x, y) for all x, y 2 X. Now we turn to general normed spaces, and
operators on them.
Theorem 3 (Banach) If X is a complete metric
space and f is a contraction in X, then f has a unique Definition 39 Convergence in the norm is also
fixed point x 2 X, that is, f (x) = x. called strong convergence. In other words, a
sequence (xn ) in a normed space X is said to
converge strongly to x if
Some Function and Operator Spaces
lim kxn  xk ¼ 0
The spaces of functions and operators can be n!1
equipped with different topologies, given by various
Definition 40 A sequence (xn ) in a normed space X
concepts of convergence and of norms (or sometimes
is said to converge weakly to x if
seminorms), very often with different such concepts
for the same space. As we saw earlier, a norm in a lim f ðxn Þ ¼ f ðxÞ
n!1
vector space gives rise to a metric, and hence to a
topology. Similarly with the concept of convergence for all bounded linear functionals f.
for sequences of functions and operators, as one Consider the space B(X, Y) of bounded linear
then knows what the limit points, and hence closed operators T from X to Y. We can make this into a
sets, are. normed space by defining the following norm:
But before we do that, let us introduce, in a
slightly different context, a topology which is in kTk ¼ sup kTxk
some sense the natural one for the space of x 2X; kxk ¼ 1
continuous maps from one space to another.
Then we can define three different concepts of
Definition 36 Consider a family F of maps from a
convergence on B(X, Y). There are in fact more in
topological space X to a topological space Y, and
current use in functional analysis.
define W(K, U) = {f : f 2 F, f (K)  U}. Then the
family of all sets of the form W(K, U) with K Definition 41 Let X and Y be normed spaces and
compact (in X) and U open (in Y) form a sub-basis let (Tn ) be a sequence of operators Tn 2 B(X, Y).
for the compact open topology for F.
(i) (Tn ) is uniformly convergent if it converges in
Consider a topological space X and sequences of the norm.
functions (fn ) on it. Let D  X. We can then define (ii) (Tn ) is strongly convergent if (Tn x) converges
pointwise convergence and uniform convergence strongly for every x 2 X.
exactly as for functions on subsets of the real line. (iii) (Tn ) is weakly convergent if (Tn x) converges
weakly for every x 2 X.
Definition 37 Let X, D and (fn ) as above.
Remark Clearly we have: uniform convergence ¼)
(i) The functions fn converge pointwise on D to a
strong convergence ¼) weak convergence, and the
function f if the sequence of numbers
limits are the same in all three cases. However, the
fn (x) ! f (x), 8x 2 D.
converses are in general not true.
(ii) The functions fn converge uniformly on D to a
function f if given  > 0, there exists N such that
for all n > N we have jfn (x)  f (x)j < , 8x 2 D. Homotopy Groups
Next we consider the Lebesgue spaces Lp , that The most elementary and obvious property of a
is, functions f defined on subsets of Rn , such topological space X is the number of connected
that jf (x)jp is Lebesgue integrable, for real components it has. The next such property, in a
numbers p  1. To define these spaces, we tacitly certain sense, is the number of holes X has. There
136 Introductory Article: Topology

are higher analogues of these, called the homotopy Definition 45 A space X is called simply connected
groups, which are topological invariants, that is, if 1 (X) is trivial.
they are invariant under homeomorphisms. They
To define the higher homotopy groups, let us go
play important roles in many topological considera-
into a little detail about homotopy.
tions in field theory and other topics of mathema-
tical physics. The articles Topological Defects Definition 46 Given two topological spaces X and
and Their Homotopy Classification and Electric- Y, and maps
Magnetic Duality contain some examples.
p; q : X ! Y
Definition 42 Given a topological space X, the
zeroth homotopy set, denoted 0 (X), is the set of we say that h is a homotopy between the maps p, q if
connected components of X. One sometimes writes
0 (X) = 0 if X is connected. h:XI !Y

To define the fundamental group of X, or 1 (X), is a continuous map such that h(x, 0) = p(x),
we shall need the concept of closed loops, which we h(x, 1) = q(x), where I is the unit interval [0, 1]. In
shall find useful in other ways too. For simplicity, this case, we write p ’ q.
we shall consider based loops (that is, loops passing
Definition 47 A map f : X ! Y is a homotopy
through a fixed point in X). It seems that in most
equivalence if there exists a map g : Y ! X such
applications, these are the relevant ones. One could
that g f ’ idX and f g ’ idY .
consider loops of various smoothness (when X is a
manifold), but in view of applications to quantum Remark This is an equivalence relation.
field theory, we shall consider continuous loops,
Definition 48 For a topological space X with base
which are also the ones relevant for topology.
point x0 , we define n (X), n  0 as the set of
Definition 43 Given a topological space X and a homotopy equivalence classes of based maps from
point x0 2 X, a (closed) (based) loop is a continuous the n-sphere Sn to X.
function of the parametrized circle to X:
Remark This coincides with the previous defini-
tions for 0 and 1 .
 : ½0; 2
! X
There is a very nice relation between homotopy
satisfying (0) = (2) = x0 .
classes and loop spaces.
Definition 44 Given a connected topological space
Proposition 8 n (X) = n1 (X) = = 0 (n X).
X and a point x0 2 X, the space of all closed based
loops is called the (parametrized based) loop space Remarks
of X, denoted X.
(i) When we consider the gauge group G in a Yang–
Remarks Mills theory, its fundamental group classifies the
monopoles that can occur in the theory.
(i) The loop space X inherits the relative compact–
(ii) For n  1, n (X) is a group, the group action
open topology from the space of continuous maps
coming from the joining of two loops together
from the closed interval [0, 2] to X. It also has a
to form a new loop. On the other hand, 0 (X)
natural base point: the constant function mapping
in general is not a group. However, when X is a
all of [0, 2] to x0 . Hence it is easy to iterate the
Lie group, then 0 (X) inherits a group structure
construction and define k X, k  1.
from X, because it can be identified with the
(ii) Here we have chosen to parametrize the circle
quotient group of X by its identity-connected
by [0, 2], as is more natural if we think in
component. For example, the two components
terms of the phase angle. We could easily have
of O(3) can be identified with the two elements
chosen the unit interval [0,1] instead. This
of the group Z2 , the component where the
would perhaps harmonize better with our pre-
determinant equals 1 corresponding to 0 in Z2
vious definition of paths and the definitions of
and the component where the determinant
homotopies below.
equals 1 corresponding to 1 in Z2 .
Proposition 7 The fundamental group of a topo- (iii) For n  2, the group n (X) is always abelian.
logical space X, denoted 1 (X), consists of classes of (iv) Examples of nonabelian 1 are the fundamental
closed loops in X which cannot be continuously groups of some Riemann surfaces.
deformed into one another while preserving the base (v) Since 1 is not necessarily abelian, much of the
point. direct-sum notation we use for the homotopy
Introductory Article: Topology 137

groups should more correctly be written multi- transitive, then we have the following nice
plicatively. However, in most literature in result: coverings of X are in 1–1 correspon-
mathematical physics, the additive notation dence with normal subgroups of 1 (X).
seems to be preferred. (ii) Given a connected space X, there always exists a
unique connected simply connected covering space
Examples e called the universal covering space. Further-
X,
(i) n (X  Y) = n (X) þ n (Y), n  1. more, Xe covers all the other covering spaces of X.
(ii) For the spheres, we have the following results: For the higher homotopy groups, one has
 e
0 if i > n n ðXÞ ¼ n ðXÞ; n2
i ðSn Þ ¼
Z if i ¼ n One very important class of homotopy groups are
i ðS1 Þ ¼ 0 if i > 1 those of Lie groups. To simplify matters, we shall
nþ1 ðSn Þ ¼ Z2 if n  3 consider only connected groups, that is, 0 (G) = 0.
Also we shall deal mainly with the classical groups,
nþ2 ðSn Þ ¼ Z2 if n  2
and in particular, the orthogonal and unitary groups.
6 ðS3 Þ ¼ Z12
Proposition 9 Suppose that G is a connected Lie
(iii) From the theory of sphere bundles, we can group.
deduce:
(i) If G is compact and semi-simple, then 1 (G) is
i ðS2 Þ ¼ i1 ðS1 Þ þ i ðS3 Þ if i  2 e is still compact.
finite. This implies that G
i ðS4 Þ ¼ i1 ðS3 Þ þ i ðS7 Þ if i  2 (ii) 2 (G) = 0.
(iii) For G compact, simple, and nonabelian,
i ðS8 Þ ¼ i1 ðS7 Þ þ i ðS15 Þ if i  2 3 (G) = Z.
(iv) For G compact, simply connected, and simple,
and the first of these relations give the follow-
4 (G) = 0 or Z2 .
ing more succinct result:
Examples
i ðS3 Þ ¼ i ðS2 Þ if i  3
(i) 1 (SU(n)) = 0.
(iv) A result of Serre says that all the homotopy (ii) 1 (SO(n)) = Z2 .
groups of spheres are in fact finite except n (Sn ) (iii) Since the unitary groups U(n) are topologically
and 4n1 (S2n ), n  1. the product of SU(n) with a circle S1 , their
Definition 49 Given a connected space X, a map homotopy groups are easily computed using the
 : B ! X is called a covering if (i) (B) = X, and (ii) for product formula. We remind ourselves that
each x 2 X, there exists an open connected neighbor- U(1) is topologically a circle and SU(2) topolo-
hood V of x such that each component of 1 (V) is open gically S3 .
in B, and  restricted to each component is a home- (iv) For i  2, we have:
omorphism. The space B is called a covering space. i ðSOð3ÞÞ ¼ i ðSUð2ÞÞ
Examples i ðSOð5ÞÞ ¼ i ðSpð2ÞÞ
(i) The real line R is a covering of the group U(1). i ðSOð6ÞÞ ¼ i ðSUð4ÞÞ
(ii) The group SU(2) is a double cover of the group Just for interest, and to show the richness of the
SO(3). subject, some isomorphisms for homotopy groups
(iii) The group SL(2, C) is a double cover of the are shown in Table 1 and some homotopy groups
Lorentz group SO(1, 3). for low SU(n) and SO(n) are listed in Table 2.
(iv) The group SU(2, 2) is a 4-fold cover of the
conformal group in four dimensions. This local Table 1 Some isomorphisms for homotopy groups
isomorphism is of great importance in twistor
theory. Isomorphism Range

Remarks i (SO(n)) ffi i (SO(m)) n, m  i þ 2


i (SU(n)) ffi i (SU(m)) n, m  12 (i þ 1)
(i) By considering closed loops in X and their i (Sp(n)) ffi i (Sp(m)) n, m  14 (i  1)
coverings in B it is easily seen that the i (G2 ) ffi i (SO(7)) 2i 5
i (F4 ) ffi i (SO(9)) 2i 6
fundamental group 1 (X) acts on the coverings
i (SO(9)) ffi i (SO(7)) i  13
of X. If we further assume that the action is
138 Introductory Article: Topology

Table 2 Some homotopy groups for low SU(n) and SO(n)

4 5 6 7 8 9 10

SU(2) Z2 Z2 Z12 Z2 Z2 Z3 Z15


SU(3) 0 Z Z6 0 Z12 Z3 Z30
SU(4) 0 Z 0 Z Z24 Z2 Z120 þ Z2
SU(5) 0 Z 0 Z 0 Z Z120
SU(6) 0 Z 0 Z 0 Z Z3
SO(5) Z2 Z2 0 Z 0 0 Z120
SO(6) 0 Z 0 Z Z24 Z2 Z120 þ Z2
SO(7) 0 0 0 Z Z2 þ Z2 Z2 þ Z2 Z24
SO(8) 0 0 0 ZþZ Z2 þ Z2 þ Z2 Z2 þ Z2 þ Z2 Z24 þ Z24
SO(9) 0 0 0 Z Z2 þ Z2 Z2 þ Z2 Z24
SO(10) 0 0 0 Z Z2 Z þ Z2 Z12

Appendix: A Mathematician’s the other hand, the map f 1 is defined if and only
Basic Toolkit if f is bijective.
6. A map from a set to either the real or complex
The following is a drastically condensed list, most numbers is usually called a function.
of which is what a mathematics undergraduate 7. A map between vector spaces, and more particu-
learns in the first few weeks. The rest is included larly normed spaces (including Hilbert spaces), is
for easy reference. These notations and concepts called an operator. Most often, one considers
are used universally in mathematical writing. We linear operators.
have not endeavored to arrange the material in a 8. An operator from a vector space to its field of
logical order. Furthermore, given structures such as scalars is called a functional. Again, one con-
sets, groups, etc., one can usually define ‘‘substruc- siders almost exclusively linear functionals.
tures’’ such as subsets, subgroups, etc., in a
straightforward manner. We shall therefore not Relations
spell this out.
1. A relation  on a set A is a subset R  A  A.
We say that x  y if (x, y) 2 R.
Sets 2. We shall only be interested in equivalence relations.
An equivalence relation  is one satisfying, for all
A [ B ¼ fx : x 2 A or x 2 Bg union
x, y, z 2 A:
A \ B ¼ fx : x 2 A and x 2 Bg intersection (a) x  x (‘‘reflexive’’),
AnB ¼ fx : x 2 A and x 62 Bg complement (b) x  y ¼) y  x (‘‘symmetric’’),
A  B ¼ fðx; yÞ : x 2 A; y 2 Bg Cartesian product (c) x  y, y  z ¼) x  z (‘‘transitive’’).
3. If  is an equivalence relation in A, then for each
x 2 A, we can define its equivalence class:

Maps ½x
¼ fy 2 A : y  xg
It can be shown that equivalence classes are
1. A map or mapping f : A ! B is an assignment of nonempty, any two equivalence classes are either
an element f (x) of B for every x 2 A. equal or disjoint, and they together partition the set
2. A map f : A ! B is injective if f (x) = f (y) A. Subgroup equivalence classes are called cosets.
¼) x = y. This is sometimes called a 1–1 map, a 4. An element of an equivalence class is called a
term to be avoided. representative.
3. A map f : A ! B is surjective if for every y 2 B
there exists an x 2 A such that y = f (x). This is Groups
sometimes called an ‘‘onto’’ map.
A group is a set G with a map, called multiplication
4. A map f : A ! B is bijective if it is both surjective
or group law
and injective. This is also sometimes called a 1–1
map, a term to be equally avoided. G  G ! G
5. For any map f : A ! B and any subset C  B, the
ðx; yÞ 7! xy
inverse image f 1 (C) = {x: f (x) 2 C}  A is always
defined, although, of course, it can be empty. On satisfying
Introductory Article: Topology 139

1. (xy)z = x(yz), 8x, y, z 2 G (‘‘associative’’); Fields


2. there exists a neutral element (or identity) 1 such
A field F is a commutative ring in which every
that 1x = x1 = x, 8x 2 G; and
nonzero element is invertible.
3. every element x 2 G has an inverse x1 , that is,
The additive identity 0 is never invertible, unless
xx1 = x1 x = 1.
0 = 1, so it is usual to assume that a field has at least
A map such as the multiplication in the definition two elements, 0 and 1.
is an example of a binary operation. Note that we The most common fields we come across are, of
have denoted the group law as multiplication here. course, the number fields: the rationals, the reals,
It is usual to denote it additively if the group is and the complex numbers.
abelian, that is, if xy = yx, 8x, y 2 G. In this case, we
may write the condition as x þ y = y þ x, and call
Vector Spaces
the identity element 0.
A vector space, or sometimes linear space, V, over a
field F, is an abelian group, written additively, with
Rings
a map F  V ! V such that, for x, y 2 V, ,  2 F,
A ring is a set R equipped with two binary
1. (x þ y) = x þ y (‘‘linearity’’),
operations, x þ y called addition, and xy called
2. ( þ )x = x þ x,
multiplication, such that
3. ()x = (x), and
1. R is an abelian group under addition; 4. 1x = x.
2. the multiplication is associative; and
A vector space is then a right (or left) F-module.
3. (x þ y)z = xz þ yz, x(y þ z) = xy þ xz, 8x, y, z 2 R
The elements of V are called vectors, and those of F
(‘‘distributive’’).
scalars.
If the multiplication is commutative (xy = yx) then
the ring is said to be commutative. A ring may
Algebras
contain a multiplicative identity, in which case it is
called a ring with unit element. An algebra A over a field F is a ring which is a
An ideal I of R is a subring of R, satisfying in vector space over F, such that
addition
ðabÞ ¼ ðaÞb ¼ aðbÞ;  2 F; a; b 2 A
r 2 R; a 2 I ¼) ra 2 I; ar 2 I
Note that in some older literature, particularly the
One can define in an obvious fashion a left-ideal and Russian school, an algebra of operators is called a
a right- ideal. The above definition will then be for a ring of operators.
two-sided ideal.

Modules Further Reading


Given a ring R, an R-module is an abelian group M, Borel A (1955) Topology of Lie groups and characteristic classes.
together with an operation, M  R ! M, denoted Bulletin American Mathematical Society 61: 397–432.
Kelly JL (1955) General Topology. New York: Van Nostrand
multiplicatively, satisfying, for x, y 2 M, r, s 2 R,
Reinhold.
1. (x þ y)r = xr þ yr, Kreyszig E (1978) Introductory Functional Analysis with Applica-
tions. New York: Wiley.
2. x(r þ s) = xr þ xs,
Mc Carty G (1967) Topology: An Introduction with Application
3. x(rs) = (xr)s, and to Topological Groups. New York: McGraw-Hill.
4. x1 = x Simmons GF (1963) Introduction to Topology and Modern
Analysis. New York: McGraw-Hill.
The term right R-module is sometimes used, to
distinguish it from obviously defined left R-modules.
A
Abelian and Nonabelian Gauge Theories Using Differential Forms
A C Hirshfeld, Universität Dortmund, Dortmund, invariance. The covariant derivatives involve the
Germany gauge potentials, whose transformation properties
ª 2006 Elsevier Ltd. All rights reserved. are dictated by those of the covariant derivative.
Whereas for an abelian gauge theory such as
electromagnetism scalar-valued p-forms are suffi-
cient (actually only p = 1, 2), a nonabelian gauge
Introduction theory involves the use of Lie-algebra-valued
p-forms. These are introduced and used to construct
Quantum electrodynamics is the theory of the
the Yang–Mills action, which involves the field
electromagnetic interactions of photons and elec-
strength tensor which is determined from the gauge
trons. When attempting to generalize this theory to
potentials. This action leads to the Yang–Mills
other interactions it turns out to be necessary to
equations for the gauge potentials, which are the
identify its essential components. The essential
nonabelian generalizations of the Maxwell equations.
properties of electrodynamics are contained in its
formulation as an ‘‘abelian gauge theory.’’ The
generalization to include other interactions is then
reduced to incorporating the structure of nonabelian Relativistic Kinematics
groups. This becomes particularly clear when we
The trajectory of a mass point is described as x (),
formulate the theory in the language of differential
where  is the invariant proper time interval:
forms.
Here we first present the formulation of electro- d 2 ¼ dt2  dx  dx ¼ dt2 ð1  v2 Þ ½1
dynamics using differential forms. The electromag-
netic fields are introduced via the Lorentz force with v = dx=dt. With the abbreviation  = (1  v2 )1=2
equation. They are recognized as the components of this yields d = (1=)dt.
a differential 2-form. This form fulfills two differ- The 4-velocity of a point is defined as u =
ential conditions, which are equivalent to Maxwell’s dx =d = (dx =dt). The quantity
equations. These are expressed with the help of a
dx dx
differential operator and its Hermitian conjugate, u2 ¼ g u u ¼ ¼1 ½2
the codifferential operator. We consider the effects d 2
of charge conservation and introduce electromag- is a relativistic invariant. Here
netic potentials, which are defined up to gauge 0 1
transformations. We finally consider Weyl’s argu- 1 0 0 0
B 0 1 0 0C
ment for the existence of the electromagnetic g ¼B C ½3
@0 0 1 0A
interaction as a consequence of the local phase
invariance of the electron wave function. 0 0 0 1
We then go on to present the nonabelian general-
is the metric of Minkowski space.
ization. The gauge bosons appear in a theory with
The 4-momentum of a particle is p = m0 u =
fermions by requiring invariance of the theory with
(m0 , m0 v), and p p = m20 . The 4-force is
respect to local gauge transformations. When the
fermions group into symmetry multiplets this gives  0 
 dp dp dp
rise to a gauge group SU(N) involving N 2 1 gauge f ¼ ¼ ¼ ;f ½4
d dt dt
bosons mediating the interaction, where N is the
dimension of the Lie algebra. The interaction arises with the 3-force
through the necessity of replacing the usual deriva-
tives by covariant derivatives, which transform in a dðm0 vÞ
f ¼ ½5
natural way in order to preserve the gauge dt
142 Abelian and Nonabelian Gauge Theories Using Differential Forms

Differentiate p2 = m20 with respect to , this yields for an arbitrary vector v. The contraction of a
  2-form with a vector yields a 1-form.
dp0 It is easily seen that a 2-form can be expressed in
2p f ¼ 2m0  2  f v ¼0 ½6
dt terms of a polar vector and an axial vector: if it is to
be invariant with respect to parity transformations
or with

dp0 dx t ! t; x ! x; y ! y; z ! z ½17


¼ f v ¼ f  ½7
dt dt
the fields in eqn [13] must transform as
This says that
E ! E; B!B ½18
0
dp ¼ f  dx ¼ dW ½8 Now we check the validity of eqn [11]. We have

where W is the work done and p0 is the energy. f ¼ qiu F


For a charged particle, the Lorentz force is ¼ qðv  EÞdt  q½ðEx þ ðv  BÞx Þdx
f ¼ qðE þ v  BÞ ½9 þ ðEy þ ðv  BÞy Þdy þ ðEz þ ðv  BÞz Þdz ½19
in agreement with eqn [10]. We remember to change
where q is the charge of the particle, E is the electric,
the signs in Ex = Ex , Bx = Bx , etc.
and B the magnetic field strength. Since f  v = qE  v,
we have the four-dimensional form of the Lorentz
force:
The Codifferential Operator
f  ¼ qðE  v; E þ v  BÞ ½10
The space of p-forms on an n-dimensional manifold
is an
   
The Lorentz Force Equation with n n n!
¼ ¼ ½20
Differential Forms p np ðn  pÞ!p!
We write the Lorentz force equation as an equation dimensional vector space. The space of p-forms is
for a differential form f = f dx , with f = g f  . The thus isomorphic to the space of (n  p)-forms. The
velocity-dependent Lorentz force is Hodge dual operator maps the p-forms into the
(n  p)-forms, and is defined by
f ¼ qiu F ½11
 ^   ¼ h;  idx1 ^    ^ dxn ½21
with
Here h,  i is the scalar product of two p-forms:
 
@ @ @ @
u¼ þ vx þ vy þ vz ½12 h;  i ¼ i1  ip i1  sip ½22
@t @x @y @z
where i1  sip are the coefficients of the form ,
the 4-velocity and F the electromagnetic field
strength:  ¼ i1  ip dxi1 ^    ^ dxip ½23

F ¼ E ^ dt þ B ½13 j1  sjp are the coefficients of the form ,

where E is a 1-form in three dimensions,  ¼ j1  jp dxj1 ^    ^ dxjp ½24

E ¼ Ex dx þ Ey dy þ Ez dz ½14 and

and B is a 2-form in three dimensions, i1  ip ¼ gi1 j1    gip jp j1  jp ½25

The indices satisfy i1 <    < ip and j1 <   < jp .


B ¼ Bx dy ^ dz þ By dz ^ dx þ Bz dx ^ dy ½15
The basis elements are orthogonal with respect to
The symbol iu indicates a contraction of a 2-form this scalar product, and
with a vector, which is defined as
hdxi1 ^    ^ dxip ; dxi1 ^    ^ dxip i
iu FðvÞ ¼ Fðu; vÞ ½16 ¼ gi1 i1    gip ip ½26
Abelian and Nonabelian Gauge Theories Using Differential Forms 143

The Hodge dual has the property that to the scalar product ( , ). Whereas the differential
  operator d maps p-forms into (p þ 1)-forms, the
 dxð1Þ ^    ^ dxðpÞ codifferential operator d maps p-forms into (p  1)-
¼ gð1Þð1Þ    gðpÞðpÞ ðsign Þ forms.
  The relation d2 = 0 leads to
 dxðpþ1Þ ^    ^ dxðnÞ ½27
ðd Þ2 / ðdÞðdÞ / d2  ¼ 0 ½35
where  is a permutation of the indices (1, . . . , n), This fact plays an essential role in connection with
(1) <    < (p), and (p þ 1) <    < (n). We also the conservation laws.
have Finally, we want to obtain a coordinate expres-
  sion for d . Indeed d  = Div  for
 dxðpþ1Þ ^    ^ dxðnÞ
@Kj
¼ gðpþ1Þðpþ1Þ    gðnÞðnÞ ð1ÞpðnpÞ ðsign Þ ðDivÞK ¼ ½36
  @xj
 dxð1Þ ^    ^ dxðpÞ ½28 where K is the multi-index of the coeffecients in
 = K dxK , and K indicates that K = (k1 , . . . , kp ) is in
We therefore find that the application of the the order k1 <    < kp . We will show that
Hodge dual to a p-form twice yields (, d ) = (, Div) for an arbitrary (p  1)-form
  . It is a fact that
  dxð1Þ ^   ^dxðpÞ Z
 
¼ gð1Þð1Þ   gðpÞðpÞ ðsignÞ  dxðpþ1Þ ^  ^ dxðnÞ ð; d Þ ¼ ðd; Þ ¼ ðdÞI I  1

½37

¼ gð1Þð1Þ   gðnÞðnÞ ð1ÞpðnpÞ dxð1Þ ^  ^ dxðpÞ ½29 Now we have the coordinate expressions
or d ¼ ðdL Þ ^ dxL ½38
pðnpÞ Ind g
 ¼ ð1Þ ð1Þ Id ½30 and (dxL )K = KL . It follows that
where Ind g is the number of times (1) occurs along jK @L L
the diagonal of g. ðdÞI ¼ ðdL ^ dxL ÞI ¼ I ½39
@xj K
Now let  be a (p  1)-form, and  a p-form.
Then d   is an (n  p þ 1)-form, and or
jK @K
dð^  Þ ¼ d ^   þ ð1Þp1  ^ d   ðdÞI ¼ I ½40
@xj
ðp1Þ ðnpþ1Þðp1Þ
¼ d ^   þ ð1Þ ð1Þ Here we use
Indg
 ð1Þ ^ ðÞd  
ð ^ ÞI ¼ IKL K L ½41
nðp1Þ Indg
¼ d ^   þ ð1Þ ð1Þ
where
 ^  ðd  Þ ½31
8
We then have >
> 1 if (KL) is an even
>
>
Z >
> permutation of I
>
<
ðd; Þ  ð; d Þ ¼ dð ^ Þ ½32 IKL ¼ 1 ½42
M > if (KL) is an odd
>
>
>
> permutation of I
with >
>
: 0 otherwise
d ¼ ð1Þnðp1Þ ð1ÞInd g  d  ½33
Use of the Leibnitz rule yields
We are here using the scalar product of two p-forms Z Z
Z I jK @K I
ðdÞI   1 ¼ I  1
ð; Þ :¼ ð ^  Þ ½34 @xj
M Z @ð jK  I Þ
I K
With the help of Stokes’ theorem the last integral in ¼ 1
@xj
eqn [32] may be turned into a surface term at Z I
infinity, which vanishes for  and  with compact jK @
  K I 1 ½43
support. d is the adjoint operator to d with respect @xj
144 Abelian and Nonabelian Gauge Theories Using Differential Forms

The first term corresponds to a surface integration We apply again the Hodge dual:
jK
and we can neglect it. We then have I I =  jK from  
the antisymmetry of , so that @Ex
d  F ¼ ðdiv EÞdt þ ðcurl BÞx  dx
Z @t
@jK  
ð; d Þ ¼  K  1 ¼ ð; DivÞ ½44 @Ey
@xj þ ðcurl BÞy  dy
@t
 
@Ez
þ ðcurl BÞz  dz ½53
@t
The Maxwell Equations
In Minkowski space the expression d equals the
The Maxwell equations become remarkably concise codifferential. Therefore, the equation d F = d 
when expressed in terms of differential forms, namely F =  j holds, with j given by j = (
, J), which is
equivalent to
dF ¼ 0; d F ¼ j ½45
@E
where F is the field strength and j is the current div E ¼
; curl B  ¼J ½54
@t
density. We wish to demonstrate this. We use a
(3 þ 1)-separation of the exterior derivative into a the inhomogeneous Maxwell equations.
timelike and a spacelike part:
@ Current Conservation
d ¼ d þ dt ^ ½46
@t
The electromagnetic 4-current is
We then get
  j ¼
0 u ¼ ð
0 ;
0 vÞ ¼ ð
; JÞ ½55
@B
dF ¼ dE þ ^ dt þ dB ¼ 0 ½47 where
is the charge density and J the current
@t
density. This corresponds to a 1-form
By comparing coefficients, we arrive at
j ¼
dt  Jx dx  Jy dy  Jz dz ½56
@B
dE ¼  ; dB ¼ 0 ½48 The Hodge dual is j = 3  j2 ^ dt, with the 3-form
dt
3 =
dx ^ dy ^ dz, and the 2-form
In vector notation
j2 ¼ Jx dy ^ dz  Jy dz ^ dx  Jz dx ^ dy ½57
@B
curl E ¼  ; div B ¼ 0 ½49
@t From the Maxwell equation d F = j, it follows
the usual form of the homogeneous Maxwell that
equations.
By direct application of the formula [27], one finds ðd Þ2 F ¼ d j ¼ 0 ½58

F ¼  ?B ^ dt þ ?E ½50 that is

where ? means the Hodge dual in three space dðjÞ ¼ dð3  j2 ^ dtÞ ¼ ðd3  dj2 ^ dtÞ
 
dimensions. One finds @

  ¼ þ div J dt ^ dx ^ dy ^ dz
@t
@?E
dF ¼d?E d?B ^ dt ½51 @

@t ¼ þ div J ¼ 0 ½59
@t
Therefore,
This is the ‘‘continuity equation.’’ R
d  F ¼ ðdiv EÞdx ^ dy ^ dz The total charge inside a volume V is Q = V
dV,
  therefore
x @Ex
þ ðcurl BÞ  dy ^ dz ^ dt Z Z
dt dQ d
   ¼
dV ¼ J  n dS ½60
@Ey dt dt V
þ ðcurl BÞy  dz ^ dx ^ dt @V
dt
  where @V is the surface which encloses the
@Ez volume V, dS is the surface element, and n is the normal
þ ðcurl BÞz  dx ^ dy ^ dt ½52
dt vector to this surface. This is current conservation.
Abelian and Nonabelian Gauge Theories Using Differential Forms 145

The Gauge Potential of the form g = exp {i(x)}, with g an element of the
abelian gauge group G = U(1). The free action is
The ‘‘Poincaré lemma’’ tells us that dF = 0 implies
F = dA, with the 4-potential A: Z
S0 ¼ L 0 d 4 x ½69
A ¼ dt þ A ½61
and the vector potential A = Ax dx þ Ay dy þ Az dz. with
From  
  L0 ¼  i  @  m ½70
@
F ¼ E ^ dt þ B ¼ d þ dt ^ A
@t the ‘‘Lagrange density.’’ This action is not invariant
@A under gauge transformations:
¼ d ^ dt þ dA þ dt ^ ½62
@t  
L0 ! L00 ¼  i  @  m  ð@ Þ   ½71
it follows by comparing coefficients that
The undesired term can be compensated by the
@A introduction of a gauge potential ! in a covariant
E ¼ d  ; B ¼ dA ½63
@t derivative of ,
In vector notation this is
D ¼ ðd þ !Þ ½72
@A
E ¼ grad  ; B ¼ curl A ½64 which has the desired transformation property
@t
D ! exp {i}D when besides the transformation
The 4-potential is determined up to a gauge function :
(x) ! exp {i(x)} (x) of the matter field the gauge
A0 ¼ A þ d ½65 potential simultaneously transforms according to the
gauge transformation ! ! !  id. The new Lagrange
This gauge freedom has no influence on the density is
observable quantities E and B:
 
L ¼  i  D  m ¼ L0 þ i! ðxÞ  ðxÞ ½73
F0 ¼ dA0 ¼ dA þ d2  ¼ dA ¼ F ½66
2 The substitution @ ! D is known to physicists;
The Laplace operator is 4 = (d þ d) = dd þ

 with ! =  iqA it is the ansatz of minimal coupling
d d, so when the 4-potential A fulfills the condition
for taking into account electromagnetic effects:
d A = 0, we have
@ ! @  iqA . The Lagrange density becomes in
4A ¼ d dA ¼ d F ¼ j ½67 this notation L = L0  A J , where J = q   .
The Lagrange density must now be completed by
the ‘‘classical wave equation.’’ The condition a kinetic term for the gauge potential and we get the
d A = 0 is called the ‘‘Lorentz gauge condition.’’ complete electromagnetic Lagrange density
This condition can always be fulfilled by using the
gauge freedom: d (A þ d) = 0 is fulfilled when L ¼ L0  A J  14 F F ½74
d d = 4 = d A, where we have used the fact
that d  = 0 for functions. That is to say, d A = 0 is with F = @ A  @ A . In the action this corre-
fulfilled when  is a solution of the inhomogeneous sponds to
wave equation. Z Z
1
S ¼ S0  A J vol4  F F vol4 ½75
M 4 M
Gauge Invariance
In quantum mechanics, the electron is described by a We get the field equations for the potential A by
wave function which is determined up to a free demanding that the variation of the action vanishes:
phase. Indeed, at every point in space this phase can Z Z
1 4
be chosen arbitrarily: S½A ¼  
A J vol  F F vol4 ½76
M 4 M
0
ðxÞ ! ðxÞ ¼ expfiðxÞg ðxÞ
ðxÞ ! 0 ðxÞ ¼ ðxÞ expfiðxÞg ½68 We write now
Z
with the only condition being that (x) is a A J vol4 ¼ ð A; jÞ ½77
continuous function. The gauge transformation is M
146 Abelian and Nonabelian Gauge Theories Using Differential Forms

and The Dirac Lagrangian is not invariant with


Z respect to local gauge transformations:
1
F F vol4
4 M  
Z L0 ¼  i  @  m ! L00
1 1  
¼ F ^  F ¼ ðF; FÞ
2 M 2 ¼ L0 þ i   g@ g1 ½87
¼ ð dA; FÞ ¼ ðd A; FÞ ¼ ð A; d FÞ ½78 We introduce the gauge potential
where we have exchanged the action of and d. ! ðxÞ ¼ !a ðxÞTa ½88
Since this holds for arbitrary variations A we find
with a gauge transformation
d F ¼ j ½79
! ! !0 ¼ g1 ! g þ g1 @ g ½89
the inhomogeneous Maxwell equation.
The Lagrange density is modified through a covar-
iant derivative:
Nonabelian Gauge Theories @ ! D ¼ @ þ !  ½90
In SU(N) gauge theory the elementary particles are The covariant derivative D transforms according to
taken to be members of symmetry multiplets. For
example, in electroweak theory the left-handed D ! D 0 ¼ g1 D g ½91
electron and the neutrino are members of an SU(2) and thus the modified Lagrange density
doublet:  
  L ¼  i  D  m ¼ L0 þ i   ! ½92
e
¼ ½80 is invariant with respect to local gauge transformations.

The extra term in the Langrange density is
A gauge transformation is conventionally written
0
ðxÞ ¼ g1 ðxÞ ðxÞ; 0 ðxÞ ¼ ðxÞgðxÞ ½81 Ja Aa ½93

with with
Aa ¼ iq!a ½94
gðxÞ ¼ exp fðxÞg ½82
and
where g(x) is an element of the Lie group SU(2) and
 is an element of the Lie algebra su(2). The Lie Ja ¼   Ta ½95
algebra is a vector space, and its elements may be
In mathematical terminology ! is called a connec-
expanded in terms of a basis:
tion. The quantity A is the physicist’s gauge
ðxÞ ¼ a ðxÞTa ½83 potential. The connection is anti-Hermitian and the
gauge potential Hermitian. The gauge potential also
For su(2) the basis elements are traceless and anti- includes the coupling constant q. We will refer to
Hermitian (see below), they are conventionally both ! and A as the gauge potential, where the
expressed in terms of the Pauli matrices, relation between them is given by eqn [94].
We can write the gauge potential as A = Aa dx Ta
a
Ta ¼ ½84 or, in the SU(2) case, as
2i
A ¼ A1 T1 þ A2 T2 þ A3 T3 ½96
with
    where we see explicitly that it involves three vector
0 1 0 i
1 ¼ ; 2 ¼ fields, which couple to the electroweak currents [95]
1 0 i 0 with the single coupling constant q, and which will
  ½85
1 0 become after symmetry breaking the three vector
3 ¼ bosons Wþ , W , Z0 of the electroweak gauge theory.
0 1
Actually, a mix of the neutral gauge boson and the
They are conventionally normalized according to photon will combine to yield the Z0 boson, while the
orthogonal mixture gives rise to the electromagnetic
trðTa Tb Þ ¼  12 ab ½86 interaction, in an SU(2)  U(1) theory. At this stage,
Abelian and Nonabelian Gauge Theories Using Differential Forms 147

the gauge bosons are all massless, their masses are The Gauge Potential and the
generated by the ‘‘Higgs’ mechanism.’’ Field Strength
The generalization of the abelian relationship
between the gauge potential and the field strength,
Lie-Algebra-Valued p-Forms F = dA, is

To describe nonabelian fields, we need Lie-algebra- ¼ d! þ 12 ½!; ! ¼ d þ ! ^ ! ½107


valued p-forms:
where because ! is a 1-form we can use eqn [106].
¼ Ta a ½97 The mathematician refers to as the curvature. The
physicist writes, in analogy to eqn [94],
where Ta is a generator of the Lie algebra, the index
a runs over the number of generators of the Lie F ¼ i q ¼ 12 F
a
dx ^ dx Ta ½108
algebra, and the a are the usual scalar-valued
p-forms. The composition in a Lie algebra is a Lie One obtains for the components
bracket, which is defined for two Lie-algebra-valued
a
p-forms by F ¼ @ Aa  @ Aa  iqfbc
a b c
A A ½109

½ ;  :¼ ½Ta ; Tb  a ^ b
½98 A generalization of the gauge transformation of
A, that is, A0 = A þ d, is eqn [89]:
The Lie bracket in the algebra is
c !0 ¼ g1 !g þ g1 dg ½110
½Ta ; Tb  ¼ fab Tc ½99
a
A quantity with the transformation property
where fbc are the structure constants. It follows from
this that 0 ¼ g1 g ½111
a
½ ;  ¼ ½Ta ; Tb  ^ b ¼ ½Tb ; Ta  a
^ b ½100 is called a ‘‘tensorial’’ quantity. The gauge potential
! is according to this definition nontensorial.
or Nevertheless the field strength is tensorial. Indeed
½ ;  ¼ ð1Þpqþ1 ½ ;  ½101 0 ¼ dðg1 !gÞ þ ðdg1 Þ ^ dg

when is a p-form and is a q-form. In the special þ 12 ½g1 !g þ g1 dg; g1 !g þ g1 dg
case that Ta is a matrix, also the product Ta Tb is ¼ ðdg1 Þ ^ !g þ g1 d!g  g1 ! ^ dg þ ðdg1 Þ ^ dg
defined, and from this the product of two Lie- þ 12 g1 ½!; !g þ 12 ½g1 !g; g1 dg
algebra-valued p-forms þ 12 ½g1 dg; g1 !g þ 12 ½g1 dg; g1 dg
^ ¼ Ta a ^ Tb b
¼ Ta Tb a ^ b
½102 ¼ g1 g þ ðdg1 Þ ^ !g  g1 ! ^ dg þ ðdg1 Þ ^ dg
þ g1 ! ^ dg þ g1 dg ^ g1 !g þ g1 dg ^ g1 dg
Now the Lie bracket is a commutator:
¼ g1 g ½112
½Ta ; Tb  ¼ Ta Tb  Tb Ta ½103
where we have used the derivation of the relation
and g1 g = Id to get

½ ;  ¼ ½Ta ; Tb  a ^ b
dg1 ¼ g1 dg g1 ½113
¼ Ta a ^ Tb b
 ð1Þpq Tb b
^ Ta a In the abelian case, we had dF = 0. The non-
¼ ^  ð1Þ pq
^ ½104 abelian analog is

From this relation it follows that for and odd d ¼ d! ^ !  ! ^ d!


p-forms ¼ ð  ! ^ !Þ ^ !  ! ^ ð  ! ^ !Þ
½ ;  ¼ ^ þ ^ ½105 ¼ ^!  !^ ½114

For an odd p-form or

½ ;  ¼ ^ þ ^ ¼ 2ð ^ Þ ½106 d þ ! ^  ^ ! ¼ 0 ½115
148 Abelian and Nonabelian Gauge Theories Using Differential Forms

the Bianchi identity. It can also be written as The scalar product is invariant under the action of
G on G: for g 2 G
d þ ! ^  ^ ! ¼ d þ ½!;  ¼ 0 ½116
h gXg1 ; gYg1 i ¼ tr ðgXYg1 Þ
because from eqn [104]
¼ trðX; YÞ ¼ hX; Yi ½126
21
! ^ þ ð1Þ ^ ! ¼ ½!;  ½117
or for X, Y, Z 2 G
The covariant derivative D is defined as
hetX Y etX ; etX ZetX i¼ hY; Zi ½127
D :¼ d þ ½!;  ½118 We take the derivative of this equation with respect
for a tensorial quantity. The covariant derivative to t at the value t = 0 and get:
takes tensorial p-forms into tensorial (p þ 1)-forms: h½X; Y; Zi þ hY; ½X; Zi ¼ 0 ½128
D0 0 ¼ dðg1 gÞ þ ½g1 !g þ g1 dg; g1 g We define an action of the algebra G on itself:
1 1 p 1 ad(X): G ! G
¼ dg ^ g þ g d g þ ð1Þ g ^ dg
þ ½g1 !g; g1 g þ ½g1 dg; g1 g adðXÞY ¼ ½X; Y ½129
p 1
¼ g1 D g þ dg1 ^ g þ ð1Þ g ^ dg We can then formulate our conclusion as follows:
the action of G on itself is anti-Hermitian:
þ g1 dgg1 ^ g  ð1Þp g1 ^ dg
¼ g1 D g ½119 hadðXÞY; Z i ¼  hY; adðXÞZi ½130
or
We have thereby verified the transformation prop-
erty of eqn [91]. ½adðXÞy ¼ adðXÞ ½131

From gy g = 1 we have jdet (g)j2 = 1. For the gauge


The Gauge Group group G = SU(N) we require in addition det (g) = 1.
From the gauge transformation 0 = g the require- Since
ment j 0 j2 = j j2 leads to gy g = 1. That means that g detðgÞ ¼ detðexpðXÞÞ ¼ expðtrðXÞÞ ½132
belongs to the unitary Lie group G = U(n), whose
gT = g1 . For elements of the Lie
elements fulfill gy =  the elements X 2 su(N) must be traceless. A basis of
algebra G = u(n) this implies the vector space of traceless, anti-Hermitian (2  2)
matrices is given by the Pauli matrices, eqn [85].
 y T
eX ¼ eX ¼ eX ½120

or
The Yang–Mills Action
y  T ¼ X
X ¼X ½121 The SU(2) Yang–Mills action is, in analogy to the
 is complex conjugation and XT means abelian case,
where X Z Z
transposition. 1 4 1
S¼ 2 a a
F F vol ¼ 2 trðF F Þvol4
For elements of the Lie algebra we can define a 4q M  2q M
scalar product (the Killing metric) Z
1
¼ 2 trðF ^  FÞ ½133
2q M
hX; Yi :¼ tr ðXYÞ ¼ X  X  ½122
We have included the trace in our definition of the
The scalar product is real:
scalar product:
 Yi
 ¼X
  Y
   ¼ X  X  ¼ hX; Yi Z Z
hX; ½123 I n
ð ; Þ:¼  tr < I > vol ¼  trð ^  Þ ½134
M M
symmetric:
We then write eqn [133] as
hX; Yi ¼ trðX; YÞ ¼ trðY; XÞ ¼ hY; Xi ½124
S½! ¼ 12 ð ; Þ ½135
and positive definite:
 taking into account the relation between and the

hX; Xi ¼ X  X  ¼ X  X  ¼ jX  j2 ½125 field strength F, and indicating the dependence on
Abelian and Nonabelian Gauge Theories Using Differential Forms 149

the gauge potential. Since is tensorial the action is The first term in the last expression is
invariant. Z
Now we calculate the variation von S[!] with 
ðd !; Þ ¼ ð !; d Þ ¼ tr ! fd g vol4 ½145
respect to a variation of the gauge potential: M

d 1 The second term can be computed using


S½! ¼ S½!ðtÞjt¼0 ¼ ð ; Þ
dt 2 ½!; ! ¼ f! ^ ! þ ! ^ !gð@ ; @ Þ
1
¼ ðð ; Þ þ ð ; ÞÞ ¼ ! !  ! ! þ ! !  ! ! ½146
2    
1 and hence
¼ ð ; Þ ¼ d! þ ½!; ! ;
2
  ½!; !  ¼ 2½! ; !   ½147
1 1
¼ d! þ ½ !; ! þ ½!; !;
2 2 because is antisymmetric,  =   . Thus,
¼ ðd ! þ ½!; !; Þ ½136 Z
ð½!; !; Þ ¼  trð½!; ! ^  Þ
where we have exchanged the order of and d. We M
Z
remark that although ! is not a tensorial section, ! is: 1
¼ trð½!; !  Þvol4
for !01 = g1 !1 g þ g1 dg and !02 = g1 !2 g þ g1 dg is 2 M
Z
! ¼ !01  !02 ¼ g1 ð!1  !2 Þg ½137 ¼ trð½! ; !   Þvol4
Z M
The quantity is in any case tensorial. Therefore,
the covariant derivative is defined, and we have ¼ h½! ; ! ;  ivol4 ½148
M
D ! ¼ d ! þ ½!; ! ½138 where h , i is the scalar product in G. From eqn [128]
and this equals
Z
D ¼ d þ ½!;  ½139  h ! ; ½! ;  ivol4
M
In general, the action of the covariant derivative on Z
tensorial quantities can be written as D = d þ ad(!), ¼ trð ! ½! ;  Þvol4 ½149
M
where ad(X) is the representation of the Lie algebra on
itself introduced in the previous section. We now have Combining this with eqn [144] gives
 Z
S½! ¼ ðD !; Þ ¼ ð !; D Þ ¼ 0 ½140
ð !; D Þ ¼  trð ! fðd Þ  ½! ;  gÞvol4
for an arbitrary variation !. Therefore, D = 0. M

We have obtained ¼ ð !; fðd Þ  ½! ;  gÞ ½150

D ¼ 0 ½141 We can now insert the coordinate expression for


the ‘‘Yang–Mills equations,’’ and ðd Þ ¼ @  ½151
D ¼ 0 ½142 Finally, the coordinate expressions of the Yang–
the ‘‘Bianchi identites.’’ These are the generalizations Mills equations D = 0 are
of the Maxwell equations d F = 0 and dF = 0 in the ðD Þ ¼ f@  þ ½! ;  g ¼ 0 ½152
absence of external sources. For the general case of
interacting fermions, we write out the full action, in
analogy to eqn [74], and obtain, in analogy to eqns
The Analogy with Electromagnetism
[79] and [58],
The Yang–Mills equation and the Bianchi identity in
D ¼ J; D J ¼ 0 ½143
the absence of external sources are
We shall now derive, again for the pure gauge
@ F  iq½A ; F  ¼ 0 ½153
sector, coordinate expressions for the Yang–Mills
equations. Consider the expression and

S½! ¼ ðD !; Þ ¼ ð !; D Þ @ F þ @ F þ @ F  iqf½A ; F 
¼ ðd ! þ ½!; !; Þ ½144 þ ½A ; F  þ ½A ; F g ¼ 0 ½154
150 Abelian and Nonabelian Gauge Theories Using Differential Forms

We shall write these equations in terms of the fields already by Cartan (1923). A modern presentation of
i0 i differential forms and the manifolds on which they
F ¼E; i ¼ 1; 2; 3 ½155 are defined is given in Abraham et al. (1983). A
recent treatment of electrodynamics in this approach
is Hehl and Obukhov (2003). Weyl’s argument is in
F12 ¼ B3 ; F31 ¼ B2 ; F12 ¼ B3 ½156 his paper of 1929.
where the E and B vectors may be thought of as Nonabelian gauge theories today explain the
‘‘electric’’ and ‘‘magnetic’’ fields, even though they have electromagnetic, the strong and weak nuclear
Lie-algebra indices, Fi0 = (Fa )i0 Ta , etc. In the context of interactions. The original paper is that of Yang
the SU(3) theory, they are referred to as the ‘‘chromo- and Mills (1954). Glashow, Salam, and Weinberg
electric’’ and ‘‘chromomagnetic’’ fields, respectively. (1980) saw the way to apply it to the weak
The Yang–Mills equations with  = 0 are interactions by using spontaneous symmetry
breaking to generate the masses through the use
@i Fi0  iq½Ai ; Fi0  ¼ 0 ½157 of the Higgs’ (1964) mechanism. t’Hooft and
with i = 1, 2, 3 a spatial index. In vector notation Veltman (1972) showed that the resulting quan-
this is tum field theory was renormalizable. The strong
interactions were recognized as the nonabelian
div E ¼ iqðA  E  E  AÞ ½158 gauge theory with gauge group SU(3) by Gell-
This is the analog of Gauss’s equation. Even though Mann (1972). For a modern treatment which puts
we started out without external sources, iq(A  E  nonabelian gauge theories in the context of
E  A) plays the role of a ‘‘charge density.’’ The differential geometry, see Frankel (1987).
Yang–Mills field E and the potential A combine to
See also: Dirac Fields in Gravitation and Nonabelian
act as a source for the Yang–Mills field. This is an Gauge Theory; Electroweak Theory; Measure on Loop
essential feature of nonabelian gauge theories in Spaces; Nonperturbative and Topological Aspects of
which they differ from the abelian case, due to the Gauge Theory; Quantum Electrodynamics and its
fact that the commutator [A, E] is nonvanishing. Precision Tests.
Now consider the Yang–Mills equations with a
spatial index  = i:
@0 Fi0 þ @j Fij  iq½A0 ; Fi0   iq½Aj Fij  ¼ 0 ½159 Further Reading
In vector notation this is Abraham A, Marsden J, and Ratiu T (1983) Manifolds, Tensor
Analysis, and Applications. MA: Addison-Wesley.
@E Cartan É (1923) On manifolds with an Affine Connection and the
curl B ¼  iqðA0 E  EA0 Þ Theory of General Relativity. English translation of the French
@t
original 1923/1924 (Bibliopolis, Napoli 1986).
þ iqðA  B þ B  AÞ ½160
Frankel T (1987) The Geometry of Physics, An Introduction.
replacing the Ampere–Maxwell law. Note that there Cambridge University Press.
Gell-Mann M (1972) Quarks: developments in the quark theory
are two extra contributions to the ‘‘current’’ other of hadrons. Acta Physica Austriaca Suppl. IV: 733.
than the displacement current. Glashow SL (1980) Towards a unified theory: threads in a
The analogs of the laws of Faraday and of the tapestry. Reviews of Modern Physics 52: 539.
absence of magnetic monopoles are derived similarly Hehl FW and Obukhov YN (2003) Foundations of Classical
from the Bianchi identities. The results are Electrodynamics. Boston: Birkhäuser.
Higgs PW (1964) Broken symmetries and the masses of gauge
@B bosons. Physical Review Letters 13: 508.
curl E þ ¼ iqfðA  E þ E  AÞ þ ðA0 B  BA0 Þg ½161 t’Hooft G and Veltman M (1972) Regularization and renorma-
@t
lization of gauge fields. Nuclear Physics B 44: 189.
and Poincaré H (1953) Oeuvre. Paris: Gauthier-Villars.
Salam A (1980) Gauge unification of fundamental forces. Reviews
div B ¼ iqðA  B  B  AÞ ½162 of Modern Physics 52: 525.
Weinberg SM (1980) Conceptual foundations of the unified
theory of weak and electromagnetic interactions. Reviews of
Further Remarks Modern Physics 52: 515.
Weyl H (1929) Elektron und gravitation. Zeitschrift fuer Physik
The foundations of the mathematics of differential 56: 330.
forms were laid down by Poincaré (1953). They Yang CN and Mills RL (1954) Construction of isotopic spin and
were applied to the description of electrodynamics isotopic gauge invariance. Physical Review 96: 191.
Abelian Higgs Vortices 151

Abelian Higgs Vortices


J M Speight, University of Leeds, Leeds, UK We sometimes use polar coordinates in the spatial
ª 2006 Elsevier Ltd. All rights reserved. plane, x = r( cos , sin ), and sometimes a complex
coordinate z = x1 þ ix2 = rei . Occasionally, it is
convenient to think of R2þ1 as a subspace of R3þ1
Introduction and denote by k the unit vector in the (fictitious)
third spatial direction. The complex scalar Higgs
For the purpose of this article, vortices are topological field is denoted , and the electromagnetic gauge
solitons arising in field theories in (2 þ 1)-dimensional potential A , best thought of as the components of a
spacetime when a complex-valued field  is allowed to 1-form A = A dx . F = @ A  @ A is the field
acquire winding at infinity, meaning that the phase of strength tensor which, in R 2þ1 , has only three
(t, x), as x traverses a large circle in the spatial plane, independent components, identified with the mag-
changes by 2n, where n is a nonzero integer. Such netic field B = F12 and electric field (E1 , E2 ) =
winding cannot be removed by any continuous (F01 , F02 ). The gauge-covariant derivative is D  =
deformation of  (hence ‘‘topological’’) and traps a @   ieA , e being the electric charge of the Higgs.
considerable amount of energy which tends to coalesce Under a U(1) gauge transformation,
into smooth, stable lumps with highly particle-like
characteristics (hence ‘‘solitons’’). Clearly, the universe  7! ei ; A 7! A þ e1 @  ½1
is (3 þ 1) dimensional. Nonetheless, planar field :R 2þ1
! R being any smooth function, F and
theories are of physical interest for two main reasons. jj remain invariant, while D  7! ei D . Only
First, the theory may arise by dimensional reduction of gauge-invariant quantities are physically observable
a (3 þ 1)-dimensional model under the assumption of (classically).
translation invariance in one direction. Vortices are With these conventions, the AHM has Lagrangian
then transverse slices through straight tube-like objects density
variously interpreted as magnetic flux tubes in a
superconductor or cosmic strings. Second, a crucial 1  
L¼ F F þ D D   ð 2  jj2 Þ2 ½2
ingredient of the standard model of particle physics is 4 2 8
spontaneous breaking of gauge symmetry by a Higgs which is manifestly gauge invariant. By rescaling
field. As well as endowing the fundamental gauge , A , x and the unit of action, we can (and
bosons and chiral fermions with mass, this mechanism henceforth will) assume that e =  =  = 1. The
can potentially generate various types of topological only parameter which cannot be scaled away is  > 0.
solitons (monopoles, strings, and domain walls) whose Its value greatly influences the model’s behavior.
structure and interactions one would like to under- The field equations, obtained by demanding that
stand. Vortices in (2 þ 1) dimensions are interesting in (x),RA (x) be a local extremal of the action
this regard because they arise in the simplest field S = Ld3 x, are
theory exhibiting the Higgs mechanism, the abelian
Higgs model (AHM). They are thus a useful theoret- 
D D  þ ð1  jj2 Þ ¼ 0
ical laboratory in which to test ideas which may 2 ½3
i   Þ ¼ 0
ultimately find application in more realistic theories. @  F þ ðD   D
2
This article describes the properties of abelian Higgs
vortices and explains how, using a mixture of This is a coupled set of nonlinear second-order PDEs.
numerical and analytical techniques, a good under- Of particular interest are solutions which have finite
standing of their dynamical interactions has been total energy. Energy is not a Lorentz-invariant
obtained. quantity. To define it we must choose an inertial
frame and, having broken Lorentz invariance, it is
convenient to work in a temporal gauge, for which
The Abelian Higgs Model A0  0 (which may beR obtained by a gauge transfor-
t
Throughout this article spacetime will be R2þ1 mation with (t, x) = 0 A0 (t0 , x) dt0 , after which only
endowed with the Minkowski metric with signature time-independent gauge transformations are per-
( þ ,  ,  ), and Cartesian coordinates x ,  = mitted). The potential energy of a field is then
0, 1, 2, with x0 = t (the speed of light c = 1). A Z  
1  2 2
spacetime point will be denoted x, its spatial part by E¼ B þ Di Di  þ ð1  jj Þ dx1 dx2
2
2 4
x = (x1 , x2 ). Latin indices j, k, . . . range over 1, 2, and
repeated indices (Latin or Greek) are summed over. ¼ Emag þ Egrad þ Eself ½4
152 Abelian Higgs Vortices

while its kinetic energy is const2 r2 as r ! 0. It is known that solutions to this


Z system, which we shall call n-vortices, exist for all
1  
Ekin ¼ j@0 Aj2 þ @0 @0  dx1 dx2 ½5 n, , though no explicit formulas for them are
2 known. They may be found numerically, and are
If , A satisfy the field equations then the total depicted in Figure 1. Note that
and a always rise
energy Etot = Ekin þ E is independent of t. By monotonically to their vacuum values, and B always
Derrick’s theorem, static solutions have Emag  falls monotonically to 0, as r increases. These
Eself (Manton and Sutcliffe 2004, pp. 82–87). solutions have their magnetic flux concentrated in a
Configurations with finite energy have quantized single, symmetric lump, a flux tube in the R3þ1
total magnetic flux. To see this, note that E finite picture. In contrast, the total energy density (inte-
implies jj ! 1 as r ! 1, so   ei (r, ) at large r for grand of E in [4]) is nonmonotonic for n  2, being
some real (in general, multivalued) function . The peaked on a ring whose radius grows with n. This is
winding number of  is its winding around a circle of a common feature of planar solitons.
large radius R, that is, the integer n = ( (R, 2)  The large r asymptotics of n-vortices are well
(R, 0))=2. Although the phase of  is clearly gauge understood. For   4 one may linearize [7] about
dependent, n is not, because to change this, a gauge
= 1, a = n, yielding
transformation ei : R 2 ! U(1) would itself need qn pffiffiffi
nonzero winding around the circle, contradicting
ðrÞ  1 þ K0 ð rÞ ½8
2
smoothness of ei . The model is invariant under
spatial reflexions, under which n 7! n, so we will mn
assume (unless noted otherwise) that n  0. Finite- aðrÞ  n þ rK1 ðrÞ ½9
2
ness of E also implies that D = d  iA ! 0, so
A  id=  d as r ! 1 (note  6¼ 0 for large r). where qn , mn are unknown constants and K
Hence, the total magnetic flux is denotes the modified Bessel’s function. For  > 4
Z I Z 2 linearization is no longer well justified, and the
2 asymptotic behaviour of
(though not a) is quite
B d x ¼ lim A ¼ lim @ d ¼ 2n ½6
R2 R!1 SR R!1 0 different (Manton and Sutcliffe 2004, pp. 174–175).
We shall not consider this rather extreme regime
where SR = {x : jxj = R} and we have used Stokes’s further. Note that
theorem. The above argument uses only generic rffiffiffiffiffi
properties of E, namely that finite Eself requires jj  r
K ðrÞ  e as r ! 1 ½10
to assume a nonzero constant value as r ! 1. So 2r
flux quantization is a robust feature of this type of
model. As presented, the argument is somewhat for all , so both
and a approach their vacuum
formal, but it can be made mathematically rigorous values exponentially
pffiffiffi fast, but with different decay
at the cost of gauge-fixing technicalities (Manton lengths: 1=  for
, 1 for a. This can be seen in
and Sutcliffe 2004, pp. 164–166). Note that if n 6¼ 0 Figure 1a. The constants qn and mn depend on  and
then, by continuity, (x) must vanish at some x 2 must be inferred by comparing the numerical
R2 , and one expects a lump of energy density to be solutions with [8], [9]; q = q1 and m = m1 will
associated with each such x since  = 0 maximizes receive a physical interpretation shortly.
the integrand of Eself . The 1-vortex (henceforth just ‘‘vortex’’) is stable for
all , but n-vortices with n  2 are unstable to break
up into n separate vortices if  > 1. We shall say that
Radially Symmetric Vortices the AHM is type I if  < 1, type II if  > 1, and
critically coupled if  = 1, based on this distinction. Let
The model supports static solutions within the En denote the energy of an n-vortex. Figure 2 shows
radially symmetric ansatz  =
(r)ein , A = a(r) d, the energy per vortex En =n plotted against n for
which reduces the field equations to a coupled pair  = 0.5, 1, and 2. It decreases with n for  = 0.5,
of nonlinear ODEs: indicating that it is energetically favorable for isolated
d2
1 d
1  vortices to coalesce into higher winding lumps. For
þ  ðn  aÞ2
þ ð1 
2 Þ
¼ 0  = 2, by contrast, En =n increases with n indicating
dr2 r dr r2 2 ½7
2 that it is energetically favorable for n-vortices to fission
d a 1 da into their constituent vortex parts. The case  = 1
 þ ðn  aÞ
2 ¼ 0
dr2 r dr balances between these behaviors: En =n is independent
Finite energy requires limr!1
(r) = 1, limr!1 a(r) = n of n. In fact, the energy of a collection of vortices is
while smoothness requires
(r)  const1 rn , a(r)  independent of their positions in this case.
Abelian Higgs Vortices 153

1 0.5

0.8 0.4

0.6 0.3
σ, a

B
0.4 0.2
n=1 n=5
0.2 0.1

0 0
0 2 4 6 8 0 2 4 6 8
r r

(a) (b)

0.6
n=1
0.5
Energy density

0.4

0.3

0.2 n=5

0.1

0
0 2 4 6 8
r

(c)
Figure 1 Static, radially symmetric n-vortices: (a) the 1-vortex profile functions
(r ) (solid curve) and a(r ) (dashed curve) for  = 2, 1,
and 1/2, left to right; (b) the magnetic field B; and (c) the energy density of n-vortices, n = 1 to 5, left to right, for  = 1.

1.4 Interaction Energy


1.3 λ =2 A precise understanding of the type I/II dichotomy
can be obtained using the 2-vortex interaction
1.2 energy Eint (s) introduced by Jacobs and Rebbi. This
is defined to be the minimum of E over all n = 2
1.1
configurations for which (x) = 0 at some pair of
En /nπ

λ =1
1
points x1 , x2 distance s apart. One interprets x1 , x2
as the vortex positions. Eint can only depend on their
0.9 separation s = jx1  x2 j, by translation and rotation
λ=1 invariance. Figure 3 presents graphs of Eint (s)
2
0.8 generated by a lattice minimization algorithm. For
 < 1, vortices uniformly attract one another, so a
0.7
1 2 3 4 vortex pair has least energy when coincident. For
n  > 1, vortices uniformly repel, always lowering
Figure 2 The energy per unit winding En =n of radially their energy by moving further apart. The graph for
symmetric n-vortices for  = 1=2, 1, and 2.  = 1 would be a horizontal line, Eint (s) = 2.
154 Abelian Higgs Vortices

1.75
2.42

2.38
Eint

Eint
1.7

2.34

1.65 2.3
0 2 4 6 8 10 0 2 4 6 8 10
s s

(a) (b)
Figure 3 The 2-vortex interaction energy Eint (s) as a function of vortex separation (solid curve), in comparison with its asymptotic
1
form Eint (s) (dashed curve) for (a)  = 1=2 and (b)  = 2.

The large s behavior of Eint (s) is known, and can q2 pffiffiffi


be understood in two ways (Manton and Sutcliffe Eint ðsÞ  E1
int ðsÞ ¼ 2E1  K0 ð sÞ
2
2004, pp. 177–181). Speight, adapting ideas of m2
Manton on asymptotic monopole interactions, þ K0 ðsÞ ½15
2
observed that, in the real  gauge ( 7! ei ,
A 7! A  d), the difference between the vortex and
the vacuum  = 1, A = 0 at large r, Bettencourt and Rivers obtained the same formula
by a more direct superposition ansatz approach,
q pffiffiffi though they did not give the constants q, m a
¼1 K0 ð  rÞ ½11
2 physical interpretation.
The force between a well-separated vortex pair,
m Eint 0 (s), consists of the mutual attraction
pffiffiffi of
ðA0 ; AÞ  ð0; k  rK0 ðrÞÞ ½12 identical scalar monopoles, of range 1= , and the
2
mutual repulsion of identical magnetic dipoles, of
is identical to the solution of a linear Klein– range 1. If  < 1, scalar attraction dominates at
Gordon–Proca theory, large s so vortices attract. If  > 1, magnetic
repulsion dominates and they repel. If  = 1 then
ð@ @  þ Þ ¼ ; ð@ @  þ 1ÞA ¼ j ½13 q  m, as we shall see, so the forces cancel exactly.
Figure 3 shows both Eint and E1 int for  = 0.5, 2. The
in the presence of a composite point source, agreement is good for s large, but breaks down for
s < 4, as one expects. Vortices are not point
¼ q ðxÞ; ðj0 ; jÞ ¼ mð0; k  r ðxÞÞ ½14
particles, as in the linear model, and when they lie
located at the vortex position. Viewed from afar, close together the overlap of their cores produces
therefore, a vortex looks like a point particle significant effects.
carrying both a scalar monopole charge q and a The same method predicts the interaction energy
magnetic dipole moment m, a ‘‘point between an n1 -vortex and an n2 -vortex at large
pffiffiffi vortex,’’ separation. We just replace 2E1 by En1 þ En2 , q2 by
inducing a real scalar field of mass  (the Higgs
particle) and a vector boson field of mass 1 (the qn1 qn2 , and m2 by mn1 mn2 . In particular, an
‘‘photon’’). If physics is to be model independent, antivortex ((1)-vortex) has E1 = E1 , q1 = q1 = q,
therefore, the interaction energy of a pair of well- and m1 = m1 = m, so the interaction energy for
separated vortices should approach that of the a vortex–antivortex pair is
corresponding pair of point vortices as the separa-
tion grows. Computing the latter is an easy exercise q2 pffiffiffi m2
Ev
v
int ðsÞ  2E1  K0 ð rÞ  K0 ðrÞ ½16
in classical linear field theory, yielding 2 2
Abelian Higgs Vortices 155

which is uniformly attractive. It would be pleasing if capture when  = 1=2. Since type I vortices attract,
qn , mn could be deduced easily from q, m. One one might expect  to be always negative, indicating
might guess qn = jnjq, mn = nm, in analogy with that the vortices deflect towards one another. In
monopoles. Unfortunately, this is false: qn , mn fact, as Figure 5a shows, this happens only for small
grow approximately exponentially with jnj. v and large b. Another naive expectation is that
 = 0 or  = 180 when b = 0 (either vortices pass
through one another or ricochet backwards in a
Vortex Scattering head-on collision). In fact  = 90 , the only other
The AHM being Lorentz invariant, one can obtain possibility allowed by reflexion symmetry of the
time-dependent solutions wherein a single n-vortex initial data. Figure 6 depicts snapshots of such a
travels at constant velocity, with speed 0 < v < 1 scattering process at modest v. The vortices deform
and Etot = (1  v2 )1=2 En , by Lorentz boosting the each other as they get close until, at the moment of
static solutions described above. Of more dynamical coincidence, they are close to the static 2-vortex
interest are solutions in which two or more vortices ring. They then break apart along a line perpendi-
undergo relative motion. The simplest problem is cular to their line of approach. One may consider
vortex scattering. Two vortices, initially well sepa- them to have exchanged half-vortices, so that each
rated, are propelled towards one another. In the emergent vortex is a mixture of the incoming
center-of-mass (COM) frame they have, as t ! 1, vortices. This rather surprising phenomenon was
equal speed v, and approach one another along actually predicted by Ruback in advance of any
parallel lines distance b (the impact parameter) numerical simulations and turns out to be a generic
apart, see Figure 4. If b = 0, they approach head- feature of planar topological solitons.
on. Assuming they do not capture one another, they Consider now the type II case ( = 2, Figure 5b).
interact and, as t ! 1, recede along parallel straight Here,  > 0 for all v, b as one expects of particles
lines having been deflected through an angle  (the that repel each other. Head-on scattering is more
scattering angle). If scattering is elastic, the exit lines interesting now since two regimes emerge: for v >
also lie b apart and each vortex travels at speed v as vcrit
0.3, one has the surprising 90 scattering
t ! 1. The dependence of  on v, b, and  has already described, while for v < vcrit the vortices
been studied through lattice simulations by several bounce backwards,  = 180 . This is easily
authors, perhaps most comprehensively by Myers, explained. In order to undergo 90 head-on scatter-
Rebbi, and Strilka (1992). We shall now describe ing, the vortices must become coincident (otherwise
their results. reflexion symmetry is violated), hence must have
Note first that vortex scattering is actually initial energy at least E2 . For v < vcrit , where
inelastic: vortices recede with speed < v because
2E1
some of their initial kinetic energy is dispersed by pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ E2 ½17
the collision as small-amplitude traveling waves 1  vcrit 2
(‘‘radiation’’). This energy loss can be as high as they have too little energy, so come to a halt before
80% in very fast collisions at small b. At small v the coincidence, then recede from one another. The
energy loss is tiny, but can still have important solution vcrit of [17] depends on  and is plotted in
consequences for type I vortices: if v is very small, Figure 7. For v slightly above vcrit , we see that, in
they start with only just enough energy to escape contrast to the type I case, (b) is not monotonic:
their mutual attraction. In undergoing a small b maximum deflection occurs at nonzero b.
collision they can lose enough of this energy to The point vortex formalism yields a simple model
become trapped in an oscillating bound state. In this of type II vortex scattering which is remarkably
case they do not truly scatter and  is ill-defined. successful at small v. One writes down the Lagrangian
Myers et al. find that v  0.2 suffices to avoid for two identical (nonrelativistic) point particles of
mass E1 moving along trajectories x1 (t), x2 (t) under
the influence of the repulsive potential E1
int ,

Θ ν L ¼ 12 E1 ðjx_ 1 j2 þ jx_ 2 j2 Þ  E1
int ðjx1  x2 jÞ ½18
b
Energy and angular momentum conservation reduce
ν (v, b) to an integral over one variable (s = jx1  x2 j)
which is easily computed numerically. To illustrate,
Figure 5b shows the result for  = 2, v = 0.1
Figure 4 The geometry of vortex scattering. in comparison with the lattice simulations of
156 Abelian Higgs Vortices

180
100
160

140

50 120

100
Θ Θ
80

0 60

40

20

–50 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
b b

(a) (b)

90

80

70

60

50
Θ
40

30

20

10

0
0 1 2 3 4 5 6
b

(c)
Figure 5 The 2-vortex scattering angle  as a function of impact parameter b for v = 0.1 (5), v = 0.2 (4),
v = 0.3 (}), v = 0.4 (&), v = 0.5 (), and v = 0.9 (þ), as computed by Myers et al. (1992): (a)  = 1=2; (b)  = 2; (c)  = 1. The
dotted curves are merely guides to the eye.pffiffiffiThe solid curves in (b), (c) were computed using the point vortex model. Note that Myers
et al. use different normalizations, so b = 2bMRS and  = MRS =2.

Myers et al. The agreement is almost perfect. For (v, 0) = 90 for all v, just as in the large v type I
large v the approximation breaks down not only and type II cases. The point is that scalar attraction
because relativistic corrections become significant, and magnetic repulsion of vortices are mediated by
but also because small b collisions then probe the small fields with different Lorentz transformation proper-
jx1  x2 j region where vortex core overlap effects ties. While they cancel for static vortices, there is no
become important. For the same reason, the point reason to expect them to cancel for vortices in
vortex model is less useful for type I scattering. relative motion.
Here there is no repulsion to keep the vortices well
separated, so its validity is restricted to the small v,
Critical Coupling
large b regime.
Critical coupling is theoretically the most inter- The AHM with  = 1 has many remarkable proper-
esting regime, where most analytic progress has been ties, at which we have so far only hinted. These all
made. Since Eint  E1 int  0, one might expect vortex stem from Bogomol’nyi’s crucial observation
scattering to be trivial ((v, b)  0), but this is quite (Manton and Sutcliffe 2004, pp. 197–202) that the
wrong, as shown in Figure 5c. In particular, potential energy in this case can be rewritten as
Abelian Higgs Vortices 157

0.5

0.4

0.3

υcrit
0.2

0.1

0
1 2 3 4 5 6 7 8
λ
Figure 7 The critical velocity for 90 head-on scattering of type
II vortices vcrit as a function of , as predicted by equation [17]
(solid curve), in comparison with the results of Myers et al.
(1992), (crosses).

Z ( 2
1 1 2
E¼ B  ð1  jj Þ
2 2
 Z
þ jD1  þ iD2 j2 þ B d2 x  i 
dðDÞ ½19
R2

The last integral vanishes by Stokes’s theorem, so


E  n by flux quantization [6], and E = n if and
only if
ðD1 þ iD2 Þ ¼ 0 ½20
1
2 ð1  jj2 Þ ¼ B ½21
Note that system [20], [21] is first order, in contrast
to the second-order field equations [3]. No explicit
solutions of [20], [21] are known. However, Taubes
has proved that for each unordered list
[z1 , z2 , . . . , zn ] of n points in C, not necessarily
distinct, there exists a solution of [20], [21], unique
up to gauge transformations, with (z1 ) = (z2 ) =
= (zn ) = 0 and  nonvanishing elsewhere, the
zero at zr having the same multiplicity as zr has in
the list. Note that the list is unordered: a solution is
uniquely determined by the positions and multi-
plicities of the zeroes of , but the order in which we
label these is irrelevant. The solution minimizes E
within the class Cn of winding n configurations, so is
automatically a stable static solution of the model.
Equation [20] applied to the symmetric n-vortex,
 =
(r) ein , A = a(r) d implies a(r) = n  r
0 (r)=
(r).
Comparing with [8], [9], it follows that qn = mn
when  = 1 as previously claimed, since K1 = K00 .
Figure 6 Snapshots of the energy density during a head-on Tong has conjectured, based on a string duality
collision of vortices. This 90 scattering phenomenon is a argument, that q1 = 281=4 . This is consistent with
generic feature of planar topological soliton dynamics. current numerics but has no direct derivation so far.
158 Abelian Higgs Vortices

Taubes’s theorem shows that this n-vortex is just is approximately independent of v for v  0.5.
one point, corresponding to the list [0, 0, . . . , 0], in a Further, Stuart (1994) has proved that, for initial
2n-dimensional space of static multivortex solutions speeds of order , small, the fields stay (pointwise) 2
called the moduli space Mn . This space may be close to their geodesic approximant for times of
visualized as the flat, finite-dimensional valley order 1 .
bottom in Cn on which E attains its minimum On symmetry grounds, two vortex dynamics in
value, n. Points in Mn are in one-to-one correspon- the COM frame reduces to geodesic motion in M02 ffi
dence with distinct unordered lists [z1 , z2 , . . . , zn ], C, the subspace of centered 2-vortices (a1 = 0, so
which are themselves in one-to-one correspondence z1 = z2 ), with induced metric
with points in Cn , as follows. To each list, we assign
the unique monic polynomial whose roots are zr , 0 ¼ Gðja0 jÞda0 da0 ½25

pðzÞ ¼ ðz  z1 Þðz  z2 Þ ðz  zn Þ G being some positive function. Note that a0 = z1 z2 ,


n1 n so the intervortex distance jz1  z2 j = 2jz1 j = 2ja0 j1=2 .
¼ a0 þ a1 z þ þ an1 z þz ½22
The line a0 =  2 R, traversed with  increasing, say,
0
This polynomial is uniquely determined by its is geodesic inpMffiffiffiffiffi
2 .ffi The vortex positions (roots of
pffiffiffi
coefficients (a0 , a1 , . . . , an1 ) 2 Cn , which give good 2
z þ a0 ) are jj for   0 and i  for  > 0.
global coordinates on Mn ffi Cn . The zeros zr of  This describes perfectly the 90 scattering phenom-
may be used as local coordinates on Mn , away from enon: two vortices approach head-on along the x1
, the subset of Mn on which two or more of the axis, coincide to form a 2-vortex ring, then break
zeros zr coincide, but are not good global apart along the x2 axis, as in Figure 6. This behavior
coordinates. occurs because a0 = z1 z2 , rather than z1  z2 , is the
Let (, A)a denote the static solution correspond- correct global coordinate on M02 , since vortices are
ing to a 2 Cn . If the zeros zr are all at least s apart, classically indistinguishable.
Taubes showed the solution is just a linear super- Samols found a useful formula (Manton and
position of 1-vortices located at zr , up to corrections Sutcliffe 2004, pp. 205–215) for in terms of the
exponentially small in s. Imagine these constituent behavior of ja j close to its zeros, using which he
vortices are pushed with small initial velocities. devised an efficient numerical scheme to evaluate
Then ((t), A(t)) must remain close to the valley G(ja0 j), and computed (b) in detail, finding
bottom Mn , since departing from it costs kinetic excellent agreement with lattice simulations at low
energy, of which there is little. Manton has speeds. He also studied the quantum scattering of
suggested, therefore, that the dynamics is well vortices, approximating the quantum state by a
approximated by the constrained variational problem wave function  on Mn evolving according to the
wherein ((t), A(t))
R = (, RA)a(t) 2 Mn for all t. Since natural Schrödinger equation for quantum geodesic
the action S = L d3 x = (Ekin  E) dt, and E = n, motion,
constant, on Mn , this constrained problem amounts
@
to Lagrangian mechanics on configuration space Mn ih ¼  12 h2   ½26
with Lagrangian L = Ekin jMn . Now Ekin is real, @t
positive, and quadratic in time derivatives of , A, so where  is the Laplace–Beltrami operator on
1 _ _ (Mn , ). This technique, introduced for monopoles
L¼ 2 rs ðaÞar as ½23
by Gibbons and Manton, is now standard for
rs forming the entries of a positive-definite n  n solitons of Bogomol’nyi type.
Hermitian matrix ( sr  rs ). Since (, A)a is not By analyzing the forces between moving point
known explicitly, neither are rs (a). Observe, how- vortices at  = 1, Manton and Speight (2003)
ever, that L is the Lagrangian for geodesic motion in showed that, as the vortex separations become
Mn with respect to the Riemannian metric uniformly large, the metric on Mn approaches
¼ rs ðaÞdar d
as ½24 X q2 X
1 ¼  dzr dzr  K0 ðjzr  zs jÞ
Manton originally proposed this geodesic approx- r
4 s6¼r
imation for monopoles, but it is now standard for all
topological solitons of Bogomol’nyi type (where one  ðdzr  dzs Þðdzr  dzs Þ ½27
has a moduli space of static multisolitons saturating
a topological lower bound on E). Note that This formula can also be obtained by a method of
geodesics are independent of initial speed, which matched asymptotic expansions. We can use [27] to
agrees with Myers et al: Figure 5c shows that (v, b) study 2-vortex scattering for large b, when the
Abelian Higgs Vortices 159

vortices remain well separated. (Note that 1 is not reference frame (the rest frame of the superconduc-
positive definite if any jzr  zs j becomes too small.) tor) so it is unsurprising that the Lorentz-invariant
The results are good, provided v  0.5 and b  3 AHM is inappropriate. Insofar as vortices move at
(see Figure 5c). all, they seem to obey a first-order (in time)
dynamical system, in contrast to the second-order
AHM. Manton has devised a first-order system
Other Developments which may have relevance to superconductivity, by
replacing Ekin with a Chern–Simons–Schrödinger func-
The (critically coupled) AHM on a compact physical tional (Manton and Sutcliffe 2004, pp. 193–197).
space  is of considerable theoretical and physical Rather than attracting or repelling, vortices now
interest. Bradlow showed that Mn () is empty unless tend to orbit one another at constant separation.
V = Area()  4n, so there is a limit to how many There is again a moduli space approximation to
vortices a space of finite area can accommodate slow vortex dynamics for 
1, but it has a
(Manton and Sutcliffe 2004, pp. 227–230). Manton Hamiltonian-mechanical rather than Riemannian-
has analyzed the thermodynamics of a gas of geometric flavor.
vortices by studying the statistical mechanics of Finally, an interesting simplification of the AHM,
geodesic flow on Mn (). In this context, spatial which arises, for example, as a phenomenological
compactness is a technical device to allow nonzero model of liquid helium-4, is obtained if we discard the
vortex density n=V for finite n, without confining gauge field A , or equivalently set the electric charge of
the fields to a finite box, which would destroy the  to e = 0. There is now no type I/II classification, since
Bogomol’nyi properties. In the limit of interest,  may be absorbed by rescaling. The resulting model,
n, V ! 1 with n/V fixed, the thermodynamical which has only global U(1) phase symmetry, supports
properties turn out to depend on  only through n-vortices  =
(r)ein for all n, but these are not
V, so  = S2 and  = T 2 give equivalent results, for exponentially spatially localized,
example. The equation of state of the gas is
(P = pressure, T = temperature) n2 n2 ð8 þ n2 Þ

ðrÞ ¼ 1   þ Oðr6 Þ ½29
r2 22 r4
nT
P¼ ½28 and cannot have finite E by Derrick’s theorem. They
V  4n
are unstable for jnj > 1, and 1-vortices uniformly
which is similar, at low density n/V, to that of a gas repel one another. They can be given an interesting
of hard disks of area 2. The crucial step in deriving first-order dynamics (the Gross–Pitaevski equation).
[28] is to find the volume of Mn () which, despite
there being no formula for , may be computed
exactly by remarkable indirect arguments (Manton Abbreviations
and Sutcliffe 2004, pp. 231–234).
A electromagnetic gauge potential
The static AHM coincides with the Ginzburg–
b impact parameter
Landau model of superconductivity, which has
D gauge-covariant derivative
precisely the same type I/II classification. Here the E potential energy
‘‘Higgs’’ field represents the wave function of a Ekin kinetic energy
condensate of Cooper pairs, usually (but not always) F electromagnetic field strength tensor
electrons. There has been a parallel development of L Lagrangian
the static model by condensed matter theorists, L Lagrangian density
therefore; see Fossheim and Sudbo (2004), for S action
example. In fact the vortex was actually first  Higgs field
discovered by Abrikosov in the condensed matter  scattering angle
context. One important difference is that type I
superconductors do not support vortex solutions in
an external magnetic field Bext because the critical See also: Fractional Quantum Hall Effect;
Ginzburg–Landau Equation; High Tc Superconductor
jBext j required to create a single vortex is greater
Theory; Integrable Systems: Overview; Nonperturbative
than the critical jBext j required to destroy the
and Topological Aspects of Gauge Theory; Quantum
condensate completely (  0). Type II supercon- Fields with Topological Defects; Solitons and Other
ductors do support vortices, and there are such Extended Field Configurations; Symmetry Breaking in
superconductors with 
1, but the vortex Field Theory; Topological Defects and Their Homotopy
dynamics we have described is not relevant to these Classification; Variational Techniques for
systems. In this context there is an obvious preferred Ginzburg–Landau Energies.
160 Adiabatic Piston

Further Reading Manton NS and Sutcliffe PM (2004) Topological Solitons.


Cambridge: Cambridge University Press.
Atiyah M and Hitchin N (1988) The Geometry and Dynamics of Myers E, Rebbi C, and Strilka R (1992) Study of the interaction
Magnetic Monopoles. Princeton: Princeton University Press. and scattering of vortices in the abelian Higgs (or Ginzburg-
Fossheim K and Sudbo A (2004) Superconductivity: Physics and Landau) model. Physical Review 45: 1355–1364.
Applications. Hoboken NJ: Wiley. Rajaraman R (1989) Solitons and Instantons. Amsterdam: North-
Jaffe A and Taubes C (1980) Vortices and Monopoles: Structure Holland.
of Static Gauge Theories. Boston: Birkhäuser. Stuart D (1994) Dynamics of abelian Higgs vortices in the near
Nakahara M (1990) Geometry, Topology and Physics. Bristol: Bogomolny regime. Communications in Mathematical Physics
Adam-Hilger. 159: 51–91.
Manton NS and Speight JM (2003) Asymptotic interactions of Vilenkin A and Shellard EPS (1994) Cosmic Strings and Other
critically coupled vortices. Communications in Mathematical Topological Defects. Cambridge: Cambridge University Press.
Physics 236: 535–555.

Adiabatic Piston
Ch Gruber, Ecole Polytechnique Fédérale de question is to find the final state, that is, the final
Lausanne, Lausanne, Switzerland position Xf of the piston and the parameters (p
f , Tf )
A Lesne, Université P.-M. Curie, Paris VI, Paris, of the gases.
France In the late 1950s, using the two laws of
ª 2006 Elsevier Ltd. All rights reserved. equilibrium thermodynamics (i.e., thermostatics),
Landau and Lifshitz concluded that the adiabatic
piston will evolve toward a final state where
Introduction p =T  = pþ =T þ . Later, Callen (1963) and others
realized that the maximum entropy condition
Macroscopic Problem implies that the system will reach mechanical
þ
The ‘‘adiabatic piston’’ is an old problem of equilibrium where the pressures are equal p f = pf ;
thermodynamics which has had a long and con- however, nothing could be said concerning the final
troversial history. It is the simplest example con- position Xf or the final temperatures Tf which
cerning the time evolution of an adiabatic wall, that should depend explicitly on the viscosity of the
is, a wall which does not conduct heat. The system fluids. It thus became a controversial problem since
consists of a gas in a cylinder divided by an one was forced to accept that the two laws of
adiabatic wall (the piston). Initially, the piston is thermostatics are not sufficient to predict the final
held fixed by a clamp and the two gases are in state as soon as adiabatic movable walls are
thermal equilibrium characterized by (p , T , N ), involved (see early references in Gruber (1999)).
where the index /þ refers to the gas on the left/right Experimentally, the adiabatic piston was used
side of the piston and (p, T, N) denote the pressure, already before 1924 to measure the ratio cp =cv of
the temperature, and the number of particles the specific heats of gases. In 2000, new measure-
(Figure 1). Since the piston is adiabatic, the whole ments have shown that one has to distinguish
system remains in equilibrium even if T  6¼ T þ . At between two regimes, corresponding to weak damp-
time t = 0, the clamp is removed and the piston is let ing or strong damping, with very different proper-
free to move without any friction in the cylinder. The ties, for example, for weak damping the frequency
of oscillations corresponds to adiabatic oscillations,
whereas for strong damping it corresponds to
isothermal oscillations.

N– N+
Microscopic Problem
p– p+ A
T– T+ The ‘‘adiabatic piston’’ was first considered from a
microscopic point of view by Lebowitz who intro-
duced in 1959 a simple model to study heat
conduction. In this model, the gas consists of point
particles of mass m making purely elastic collisions
0 X L
on the wall of the cylinder and on the piston.
Figure 1 The adiabatic piston problem. Furthermore, the gas is very dilute so that the
Adiabatic Piston 161

equation of state p = nkB T is satisfied at equili- extension to hard-disk particles was analyzed at
brium, where n is the density of particles in the gas the same time by Kestemont et al. (2000). Recently,
and kB the Boltzmann constant. The adiabatic piston several other authors have contributed to this
is taken as a heavy particle of mass M  m without subject.
any internal degree of freedom. Using this same The general picture which emerges from all the
model Feynman (1965) gave a qualitative analysis in investigations is the following. For an infinite
Lectures in Physics. He argued intuitively but cylinder, starting with mechanical equilibrium
correctly that the system should converge first p = pþ = p, the piston evolves to a stationary
toward a state of mechanical equilibrium where stochastic state with nonzero velocity toward the
p = pþ and then very slowly toward thermal warmer side
equilibrium. This approach toward thermal equili- rffiffiffiffiffiffiffiffi
brium is associated with the ‘‘wiggles’’ of the piston m kB pffiffiffiffiffiffi ffi pffiffiffiffiffiffiffi m
hVi ¼ ð T þ  T Þ þ o ½1
induced by the random collisions with the atoms of M 8m M
the gas. Of course, this stochastic behavior is not
part of thermodynamics and the evolution beyond with relaxation time
the mechanical equilibrium cannot appear in the rffiffiffiffiffiffiffiffi  1
macroscopical framework assuming that the piston M kB 1 1 1
¼ pffiffiffiffiffiffiffi þ pffiffiffiffiffiffiffi ½2
does not conduct heat. A 8m p T Tþ
From a microscopical point of view, one is
where M=A is the mass per unit area of the piston.
confronted with two different problems: the
In this state the piston has a temperature
pffiffiffiffiffiffiffiffiffiffiffiffiffi
approach toward mechanical equilibrium in the
TP = T þ T  and there is a heat flux
absence of any a priori friction (where the entropy
of both gases should increase) and, on a different rffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi m 8kB m
timescale, the approach toward thermal equilibrium jQ ¼ ð T   T þ Þ pþo
(where the entropy of one gas should decrease but M m M
the total entropy increase). ðp ¼ pþ ¼ pÞ ½3
The conceptual difficulties of the problem beyond
For a finite cylinder and pþ 6¼ p , the evolution
mechanical equilibrium come from the following
proceeds in four different stages. The first two are
intuitive reasoning. When the piston moves toward
deterministic and adiabatic. They correspond to the
the hotter gas, the atoms of the hotter gas gain
thermodynamic evolution of the (macroscopic)
energy, whereas those of the cooler gas lose energy.
adiabatic piston. The last two stages, which go
When the piston moves toward the cooler side, it is
beyond thermodynamics, are stochastic with heat
the opposite. Since on an average the hotter side
transfer across the piston. More precisely:
should cool down and the cold side should warm
up, we are led to conclude that on an average the 1. In the first stage whose duration is the time
piston should move toward the colder side. On the needed for the shock wave to bounce back on the
other hand, from p = nkB T, the piston should move piston, the evolution corresponds to the case of
toward the warmer side to maintain pressure the infinite cylinder (with p 6¼ pþ ). If
balance. R = Nm=M > 10, the piston will be able to
In 1996, Crosignani, Di Porto, and Segev intro- reach and maintain a constant velocity
duced a kinetic model to obtain equations describing rffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi
kB T T þ m
the adiabatic approach toward mechanical equili-  þ
V ¼ ðp  p Þ pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi þ O
brium. Starting with the microscopical model 8m p T þ p T
þ   þ M
introduced by Lebowitz, Gruber, Piasecki, and for jp  pþ j  1 ½4
Frachebourg, later joined by Lesne and Pache,
initiated in 1998 a systematic investigation of the 2. In the second stage the evolution toward
adiabatic piston within the framework of statistical mechanical equilibrium is either weakly or
mechanics, together with a large number of numer- strongly damped depending on R. If R < 1, the
ical simulations. This analysis was based on the fact evolution is very weakly damped,
pffiffiffiffi the dynamics
that m=M is a very small parameter to investigate takes place on a timescale t0 = Rt, and the effect
expansions in powers of m=M (see Gruber and of the collisions on the piston is to introduce an
Piasecki (1999) and Gruber et al. (2003) and external potential (X) = c1 =X2 þ c2 =(L  X)2 .
reference therein). An approach using dynamical On the other hand, if R > 4, the evolution is
system methods was then developed by Lebowitz strongly damped (with two oscillations only) and
et al. (2000) and Chernov et al. (2002). An depends neither on M nor on R.
162 Adiabatic Piston

3. After mechanical equilibrium has been reached, independent of the transverse coordinates. We are
the third stage is a stochastic approach toward thus led to a formally one-dimensional problem
thermal equilibrium associated with heat transfer (except for normalizations). Therefore, in this
across the piston. This evolution is very slow and review, we consider that the particles are noninter-
exhibits a scaling property with respect to acting and all velocities are parallel to the x-axis.
t0 = mt=M. From the collision law, if v and V denote the
4. After thermal equilibrium has been reached velocities of a particle and the piston before a
(T  = T þ , p = pþ ), in a fourth stage the gas collision, then under the collision on the piston:
will evolve very slowly toward a state with
Maxwellian distribution of velocities, induced v ! v 0 ¼ 2V  v þ ðv  VÞ
½5
by the collision with the stochastic piston. V ! V 0 ¼ V þ ðv  VÞ
The general conclusion is thus that a wall which is where
adiabatic when fixed will become a heat conductor
under a stochastic motion. However, it should be 2m
¼ ½6
stressed that the time required to reach thermal Mþm
equilibrium will be several orders of magnitude larger Similarly, under a collision of a particle with the
than the age of the universe for a macroscopical piston boundary at x = 0 or x = L:
and such a wall could not reasonably be called a heat
conductor. However, for mesoscopic systems, the effect v ! v0 ¼ v ½7
of stochasticity may lead to very interesting properties,
Let us mention that more general models have also
as shown by Van den Broeck et al. (2004) in their
been considered, for example, the case where the
investigations of Brownian (or biological) motors.
two fluids are made of point particles with different
masses m , or two-dimensional models where the
particles are hard disks. However, no significant
Microscopical Model
differences appear in these more general models and
The system consists of two fluids separated by an we restrict this article to the simplest case.
‘‘adiabatic’’ piston inside a cylinder with x-axis, One can study different situations: L = 1, L
length L, and area A. The fluids are made of N  finite, and L ! 1. Furthermore, taking first M and
identical light particles of mass m. The piston is a A finite, one can investigate several limits.
heavy flat disk, without any internal degree of
1. Thermodynamic limit for the piston only. In
freedom, of mass M  m, orthogonal to the
this limit, L is fixed (finite or infinite) and
x-axis, and velocity parallel to this x-axis. If the
A ! 1, M ! 1, keeping constant the initial
piston is fixed at some position X0 , and if the two
densities n  of the fluid and the parameter
fluids are in thermal equilibrium characterized by
(p  
0 , T0 , N ), then they will remain in equilibrium 2mA A
forever even if T0þ 6¼ T0 : it is thus an ‘‘adiabatic ¼ ¼ A  2m ½8
Mþm M
piston’’ in the sense of thermodynamics. At a certain
time t = 0, the piston is let free to move and the If L is finite, this means that N  ! 1 while
problem is to study the time evolution. To define the keeping constant the parameters
dynamics, we consider that the system is purely 
Hamiltonian, that is, the particles and the piston mN  Mgas
R ¼ ¼ ½9
move without any friction according to the laws of M M
mechanics. In particular, the collisions between the
2. Thermodynamic limit for the whole system,
particles and the walls of the cylinder, or the piston,
where L ! 1 and A  L2 , N   L3 . In this
are purely elastic and the total energy of the system
limit, space and time variables are rescaled
is conserved. In most studies, one considers that the
according to x0 = x=L and t 0 = t=L. This limit
particles are point particles making purely elastic
can bepconsidered
ffiffiffiffi as a limiting case of (1) where
collisions. Since the piston is bound to move only in
R  A ! 1 (and time is scaled).
the x-direction, the velocity components of the
3. Continuum limit where L and M are fixed and
particles in the transverse directions play no role in
N  ! 1, m ! 0 keeping M  gas constant, that is,
this problem. Moreover, since there is no coupling
R = cte.
between the components in the x- and transverse
directions, one can simplify the model further by The case L infinite and the limit (1) have been
assuming that all probability distributions are investigated using statistical mechanics (Liouville or
Adiabatic Piston 163

Boltzmann’s equations). On the other hand, the where (v 0 , V 0 ) are given by eqn [5] and
limit (2) has been studied using dynamical system Z 1
methods, reducing first the system to a billiard in an  ðv; V; tÞ ¼ dX; P ðX; v ; X; V; tÞ ½14
surf
(N þ þ N  þ 1)-dimensional polyhedron. The limit 1
(3) has been introduced to derive hydrodynamical
We thus have to solve eqns [12]–[13] with initial
equations for the fluids.
conditions
In this article, we present the approach based on
statistical mechanics. Although not as rigorous as (2)  ðx; v; t ¼ 0Þ ¼ n 
0 ’0 ðvÞ ðxÞ ðX0  xÞ
on a mathematical level, it yields more informations
on the approach toward mechanical and thermal þ ðx; v; t ¼ 0Þ ¼ nþ þ
0 ’0 ðvÞ ðL  xÞ ðx  X0 Þ ½15
equilibrium. Moreover, it indicates what are the ðV; t ¼ 0Þ ¼
ðVÞ
open problems which should be mathematically
solved. In all investigations, advantage is taken of Using the fact that  = 2m=(M þ m)  1, we can
the fact that m/M is very small and one introduces rewrite eqn [13] as a formal series in powers of :
the small parameter  
pffiffiffiffiffiffiffiffiffiffiffi X1
ð1Þk  k1 @ k e
 ¼ m=M  1 ½10 @t ðV; tÞ ¼  Fkþ1 ðV; tÞ ½16
k¼1
k! @V
Let us note that  measures the ratio of thermal
Z 1
velocities for the piston and a fluid particle, whereas
   2 measures the ratio of velocity changes during
~k ðV; tÞ ¼
F ðv  VÞk 
surf ðv ; V; tÞdv
V
a collision. Z V
 ðv  VÞk þ
surf ðv ; V; tÞdv ½17
1

Starting Point: Exact Equations from which one obtains the equations for the
moments of the piston velocity:
Using the statistical point of view, the time evolution
is given by Liouville’s equation for the probability
1 dhV n i
distribution on the whole phase space for (Nþ þ
 dt
N  þ 1) particles, with L, A, N  , and M finite. Z
X n
n! 1
Initially (t  0), the piston is fixed at (X0 , V0 = 0) ¼ k1 ~kþ1 ðV; tÞ
dV V nk F ½18
and the fluids are in thermal equilibrium with k¼1
k!ðn  kÞ! 1
homogeneous densities n 0 , velocity distributions
’ 
0 (v) = ’0 (v), and temperatures
However, we do not know the two-point correlation
Z 1 functions.

T0 ¼ m dv n 
0 ’0 ðvÞv
2
½11 If the length of the cylinder is infinite, the
1 condition M  m implies that the probability for
Integrating out the irrelevant degrees of freedom, a particle to make more than one collision on the
the Liouville’s equation yields the equations for piston is negligible. Alternatively, one could choose
the distribution  (x, v; t) of the right and left initial distributions ’0 (v) which are zero for jvj <
particles: vmin , where vmin is taken such that the probability
of a recollision is strictly zero. Therefore, if L = 1,
@t   ðx; v; tÞ þ v@x   ðx; v ; tÞ ¼ I  ðx; v ; tÞ ½12 one can consider that before a collision on the

The collision term I (x, v; t) is a functional of piston the particles are distributed with ’ 0 (v) for
, P (X, v; X, V; t), the two-point correlation func- all t, and the two-point correlation functions
tion for a right (resp. left) particle at (x = X, v) and factorize, that is,
the piston at (X, V). Similarly, one obtains for the
 
surf ðv; V; tÞ ¼ surf ðv; tÞðV; tÞ; if v > V
velocity distribution of the piston: ½19
Z 1 þ
surf ðv; V; tÞ ¼ þ
surf ðv; tÞðV; tÞ; if v < V

@t ðV; tÞ ¼ A ðV  vÞ ðV  vÞ 0 0
surf ðv ; V ; tÞ
1 where for L = 1,   
surf (v; t) = n0 ’0 (v) and thus the
 conditions to obtain eqn [18] are satisfied.
þ ðv  VÞ surf ðv ; V; tÞ dv
Z 1 If L is finite, one can show that the factorization

A ðV  vÞ ðv  VÞþ 0 0
surf ðv ; V ; tÞ
property (eqn [19]) is an exact relation in the
1
 thermodynamic limit for the piston (A ! 1,
þ ðV  vÞþ surf ðv ; V ; tÞ dv ½13 M=A = cte). For finite L and finite A, we introduce
164 Adiabatic Piston

Assumption 1 (Factorization condition). Before a from which one obtains equations for dr =dt. In
collision the two-point correlation functions have the particular, using the identities
factorization property (eqn [19]) to first order in .
ðrþ1; Þ ðr; Þ ðrþ2; Þ ðr; Þ
Under the factorization condition, we have F3 ¼ 3F2 ; F2 ¼ 2F0 ½29

~k ðV; tÞ ¼ Fk ðV; tÞðV; tÞ


F ½20 in [22] and [24], we have


with Z  tÞ
F2 ðV; tÞ  ¼ F2 ðV;
1
Fk ðV; tÞ ¼ dvðv  VÞk 
surf ðv; tÞ
X 2 ðr; Þ
V þ F0 2þr ½30
Z V r 0
ð2 þ rÞ!
 dvðv  VÞk þ
surf ðv; tÞ
1
¼ Fk ðV; tÞ  Fkþ ðV; tÞ ½21  
d hE i

¼  M F2 ðV; tÞ  V 
and from eqn [18] dt A
  X1
M d 
þ F3 ðV; tÞ þ 1 ð2r  3Þ
hVi ¼ MhF2 ðV; tÞi ½22 2 2 r 2 r!
A dt

ðr1; Þ 

F2 ðV; tÞr ½31
 
M d 2
hV i ¼ M½hVF2 ðV;tÞi þ hF3 ðV;tÞi  ½23
A dt Depending on the questions or approximations one
 = hV i then from eqns [12] and [20], wants to study, either the distribution (V; t) or the
Introducing V 
moments hV n it will be the interesting objects.
it follows that the (kinetic) energies satisfy
Finally, with the condition [19], one can take
  h
d hE i eqn [12] for x 6¼ Xt and impose the boundary
¼  M hF2 ðV; tÞi V
dt A conditions at x = Xt :
  ðV; tÞi
þ hðV  VÞF 2
i
  ðXt ; v; tÞ ¼  ðXt ; v 0 ; tÞ; if v < Vt
  ½32
þ hF3 ðV; tÞi ½24 þ ðXt ; v; tÞ ¼ þ ðXt ; v 0 ; tÞ; if v > Vt
2
which implies conservation of energy.
From the first law of thermodynamics, and similarly for x = 0 and x = L with v 0 = v.
  Let us note that this factorization condition is of
d h E i 1 h P! i
the same nature as the molecular chaos assumption
¼ PW þ PPQ!  ½25
dt A A introduced in kinetic theory, and with this condition
eqn [13] yields the Boltzmann equation for this
where PPW!  and PPQ!  denote the work- and model.
heat-power transmitted by the piston to the fluid, In the following, to obtain explicit results as a
we conclude from eqns [22] and [25] that the heat function of the initial temperatures T0 , we take
flux is Maxwellian distributions ’ 0 (v) and initial condi-
1 P! h tions (p  
0 , T0 , n0 ) such that the velocity of the piston
  ðV; tÞi
PQ ¼  M hðV  VÞF remains small (i.e., jhVit j  jhv i0 j).
2 
A i
 
þ hF3 ðV; tÞi ½26
2
Since   1, it is interesting to introduce the Distribution (V ; t) for the Infinite
irreducible moments Cylinder (L = 1)
pffiffiffiffiffiffiffiffiffiffiffi
 ri To lowest order in  = m=M, and assuming
r ¼ hðV  VÞ  ½27 j1  pþ =p j is of order , one obtains from eqn [16]
 = hVi ,
and the expansion around V the usual Fokker–Planck equation whose solution
t
gives
X
1
1 !
Fn ðV; tÞ ¼ 
Fnðr; Þ ðVÞðV  r
 VÞ ½28
r! 1 1 ðV  V  ðtÞÞ2
r¼0 0 ðV; tÞ ¼ pffiffiffiffiffiffi exp  ½33
2 ðtÞ 22 ðtÞ
Adiabatic Piston 165

rffiffiffiffiffiffiffiffi 9
with m kB pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi >
rffiffiffiffiffiffiffiffi 1 hVistat ¼ ð Tþ  TÞ > =
kB pþ p M 8m if p ¼ pþ ½38

VðtÞ 
¼ ðp  p Þ þ
pffiffiffiffiffiffiffi þ pffiffiffiffiffiffiffi ð1  e t Þ k p ffiffiffiffiffiffiffiffiffiffiffiffiffi >
>
TTþ ;
8m T þ 2 B
T hV 2 istat  hVistat ¼
sffiffiffiffiffiffiffiffi M
A 8m pþ p
¼ pffiffiffiffiffiffiffi þ pffiffiffiffiffiffiffi ½34 Let us remark that we have established eqn [35]
M kB T þ T under the condition that j1  pþ =p j = O(), but as
pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi
kB pffiffiffiffiffiffiffiffiffiffiffiffiffi pþ T þ þ p T  we see in the next section, the stationary value Vstat
2 ðtÞ ¼ TTþ pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi ð1  e2 t Þ obtained from eqn [36] ffi remains valid whenever
M pþ T  þ p T þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j(1  pþ =p )(1  T þ =T  )j  1.
where we have dropped the index ‘‘zero’’ on the
variable T  , n and used the equation of state
p = n kB T  .
In conclusion, in the thermodynamic limit for the Moments hV n it : Thermodynamic Limit
piston (M ! 1, M=A fixed), eqn [33] shows that for the Piston
the evolution is deterministic, that is, (V; t) = General Equations: Adiabatic Evolution


(V  V(t), 
where the velocity V(t) of the piston
tends exponentially fast toward stationary value In the thermodynamic limit M ! 1,  ! 0,  = A

Vstat = V(1) with relaxation time  = 1 . is fixed and eqn [16] reduces to
Let us note that for pþ = p , we have V(t)  0
@ ~
and the evolution [33] is identical to the @t ðV; tÞ ¼  F2 ðV; tÞ ½39
@V
Ornstein–Uhlenbeck process of thermalization of
the Brownian particle starting with zero velocity Integrating [39] with initial condition (V; t = 0) =
and friction coefficient . The analysis of [16] to
(V) yields
first order in  yields then
" #

ðV;tÞ ¼
ðV  VðtÞÞ; that is; hV n it ¼ hVint ½40
X3
ðV; tÞ ¼ 1 þ  
ak ðtÞðV  VðtÞÞ k
0 ðV; tÞ ½35 where
k¼0
d
where ak (t) can be explicitly calculated and a0 (t) = VðtÞ ¼ F2 ðVðtÞ; tÞ; Vðt ¼ 0Þ ¼ 0 ½41
dt
2 (t)a2 (t) because of the normalization condition.
Moreover, a2 (t)  (p  pþ ), that is, a2 (t) = 0 if Moreover,
p = pþ . From [35], one obtains ~2 ðV; tÞ ¼ F2 ðV; tÞðV; tÞ
rffiffiffiffiffiffiffiffi F ½42
pffiffiffiffiffiffiffiffiffiffiffiffiffi
kB T T þ
hVit ¼ pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi and
8m pþ T  þ p T þ
n
; P ðX; v; X; V; tÞ ¼  ðx; v; tÞ
ðX  XðtÞÞ

ðp  pþ Þð1  e t Þ


ðV  VðtÞÞ ½43
  ðp T þ  pþ T  Þ
þ 2
þ ðp  p Þ pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi
8 ðpþ T  þ p T þ Þ2 where dX(t)=dt = V(t), X(t = 0) = X0 .
In conclusion, as already mentioned, in this limit

ð1  2 te t  e2 t Þ
the factorization condition (eqn [19]) is an exact
m 1 relation. Let us note that  
þ pffiffiffiffiffiffiffiffiffiffiffiffiffi ðp T þ  pþ T  Þ surf (v; t) = surf (2V  v; t) if
M TTþ v > V(t) (on the right) or v < V(t) (on the left). Let
pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi! o us also remark that 2mF2 (V(t); t) represents the
p þ T þ þ p T 

pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi ð1  e t Þ2 ½36 effective pressure from the right/left exerted on the
p þ T  þ p T þ piston. Moreover, since for any distribution
þ
and  
surf (v; t), the functions F2 (V; t) and F2 (V; t) are
rffiffiffiffiffi
monotonically decreasing, we can introduce the
m decomposition
2
hV it  hVi2t 2
¼  ðtÞ 1 þ 2
2 ðtÞa2 ðtÞ ½37
M  
M 
From eqn [36], we now conclude that for equal p surf ¼ 2mF 
2 ðV; tÞ ¼ ^
p 
 ðV; tÞV ½44
A
pressures p = pþ , the piston will evolve stochasti-
cally to a stationary state with nonzero velocity where the static pressure at the surface is
toward the warmer side ^ (t) = p
p surf (V = 0; t) and the friction coefficients
166 Adiabatic Piston

 (V; t) are strictly positive. The evolution [41] is its final velocity Vstat and one can solve eqn [12] to
thus of the form obtain the evolution of the fluids.
d A  Finite Cylinder (L < 1, M = 1)
VðtÞ ¼ ðp ^þ Þ  ðVÞV
^ p ½45
dt M
For finite L, introducing the average temperature in
It involves the difference of static pressure and the the fluids
friction coefficient (V) =  (V) þ þ (V). Finally,
from eqn [12], we obtain the evolution of the  2hE it
Tav ¼ ½50
(kinetic) energy per unit area for the fluids in the left kB N 
and right compartments:
we have to solve [41] and [46], that is,
 
d < E > d A  
¼  2mF2 ðV; tÞV ½46 VðtÞ ¼ 2m F2 ðV; tÞ  F2þ ðV; tÞ
dt A dt M ½51
Therefore, from [40] and [46], and the first law of d  A
kB Tav ¼ 4m  F2 ðV; tÞV
thermodynamics, we recover the conclusions dt N
obtained in the previous section, that is, in the where F2 (V; t) is a functional of 
surf (v; t) which we
thermodynamic limit for the piston, the evolution decompose as
(eqns [41], [12], and [35]) is deterministic and  
adiabatic (i.e., in [46] only work and no heat is
F2 ðV; tÞ ¼ n ^  ðtÞ  M  ðV; tÞV ½52
^ ðtÞkB T
involved). A

Infinite Cylinder (L = 1, M = 1) with


Z 1
As already discussed, for L = 1 we can neglect the 
n
^ ðtÞ ¼ dv
surf ðv; tÞ
recollisions. Therefore, in F2 the distribution  (v; t) 0
can be replaced by n   Z ½53
0 ’0 (v) and F2 (V) is indepen- 0
þ
dent of t. In this case, the evolution of the piston is n
^ ðtÞ ¼ dvþ
surf ðv; tÞ
1
simply given by the ordinary differential equation
and
d A
VðtÞ ¼ 2mF2 ðVÞ; Vðt ¼ 0Þ ¼ 0 ½47
dt M ^ kB T
n ^ ¼ p ^ ½54
where F2 (V) is a strictly decreasing function of V. If pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
For a time interval 1 = L m=kB T which is the time
pþ 
0 = p0 , then V(t) = 0, that is, the piston remains at for the shock wave to bounce back, the piston will
rest and the two fluids remain in their original
evolve as already discussed. In particular, if R is
thermal equilibrium. If pþ  þ þ
0 6¼ p0 , that is, n0 kB T0 6¼
  sufficiently large, then after a time 0 = O((R )1 ) the
n0 kB T0 , the piston will evolve monotonically to a piston will reach the velocity V  given by F2 (V,
 t) = 0
stationary state with constant velocity Vstat solution 
(eqn [47]). For t > 1 , F2 (V; t) depends explicitly on
of F2 (Vstat ) = 0. From [34], it follows that Vstat is a time. For R sufficiently large, we can expect that for
function of nþ   þ
0 =n0 , T0 , T0 but does not depend on all t the velocity V(t) will be a functional of  surf (v; t)
the value M=A. Moreover, the approach to this
given by F2 [V(t); surf (. ; t)] = 0, and thus the problem
stationary state is exponentially fast with relaxation
is to solve eqn [12] with the boundary condition (eqn
time 0 = 1= (V = 0). For Maxwellian distributions
[32]). Since V(t) so defined is independent of M=A,
’0 (v), Vstat is a solution of the evolution will be independent of M=A if R is
rffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffi qffiffiffiffiffiffiffi
   8kB m  sufficiently large. This conclusion, which we cannot
þ þ
kB n0 T0  n0 T0  Vstat n0 T0  nþ
0 T0þ prove rigorously, will be confirmed by numerical

2
 3  simulations.
þ Vstat m n þ
0  n0 þ O Vstat ¼ 0 ½48
To give a qualitative discussion of the evolution
Moreover, for arbitrary values of R , we shall use the following
rffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffi assumption already introduced in the experimental
qffiffiffiffiffiffiffi
A 8kB m  measurement of cp =cv .
01 ¼  þ
n0 T0 þ n0 T0þ ½49
M 
Assumption 2 (Average assumption). The surface
which implies that the relaxation time will be very coefficients n ^  (t) (eqns [52]–[53]) coin-
^ (t) and T
small either if M=A  1, or if n0 = n~
0 with  1. cide to order 1 in  with the average value of the
In this case, the piston acquires almost immediately density and temperature in the fluids, that is,
Adiabatic Piston 167

N Nþ In other words, the effect of collisions on the piston


^ ¼
n ; ^þ ¼
n is to induce an external potential of the form
AXðtÞ AðL  XðtÞÞ
^  ¼ T  ðtÞ [c1 jXj2 þ c2 (L  X)2 ] and a friction force. It is a
T av ½55 damped harmonic oscillator with
 
We still need an expression for the friction 2 E0 1
!0 ¼ 6
coefficients. From M Xf ðL  Xf Þ
rffiffiffirffiffiffiffiffiffiffiffi"sffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi # ½64
F2 ðV; tÞ ¼ p
^ ðtÞ  4mVF1 ðV ¼ 0; tÞ 1 E0 R Rþ
¼4 þ
þ mV 2 n
^ ðtÞ þ OðV 3 Þ ½56  ML Xf ðL  Xf Þ

then, assuming that to first order in , F1 (V = 0; t) is (recall that R = mN  =M). For the case N  = N þ to
the same function of T ^  (t) as for Maxwellian be considered in the simulations, eqn [64] implies
distributions, we have that the motion is weakly damped if
2sffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 "rffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffi #2
 
A  4 8kB T
^ 3 Xf X

ðVÞ ¼ m^n  V 5 þ OðV 2 Þ ½57 R < Rmax ¼ þ 1 f ½65
M m 2 L L

Therefore, choosing initial condition such that V(t) with period


is small for all time, eqn [51] yields 2 1
pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½66
^ X  T ^ þ ðL  XÞ !0 R  Rmax
T
qffiffiffiffiffiffiffi qffiffiffiffiffiffiffi and strongly damped if R > Rmax , in agreement with
¼C¼ T ^  X0  T ^ þ ðL  X0 Þ ½58
0 0 experimental observations.
We thus obtain the equilibrium point for the
adiabatic evolution (M = 1): Moments hV n it : Piston with Finite Mass
 
N 2E0 Xf Equation to First Order in  = 2m=(M þ m)
Tf ¼ ½59
A AkB L
If the mass of the piston is finite with M  m, then
    the irreducible moments r are of the order [(rþ1)=2]
Nþ þ 2E0 Xf where [(r þ 1)=2] is the integral part of (r þ 1)=2.
Tf ¼ 1 ½60
A AkB L If the factorization condition [19] is satisfied, to first
order in  we have
where    þ
2E0 N N nðn  1Þ n2
¼ T0 þ T0þ ½61 hV n it ¼ V n ðtÞ þ V ðtÞ2 ðtÞ ½67
AkB A A 2
and where V(t) = hVit and 2 (t) = hV 2 it  hVi2t are
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffi solutions of
 
A A 3 AL
X3f  ðL  Xf Þ ¼ C ½62 1d
N Nþ 2E0 kB VðtÞ ¼F2 þ 2 F0
 dt
Solving [58]–[62] gives the equilibrium state (Xf , Tf ), 1d
þ 2 ðtÞ ¼  42 F1 þ F3
which is a state of mechanical equilibrium p f = pf ,  dt ½68
 þ
but not thermal equilibrium Tf 6¼ Tf . Moreover, this 1d  
equilibrium state does not depend on M. Having hE it ¼  M½F2 þ 2 F0 V
 dt
obtained the equilibrium point, we can then investi- 
þ ðM=2Þ½42 F1  F3 
gate the evolution close to the equilibrium point.
Linearizing eqn [51] around (Xf , Tf ) yields :
and 2 ¼ kB TP =M defines the temperature of the
    2 piston.
d N Tf Xf
V ¼ kB
dt M X3 Infinite Cylinder: Heat Transfer
 þ þ
N Tf ðL  Xf Þ2 For the infinite cylinder, the factorization assump-
  ðV ¼ 0ÞV ½63
M ðL  XÞ3 tion is an exact relation and in this case the
functions Fk (V; t) are independent of t. The solution
168 Adiabatic Piston

of the autonomous system [68] with Fk = Fk (V) observables. The initial conditions are set on the
shows that the piston evolves to a stationary state first-stage solution. The initial conditions of the
 given by
with velocity V second regime match the asymptotic behavior of the
  first-stage solution (‘‘matching condition’’).
 þ  F3 ðVÞF0 ðVÞ ¼ 0
F2 ðVÞ ½69 The slaving principle is implemented by interpret-
4 F1 ðVÞ
ing an evolution equation of the form
The temperature of the piston is da da
 ¼ Að; aÞ; A ¼ Oð1Þ ½73
 dt d
 2 ¼ kB TP ¼  F3 ðVÞ
 ½70
M 
4 F1 ðVÞ as follows: it indicates that a is in fact a fast quantity
relaxing at short times ( ) toward a stationary
and the heat flux from the piston to the fluid is
state aeq () slaved to the slow evolution and

1 P! m2 F3þ F1  F3 F1þ determined by the condition
PQ ¼ ½71
A 2M F1  F1þ A½; aeq ðÞ ¼ 0 ½74
If we choose initial conditions such that jV(t)j  1
(at lowest order in , actually A[, aeq ()] = O()
for all t, and Maxwellian distributions ’ (v), the
which prescribes the leading order of aeq ()); the
solutions V(t), 2 (t) coincide with the solutions
following-order terms can be arbitrarily fixed as
previously obtained (eqns [36] and [37]) and
long as only the first order of perturbation is
rffiffiffiffiffiffiffiffi
1 P! m 8kB implemented. Physically, such a condition arises to
þ 
P ¼ ðT  T Þ
express that an instantaneous mechanical equili-
A Q M m
brium takes place at each time  of the slow
p pþ

pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi ½72 relaxation to thermal equilibrium.
ðpþ T  þ p T þ Þ
In conclusion, to first order in m=M, there is a heat
Equations for the fluctuation-induced evolution of
flux from the warm side to the cold one propor-
the system Following this procedure, we arrive at
tional to (T þ  T  ), induced by the stochastic
explicit expressions for the rescaled quantities (of order
motion of the piston. e = V=,  e 2 = 2 =, and  e = (p  pþ )=:
O(1))V
   þ þ 
Finite Cylinder (L < 1, M < 1)
Ve ¼ m AL F3 F1  F3 F1 þ OðÞ
3 E0 F1
Singular character of the perturbation approach  
Whereas the leading order is actually the ‘‘thermo- e
 2m AL
¼ ðF3 F1þ  F3þ F1 Þ
dynamic behavior’’ M = 1 in the first two stages of 2m 3 E0 ½75
the evolution (fast relaxation toward mechanical F3 F1
equilibrium), the fluctuations of order O() rule the  þ OðÞ
4F1
slow relaxation toward thermal equilibrium. It is
thus obvious that a naive perturbation approach e 2 ¼ F3 þ OðÞ
4F1
cannot give access to ‘‘both’’ regimes. This difficulty
is reminiscent of the boundary-layer problems We then introduce a (dimensionless) rescaled posi-
encountered in hydrodynamics, and the perturbation tion for the piston
method to be used here is the exact temporal analog
1 X 1 1
of the matched perturbative expansion method ¼  2  ; ½76
developed for these boundary layers. The idea is to 2 L 2 2
implement two different perturbation approaches: which satisfies
1. one at short times, with time variable t describing  
d  þ 2A F1 F1þ
the fast dynamics ruling the fast relaxation ¼ kB ðT  T Þ ½77
d 3E0 F1
toward mechanical equilibrium; and
2. one for longer times, with a rescaled time To discuss eqn [77], a third assumption has to be
variable  = t. introduced.
The second perturbation approach above is supple- Assumption 3 (Maxwellian Identities). In the
mented with a ‘‘slaving principle,’’ expressing that at regime when V = O(), the relations between the
each time of the slow evolution, that is, at fixed , functionals F1 , F2 , and F3 are the same at lowest
the still present fast dynamics has reached a local order in  as if the distributions surf (v; V; t) were
asymptotic state, slaved to the values of the slow Maxwellian in v:
Adiabatic Piston 169

rffiffiffiffiffiffiffiffiffiffiffiffi
kB T  thermal equilibrium up to a temperature difference
F1 ðVÞ  
T þ  T  = O(). For the sake of technical complete-
2m
 
 ½78 ness (rather that physical relevance, since the above
2kB T first-order analysis is enough to get the observable,
F3 ðVÞ F1 ðVÞ  VF2 ðVÞ
m meaningful behavior), let us mention that the pertur-
bation analysis can be carried over at higher orders;
Using these identities and the (dimensionless)
using further rescaled times t2 = 2 t0 , . . . , tn = n t0 , it
rescaled time
would allow us to control the evolution up to a
rffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 kB 2ðN T0 þ N þ T0þ Þ temperature difference jT þ  T  j = O(n ); however,
s¼ ½79 one could expect that the factorization condition does
3L m N
not hold at higher orders.
where N = N þ þ N  , we obtain a deterministic
equation describing the piston motion (Gruber et al.
2003): Numerical Simulations
"rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi #
d N N As we have seen, the results were established under
¼ ð1 þ 2 Þ  ð1  2 Þ the condition that m/M is a small parameter. More-
ds 2N þ 2N 
½80 over for finite systems (L < 1, M < 1), it was
1 Xad assumed that before collisions and to first order in
ð0Þ ¼ 
2 L m/M, the factorization and the average assumptions
where Xad is the piston position at the end of the are satisfied. The numerical simulations are thus
adiabatic regime (i.e., Xf , eqn [62]). The meaningful essential to check the validity of these assumptions, to
observables straightforwardly follow from the solu- determine the range of acceptable values m/M for the
tion (s): perturbation expansion, to investigate the thermo-
  dynamic limit, and to guide the intuition.
1
XðsÞ ¼ L  ðsÞ In all simulation, we have taken kB = 1, m = 1,
2 T  = 1 and usually T þ = 10. For L finite, we have
    ½81
N T0 þ N þ T0þ taken L = 60, X0 = 10, A = 105 , and N þ = N  = N=2,
T  ðsÞ ¼ ½1  2 ðsÞ
2N that is, p = R(M=A)(1=10) and pþ = 2p . The
number of particles N was varied from a few hundreds
The first-order perturbation analysis using a single to one or several millions; the mass M of the piston
rescaled time t1 = t0 is valid in the regime when from 1 to 105 . We give below some of the results
V = O() and it gives access to the relaxation toward which have been obtained for L = 1 (Figures 2 and 3)

M=5
450
400 M = 10

350
0.5
300
M = 15
250 0.4
X(t )

200
0.3
Vstat

M = 25
150

100 0.2
M = 50
50 M = 100 0.1
0
0
0 500 1000 1500 2000 2500 3000 0 20 40 60 80 100
t M

(a) (b)
Figure 2 Evolution of the piston for L = 1, and p  = p þ = 1 as observed in simulations (stochastic line in (a), dots in (b)) compared
with prediction: (a) position X(t ) for T þ = 10; and (b) stationary velocity for T þ = 10 (continuous line) and T þ = 100 (dotted line), as a
function of M.
170 Adiabatic Piston

0
0
–2000
–2000

X ′(t ′)
–4000 –4000
X(t)

–6000
–6000
–8000
–8000
–10000

–12000 –10000
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3
t ×104 t′ ×104

(a) (b)
Figure 3 Evolution of the piston for L = 1, M = 104 , and p þ 6¼ p  as observed in simulations (continuous line) compared with
predictions (dotted line): (a) p  = 1, p þ = p  þ p, from top to bottom p=p  = 0.05, 0.1, 0.2, 1, 2, 3; and (b) p  = , p þ = 2 ,
p=p  = 1; X 0 = X , t 0 = t, = 103 , 102 , 101 , 1, 10, 102 , 103 , 104 .

10 0.3

0.2
9.5
0.1
9 0
Xad V
– 0.1
8.5
– 0.2
8
– 0.3

7.5 – 0.4
0 50 100 150 200 250 300 350 0 10 20 30 40 50

9.5 0.3

0.2
9
0.1

8.5 0
Xad V
X(t )

V(t )

– 0.1
8
– 0.2
7.5
– 0.3

7 – 0.4
50 100 150 200 250 300 350 0 10 20 30 40 50

0.15
10
0.1
9.5

9 0.05

8.5 0
Xad
8 – 0.05

7.5 – 0.1

7 – 0.15

6.5 – 0.2
0 50 100 150 200 250 300 350 0 10 20 30 40 50
t t

(a) (b)
Figure 4 ‘‘Deterministic’’ evolution toward mechanical equilibrium for L < 1, M = 105 : (a) position X(t); one finds Xad
sim
= 8.3 whereas
th
Xad = 8.42 and (b) velocity V(t); one finds V sim = 0.343 whereas V  th = 0.3433. From top to bottom: R = 12: strong damping,
3
independentpffiffiffiffiof R and M for R > 4 and M > 10 . R = 2: critical damping. R = 0.1: weak damping; damping coefficient increases with R
and !0  R for R < 1 but is independent of M for M > 103 .
Adiabatic Piston 171

× 105 × 105
3 3
2.8 2.8
2.6 2.6
2.4 2.4
2.2
p av

2.2

psurf
±

±
2 2
1.8
1.8
1.6
1.6
1.4
1.4
1.2
0 50 100 150 200 250 1.2
0 20 40 60 80
0 50 100 150 200 250
10 0 20 40 60 80
10.5
Tav+ (t )

10

+ (t)
9.5

Tsurf
9.5
9
9
8.5

2 2

– (t)
Tav– (t )

1.5 1.5
Tsurf

1 1
0 50 100 150 200 250 0 20 40 60 80
t t

(a) (b)

Figure 5 Same conditions as Figure 4, R = 12: (a) average pressure and temperature in the fluid: pav (t) = 2E  n  =N  ,
 þ þ
Tav = E  =N  kB and (b) pressure and temperature at the surface of the piston. Prediction: Tad

= 1.54, Tad 
= 9.46, pad = pad = 2.2.
 þ  þ
Simulations: Tad = 1.52, Tad = 9.48, pad = pad = 2.2.

and for L < 1 approach to mechanical equilibrium predictions. In particular, they show that if R > 4,
(Figures 4–6) and to thermal equilibrium (Figures 7 the piston will be able to reach and maintain for
and 8). some time the velocity Vstat , whereas this will not be
the case for R < 1 (Figure 4b). In the second stage of
the evolution, the simulations (Figure 4) exhibit
Conclusions and Open Problems
damped oscillations toward mechanical equilibrium
In this article, the adiabatic piston has been which are in very good agreement with the predic-

investigated to first order in the small parameter tions for the final state (Xad , Tad ), the frequency of
m/M, but no attempt has been made to control the oscillations and the existence of weak and strong
remainder terms. For an infinite cylinder, no other damping depending on R < 1 or R > 4. Moreover,
assumptions were necessary and the numerical the general behavior of the evolution observed in the
simulations (Figures 2 and 3) are in perfect agree- simulations as a function of the parameters was as
ment with the theoretical prediction in particular for predicted. However, the damping coefficient of these
the stationary velocity Vstat , the friction coefficient oscillations is wrong by one or several orders of
(V), and the relaxation time . magnitude. To understand this discrepancy, we note
For a finite cylinder (L < 1) and in the thermo- that using the average assumption we have related
dynamic limit (M = 1), we were forced to introduce the damping to the friction coefficient. However, the
the average assumption to obtain a set of autono- simulations clearly show that those two dissipative
mous equations. As we have seen when initially p effects have totally different origins. Indeed, as one
6¼ pþ , this limiting case also describes the evolution can see with L = 1, friction is associated with the
to lowest order during the first two stages character-
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi fact that the density of the gas in front and in the
ized by a time of the order t1 = L m=kB T , where the back of the piston is not the same as in the bulk, and
evolution is adiabatic and deterministic. In the first this generates a shock wave that propagates in the
stage, that is, before the shock wave bounces back on fluid. For finite L, when R > 4, the stationary
the piston, the simulations confirm the theoretical velocity Vstat is reached and the effect of friction is
172 Adiabatic Piston

0.3 0.3 0.35


0.2 0.2

0.1 0.1
0.3
0 0
–5 0 5 –5 0 5

0.3 0.3 0.25

0.2 0.2

0.1 0.1 0.2


0 0
ρ–(υ)

–5 0 5 –5 0 5

ρ–(υ)
0.15
0.3 0.3

0.2 0.2
0.1
0.1 0.1
0 0
–5 0 5 –5 0 5
0.05
0.3 0.3

0.2 0.2
0
0.1 0.1

0 0
–5 0 5 –5 0 5 –5 –2.5 0 2.5 5
υ υ

(a) (b)
Figure 6 Velocity distribution in the left compartment. Same conditions as Figure 4, R = 12. Dotted line corresponds to Maxwellian
with T  = 1.52: (a) t = 12, 24, 36, 48, 60, 92, 144, 240 from top to bottom and (b) t = 276460.

to transfer in this first stage more and more energy to motion. In this case very little dissipation is involved
the fluid on one side and vice versa on the other side. and the damping will be very small. This indicates
However, to stop the piston and reverse its motion, that the mechanism responsible for damping is
only a certain amount of the transferred energy is associated with shock waves bouncing back and
necessary and the rest remains as dissipated energy in forth and the average assumption, which corresponds
the fluid leading to a strong damping. On the other to a homogeneity condition throughout the gas,
hand, for R < 1, the value Vstat is never reached and cannot describe the situation. In fact, the simulations
all the energy transferred is necessary to revert the (Figure 5b) indicate that the average assumption does

35 10

9
30
8
25 7

6

X

20
5
15 4

3
10
2

5 1
0 50 100 150 200 250 300 0 50 100 150 200 250 300
τ = αt τ = αt

(a) (b)
 4
Figure 7 Approach to thermal equilibrium, N = 3
10 . The smooth curves correspond to the predictions, the stochastic curves to
simulations: (a) position X (),  = t, no visible difference for M = 100, 200, 1000 and (b) average temperatures T  (),  = t, M = 200.
Adiabatic Piston 173

0.4 0.2

0.35

0.3 0.15
ρ–(υ)/n–

0.25

ρ–(υ)/n–
0.2 0.1

0.15

0.1 0.05

0.05

0 0
–10 –5 0 5 10 –15 –10 –5 0 5 10 15
υ υ

(a) (b)

Figure 8 Approach to thermal equilibrium from Tad = 1.54 (dotted line in(a)) to Tf = 5.5 (heavy line in (b)). Velocity distribution
function on the left for M = 200, N  = 5
104 . (a)  =  t = 2, 4, 14, 48, 92, 144 and (b) approach to Maxwellian distribution for  > 445.

not hold in this second stage. In conclusion, one is Finally, let us mention that the relation between the
forced to admit that to describe correctly the piston problem and the second law of thermodynamics
adiabatic evolution, it is necessary to study the is one more major problem. The question of entropy
coupling between the motion of the piston and the production out of equilibrium, and the validity of the
hydrodynamic equations of the gas. Preliminary second law, are still highly controversial. Again,
investigations have been initiated, but this is still preliminary results can be found in the literature.
one of the major open problems. Another problem Among other things, this question has led to a model of
would be to study the evolution in the case of heat conductivity gases, which reproduces the correct
interacting particles. However, investigations with behavior (Gruber and Lesne 2005).
hard disks suggest that no new effects should appear.
To investigate adiabatic evolution, a simpler version
See also: Billiards in Bounded Convex Domains;
of the adiabatic piston problem, without any con-
Boltzmann Equation (Classical and Quantum);
troversy, has been introduced: this is the model of a
Hamiltonian Fluid Dynamics; Multiscale Approaches;
standard piston with a constant force acting on it. Nonequilibrium Statistical Mechanics (Stationary):
In the third stage, that is, the very slow Overview; Nonequilibrium Statistical Mechanics:
approach to thermal equilibrium, another assump- Dynamical Systems Approach.
tion was necessary, namely the factorization
condition. The simulations (Figure 7) show a very
good agreement with the prediction, and in
particular the scaling property with t0 = t=M is Further Reading
perfectly verified. It appears that the small dis-
crepancy between simulations and theoretical Callen HB (1963) Thermodynamics. New York: Wiley.
(Appendix C. See also Callen HB (1985) Thermodynamics
predictions could be due to the fact that, to and Thermostatics, 2nd edn., pp. 51 and 53. New York:
compute explicitly the coefficients in the equations Wiley.)
of motion, we have taken Maxwellian relations for Chernov N, Sinai YaG, and Lebowitz JL (2002) Scaling dynamic
the velocities of the gas particles, which is clearly of a massive piston in a cube filled with ideal gas: exact
not the case (Figure 8a). results. Journal of Statistical Physics 109: 529–548.
Feynman RP (1965) Lectures in Physics I. New York: Addison-
The fourth stage of the evolution, that is, the Wesley.
approach to Maxwellian distributions (Figure 8b), is Gruber Ch (1999) Thermodynamics of systems with internal
still another major open problem. Some preliminary adiabatic constraints: time evolution of the adiabatic piston.
studies have been conducted, where one investigates European Journal of Physics 20: 259–266.
the stability and the evolution of the system when Gruber Ch and Lesne A (2005) Hamiltonian model of heat
conductivity and Fourier law. Physica A 351: 358.
initially the two gases are in the same equilibrium Gruber Ch, Pache S, and Lesne A (2003) Two-time-scale
state, but characterized by a distribution function relaxation towards thermal equilibrium of the enigmatic
which is not Maxwellian. piston. Journal of Statistical Physics 112: 1199–1228.
174 AdS/CFT Correspondence

Gruber Ch and Piasecki J (1999) Stationary motion of the Ball Systems and the Lorentz Gas, Encyclopedia of
adiabatic piston. Physica A 268: 412–442. Mathematical Sciences Series, vol. 101, pp. 217–227. Berlin:
Kestemont E, Van den Broeck C, and MalekMM (2000) The Springer.
‘‘adiabatic’’ piston: and yet it moves. Europhysics Letters 49: 143. Van den Broeck C, Meurs P, and Kawai R (2004) From
Lebowitz JL, Piasecki J, and Sinai YaG (2000) Scaling dynamics Maxwell demon to Brownian motor. New Journal of Physics
of a massive piston in an ideal gas. In: Szász D (ed.) Hard 7: 10.

AdS/CFT Correspondence
C P Herzog, University of California at Santa Barbara, with string one-loop diagrams, by N 0 , etc. This
Santa Barbara, CA, USA counting corresponds to the closed-string coupling
I R Klebanov, Princeton University, Princeton, NJ, constant of order N1 . Thus, in the large-N limit
USA the gauge theory becomes ‘‘planar,’’ and the dual
ª 2006 Elsevier Ltd. All rights reserved. string theory becomes classical. For small g2YM N,
the gauge theory can be studied perturbatively; in
this regime the dual string theory has not been very
Introduction useful because the background becomes highly
curved. The real power of the AdS/CFT duality,
The anti-de Sitter/conformal field theory (AdS/CFT)
which already has made it a very useful tool, lies in
correspondence is a conjectured equivalence
the fact that, when the gauge theory becomes
between a quantum field theory in d spacetime strongly coupled, the curvature in the dual descrip-
dimensions with conformal scaling symmetry and a tion becomes small; therefore, classical supergravity
quantum theory of gravity in (d þ 1)-dimensional
provides a systematic starting point for approximat-
anti-de Sitter space. The most promising
ing the string theory.
approaches to quantizing gravity involve super-
There is a strong motivation for an improved
string theories, which are most easily defined in
understanding of dualities of this type. In one
10 spacetime dimensions, or M-theory which is
direction, generalizations of this duality provide the
defined in 11 spacetime dimensions. Hence, the
tantalizing hope of a better understanding of
AdS/CFT correspondences based on superstrings quantum chromodynamics (QCD); QCD is a non-
typically involve backgrounds of the form AdSdþ1
abelian gauge theory that describes the strong
Y9d while those based on M-theory involve back-
interactions of mesons, baryons, and glueballs, and
grounds of the form AdSdþ1
Y10d , where Y are
has a conformal symmetry which is broken by
compact spaces.
quantum effects. In the other direction, AdS/CFT
The examples of the AdS/CFT correspondence
suggests that quantum gravity may be understand-
discussed in this article are dualities between
able as a gauge theory. Understanding the confine-
(super)conformal nonabelian gauge theories and
ment of quarks and gluons that takes place in
superstrings on AdS5
Y5 , where Y5 is a five- low-energy QCD and quantizing gravity are well
dimensional Einstein space (i.e., a space whose acknowledged to be two of the most important
Ricci tensor is proportional to the metric,
outstanding problems of theoretical physics.
Rij = 4gij ). In particular, the most basic (and maxi-
mally supersymmetric) such duality relates
N = 4 SU(N) super Yang–Mills (SYM) and type IIB
superstring in the curved background AdS5
S5 . Some Geometrical Preliminaries
There exist special limits where this duality is The d-dimensional sphere of radius L, Sd , may be
more tractable than in the general case. If we take defined by a constraint
the large-N limit while keeping the ‘t Hooft coupling
= g2YM N fixed (gYM is the Yang–Mills coupling X
dþ1

strength), then each Feynman graph of the gauge ðXi Þ2 ¼ L2 ½1


i¼1
theory carries a topological factor N  , where  is
the Euler characteristic of the graph. The graphs of on d þ 1 real coordinates Xi . It is a positively curved
spherical topology (often called ‘‘planar’’), to be maximally symmetric space with symmetry group
identified with string tree diagrams, are weighted by SO(d þ 1). We will denote the round metric on Sd of
N 2 ; the graphs of toroidal topology, to be identified unit radius by d2d .
AdS/CFT Correspondence 175

The d-dimensional anti-de Sitter space, AdSd , may topological defect: upon touching a D-brane, a
be defined by a constraint closed string can open up and turn into an open
string whose ends are free to move along the
X
d1
ðX0 Þ2 þ ðXd Þ2  ðXi Þ2 ¼ L2 ½2 D-brane. For the endpoints of such a string the p þ 1
i¼1 longitudinal coordinates satisfy the conventional free
(Neumann) boundary conditions, while the 9  p
This constraint shows that the symmetry group of coordinates transverse to the Dp brane have the fixed
AdSd is SO(2, d  1). AdSd is a negatively curved (Dirichlet) boundary conditions, hence the origin of
maximally symmetric space, that is, its curvature the term ‘‘Dirichlet brane.’’ The Dp brane preserves
tensor is related to the metric by half of the bulk supersymmetries and carries an
1 elementary unit of charge with respect to the (p þ 1)-
Rabcd ¼  ½gac gbd  gad gbc  ½3 form gauge potential from the Ramond–Ramond
L2
(RR) sector of type II superstring.
Its metric may be written as
For this article, the most important property of
 
dy2 D-branes is that they realize gauge theories on their
ds2AdS ¼ L2 ðy2 þ 1Þdt2 þ 2 þ y2 d2d2 ½4 world volume. The massless spectrum of open
y þ1
strings living on a Dp brane is that of a maximally
where the radial coordinate y 2 [0, 1), and t is supersymmetric U(1) gauge theory in p þ 1 dimen-
defined on a circle of length 2. This space has sions. The 9  p massless scalar fields present in this
closed timelike curves; to eliminate them, we will supermultiplet are the expected Goldstone modes
work with the universal covering space where associated with the transverse oscillations of the Dp
t 2 (1, 1). The boundary of AdSd , which plays brane, while the photons and fermions provide the
an important role in the AdS/CFT correspondence, is unique supersymmetric completion. If we consider
located at infinite y. There exists a subspace of AdSd N parallel D-branes, then there are N 2 different
called the Poincaré wedge, with the metric species of open strings because they can begin and
! end on any of the D-branes. N 2 is the dimension of
L 2
2
Xd2
2
ds2 ¼ 2 dz2  ðdx0 Þ þ ðdxi Þ ½5 the adjoint representation of U(N), and indeed we
z i¼1 find the maximally supersymmetric U(N) gauge
theory in this setting.
where z 2 [0, 1).
The relative separations of the Dp branes in the
A Euclidean continuation of AdSd is the
9  p transverse dimensions are determined by
Lobachevsky space (hyperboloid), Ld . It is obtained
the expectation values of the scalar fields. We will
by reversing the sign of (Xd )2 , dt2 , and (dx0 )2 in [2],
be interested in the case where all scalar expectation
[4], and [5], respectively. After this Euclidean
values vanish, so that the N Dp branes are stacked
continuation, the metrics [4] and [5] become
on top of each other. If N is large, then this stack is
equivalent; both of them cover the entire Ld .
a heavy object embedded into a theory of closed
Another equivalent way of writing the metric is
  strings which contains gravity. Naturally, this
ds2L ¼ L2 d2 þ sinh2  d2d1 ½6 macroscopic object will curve space: it may be
described by some classical metric and other back-
which shows that the boundary at infinite  has the ground fields including the RR (p þ 2)-form field
topology of Sd1 . In terms of the Euclideanized strength. Thus, we have two very different descrip-
metric [5], the boundary consists of the Rd1 at tions of the stack of Dp branes: one in terms of the
z = 0, and a single point at z = 1. U(N) supersymmetric gauge theory on its world
volume, and the other in terms of the classical RR
charged p-brane background of the type II closed
superstring theory. The relation between these two
The Geometry of Dirichlet Branes
descriptions is at the heart of the connections
Our path toward formulating the AdS5 =CFT4 between gauge fields and strings that are the subject
correspondence requires introduction of Dirichlet of this article.
branes, or D-branes for short. They are soliton-like
‘‘membranes’’ of various internal dimensionalities
Coincident D3 Branes
contained in type II superstring theories. A Dirichlet
p-brane (or Dp brane) is a (p þ 1)-dimensional Gauge theories in 3 þ 1 dimensions play an impor-
hyperplane in (9 þ 1)-dimensional spacetime where tant role in physics, and as explained above, parallel
strings are allowed to end. A D-brane is much like a D3 branes realize a (3 þ 1)-dimensional U(N) SYM
176 AdS/CFT Correspondence

theory. Let us compare a stack of D3 branes with where we used the standard relations = 87=2 gst 0 2
the RR-charged black 3-brane classical solution and g2YM = 4gst [10]. Thus, the size of the throat in
where the metric assumes the form string units is 1=4 . This remarkable emergence
h i of the ‘t Hooft coupling from gravitational con-
ds2 ¼ H1=2 ðrÞ f ðrÞðdx0 Þ2 þ ðdxi Þ2 siderations is at the heart of the success of the AdS/
h i CFT pcorrespondence. Moreover, the requirement
þ H 1=2 ðrÞ f 1 ðrÞdr2 þ r2 d5 2 ½7 ffiffiffiffiffi
L  0 translates into   1: the gravitational
approach is valid when the ‘t Hooft coupling is very
where i = 1, 2, 3 and
strong and the perturbative field-theoretic methods
L4 r0 4 are not applicable.
HðrÞ ¼ 1 þ ; f ðrÞ ¼ 1 
r4 r4
The solution also contains an RR self-dual 5-form Example: Thermal Gauge Theory from
Near-Extremal D3 Branes
field strength
An important black hole observable is the Bekenstein–
F ¼ dx0 ^ dx1 ^ dx2 ^ dx3 ^ dðH 1 Þ
Hawking (BH) entropy, which is proportional to the
þ 4L4 volðS5 Þ ½8 area of the event horizon. For the 3-brane solution
[7], the horizon is located at r = r0 . For r0 > 0 the
so that the Einstein equation of type IIB super-
3-brane carries some excess energy E above its
gravity, R = F
F
=96, is satisfied.
extremal value, and the BH entropy is also non-
In the extremal limit r0 ! 0, the 3-brane metric
vanishing. The Hawking temperature is then defined
becomes
by T 1 = @SBH =@E.
 1=2   Setting r0  L in [9], we obtain a near-extremal
L4
ds2 ¼ 1 þ 4 ðdx0 Þ2 þ ðdxi Þ2 3-brane geometry, whose Hawking temperature is
r
 1=2 found to be T = r0 =(L2 ). The eight-dimensional
L4  2  ‘‘area’’ of the horizon is
þ 1þ 4 dr þ r2 d25 ½9
r
Ah ¼ ðr0 =LÞ3 V3 L5 volðS5 Þ ¼ 6 L8 T 3 V3 ½12
Just like the stack of parallel, ground-state D3
branes, the extremal solution preserves 16 of the where V3 is the spatial volume of the D3 brane (i.e.,
32 supersymmetries present in the type IIB theory. the volume of the x1 , x2 , x3 coordinates). Therefore,
Introducing z = L2 =r, one notes that the limiting the BH entropy is
form of [9] as r ! 0 factorizes into the direct 2Ah 2 2
product of two smooth spaces, the Poincaré wedge SBH ¼ ¼ N V3 T 3 ½13
2 2
[5] of AdS5 , and S5 , with equal radii of curvature L.
The 3-brane geometry may thus be viewed as a This gravitational entropy of a near-extremal
semi-infinite throat of radius L which, for r  L, 3-brane of Hawking temperature T is to be
opens up into flat (9 þ 1)-dimensional space. Thus, identified with the entropy of N = 4 supersym-
pffiffiffiffiffi metric U(N) gauge theory (which lives on N
for L much larger than the string length scale, 0 ,
the entire 3-brane geometry has small curvatures coincident D3 branes) heated up to the same
everywhere and is appropriately described by the temperature.
supergravity approximation to type IIB string The entropy of a free U(N) N = 4 supermultiplet –
theory. which consists of the gauge field, 6N2 massless
pffiffiffiffiffi scalars, and 4N2 Weyl fermions – can be calculated
The relation between L and 0 may be found by
equating the gravitational tension of the extremal using the standard statistical mechanics of a
3-brane classical solution to N times the tension of a massless gas (the blackbody problem), and the
single D3 brane: answer is
pffiffiffi 22 2
2 4 5  S0 ¼ N V3 T 3 ½14
L volðS Þ ¼ N ½10 3
2
where vol(S 5
)= 3 It is remarkable that the 3-brane geometry captures
ffi  is the volume of a unit 5-sphere,
pffiffiffiffiffiffiffiffiffi
and = 8G is the ten-dimensional gravitational the T 3 scaling characteristic of a conformal field
constant. It follows that theory (CFT) (in a CFT this scaling is guaranteed by
the extensivity of the entropy and the absence of

L4 ¼ 5=2 N ¼ g2YM N0 2 ½11 dimensionful parameters). Also, the N 2 scaling
2 indicates the presence of O(N 2 ) unconfined degrees
AdS/CFT Correspondence 177

of freedom, which is exactly what we expect in the particle incident from the asymptotic (large r) region
N = 4 supersymmetric U(N) gauge theory. But what tunnels into the r  L region and produces an
is the explanation of the relative factor of 3/4 excitation of the throat. The fact that the two
between SBH and S0 ? In fact, this factor is not a different descriptions of the absorption process give
contradiction but rather a prediction about the identical cross sections supports the identification of
strongly coupled N = 4 SYM theory at finite excitations of AdS5  S5 with the excited states of
temperature. As we argued above, the supergravity the N = 4 SYM theory.
calculation of the BH entropy, [13], is relevant to Maldacena (1998) motivated this correspondence
the  ! 1 limit of the N = 4 SU(N) gauge theory, by thinking about the low-energy (0 ! 0) limit of
while the free-field calculation, [14], applies to the the string theory. On the D3 brane side, in this low-
 ! 0 limit. Thus, the relative factor of 3/4 is not a energy limit, the interaction between the D3 branes
discrepancy: it relates two different limits of the and the closed strings propagating in the bulk
theory. Indeed, on general field-theoretic grounds, vanishes, leaving a pure N = 4 SYM theory on the
we expect that in the ‘t Hooft large-N limit, the D3 branes decoupled from type IIB superstrings in
entropy is given by flat space. Around the classical 3-brane solutions,
there are two types of low-energy excitations. The
22 2
S¼ N f ðÞV3 T 3 ½15 first type propagate in the bulk region, r  L, and
3 have a cross section for absorption by the throat
The function f is certainly not constant: which vanishes as the cube of their energy. The
perturbative calculations valid for small  = g2YM N second type are localized in the throat, r  L, and
give find it harder to tunnel into the asymptotically flat
pffiffiffi region as their energy is taken smaller. Thus, both
3 3 þ 2 3=2 the D3 branes and the classical 3-brane solution
f ðÞ ¼ 1  2  þ  þ  ½16
2 3 have two decoupled components in the low-energy
Thus, the BH entropy in supergravity, [13], is limit, and in both cases, one of these components is
translated into the prediction that type IIB superstrings in flat space. Maldacena
conjectured an equivalence between the other two
3
lim f ðÞ ¼ ½17 components.
!1 4 Immediate support for this identification comes
from symmetry considerations. The isometry group
of AdS5 is SO(2, 4), and this is also the conformal
group in 3 þ 1 dimensions. In addition, we have the
isometries of S5 which form SU(4) SO(6). This
The Essentials of the AdS/CFT
group is identical to the R-symmetry of the N = 4
Correspondence
SYM theory. After including the fermionic genera-
The AdS/CFT correspondence asserts a detailed map tors required by supersymmetry, the full isometry
between the physics of type IIB string theory in the supergroup of the AdS5  S5 background is
throat of the classical 3-brane geometry, that is, the SU(2, 2j4), which is identical to the N = 4 super-
region r  L, and the gauge theory living on a stack conformal symmetry. We will see that, in theories
of D3 branes. As already noted, in this limit r  L, with reduced supersymmetry, the S5 factor is
the extremal D3 brane geometry factors into a direct replaced by other compact Einstein spaces Y5 , but
product of AdS5  S5 . Moreover, the gauge theory AdS5 is the ‘‘universal’’ factor present in the dual
on this stack of D3 branes is the maximally description of any large-N CFT and makes the
supersymmetric N = 4 SYM. SO(2, 4) conformal symmetry a geometric one.
Since the horizon of the near-extremal 3-brane lies The correspondence extends beyond the super-
in the region r  L, the entropy calculation could gravity limit, and we must think of AdS5  Y5 as a
have been carried out directly in the throat limit, background of string theory. Indeed, type IIB strings
where H(r) is replaced by L4 =r4 . Another way to are dual to the electric flux lines in the gauge theory,
motivate the identification of the gauge theory with providing a string-theoretic setup for calculating
the throat is to think about the absorption of correlation functions of Wilson loops. Furthermore,
massless particles. In the D-brane description, a if N ! 1 while g2YM N is held fixed and finite, then
particle incident from asymptotic infinity is con- there are string scale corrections to the supergravity
verted into an excitation of the stack of D-branes, limit (Maldacena 1998, Gubser et al. 1998, Witten
that is, into an excitation of the gauge theory on the 1998) which proceed in powers of
world volume. In the supergravity description, a 0 =L2 = (g2YM N)1=2 . For finite N, there are also
178 AdS/CFT Correspondence

string loop corrections in powers of 2 =L8 N 2 . large-N limit, the string theory becomes classical
As expected, with N ! 1 we can take the classical which implies
limit of the string theory on AdS5  Y5 . However, in
order to understand the large-N gauge theory with Zstring eI½ 0 ðxÞ ½20
finite ‘t Hooft coupling, we should think of AdS5  where I[ 0 (x)] is the extremum of the classical string
Y5 as the target space of a two-dimensional sigma action calculated as a functional of 0 . If we are
model describing the classical string physics. further interested in correlation functions at very
large ‘t Hooft coupling, then the problem of
Correlation Functions and the Bulk/Boundary extremizing the classical string action reduces to
Correspondence solving the equations of motion in type IIB super-
gravity whose form is known explicitly. A simple
A basic premise of the AdS/CFT correspondence is example of such a calculation is presented in the
the existence of a one-to-one map between gauge- next subsection.
invariant operators in the CFT and fields (or Our reasoning suggests that from the point of
extended objects) in AdS. Gubser et al. (1998) and view of the metric [5], the boundary conditions are
Witten (1998) formulated precise methods for imposed not quite at z = 0, which is the true
calculating correlation functions of various opera- boundary of L5 , but at some finite value z = . It
tors in a CFT using its dual formulation. A physical does not matter which value it is since the metric [5]
motivation for these methods comes from earlier is unchanged by an overall rescaling of the coordi-
calculations of absorption by 3-branes. When a nates (z, x); thus, such a rescaling can take z = L into
wave is absorbed, it tunnels from asymptotic infinity z =  for any . The physical meaning of this cutoff is
into the throat region, and then continues to that it acts as a UV regulator in the gauge theory.
propagate toward smaller r. Let us separate the Indeed, the radial coordinate z is to be considered as
3-brane geometry into two regions: r > <
L and r L. the effective energy scale of the gauge theory, and
<
For r L the metric is approximately that of decreasing z corresponds to increasing the energy. A
AdS5  S5 , while for r >
L it becomes very different safe method for performing calculations of correla-
and eventually approaches the flat metric. Signals tion functions, therefore, is to keep the cutoff on the
coming in from large r (small z = L2 =r) may be z-coordinate at intermediate stages and remove it
considered as disturbing the ‘‘boundary’’ of AdS5 at only at the end.
r L, and then propagating into the bulk of AdS5 .
Discarding the r > L part of the 3-brane metric, the
gauge theory correlation functions are related to the Two-Point Functions and Operator Dimensions
response of the string theory to boundary conditions
In the following, we present a brief discussion of
at r L. It is therefore natural to identify the
two-point functions of scalar operators in CFTd .
generating functional of correlation functions in the
The corresponding field in Ldþ1 is a scalar field of
gauge theory with the string theory path integral
mass m whose Euclidean action is proportional to
subject to the boundary conditions that
(x, z) = 0 (x) at z = L (at z = 1 all fluctuations Z " #
1 d 2
Xd
2 m2 L2 2
are required to vanish). In calculating correlation d x dz z dþ1
ð@z Þ þ ð@a Þ þ 2
functions in a CFT, we will carry out the standard 2 a¼1
z
Euclidean continuation; then on the string theory ½21
side, we will work with L5 , which is the Euclidean
version of AdS5 . In calculating correlation functions of vertex
More explicitly, we identify a gauge theory operators from the AdS/CFT correspondence, the
quantity W with a string-theory quantity Zstring : first problem is to reconstruct an on-shell field in
Ldþ1 from its boundary behavior. The near-bound-
W½ 0 ðxÞ ¼ Zstring ½ 0 ðxÞ ½18
ary, that is, small z, behavior of the classical
W generates the connected Euclidean Green’s func- solution is
tions of a gauge-theory operator O,
Z
ðz; xÞ ! zd 0 ðxÞ þ Oðz2 Þ
4
W½ 0 ðxÞ ¼ exp d x 0 O ½19 þ z AðxÞ þ Oðz2 Þ ½22

Zstring is the string theory path integral calculated as where  is one of the roots of
a functional of 0 , the boundary condition on the
field related to O by the AdS/CFT duality. In the ð  dÞ ¼ m2 L2 ½23
AdS/CFT Correspondence 179

0 (x) is regarded as a ‘‘source’’ in [19] that couples states. Since the radius of the S5 is L, the masses of
to the dual gauge-invariant operator O of dimension the Kaluza–Klein states are proportional to 1=L.
, while A(x) is related to the expectation value, Thus, the dimensions of the corresponding operators
are independent of L and therefore also of . On the
1 gauge-theory side, this independence is explained by
AðxÞ ¼ hOðxÞi ½24
2  d the fact that the supersymmetry protects the dimen-
sions of certain operators from being renormalized:
It is possible to regularize the Euclidean action to
they are completely determined by the representa-
obtain the following value as a functional of the
tion under the superconformal symmetry. All
source:
families of the Kaluza–Klein states, which corre-
ðÞ spond to such protected operators, were classified
I½ 0 ðxÞ ¼  ð  ðd=2ÞÞd=2 long ago. Correlation functions of such operators in
ð  ðd=2ÞÞ
Z Z 0 the strong ‘t Hooft coupling limit may be obtained
0 ðxÞ 0 ðx Þ
 dd x dd x0 ½25 from the dependence of the supergravity action on
jx  x0 j2 the boundary values of corresponding Kaluza–Klein
fields, as in [19]. A variety of explicit calculations
Varying twice with respect to 0 , we find that the
have been performed for two-, three-, and even four-
two-point function of the corresponding operator is
point functions. The four-point functions are parti-
ð2  dÞðÞ 1 cularly interesting because their dependence on
hOðxÞOðx0 Þi ¼ ½26
 ðd=2ÞÞ jx  x0 j2
d=2 ð operator positions is not determined by the con-
formal invariance.
Which of the two roots, þ or  , of [23] On the other hand, the masses of string excita-
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tions are m2 = 4n=0 , where n is an integer. For the
d d2

¼
þ m2 L2 ½27 corresponding operators the formula [27] predicts
2 4 that the dimensions do depend on the ‘t Hooft
should we choose for the operator dimension? For 2
coupling
pffiffiffi and, in fact, blow up for large  = gYM N as
positive m2 , þ is certainly the right choice: here the 2 1=4
n.
other root,  , is negative. However, it turns out
that for
Calculation of Wilson Loops
d2 d2
 < m2 L2 <  þ 1 ½28 The Wilson loop operator of a nonabelian gauge
4 4
theory
both roots of [23] may be chosen. Thus, there are  I 
two possible CFTs corresponding to the same WðCÞ ¼ tr P exp i A ½29
classical AdS action: in one of them the correspond- C
ing operator has dimension þ , while in the other involves the path-ordered integral of the gauge
the dimension is  . We note that  is bounded connection A along a contour C. For N = 4 SYM,
from below by (d  2)=2, which is precisely the one typically uses a generalization of this loop
unitarity bound on dimensions of scalar operators in operator which incorporates other fields in the
d-dimensional field theory! Thus, the ability to N = 4 multiplet, the adjoint scalars and fermions.
choose dimension  is crucial for consistency of Using a rectangular contour, we can calculate the
the AdS/CFT duality. quark–antiquark potential from the expectation
Whether string theory on AdS5  Y5 contains value hW(C)i. One thinks of the quarks located a
fields with m2 in the range [28] depends on Y5 . distance L apart for a time T, yielding
The example discussed in the next section,
Y5 = T 1, 1 , turns out to contain such fields, and the hWi eTVðLÞ ½30
possibility of having dimension  , [27], is crucial
where V(L) is the potential.
for consistency of the AdS/CFT duality in that case.
According to Maldacena, and Rey and Yee, the
However, for Y5 = S5 , which is dual to the N = 4
AdS/CFT correspondence relates the Wilson loop
large-N SYM theory, there are no such fields and all
expectation value to a sum over string world sheets
scalar dimensions are given by [27].
ending on the boundary of L5 (z = 0) along the
The operators in the N = 4 large-N SYM theory
contour C:
naturally break up into two classes: those that Z
correspond to the Kaluza–Klein states of super-
hWi eS ½31
gravity and those that correspond to massive string
180 AdS/CFT Correspondence

where S is the action functional of the string world


sheet. In the large ‘t Hooft coupling limit  ! 1, N X
Y
this path integral may be evaluated using a saddle-
point approximation. The leading answer is eS0 ,
where S0 is the action for the classical solution,
which is proportional to the minimal area of the
string world sheet in L5 subject to the boundary
conditions. The area as currently defined is
actually divergent, and to regularize it one must Figure 1 D3 branes placed at the tip of a Ricci-flat cone X.
position the contour at z =  (this is the same type
of regulator as used in the definition of correlation produced by placing D3 branes at the tip of a
functions). Ricci-flat six-dimensional cone X (see Figure 1). The
Consider a circular Wilson loop of radius a. The cone metric may be cast in the form
action of the corresponding classical string world
sheet is dsX 2 ¼ dr2 þ r2 dsY 2 ½37
pffiffiffi a  where Y is the level surface of X. In particular, Y is a
S0 ¼   1 ½32 positively curved Einstein manifold, that is, one for

which Rij = 4gij . In order to preserve the N = 1
Subtracting the linearly divergent term, which is
supersymmetry, X must be a Calabi–Yau space; then
proportional to the length of the contour, one finds
Y is defined to be Sasaki–Einstein.
pffiffiffi
lnhWi ¼  þ Oðln Þ ½33 The D3 branes appear as a point in X and span the
transverse Minkowski space R3, 1 . The ten-dimen-
a result which has been duplicated in field theory by sional metric they produce assumes the form [9], but
summing certain classes of rainbow Feynman dia- with the sphere metric d5 2 replaced by the metric on
grams in N = 4 SYM. From these sums, one finds Y, ds2Y . The equality of tensions [10] now requires that
2 pffiffiffi pffiffiffi
hWirainbow ¼ pffiffiffi I1  ½34  N 3
 L4 ¼ ¼ 4gs N02 ½38
2 volðYÞ volðYÞ
where I1 is a Bessel function. This formula is one of
In the near-horizon limit, r ! 0, the geometry factors
the few available proposals for extrapolation of an
into AdS5  Y. Because the D3 branes are located at a
observable from small to large coupling. At large ,
singularity, the gauge theory becomes much more
rffiffiffi pffiffi
2e  complicated, typically involving a product of several
hWirainbow ½35 SU(N) factors coupled to matter in bifundamental
 3=4
representations, often described using a quiver dia-
in agreement with the geometric prediction. gram (see Figure 2 for an example).
The quark–antiquark potential is extracted from a
rectangular Wilson loop of width L and length T.
After regularizing the divergent contribution to the Z
energy, one finds the attractive potential
pffiffiffi U U
42 
VðLÞ ¼  ½36 Y
ð1=4Þ4 L
Y Y
The Coulombic 1/L dependence is required by the
conformal invariance of the theory. The fact that the
V V
potential scales as the square root of the ‘t Hooft
coupling indicates some screening of the charges at Y Y
large coupling.
Y Y
U U
Conformal Field Theories and Einstein
Manifolds
V
Interesting generalizations of the duality between 4,3
Figure 2 The quiver for Y . Each node corresponds to an
AdS5  S5 and N = 4 SYM with less supersymmetry SU(N ) gauge group and each arrow to a bifundamental chiral
and more complicated gauge groups can be superfield.
AdS/CFT Correspondence 181

The simplest examples of X are orbifolds C3 =, and q are integers with p q. Gauntlett et al. (2004)
where  is a discrete subgroup of SO(6). Indeed, if discovered metrics on all the Y p, q , and the quiver
 SU(3), then N = 1 supersymmetry is preserved. gauge theories that live on the D-branes probing the
The level surface of such an X is Y = S5 =. In this singularity are now known. Making contact with
case, the product structure of the gauge theory can the simpler examples discussed above, the Y p, 0 are
be motivated by thinking about image stacks of D3 orbifolds of T 1, 1 while the Y p, p are orbifolds of S5 .
branes from the action of . In the second class of cones X, a del Pezzo surface
The next simplest example of a Calabi–Yau cone shrinks to zero size at the tip of the cone. A
X is the conifold which may be described by the del Pezzo surface is an algebraic surface of complex
following equation in four complex variables: dimension 2 with positive first Chern class. One
simple del Pezzo surface is a complex projective
X
4
za 2 ¼ 0 ½39 space of dimension 2, P2 , which gives rise to the
a¼1 N = 1 preserving S5 =Z3 orbifold. Another simple
case is P1  P1 , which leads to T 1, 1 =Z2 . The
Since this equation is symmetric under an overall remaining del Pezzos surfaces Bk are P2 blown up
rescaling of the coordinates, this space is a cone. The at k points, 1  k  8. The cone where B1 shrinks to
level surface Y of the conifold is a coset manifold zero size has level surface Y 2, 1 . Gauge theories for
T 1, 1 = (SU(2)  SU(2))=U(1). This space has the all the del Pezzos have been constructed. Except for
SO(4) SU(2)  SU(2) symmetry which rotates the the three del Pezzos just discussed, and possibly also
z’s, and also the U(1) R-symmetry under za ! ei za . for B6 , metrics on the cones over these del Pezzos
The metric on T 1, 1 is known explicitly; it assumes are not known. Nevertheless, it is known that for
the form of an S1 bundle over S2  S2 . 3  k  8, the volume of the Sasaki–Einstein mani-
The supersymmetric field theory on the D3 branes fold Y associated with Bk is 3 (9  k)=27.
probing the conifold singularity is SU(N)  SU(N)
gauge theory coupled to two chiral superfields, Ai ,
in the (N, N) representation and two chiral super- The Central Charge
fields, Bj , in the (N, N) representation. The A’s
The central charge provides one of the most
transform as a doublet under one of the global
amazing ways to check the generalized AdS/CFT
SU(2)’s, while the B’s transform as a doublet under correspondences. The central charge c and confor-
the other SU(2). Cancelation of the anomaly in the
mal anomaly a can be defined as coefficients of
U(1) R-symmetry requires that the A’s and the B’s
certain curvature invariants in the trace of the stress
each have R-charge 1=2. For consistency of the
energy tensor of the conformal gauge theory:
duality, it is necessary that we add an exactly
marginal superpotential which preserves the SU(2)  hT i ¼ aE4  cI4 ½41
SU(2)  U(1)R symmetry of the theory. Since a
(The curvature invariants E4 and I4 are quadratic in
marginal superpotential has R-charge equal to 2 it
the Riemann tensor and vanish for Minkowski
must be quartic, and the symmetries fix it uniquely
space.) As discussed above, correlators such as hT i
up to overall normalization:
can be calculated from supergravity, and one finds
W ¼ ij kl tr Ai Bk Aj Bl ½40 3 N 2
a¼c¼ ½42
There are in fact infinite families of Calabi–Yau 4 volðYÞ
cones X, but there are two problems one faces in
On the gauge-theory side of the correspondence,
studying these generalized AdS/CFT correspon-
anomalies completely determine a and c:
dences. The first is geometric: the cones X are not
all well understood and only for relatively few do 3
a ¼ 32 ð3 tr R3  tr RÞ
we have explicit metrics. However, it is often 1
c ¼ 32 ð9 tr R3  5 tr RÞ ½43
possible to calculate important quantities such as
the vol(Y) without knowing the metric. The second The trace notation implies a sum over the R-charges
problem is gauge theoretic: although many techni- of all of the fermions in the gauge theory. (From the
ques exist, there is no completely general procedure geometric knowledge that a = c, we can conclude
for constructing the gauge theory on a stack of D- that tr R = 0.)
branes at an arbitrary singularity. The R-charges can be determined using the
Let us mention two important classes of Calabi– principle of a-maximization. For a superconformal
Yau cones X. The first class consists of cones over gauge theory, the R-charges of the fermions
the so-called Y p, q Sasaki–Einstein spaces. Here, p maximize a subject to the constraints that the
182 AdS/CFT Correspondence

Novikov–Shifman–Vainshtein–Zakharov (NSVZ) the 5-form RR field strengths, and their back-reaction


beta function of each gauge group vanishes and on the geometry. This back-reaction creates a ‘‘geo-
the R-charge of each superpotential term is 2. metric transition’’ to the deformed conifold
For the Y p, q spaces mentioned above, one finds
X
4
that z2a ¼ 2 ½46
 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a¼1
q2 2p þ 4p2  3q2
volðY p;q Þ ¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
3 and introduces a ‘‘warp factor’’ so that the full ten-
2 2 2
3p 3q  2p þ p 4p  3q 2 2
dimensional geometry has the form
½44 ds10 2 ¼ h1=2 ðÞððdx0 Þ2
The gauge theory consists of p  q fields Z, p þ q þ ðdxi Þ2 Þ þ h1=2 ðÞ d~s6 2 ½47
fields Y, 2p fields U, and 2q fields V. These fields all 2
where d~s6 is the Calabi–Yau metric of the deformed
transform in the bifundamental representation of a
conifold, which is known explicitly.
pair of SU(N) gauge groups (the quiver diagram for
The field-theoretic interpretation of this solution is
Y 4, 3 is given in Figure 2). The NSVZ beta function
unconventional. After a finite amount of RG flow, the
and superpotential constraints determine the
SU(N þ M) group undergoes a Seiberg duality trans-
R-charges up to two free parameters x and y. Let x
formation. After this transformation, and
be the R-charge of Z and y the R-charge of Y. Then
an interchange of the two gauge groups, the new
the U have R-charge 1  (1=2)(x þ y) and the V ~  SU(N ~ þ M) with the same
gauge theory is SU(N)
have R-charge 1 þ (1=2)(x  y). ~ = N  M. The
matter and superpotential, and with N
The technique of a maximization leads to the result
self-similar structure of the gauge theory under the
1  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Seiberg duality is the crucial fact that allows this
x ¼ 2 4p2 þ 2pq þ 3q2 þ ð2p  qÞ 4p2  3q2
3q pattern to repeat many times. If N = (k þ 1)M, where
1  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k is an integer, then the duality cascade stops after k
y ¼ 2 4p2  2pq þ 3q2 þ ð2p þ qÞ 4p2  3q2 steps, and we find SU(M)  SU(2M) gauge theory.
3q
This IR gauge theory exhibits a multitude of interesting
Thus, as calculated by Benvenuti et al. (2004) and effects visible in the dual supergravity background.
Bertolini et al. (2004) One of them is confinement, which follows from the
3 N 2 fact that the warp factor h is finite and nonvanishing at
aðY p;q Þ ¼ ½45 the smallest radial coordinate,  = 0. The methods
4 volðY p;q Þ
presented in the section ‘‘Calculation of Wilson loops,’’
in remarkable agreement with the prediction [42] of then imply that the quark–antiquark potential grows
the AdS/CFT duality. linearly at large distances. Other notable IR effects
are chiral symmetry breaking and the Goldstone
mechanism. Particularly interesting is the appearance
A Path to a Confining Theory
of an entire ‘‘baryonic branch’’ of the moduli space in
There exists an interesting way of breaking the the gauge theory, whose existence has been demon-
conformal invariance for spaces Y whose topology strated also in the dual supergravity language.
includes an S2 factor (examples of such spaces
include T 1, 1 and Y p, q , which are topologically
Conclusions
S2  S3 ). At the tip of the cone over Y, one may
add M wrapped D5 branes to the N D3 branes. The This article tries to present a logical path from
gauge theory on such a combined stack is no longer studying gravitational properties of D-branes to the
conformal; it exhibits a novel pattern of quasiperiodic formulation of an exact duality between conformal
renormalization group flow, called a duality cascade. field theories and string theory in anti-de Sitter
To date, the most extensive study of a theory of this backgrounds, and also sketches some methods for
type has been carried out for the conifold, where one breaking the conformal symmetry. Due to space
finds an N = 1 supersymmetric SU(N)  SU(N þ M) limitations, many aspects and applications of the
theory coupled to chiral superfields A1 , A2 in the AdS/CFT correspondence have been omitted. At
(N, N þ M) representation, and B1 , B2 in the the moment, practical applications of this duality
(N, N þ M) representation. D5 branes source RR are limited mainly to very strongly coupled, large-N
3-form flux; hence, the supergravity dual of this gauge theories, where the dual string description is
theory has to include M units of this flux. Klebanov well approximated by classical supergravity. To
and Strassler (2000) found an exact nonsingular understand the implications of the duality for more
supergravity solution incorporating the 3-form and general parameters, it is necessary to find better
Affine Quantum Groups 183

methods for attacking the world sheet approach to Bertolini M, Bigazzi F, and Cotrone AL (2004) New checks and
string theories in anti-de Sitter backgrounds with RR subtleties for AdS/CFT and a-maximization. JHEP 0412: 024
(arXiv:hep-th/0411249).
background fields turned on. When such methods are Bigazzi F, Cotrone AL, Petrini M, and Zaffaroni A (2002) Super-
found, it is likely that the material presented here will gravity duals of supersymmetric four dimensional gauge theories.
have turned out to be just a tiny tip of a monumental Rivista del Nuovo Cimento 25N12: 1 (arXiv:hep-th/0303191).
iceberg of dualities between fields and strings. D’Hoker E and Freedman DZ (2002) Supersymmetric gauge
theories and the AdS/CFT correspondence, arXiv:hep-th/
0201253.
Gauntlett J, Martelli D, Sparks J, and Waldram D (2004) Sasaki–
Acknowledgments Einstein metrics on S2  S3. Advances in Theoretical
The authors are very grateful to all their colla- Mathematics in Physics 8: 711 (arXiv:hep-th/0403002).
Gubser SS, Klebanov IR, and Polyakov AM (1998) Gauge theory
borators on gauge/string duality for their valuable
correlators from noncritical string theory. Physics Letters B
input over many years. The research of I R Klebanov 428: 105 (hep-th/9802109).
is supported in part by the National Science Herzog CP, Klebanov IR, and Ouyang P (2002) D-branes on the
Foundation (NSF) grant no. PHY-0243680. The conifold and N = 1 gauge/gravity dualities, arXiv:hep-th/
research of C P Herzog is supported in part by the 0205100.
Klebanov IR (2000) TASI lectures: introduction to the AdS/CFT
NSF under grant no. PHY99-07949. Any opinions,
correspondence, arXiv:hep-th/0009139.
findings, and conclusions or recommendations Klebanov IR and Strassler MJ (2000) Supergravity and a
expressed in this material are those of the authors confining gauge theory: Duality cascades and -resolution of
and do not necessarily reflect the views of the NSF. naked singularities. JHEP 0008: 052 (arXiv:hep-th/0007191).
Maldacena J (1998) The large N limit of superconformal field
See also: Brane Construction of Gauge Theories; Branes theories and supergravity. Advances in Theoretical and
and Black Hole Statistical Mechanics; Einstein Equations: Mathematical Physics 2: 231 (hep-th/9711200).
Maldacena JM (1998) Wilson loops in large N field theories.
Exact Solutions; Gauge Theories from Strings; Large-N
Physics Review Letters 80: 4859 (arXiv:hep-th/9803002).
and Topological Strings; Large-N Dualities; Mirror Polchinski J (1998) String Theory. Cambridge: Cambridge
Symmetry: A Geometric Survey; Quantum University Press.
Chromodynamics; Quantum Field Theory in Curved Polyakov AM (1999) The wall of the cave. International Journal
Spacetime; Superstring Theories. of Modern Physics A 14: 645.
Rey SJ and Yee JT (2001) Macroscopic strings as heavy quarks in
large N gauge theory and anti-de Sitter supergravity.
Further Reading European Physics Journal C 22: 379 (arXiv:hep-th/9803001).
Semenoff GW and Zarembo K (2002) Wilson loops in SYM
Aharony O, Gubser SS, Maldacena JM, Ooguri H, and Oz Y theory: from weak to strong coupling. Nuclear Physics
(2000) Large N field theories, string theory and gravity. Proceeding Supplements 108: 106 (arXiv:hep-th/0202156).
Physics Reports 323: 183 (arXiv:hep-th/9905111). Strassler MJ The duality cascade, TASI 2003 lectures, arXiv:hep-
Benvenuti S, Franco S, Hanany A, Martelli D, and Sparks J (2005) th/0505153.
An infinite family of superconformal quiver gauge theories with Witten E (1998) Anti-de Sitter space and holography. Advances in
Sasaki–Einstein duals. JHEP 0506: 064 (arXiv:hep-th/0411264). Theoretical and Mathematical Physics 2: 253 (hep-th/9802150).

Affine Quantum Groups


G W Delius and N MacKay, University of York, One can distinguish three classes of affine quantum
York, UK groups, each leading to a different dependence of the
ª 2006 G W Delius. Published by Elsevier Ltd. R-matrices on the spectral parameter u: Yangians
All rights reserved. lead to rational R-matrices, quantum affine algebras
lead to trigonometric R-matrices, and elliptic quan-
tum groups lead to elliptic R-matrices. We will mostly
Affine quantum groups are certain pseudoquasitriangu- concentrate on the quantum affine algebras but many
lar Hopf algebras that arise in mathematical physics results hold similarly for the other classes.
in the context of integrable quantum field theory, After giving mathematical details about quantum
integrable quantum spin chains, and solvable lattice affine algebras and Yangians in the first two sections,
models. They provide the algebraic framework behind we describe how these algebras arise in different
the spectral parameter dependent Yang–Baxter equation areas of mathematical physics in the three following
sections. We end with a description of boundary
R12 ðuÞR13 ðu þ vÞR23 ðvÞ quantum groups which extend the formalism to the
¼ R23 ðvÞR13 ðu þ vÞR12 ðuÞ ½1 boundary Yang–Baxter (reflection) equation.
184 Affine Quantum Groups

Quantum Affine Algebras To define the quantization of U(g ^), one can either
^) (Drinfeld 1985) as an algebra over the
define Uh (g
Definition
ring C[[h]] of formal power series over an indeter-
A quantum affine algebra Uq (g ^) is a quantization of minate h or one can define Uq (g ^) (Jimbo 1985) as an
^
the enveloping algebra U(g) of an affine Lie algebra algebra over the field Q(q) of rational functions of q
(Kac–Moody algebra) g ^. So we start by introducing with coefficients in Q. We will present Uh (g ^) first.
affine Lie algebras and their enveloping algebras The quantum affine algebra Uh (g ^) is the unital
before proceeding to give their quantizations. algebra over C[[h]] topologically generated by
Let g be a semisimple finite-dimensional Lie algebra Hi , E
i for i = 0, 1, . . . , r and D with relations
over C of rank r with Cartan matrix (aij )i,j = 1,..., r , h i
symmetrizable via positive integers di , so that di aij is Hi ; E 
j ¼ aij E i ; ½Hi ; Hj  ¼ 0
symmetric. In terms of the simple roots i , we have h i qH
i  qi
i Hi
Eþ ; E 
¼  ij ½4
i  j ji j2 i j
qi  q1i
aij ¼ 2 and di ¼ :
ji j2 2  
½D; Hi  ¼ 0; D; E i ¼ i;0 E
i
P
We can introduce an 0 = ri = 1 ni i in such a way
that the extended Cartan matrix (aij )i,j = 0,..., r is of 
1a
Xij
affine type – that is, it is positive semidefinite of k 1  aij  k    1aij k
ð1Þ E
i Ej Ei ¼ 0; i 6¼ j
rank r. The integers ni are referred to as Kac indices. k qi
k¼0
Choosing 0 to be the highest root of g leads to an
untwisted affine Kac–Moody algebra while choosing where qi = qdi and q = eh . The q-binomial coeffi-
0 to be the highest short root of g leads to a twisted cients are defined by
affine Kac–Moody algebra. qn  qn
One defines the affine Lie algebra g ^ corresponding ½nq ¼ ½5
q  q1
to this affine Cartan matrix as the Lie algebra
(over C) with generators Hi , E i for i = 0, 1, . . . , r
and D with relations ½nq ! ¼ ½nq  ½n  1q . . .½2q ½1q ½6
h i
Hi ; E 
j ¼ aij E i ; ½Hi ; Hj  ¼ 0 
h i m ½mq !
¼ ½7

i ; Ej

¼ ij Hi ½2 n q ½nq !½m  nq !
 
½D; Hi  ¼ 0; D; Ei ¼ i;0 E i ^) is a Hopf
The quantum affine algebra Uh (g
1a
Xij  
1  aij   k    1aij k algebra with coproduct
ð1Þk E
i Ej E i ¼ 0; i 6¼ j
k
k¼0 ðDÞ ¼ D  1 þ 1  D
The E
i are referred to as Chevalley generators and ðHi Þ ¼ Hi  1 þ 1  Hi ½8
the last set of relations are known as Serre relations.   Hi =2 H =2
 E
i ¼ E
i  qi þ qi i  E
i
The generator D is known as the canonical deriva-
tion. We will denote the algebra obtained by antipode
dropping the generator D by g ^0 . SðDÞ ¼ D; SðHi Þ ¼ Hi
^
In applications to physics, the affine Lie algebra g   1 
½9
often occurs in an isomorphic form as the loop Lie S Ei ¼ qi Ei
algebra g[z, z1 ]  C  c with Lie product (for and co-unit
untwisted g ^)  
ðDÞ ¼ ðHi Þ ¼  E
i ¼0 ½10
k l kþl
½Xz ; Yz  ¼ ½X; Yz þ k;l ðX; YÞc; It is easy to see that the classical enveloping
for X; Y 2 g; k; l 2 Z ½3 algebra U(g ^) can be obtained from the above by
setting h = 0, or more formally,
and c being the central element.
The universal enveloping algebra U(g ^) of g
^ is the ^Þ=hUh ðg
U h ðg ^Þ ¼ Uðg

unital algebra over C with generators Hi , E i for
i = 0, 1, . . . , r and D and with relations given by [2] We can also define the quantum affine algebra
where now [ , ] stands for the commutator instead of ^) as the algebra over Q(q) with generators
U q (g
the Lie product. Ki , E
i , D for i = 0, 1, . . . , r and relations that are
Affine Quantum Groups 185

^) by
obtained from the ones given above for Uh (g gradation,’’ s0 = 1, s1 =    = sr = 0, and the ‘‘prin-
setting cipal gradation,’’ s0 = s1 =    = sr = 1. We shall
Hi =2
also need the ‘‘spin gradation’’ si = di1 . The
qi ¼ Ki ; i ¼ 0; . . . ; r ½11 representations
One can go further to an algebraic formulation over  ¼ 
 
C in which q is a complex number (with some points
including q = 0 not allowed). This has the advantage play an important role in applications to integrable
that it becomes possible to specialize, for example, to models where  is referred to as the (multiplicative)
q a root of unity, where special phenomena occur. spectral parameter. In applications to particle scatter-
ing introduced in a later section, it is related to the
Representations rapidity of the particle. The generator D can be
For applications in physics, the finite-dimensional realized as an infinitesimal scaling operator on  and
representations of Uh (g ^0 ) are the most interesting. As thus plays the role of the Lorentz boost generator.
will be explained in later sections, these occur, for The tensor product representations a  b are
example, as particle multiplets in 2D quantum field irreducible generically but become reducible for
theory or as spin Hilbert spaces in quantum spin certain values of = , a fact which again is important
chains. In the next subsection, we will use them to in applications (fusion procedure, particle-bound
derive matrix solutions to the Yang–Baxter equation. states).
While for a nonaffine quantum algebra Uh (g)
the ring of representations is isomorphic to that of
R-Matrices
the classical enveloping algebra U(g) (because in fact
the algebras are isomorphic, as Drinfeld has pointed A Hopf algebra A is said to be ‘‘almost cocommu-
out), the corresponding fact is no longer true for affine tative’’ if there exists an invertible element R 2 A  A
quantum groups, except in the case g ^ = a(1) d such that
n = slnþ1 .
For the classical enveloping algebras U(g ^0 ), any
finite-dimensional representation of U(g) also carries RðxÞ ¼ ð

ðxÞÞR; for all x 2 A ½13
a finite-dimensional representation of U(g ^0 ). In the where
: x  y 7! y  x exchanges the two factors in
quantum case, however, in general, an irreducible the coproduct. In a quasitriangular Hopf algebra,
representation of Uh (g ^0 ) reduces to a sum of this element R satisfies
representations of Uh (g).
To classify the finite-dimensional representations ð  idÞðRÞ ¼ R13 R23
^0 ), it is necessary to use a different realization
of Uh (g ½14
ðid  ÞðRÞ ¼ R13 R12
^0 ) that looks more like a quantization of the
of Uh (g
loop algebra realization [3] than the realization in and is known as the ‘‘universal R-matrix’’ (see Hopf
terms of Chevalley generators. In terms of the Algebras and q-Deformation Quantum Groups). As
generators in this alternative realization, which we a consequence of [13] and [14], it automatically
do not give here because of its complexity, the satisfies the Yang–Baxter equation
finite-dimensional representations can be viewed as
pseudo-highest-weight representations. There is a set R12 R13 R23 ¼ R23 R13 R12 ½15
of r ‘‘fundamental’’ representations V a , a = 1, . . . r, For technical reasons, to do with the infinite number
each containing the corresponding Uh (g) fundamen- of root vectors of g^, the quantum affine algebra Uh (g ^)
tal representation as a component, from the tensor does not possess a universal R-matrix that is an
products of which all the other finite-dimensional element of Uh (g ^)  Uh (g^). However, as pointed out
representations may be constructed. The details can by Drinfeld (1985), it possesses a pseudouniversal
be found in Chari and Pressley (1994). R-matrix R() 2 (Uh (g ^ 0 )  U h (g
^0 ))(()). The  is
Given some representation  : Uh (g ^0 ) ! End(V),
related to the automorphism  defined in [12].
we can introduce a parameter  with the help of When using the homogeneous gradation, R() is a
the automorphism  of Uh (g ^0 ) generated by D and
formal power series in .
given by When the pseudouniversal R-matrix is evaluated
 
 E ¼ si E in the tensor product of any two indecomposable
i i
i ¼ 0; . . . ; r ½12 finite-dimensional representations 1 and 2 , one
 ðHi Þ ¼ Hi
obtains a numerical R-matrix
Different choices for the si correspond to different
gradations. Commonly used are the ‘‘homogeneous R12 ðÞ ¼ ð1  2 ÞRðÞ ½16
186 Affine Quantum Groups

The entries of these numerical R-matrices are (with summation over repeated indices). The Yan-
rational functions of the multiplicative spectral gian Y(g ) is the algebra generated by these and a
parameter  but when written in terms of the second set of generators Ja satisfying
additive spectral parameter u = log () they are
½Ia ; Jb  ¼ fabc Jc
trigonometric functions of u and satisfy the Yang–
Baxter equation in the form given in [1]. The matrix ðJa Þ ¼ Ja  1 þ 1  Ja þ 12 fabc Ic  Ib
 12 ðÞ ¼

R12 ðÞ
R The requirement that  be a homomorphism
imposes further relations:
satisfies the intertwining relation

½ Ja ; ½ Jb ; Ic   ½Ia ; ½ Jb ; Jc  ¼ abcdeg fId ; Ie ; Ig g
R 12 ð= Þ  1  2 ððxÞÞ


and
 12 ð= Þ
¼ 2  1 ððxÞÞ  R ½17
½½ Ja ; Jb ; ½Il ; Jm  þ ½½ Jl ; Jm ; ½Ia ; Jb 
 
^0
for any x 2 Uh (g ). It follows from the irreducibility ¼ abcdeg flmc þ lmcdeg fabc Id ; Ie ; Jg
of the tensor product representations that these where
R-matrices satisfy the Yang–Baxter equations
1 X
 23 ð = ÞÞðR
ðid  R  13 ð= Þ  idÞðid  R  12 ð= ÞÞ abcdeg ¼ f f f f ; fx1 ; x2 ; x3 g ¼ xi xj xk
24 adi bej cgk ijk i6¼j6¼k
¼ ðR 12 ð= Þ  idÞðid  R  13 ð= ÞÞ
 23 ð = Þ  idÞ When g = sl2 the first of these is trivial, while for
 ðR ½18
g 6¼ sl2 the first implies the second. The co-unit is
or, graphically, (Ia ) = (Ja ) = 0; the antipode is s(Ia ) = Ia , s(Ja ) =
Ja þ (1=2)fabc Ic Ib . The Yangian may be obtained
V ν3 V μ2 V λ1 V ν3 V μ2 V λ1 from Uh (^g 0 ) by expanding in powers of h. For
the precise relationship, see Drinfeld (1985) and
= MacKay (2005). In the spin gradation, the auto-
morphism [12] generated by D descends to Y(g) as
Ia 7! Ia , Ja 7! Ja þ uIa .
V λ1 V μ2 V ν3 V λ1 V μ2 V ν3 There are two other realizations of Y(g). The first
Explicit formulas for the pseudouniversal (see, for example, Molev 2003) defines Y(gln )
R-matrices were found by Khoroshkin and Tolstoy. directly from
However, these are difficult to evaluate explicitly in Rðu  vÞT1 ðuÞT2 ðvÞ ¼ T2 ðvÞT1 ðuÞRðu  vÞ
specific representations so that in practice it is easiest
to find the numerical R-matrices R  ab () by solving the where T1 (u) = T(u)  id, T2 (v) = id  T(v), and
intertwining relation [17]. It should be stressed that X
n
solving the intertwining relation, which is a linear TðuÞ ¼ tij ðuÞ  eij
equation for the R-matrix, is much easier than directly i;j¼1
solving the Yang–Baxter equation, a cubic equation. tij ðuÞ ¼ ij þ Iij u1 þ Jij u2 þ   
where eij are the standard matrix units for g ln . The
rational R-matrix for the n-dimensional representa-
Yangians tion of g ln is
As remarked by Drinfeld (1986), for untwisted ^g the X
n
P
quantum affine algebra Uh (^g 0 ) degenerates as h ! 0 Rðu  vÞ ¼ 1  ; where P ¼ eij  eji
into another quasipseudotriangular Hopf algebra, uv i;j¼1
the ‘‘Yangian’’ Y(g ) (Drinfeld 1985). It is associated
is the transposition operator. Y(g ln ) is then defined
with R-matrices which are rational functions of the
to be the algebra generated by Iij , Jij , and must be
additive spectral parameter u. Its representation ring
quotiented by the ‘‘quantum determinant’’ at its
coincides with that of Uh (^g 0 ).
center to define Y(sln ). The coproduct takes a
Consider a general presentation of a Lie algebra g ,
particularly simple form,
with generators Ia and structure constants fabc ,
so that X
n
ðtij ðuÞÞ ¼ tik ðuÞ  tkj ðuÞ
½Ia ; Ib  ¼ fabc Ic ; ðIa Þ ¼ Ia  1 þ 1  Ia k¼1
Affine Quantum Groups 187

Here we do not give explicitly the third realization, where R T() = T(1, 1; ) and T(x, y; ) =
y
namely Drinfeld’s ‘‘new’’ realization of Y(g ) (Drinfeld P exp( x L(; ) d). Taking the trace of this relation
1988), but we remark that it was in this presentation gives an infinity of charges in involution.
that Drinfeld found a correspondence between certain Quantization is problematic, owing to divergences
sets of polynomials and finite-dimensional irreducible in T. The QISM regularizes these by putting the
representations of Y(g ), thus classifying these (although model on a lattice of spacing , defining the lattice
not thereby deducing their dimension or constructing Lax operator to be
the action of Y(g )). As remarked earlier, the structure is
as in the earlier section: Y(g ) representations are in Ln ðÞ ¼ Tððn  1=2Þ; ðn þ 1=2Þ; Þ
Z ðnþð1=2ÞÞ !
general g -reducible, and there is a set of r fundamental
Y(g )-representations, containing the fundamental ¼ P exp Lð; Þ d
ðnð1=2ÞÞ
g -representations as components, from which all
other representations can be constructed.
The lattice monodromy matrix is then T() =
liml ! 1, m ! 1 Tlm where Tlm = Lm Lm1 . . . Llþ1 ,
and its trace again yields an infinity of commuting
Origins in the Quantum charges, provided that there exists a quantum
Inverse-Scattering Method R-matrix R(1 , 2 ) such that
Quantum affine algebras for general ^g first appear in Rð1 ; 2 ÞL1n ð1 ÞL2n ð2 Þ
Drinfeld (1985, 1986) and Jimbo (1985, 1986), but
they have their origin in the ‘‘quantum inverse- ¼ L2n ð2 ÞL1n ð1 ÞRð1 ; 2 Þ ½19
scattering method’’ (QISM) of the St. Petersburg
c2 ) first where L1n (1 ) = Ln (1 )  id, L2n (2 ) = id  Ln (2 ).
school, and the essential features of Uh (sl
That R solves the Yang–Baxter equation follows
appear in Kulish and Reshetikhin (1983). In this
from the equivalence of the two ways of intertwining
section, we explain how the quantization of the Lax-
Ln (1 )  Ln (2 )  Ln (3 ) with Ln (3 )  Ln (2 ) 
pair description of affine Toda theory led to the
Ln (1 ).
discovery of the Uh (^g ) coproduct, commutation
To compute Ln (), one uses the canonical, equal-
relations, and R-matrix. We use the normalizations
time commutation relations for the i and _ i . In
of Jimbo (1986), in which the Hi are rescaled so that
terms of the lattice fields
the Cartan matrix aij = i .j is symmetric.
We begin with the affine Toda field equations Z ðnþð1=2ÞÞ
pi;n ¼ _ i ðxÞ dx
2X
r  
m aij j 0 :j j ðnð1=2ÞÞ
@ @ i ¼  e  ni e Z
ðnþð1=2ÞÞ X
j¼1
qi;n ¼ eð =2Þaij j ðxÞ dx
ðnð1=2ÞÞ j
an integrable model in R 1þ1 of r real scalar fields
i (x, t) with a mass parameter m and coupling the only nontrivial relation is [pi, n , qj, n ] =
constant . Equivalently, we may write (ih =2)ij qj, n , and one finds
[@x þ Lx , @t þ Lt ] = 0 for the Lax pair
! !
X X
X r
mX r   Ln ðÞ ¼ exp Hi pi;n þ exp Hj pj;n
Lx ðx; tÞ ¼ Hi @t i þ eð =2Þaij j Eþ 
i þ Ei 2 i 4 j
2 i¼1 2 i;j¼1
"
  m X  
m X ð =2Þa0j j
r
1  qi;n Eþ 
þ e Eþ þ E  i þ Ei
2 j¼1 0
 0 2 i
Y n  #
X r
mX r   1
Lt ðx; tÞ ¼ Hi @x i þ eð =2Þaij j Eþ  þ qi;n i Eþ 0 þ E0

i  Ei 
2 i¼1 2 i;j¼1 i
!
 
mX r
1  X
þ e ð =2Þa0j j þ
E0  E0  exp Hj pj;n þ Oð2 Þ
2 j¼1  4 j

with arbitrary  2 C. The classical integrability of the the expression used by the St Petersburg school and
system is seen in the existence of r(, 0 ) such that by Jimbo. We now make the replacement
Hi =4  Hi =4
Ei 7! q Ei q , where q = exp(ih 2 =2), and
fTðÞ  Tð0 Þg ¼ ½rð; 0 Þ; TðÞ  Tð0 Þ compute the O() terms in [19], which reduce to
188 Affine Quantum Groups

RðzÞðHi  1 þ 1  Hi Þ (S-matrix) for solitons must be proportional to the


intertwiner for these tensor product representa-
¼ ðHi  1 þ 1  Hi ÞRðzÞ

tions, the R matrix:
RðzÞ Ei q
Hi =2
þ qHi =2  E i

 ab ðÞ
Sab ðÞ ¼ f ab ðÞR
Hi =2
¼ qHi =2  E i þ E 
i  q RðzÞ

with  proportional to u, the additive spectral
RðzÞ z1 E
0 q
H0 =2
þ qH0 =2  E 0
parameter. The scalar prefactor f ab () is not deter-

mined by the symmetry but is fixed by other
H0 =2
¼ qH0 =2  E 0 þ z1 
E 0  q RðzÞ requirements like unitarity, crossing symmetry, and
the bootstrap principle.
where z = 1 =2 . We recognize in these the Uh (g ^) It turns out that the axiomatic properties of the
coproduct and thus the intertwining relations, in the R-matrices are in perfect agreement with the
homogeneous gradation. These equations were axiomatic properties of the analytic S-matrix. For
solved for R in defining representations of example, crossing symmetry of the S-matrix, gra-
nonexceptional g by Jimbo (1986). phically represented by
For ^g = slc2 , it was Kulish and Reshetikhin (1983)
b a b a b a
who first discovered that the requirement that the
coproduct must be an algebra homomorphism forces = = 20
the replacement of the commutation relations of θ iπ – θ iπ – θ
b 2 ) by those of Uh (sl
U(sl b 2 ); more generally it requires
a b a b a b
the replacement of U(^g ) by Uh (^g ).
is a consequence of the property of the universal
R-matrix with respect to the action of the antipode S,
Affine Quantum Group Symmetry
and the Exact S-Matrix ðS  1ÞR ¼ R1
In the last section, we saw the origins of Uh (^g ) in the An S-matrix will have poles at certain imaginary
‘‘auxiliary’’ algebra introduced in the Lax pair. rapidities ab
c corresponding to the formation of
However, the quantum affine algebras also play a virtual bound states. This is graphically represented
second role, as a symmetry algebra. An imaginary- in Figure 1b. The location of the pole is determined
coupled affine Toda field theory based on the affine by the masses of the three particles involved,
algebra ^g _ possesses the quantum affine algebra
Uh (^g ) as a symmetry algebra, where ^g _ is the m2c ¼ m2a þ m2b þ 2ma mb cosðiab
c Þ
Langland dual to ^g (the algebra obtained by At the bound state pole, the S-matrix will project
replacing roots by coroots).  matrix has to have
onto the multiplet V c . Thus, the R
The solitonic particle states in affine Toda theories this projection property as well and indeed, this turns
form multiplets which transform in the fundamental out to be the case. The bootstrap principle, whereby
representations of the quantum affine algebra. Multi- the S-matrix for a bound state is obtained from the
particle states transform in tensor product representa- S-matrices of the constituent particles,
tions V a  V b . The scattering of two solitons of type
c d c
a and b with relative rapidity  is described by the
S-matrix Sab () : V a  V b ! V b  V a , graphically
d
represented in Figure 1a. It then follows from the = 21
symmetry that the two-particle scattering matrix d

a b d a b
b a b a
is a consequence of the property [14] of the universal
c R-matrix with respect to the coproduct.
θ There is a famous no-go theorem due to Coleman
θcab
a b a b and Mandula which states the ‘‘impossibility of
(a) (b) combining space-time and internal symmetries in
Figure 1 (a) Graphical representation of a two-particle any but a trivial way.’’ Affine quantum group
scattering process described by the S-matrix Sab (). (b) At symmetry circumvents this no-go theorem. In fact,
special values cab of the relative spectral parameter, the two the derivation D is the infinitesimal two-dimensional
particles of types a and b form a bound state of type c. Lorentz boost generator and the other symmetry
Affine Quantum Groups 189

charges transform nontrivially under these Lorentz or, graphically,


transformations, see [2].
The noncocommutative coproduct [8] means W
=
that a Uh (g^) symmetry generator, when acting on a W′
2-soliton state, acts differently on the left soliton V1 V2 ... Vn V1 V2 ... Vn
than on the right soliton. This is only possible
because the generator is a nonlocal symmetry charge One defines the transfer matrix
– that is, a charge which is obtained as the space
ðÞ ¼ trW TðÞ
integral of the time component of a current which
itself is a nonlocal expression in terms of the fields which is now an operator on V n , the Hilbert space
of the theory. of the quantum spin chain. Due to [22], two transfer
Similarly, many nonlinear sigma models possess matrices commute,
nonlocal charges which form Y(g ), and the con-
½ðÞ; ð0 Þ ¼ 0
struction proceeds similarly, now utilizing rational
R-matrices, and with particle multiplets forming and thus the () can be seen as a generating
fundamental representations of Y(g ). In each case, function of an infinite number of commuting
the three-point couplings corresponding to the charges, one of which will be chosen as the
formation of bound states, and thus the analogs for Hamiltonian. This Hamiltonian can then be diag-
Uh (^g ) and Y(g ) of the Clebsch–Gordan couplings, onalized using the algebraic Bethe ansatz.
obey a rather beautiful geometric rule originally One is usually interested in the thermodynamic
deduced in simpler, purely elastic scattering models limit where the number of spins goes to infinity. In
(Chari and Pressley 1996). this limit, it has been conjectured, the Hilbert space
More details about this topic can be found in of the spin chain carries a certain infinite-dimensional
Delius (1995) and MacKay (2005). representation of the quantum affine algebra and this
has been used to solve the model algebraically, using
vertex operators (Jimbo and Miwa 1995).
Integrable Quantum Spin Chains
Affine quantum groups provide an unlimited supply Boundary Quantum Groups
of integrable quantum spin chains. From any
In applications to physical systems that have a
R-matrix R() for any tensor product of finite-
boundary, the Yang–Baxter equation [1] appears in
dimensional representations W  V, one can pro-
conjunction with the boundary Yang–Baxter equa-
duce an integrable quantum system on the Hilbert
tion, also known as the reflection equation,
space V n . This Hilbert space can then be inter-
preted as the space of n interacting spins. The space R12 ðu  vÞK1 ðuÞR21 ðu þ vÞK2 ðvÞ
W is an auxiliary space required in the construction ¼ K2 ðvÞR12 ðu þ vÞK1 ðuÞR21 ðu  vÞ ½23
but not playing a role in the physics.
Given an arbitrary R-matrix R(), one defines the The matrices K are known as reflection matrices. This
monodromy matrix T() 2 End(W  V n ) by equation was originally introduced by Cherednik to
describe the reflection of particles from a boundary in
TðÞ ¼ R01 ð  1 ÞR02 ð  2 Þ    R0n ð  n Þ an integrable scattering theory and was used by
where, as usual, Rij is the R-matrix acting on the Sklyanin to construct integrable spin chains and
ith and jth component of the tensor product space. quantum field theories with boundaries.
The i can be chosen arbitrarily for convenience. Boundary quantum groups are certain co-ideal
Graphically the monodromy matrix can be repre- subalgebras of affine quantum groups. They provide
sented as the algebraic structures underlying the solutions of the
boundary Yang–Baxter equation in the same way in
W which affine quantum groups underlie the solutions of
the ordinary Yang–Baxter equation. Both allow one
V1 V2 V3 . . . Vn – 1 Vn
to find solutions of the respective Yang–Baxter
As a consequence of the Yang–Baxter equation equation by solving a linear intertwining relation. In
satisfied by the R-matrices the monodromy matrix the case without spectral parameters these algebras
satisfies appear in the theory of braided groups (see Hopf
Algebras and q-Deformation Quantum Groups and
RTT ¼ TTR ½22 Braided and Modular Tensor Categories).
190 Affine Quantum Groups

For example, the subalgebra B (^g ) of Uh (^g 0 ) algebra Y(g , h) generated by the Ii , ~Jp is, like B (^g ),
generated by a co-ideal subalgebra, (Y(g , h)) Y(g )  Y(g , h),
Hi =2 Hi
and again yields an intertwining relation for
Qi ¼ qi ðEþ 
i þ Ei Þ þ i ðqi  1Þ; K-matrices. For g = sln and h = so n or sp 2n , Y(g , h)
i ¼ 0; . . . ; r ½24 is the ‘‘twisted Yangian’’ described in Molev (2003).
All the constructions in earlier sections of this
is a boundary quantum group for certain choices of
review have analogs in the boundary setting. For
the parameters i 2 C[[h]]. It is a left co-ideal
more details see Delius and MacKay (2003) and
subalgebra of Uh (^g 0 ) because
MacKay (2005).
ðQi Þ ¼ Qi  1 þ qH g 0 Þ  B ðg
i  Qi 2 Uh ð^
i
^Þ ½25
See also: Bethe Ansatz; Boundary Conformal Field
Intertwiners K() : V ! V= for some constant  Theory; Classical r-Matrices, Lie Bialgebras, and Poisson
satisfying Lie Groups; Hopf Algebras and q-Deformation Quantum
Groups; Riemann–Hilbert Problem; Solitons and

KðÞ ðQÞ ¼ = ðQÞKðÞ; for all Q 2 B ðg ½26 Kac–Moody Lie Algebras; Yang–Baxter Equations.
provide solutions of the reflection equation in the
form
 12 ð Þðid  K1 ðÞÞR
 21 ð= Þ Further Reading
ðid  K2 ð ÞÞR
 12 ð= Þðid  K1 ðÞÞ Chari V and Pressley AN (1994) Quantum Groups. Cambridge:
¼R
Cambridge University Press.
R  21 ð Þðid  K2 ð ÞÞ ½27 Chari V and Pressley AN (1996) Yangians, integrable quantum
systems and Dorey’s rule. Communications in Mathematical
This can be extended to the case where the Physics 181: 265–302.
boundary itself carries a representation W of B (^g ). Delius GW (1995) Exact S-matrices with affine quantum group
symmetry. Nuclear Physics B 451: 445–465.
The boundary Yang–Baxter equation can be repre-
Delius GW and MacKay NJ (2003) Quantum group symmetry in
sented graphically as sine-Gordon and affine Toda field Theories on the Half-Line,
2
Communications in Mathematical Physics 233: 173–190.
V 1/μ 2
V 1/μ Drinfeld V (1985) Hopf algebras and the quantum Yang–Baxter
1
V 1/λ equation. Soviet Mathematics Doklady 32: 254–258.
Drinfeld V (1986) Quantum Groups, Proc. Int. Cong. Math.
1
(Berkeley), pp. 798–820.
V 1/λ Drinfeld V (1988) A new realization of Yangians and quantized
= V λ1
affine algebras. Soviet Mathematics Doklady 36: 212–216.
Jimbo M (1985) A q-difference analogue of UðgÞ and the Yang–
V λ1 Baxter equation. Letters in Mathematical Physics 10: 63–69.
W Jimbo M (1986) Quantum R-matrix for the generalized Toda
V μ2 V μ2 W system. Communications in Mathematical Physics 102:
537–547.
Another example is provided by twisted Yangians Jimbo M and Miwa T (1995) Algebraic Analysis of Solvable
where, when the Ia and Ja are constructed as Lattice Models. Providence, RI: American Mathematical
Society.
nonlocal charges in sigma models, it is found that Kulish PP and Reshetikhin NY (1983) Quantum linear problem
a boundary condition which preserves integrability for the sine-Gordon equation and higher representations.
leaves only the subset Journal of Soviet Mathematics 23: 2435.
MacKay NJ (2005) Introduction to Yangian symmetry in
Ii and ~Jp ¼ Jp þ 1 fpiq ðIi Iq þ Iq Ii Þ integrable field theory. International Journal of Modern
4
Physics (to appear).
conserved, where i labels the h-indices and p, q the Molev A (2003) Yangians and their applications. In: Hazewinkel
k-indices of a symmetric splitting g = h þ k. The M (ed.) Handbook of Algebra, vol. 3, pp. 907–959. Elsevier.
Aharonov–Bohm Effect 191

Aharonov–Bohm Effect
M Socolovsky, Universidad Nacional Autónoma de In the context of the Schrödinger equation, one
México, México DF, México can show that due to gauge invariance, if 0 is a
ª 2006 Elsevier Ltd. All rights reserved. solution to the equation in the absence of an
electromagnetic potential, then the product of
0 (x) times the integral of A over a path joining
an arbitrary reference point x0 to x is also a
Introduction solution, if the integral is path independent. How-
In classical electrodynamics, the interaction of charged ever, it is the path integral of Feynman which in the
particles with the electromagnetic field is local, formulas for propagators of charged particles in the
through the pointlike coupling of the electric charge presence of electromagnetic fields clearly shows that
of the particles with the electric and magnetic fields, E the action of these fields on charged particles is
and B, respectively. This is mathematically expressed nonlocal, and it is given by the celebrated non-
by the Lorentz-force law. The scalar and vector integrable (path-dependent) phase factor of Wu and
potentials, ’ and A, which are the time and space Yang (1975). Moreover, this fact provides an
components of the relativistic 4-potential A , are additional proof of the nonlocal character of
considered auxiliary quantities in terms of which quantum mechanics: to surround fluxes, or to
the field strengths E and B, the observables, are develop a potential difference, the particle has to
expressed in a gauge-invariant manner. The homo- travel simultaneously at least through two paths.
geneous or first pair of Maxwell equations are a direct Thus, the fact that the Aharonov–Bohm (A–B)
consequence of the definition of the field strengths in effect was verified experimentally, by Chambers and
terms of A_ The inhomogeneous or second pair of others, demonstrates the necessity of introducing the
Maxwell equations, which involve the charges and (gauge-dependent) potential A in describing the
currents present in the problem, are also usually electromagnetic interactions of the quantum parti-
written in terms of E and B ; however when writing cle. This is widely regarded as the single most
them in terms of A , the number of degrees of freedom important piece of evidence for electromagnetism
of the electromagnetic field is explicitly reduced from being a gauge theory. Moreover, it shows, to
six to four; and finally, with two additional gauge paraphrase Yang, that the field underdescribes the
transformations, one ends with the two physical physical theory, while the potential overdescribes it,
degrees of freedom of the electromagnetic field. and it is the phase factor which describes it exactly.
In quantum mechanics, however, both the The content of this article is essentially twofold.
Schrödinger equation and the path-integral approaches The first four sections are mainly physical, where we
for scalar and unpolarized charged particles in the describe the magnetic A–B effect using the
presence of electromagnetic fields, are written in Schrödinger equation and the Feynman path inte-
terms of the potential and not of the field strengths. gral. The fifth section is geometrical and is the long-
Even in the case of the Schrödinger–Pauli equation est of the article. We describe the effect in the
for spin 1=2 electrons with magnetic moment m context of fiber bundles and connections, namely
interacting with a magnetic field B, one knows that as a result of the coupling of the wave function
the coupling m × B is the nonrelativistic limit of the (section of an associated bundle) to a nontrivial
Dirac equation, which depends on A but not on E and flat connection (non-pure gauge vector potential
B_ Since gauge invariance also holds in the quantum with zero magnetic field) in a trivial bundle (the
domain, it was thought that A and ’ were mere A–B bundle) with topologically nontrivial (non-
auxiliary quantities, like in the classical case. simply-connected) base space. We discuss the mod-
Aharonov and Bohm, in 1959, predicted a quan- uli space of flat connections and the holonomy
tum interference effect due to the motion of charged groups giving the phase shifts of the interference
particles in regions where B(E) vanishes, but not patterns. Finally, in the last section, we briefly
A(’), leading to a nonlocal gauge-invariant effect comment on the nonabelian A–B effect.
depending on the flux of the magnetic field in the
inaccessible region, in the magnetic case, and on the
difference of the integrals over time of time-varying Electromagnetic Fields in Classical Physics
potentials, in the electric case. (The magnetic effect
was already noticed 10 years before by Ehrenberg In classical physics, the motion of charged particles
and Siday in a paper on the refractive index of in the presence of electromagnetic fields is governed
electrons.) by the equation
192 Aharonov–Bohm Effect

d  v  where  is a real-valued differentiable scalar


p = q Eþ B ½1 function (at least of class C2 ) on spacetime. That
dt c
is, if E0 , B0 , and S0int are defined in terms of A0 and
where ’0 as E, B, and Sint are defined in terms of A and
mv ’, then E0 = E, B0 = B, and S0int = Sint . This fact
p = pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1  ðv2 =c2 Þ leads to the concept that, classically, the observa-
bles E and B are the physical quantities, while A
is the mechanical momentum of the particle with is only an auxiliary quantity. Also, and most
electric charge q, mass m, and velocity v = ẋ (c is important in the present context, eqn [1] states
the velocity of light in vacuum, and for jvj  c the that the motion of the particles is determined by
left-hand side (LHS) of [1] is approximately mv); the the values or state of the field strengths in an
right-hand side (RHS) is the Lorentz force, where E infinitesimal neighborhood of the particles, that is,
and B are, respectively, the electric and magnetic classically, E and B act locally. If one defines the
fields at the spacetime point (t, x) where the particle differential 1-form A  A dx (with dx0 = c dt),
is located. Equation [1] is easily derived from the then the components of the differential 2-form
Euler–Lagrange equation F = dA = (1=2)(@ A  @ A )dx ^ dx  (1=2)F
  dx ^ dx are precisely the electric and magnetic
d @L @L
 =0 ½2 fields:
dt @v @x 0 1
0 E1 E2 E3
with the Lagrangian L given by the sum of the free B C
Lagrangian for the particle, B E1 0 B3 B2 C
B C
F = B C ½8
rffiffiffiffiffiffiffiffiffiffiffiffiffi B E2 B3 0 B A1C
v2 @
2
L0 = mc 1  2 ½3 E3 B2 B1 0
c
and the Lagrangian describing the particle–field At the level of A,
interaction,
q dF = d2 A = 0 ½9
Lint = A  v  q’ ½4
c is an identity, but at the level of E and B, [9]
In [4], A and ’ are, respectively, the vector potential amounts to the homogeneous (or first pair of)
and the scalar potential, which together form the Maxwell equations obeyed by the field strengths:
4-potential A = (A0 , A) = (’, Ai ), i = 1, 2, 3, rB = 0 ½10a
in terms of which the electric and magnetic field
strengths are given by
1@
1@ rEþ B=0 ½10b
E=  A  r’ ½5a c @t
c @t
Therefore, these equations have a geometrical
origin. The second pair of Maxwell equations is
B = rA ½5b dynamical, and is obtained from the field action (in
The classical action corresponding to a given path of the Heaviside system of units)
Z
the particle is 1
Sfield =  d4 xF F ½11
Z t2 Z t2 4c
S¼ dt L ¼ dtðL0 þ Lint Þ which leads to r  E = 4 ½12a
Zt1t2 Zt1 t2
¼ dt L0 þ dt Lint  S0 þ Sint ½6
t1 t1
1@ 4j
rB E= ½12b
E, B, and S are invariant under the gauge c @t c
transformation where (, j) = (j0 , j) is the 4-current satisfying, as a
consequence of [12a] and [12b], the conservation law
A ! A0 = A  r ½7a
@ j = 0 ½13
1@ For a pointlike particle, (t, x) = q3 (x  x(t)) and
’ ! ’0 = ’ þ  ½7b
c @t j = v.
Aharonov–Bohm Effect 193

Electromagnetic Fields in Quantum Z xðt0 Þ¼x0  Z t0 


i 1
Physics ¼ DxðÞexp d mẋ2
xðtÞ¼x h t 2
In quantum physics, the motion of charged particles in  Z t0 
iq  0

external electromagnetic fields is governed by the  exp A  dx  ’ dx
hc t
Schrödinger equation or, equivalently, by the Feynman Z xðt0 Þ¼x0  Z t0 
path integral. In both cases, however, it is the i 1
¼ DxðÞexp d mẋ2
4-potential A which appears in the equations, and xðtÞ¼x h t 2
not the field strengths. For simplicity, we consider here  Z t0 
iq
scalar (spinless) charged particles or unpolarized  exp dx A ½16
hc t
electrons (spin-(1=2)particles), both of which, in the
nonrelativistic approximation, can be described quan- R
where the integral Dx() . . . is over all continuous
tum mechanically by a complex wave function (t, x). spacetime paths (, x()) which join (t, x) with (t0 , x0 ).
To derive the Schrödinger equation, one starts If one knows the wave function at (t, x), then the
from the classical Hamiltonian wave function at (t0 , x0 ) is given by
1 q 2 Z
H = P  v  L  mc2 = P  A þ q’ ½14
2 c ðt0 ; x0 Þ = d3 x Kðt0 ; x0 ; t; xÞ ðt; xÞ ½17
where
An important point is the natural appearance in the
@ q
P= L = pþ A integrand of the functional integral of the factor
@v c R
is the canonical momentum of the particle, and we ðiq=hcÞ A
e 

have subtracted its rest energy. The replacements


P ! i hr and H ! i h@=@t lead to for each path  joining (t, x) with (t0 , x0 ).
  2 
@ 1 q
i
h ¼ hr þ A þ q’
i
@t 2m c A Solution to the Schrödinger Equation
 2
h
 q2
¼  r2 þ A2 In what follows, we shall restrict ourselves to static
2m 2mc2 magnetic fields; then in the previous formulas, we

i
hq i
hq set ’ = 0 and A(t, x) = A(x). It is then easy to
þ rAþ A  r þ q’ ½15
2mc mc show that if Rx0 is an arbitrary reference point and
x
the integral x0 A(x0 )  dx0 is independent of the
The gauge transformation [7a] and [7b] is a integration path from x0 to x, that is, it is a well-
symmetry of this equation, if simultaneously to the defined function f of x, and if 0 is a solution of
change of the 4-potential, the wave function trans- the free Schrödinger equation, that is,
forms as follows:
ðt; xÞ ! 0
ðt; xÞ = eðiq=hcÞ ðt; xÞ ½7c @ 2 2
h
ih 0 =  r 0 ½18
0 0 (iq=hc)
@t 2m
So, A and obey [15]. At each (t, x), e
belongs to U(1), the unit circle in the complex plane. then
In the path-integral approach, the kernel  Z x 
iq
K(t0 , x0 ;t, x), which gives the probability amplitude ðt; xÞ = exp Aðx0 Þ  dx0 0 ðt; xÞ ½19
for the propagation of the particle from the spacetime hc x0

point (t, x) to the spacetime point (t0 , x0 ) (t < t0 ), is


given by is a solution of [15]. In fact, replacing [19] in [15],
Kðt0 ; x0 ; t; xÞ the LHS gives
Z xðt0 Þ¼x0    
i iq @
¼ DxðÞexp ðS0 þ Sint Þ exp f ðxÞ ih
xðtÞ¼x h
 hc @t
0
Z xðt0 Þ¼x0  Z t0 
i 1
¼ DxðÞexp d mẋ2 while for the RHS one has
xðtÞ¼x h
 t 2
   
q iq h2
þ A  v  q’ exp f ðxÞ  r2 0
c hc 2m
194 Aharonov–Bohm Effect

The cancelation of the exponential factors shows of the figure (in direction z); outside of the solenoid,
that, under the condition of path independence, the magnetic field is zero. If the radius of the
there is no effect of the potential on the charged solenoid is R, a vector potential A that produces
particles. Another way to see this is by making a such field strength is given by
gauge transformation [7a]–[7c] with (x) = f (x),
(jBjr/2)’;
^ rR
which
Rx changes ! 0 and A ! A0 = A  r AðxÞ ¼ ½21
0 0 (/2r)’;
^ r>R
x0 A(x )  dx = A  A = 0.
The condition of path independence amounts, where  = R2 jBj and ’ˆ is a unit vector in the
however, to the condition
R that no magnetic field is azimuthal direction. In fact,
present since, if  A depends on , then for some R
pair of paths  and  0 from (t, x) to (t0 , x0 ), 0 6¼  jBj^z; r  R
R R R H R B = r  AðxÞ = ½22
0; r > R
A  0 A =  A þ 0 A = [(0 ) A =  ds  (r  A),
where in the last equality we applied Stokes theorem Notice that at r = R, A is continuous but not
( is any surface with boundary  [ ( 0 )), which continuously differentiable. Also, the ideal limit of
shows that B = r  A must not vanish everywhere an infinitely long solenoid makes the problem two-
and has a nonzero flux  through  given by dimensional, that is, in the x–y plane.
Z The probability amplitude for an electron emitted
= ds  B ½20 at the source S to arrive at the point P on the screen
 , is given by the sum of two probability ampli-
The conclusion of this section is that the ansatz [19] for tudes, namely those corresponding to passing
solving [15] can only be applied in simply connected through the slits 1 and 2. The solenoid is assumed
regions with no magnetic field strength present. to be impenetrable to the electrons; mathematically,
this corresponds to a motion in a non-simply-
connected region. In the approximation for the
Aharonov–Bohm Proposal path integral [16], in which one considers the
In 1959, Aharonov and Bohm proposed an experi- contribution of only two classes of paths, that is,
ment to test, in quantum mechanics, the coupling of the class fg represented by path I, and the class
electric charges to electromagnetic field strengths f 0 g represented by path II, if the wave function at
through a local interaction with the electromagnetic the source is S , then the wave function at P is
potential A , but not with the field strengths given by
themselves. However, as we saw before, no physical Z R
ðijej=hcÞ A
effect exists, that is, A can be gauged away, unless P ¼ eði=hÞS0 ðÞ e 

magnetic and/or electric fields exist somewhere, fg


Z R !
although not necessarily overlapping the wave func-
ði=hÞS0 ð 0 Þ ðijej=hcÞ A
tion of the particles. þ e e 0
S
f 0 g
Consider the usual two-slit experiment as depicted R Z
in Figure 1, with the additional presence, behind the ðijej=hcÞ A
¼e I eði=hÞS0 ðÞ S
slits, of a long and narrow solenoid enclosing a fg
nonvanishing magnetic flux  due to a constant and R Z
ðijej=hcÞ A 0
homogeneous magnetic field B normal to the plane þe II eði=hÞS0 ð Þ S
f 0 g
R 
ðijej=hcÞ A 0
¼e I
P ðIÞ
R  
y ðijej=hcÞ A 0
þe II[ðIÞ
P ðIIÞ
R  
1 ðijej=hcÞ A 0 0
¼e I
P ðIÞ þ e2ið=0 Þ P ðIIÞ
I P ½23
z
R x
S
II where, in the second line, we used the path
2 independence of the integral of A within each class
of paths;
Z R
∏ 0 ði=hÞ S0 ðÞ
P ðIÞ = e fg
S
Figure 1 Magnetic Aharonov–Bohm effect. fg
Aharonov–Bohm Effect 195

and with
Z 
0 0
eði=hÞS0 ð Þ  = 2 ½27
P ðIIÞ = S 0
f 0 g
(Schulman 1971, Kobe 1979). As in [23],
and, in the last equality, we applied the extended
version of Stokes theorem (by Craven), to allow for P ð þ k0 Þ = P ðÞ; k2Z ½28
noncontinuously differentiable vector potentials;
There is a close relation between the A–B effect
and the quantum of magnetic flux associated with
and the Dirac quantization condition (DQC) in the
the charge jej is defined by
presence of electric and magnetic charges: according
c
h to [25] (or [26]) the A–B effect disappears when the
0 = 2 ffi 4:135  107 G cm2 ½24 flux  equals n0 = 2n(hc=jej), n 2 Z, that is,
jej
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi when the condition
( = 2=jej = = ffi 137 in the natural system jej = nhc ½29
of units (n.s.u.) h = c = 1; is the fine structure
constant). Then the probability of finding the holds. But this is the DQC (Dirac 1931) when  is
electron at P is proportional to the flux associated with a magnetic charge g :
(g) = (g=4r2 )  4r2 = g, leading to jejg = nhc
2 0 2 0 2
j Pj =j P ðIÞj þj
P ðIIÞj (2n in the n.s.u.). This is precisely the condition for
 2ið=0 Þ 0 0
 the Dirac string to be unobservable in quantum
þ 2Re e P ðIÞ P ðIIÞ ½25
mechanics: to give no A–B effect.
which exhibits an interference pattern shifted with
respect to that without the magnetic field: as B and
therefore  change, dark and bright interference
Geometry of the A–B Effect
fringes alternate periodically at the screen , with In this section we study the space of gauge classes of
period 0 . This is the magnetic A–B effect, which has flat potentials outside the solenoid, which determine
been quantitatively verified in many experiments, the the A–B effect; the topological structure of the A–B
first one in 1960 by Chambers. The effect is: bundle; and the holonomy groups of the connec-
tions, which precisely give the phase shifts of the
1. gauge invariant, since B and therefore  are
wave functions. We use the n.s.u. system; in parti-
gauge invariant; 1
cular, if [L] is the unit of length, then
pffiffiffiffiffiffiffiffi [Affiffiffiffiffiffiffiffiffiffiffiffi
ffi p  ] = [L] ,
2. nonlocal, since it depends on the magnetic field 0
[jej] = [L] , and 0 = 2=jej = = ffi 137, where
inside the solenoid, where the electrons never
is the fine structure constant.
enter;
To synthesize, one can say that the abelian A–B
3. quantum mechanical, since classically the charges
effect is a nonlocal gauge-invariant quantum effect
do not feel any force and therefore no effect
due to the coupling of the wave function (section of
would be expected in this limit; and
an associated bundle) to a nontrivial (non-exact) flat
4. topological, since the electrons necessarily move
(closed) connection in a trivial principal bundle with
in a non-simply-connected space.
a non-simply-connected base space. In the following
But perhaps the most important implication of the subsections, we will give a detailed explanation of
A–B effect is a dramatic additional confirmation of these statements.
the nonlocal character of quantum mechanics: the
electron has to ‘‘travel’’ along the two paths (I and The A–B Bundle
II) simultaneously; on the contrary, no flux would
The gauge group of electromagnetism is the abelian
be surrounded and then no shift of the (then
Lie group U(1) with Lie algebra (the tangent space at
nonexistent) interference fringes would be observed
the identity) u(1) = iR. In the limit of an infinitely
at the screen .
long and infinitesimally thin solenoid carrying the
Calculations in the path-integral approach includ-
magnetic flux , the space available to the electrons
ing the whole set of homotopy classes of paths
is the plane minus a point, that is, R2
, which is of
around the solenoid, indexed by an integer m, have
the same homotopy type as the circle S1 . Then the
been performed by several authors, leading to a
set of isomorphism classes of U(1) bundles over R 2

formula of the type


is in one-to-one correspondence with the set of
X1 homotopy classes of maps from S0 to S1 (Steenrod
0
P = eim P ðmÞ ½26
m = 1 1951), which consists of only one point: if f , g :
196 Aharonov–Bohm Effect

S0 ! S1 are given by f (1) = ei’1 , f (1) = ei’2 , where


g(1) = ei
1 , and g(1) = ei
2 , then H : S0  [0, 1] !
x dy  y dx
S1 given by H(1, t) = ei((1t)’1 þt
1 ) and H(1, t) = A0 = i 2 C0 ½35
ei((1t)’2 þt
2 ) is a homotopy between f and g. Then, x 2 þ y2
up to equivalence, the relevant bundle for the A–B is the connection that, once multiplied by jej1 (see
effect is the product bundle below) generates the flux 0 and therefore no
A–B effect: A0 is closed (dA0 = 0) but not
AB : Uð1Þ ! R 2
 Uð1Þ ! R2
½30a exact ((x dy  y dx)=(x2 þ y2 ) = d’ only for ’ 2
(0, 2), ’ = 0 is excluded); [A0 ]DR = A0 þ d with
Since R2
is homeomorphic to an open disk minus a 2 0 (R2
; iR). gives an element of G through the
point (D20 )
, then the total space of the bundle is composite exp : R2
! U(1), (x, y) 7! ei (x, y) . The
homeomorphic to an open solid 2-torus minus a A–B effect with flux  =  0 is produced by the
circle, since (T02 )
= (D20 )
 S1 . Then the A–B connection A = A0 . To determine M0 , one finds
bundle has the topological structure the smallest  2 R such that ( þ )A0 A0 , that is,
( þ )A0 2 [ A0 ], which means, from [33], that
AB : S1 ! ðT02 Þ
! ðD20 Þ
½30b ( þ )A0 = A0 þ f 1 df or A0 = f 1 df . For ’ 6¼
0, A0 = id’ and f11 df1 = id’, then  = 1, and
therefore ( þ 1)A0 A0 , in particular A0 0.
A remark concerning the gauge group G is the
The Gauge Group and the Moduli Space of Flat following. In classical electrodynamics, according to
Connections [7a] and [7b], the symmetry group could be taken to
The gauge group of the bundle A–B is the set of be the additive group (R, þ) instead of the multi-
smooth functions from the base space to the plicative group U(1). Since R is contractible, then
structure group, that is, G = C1 (R2
, U(1)). Since the gauge group would be Gcl = C1 (R2
, R) with
G C0 (R2
, U(1)) = fcontinuous functions R2
! U [R2
, R] ffi 0, so that the homomorphism  : Gcl ! G,
(1)g and [R2
, U(1)] = fhomotopy classes of contin- (f ) (x) = eif (x) would not exhaust G since (f ) 2 [1]
uous functions R2
! U(1)g ffi [S1 , S1 ] ffi 1 (S1 ) ffi for any f 2 Gcl : in fact, H : R2
 [0, 1] ! U(1)
Z, given f 2 G there exists a unique n 2 Z such given by H(x, t) = ei(1t)f (x) is a homotopy between
that f is homotopic to fn (f fn ), where fn : R 2
! (f ) and 1. However, the quantization of electric
U(1) is given by fn (rei’ ) = ein’ , ’ 2 [0, 2). charges implies that in fact the gauge group is U(1)
G acts on the space of flat connections on A–B and not R. This is equivalent mathematically to the
given by the closed u(1)-valued differential 1-forms possible existence of magnetic monopoles which
on R2
: require nontrivial bundles for their description.

C0 = fA 2 1 ðR2
; uð1ÞÞ; dA = 0g ½31 Covariant Derivative, Parallel Transport,
and Holonomy
through
Let G be a matrix Lie group with Lie algebra g, B a

C0  G ! C0 ; ðA; f Þ ! A þ f 1 df ½32 differentiable manifold, : G ! P! B a principal
bundle, V a vector space, G  V ! V an action,
where f 1 (x, y) = (f (x, y))1 . The moduli space and V : V ! P G V ! v
B the corresponding asso-
ciated vector bundle ( V is trivial if is trivial). Call
C0 ( V ) the sections of V , (TB)( (TP)) the sections
M0 ¼ ¼ fgauge equivalence classes
G of the tangent bundle of B(P), and eq (P, V) the set
of flat connections on AB g of functions  : P ! V satisfying (pg) = g1 (p)
(equivariant functions from P to V). s 2 ( V )
¼ f½A ¼ fA þ f 1 df; f 2 Gg; A 2 C0 g ½33 induces s 2 eq (P, V) with s (p) = , where
s((p)) = [p, ] and  2 eq (P, V) induces s 2 ( V )
is isomorphic to the circle S1 with length 1. This can with s (b) = [p, (p)], where p 2 1 (fbg). If H is a
be seen as follows: the de Rham cohomology of R 2
connection on , that is, a smooth assignment of a
with coefficients in iR in dimension 1 is (horizontal) vector subspace Hp of Tp P at each p of
P, algebraically determined by a smooth g-valued
1-form ! on P through Hp = ker(!p ), s 2 ( V ),
1
HDR ðR2
; iRÞ = f ½A0 DR ; 2 Rg
X 2 (TB), and X" 2 (TP) the horizontal lifting of
1
ffi HDR ðS1 ; iRÞ ffi R ½34 X by !, then X" (s ) 2 eq (P, V), and covariant
Aharonov–Bohm Effect 197

derivative of s with respect to ! in the direction of X is whose solution is the time-ordered exponential
defined by Z t !
gðtÞgð0Þ1 ¼ T exp ˙
d ! ðÞ ð ðÞÞ
r!X s : = sX"ðs Þ ½36a 0

X
1 Z t
If  : 1 (U) ! U  G is a local trivialization of , ¼1þ ð1Þ m ˙ 1 ÞÞ
d1 ! ð1 Þ ð ð

x ,  = 1, . . . , dim B are local coordinates on U, and m¼1 0
ei , i = 1, . . . , dim V is a basis of the local sections in Z 1
1
V (U), then the local expression of [36a] is  ˙ 2 ÞÞ   
d2 ! ð2 Þ ð ð
0
  Z
@ m1
r!XU @=@x ðsi ei Þ = X ij  þ Aji si ej ½36b  ˙ m ÞÞ
dm ! ðm Þ ð ð ½38
@x 0

where If q = p then g(0) = 1. For each p 2 P, the set of


elements g0 2 G such that c" (1) = pg0 for c 2
j j j (B;(p)) is a subgroup of G, Hol!p , called the
AU i = Ai dx = ð
!U Þi ½36c
holonomy of ! at p. (For each p, there exists a
group isomorphism Hol!p ! Hol!(p) , and if p and p0
is the geometrical gauge potential in U, given by the are connected by a horizontal curve, then
pullback of !U , the restriction of ! to 1 (U), by the Hol!p = Hol!p0 ; if all p0 s in P are horizontally con-
j
local section  : U ! 1 (U), (b) = 1 (b, 1). (Ai nected, then Hol!p = G for all p 2 P.) If (U, ) is a
!U j
is defined through r@=@x ei = Ai ej .) The operator local trivialization of , c U, and (t) = (c(t)), then
one has the local formula
j j @ j Z cðtÞ !!
Di = i þ Ai ½36d
@x " 1
c ðtÞ =  ðcðtÞ; 1Þ T exp  AU gð0Þ
cð0Þ
is the usual local covariant derivative. In an over-
lapping trivialization, [36b] is replaced by ½39

  In particular, if is a product bundle, then  is the


@ 0 identity, and choosing g(0) = 1 gives
= X ij  þ Aij s0i e0j
!
rXU0@=@x ðs0i e0i Þ 
@x Z cðtÞ !!
"
c ðtÞ = cðtÞ; T exp  AU ½40
with e0j = gkj ek and s0i = g1il sl on U \ U0 , then the cð0Þ
local potential transforms as
In our case, V = C, is a product bundle, s = ,
0
Ajl = gjk Aik g1il þ ð@ gjk Þg1kl ½36e the wave function, is a global section of the
associated bundle
which for G abelian has the form [32]. C : C ! R2
 C !
C
R2
½41
For each smooth path c : [0, 1] ! B joining the
G = U(1) with g = iR and an action U(1)  C ! C,
points b and b0 , and each p 2 Pb = 1 (fbg), there
(ei’ , z) 7! ei’ z; therefore, A = A0 = ia with a
exists a unique path c" in P through p with c" (t) 2
real valued, and the covariant derivative is
Hc(t) for all t 2 [0, 1]. c" is the horizontal lifting of c !
by ! through p. Thus, for each connection and path @
there exists a diffeomorphism P!c : Pb ! Pb0 called D = þ ia ½36f
@x
parallel transport. If c is a loop at b, then P!c 2
Diff(Pb ) is called the holonomy of ! at b along c. To If carries the electric charge q, we define the
the loop space of B at b, (B;b), corresponds a physical gauge potential A through
subgroup Hol!b of Diff(Pb ) called the holonomy of !
a = qA ½42
at b. If c 2 (B;b) and is a lifting of c through q 2
Pb , then there exists a unique path g : [0, 1] ! G and, for the covariant derivative, after multiplying
such that c" (t) = (t)g(t) with c" (0) = qg(0) = p; g by i, we obtain the operator appearing in eqn [15],
satisfies the differential equation iD = (i(@=@x )  qA ) : in fact, for the spatial
part the coupling is (ir þ qA) , and for the
d ˙ temporal part one has (i@=@t  q’) . For the
gðtÞ þ ! ðtÞ ð ðtÞÞ =0 ½37
dt electron, q = jej and a = jejA = (2=0 )A .
198 Algebraic Approach to Quantum Field Theory

For c 2 (R2
;(x0 , y0 )), which turns n times nonabelian. Examples with Yang–Mills and grav-
around the solenoid at (0, 0), eqn [40] gives itational fields are considered in the literature.
H H
n A in a
c" ¼ ððx0 ; y0 Þ; e c Þ ¼ ððx ; y Þ; e
H 0 0 c Þ

ijejn Adx Acknowledgment


¼ ððx0 ; y0 Þ; e c Þ ¼ ððx0 ; y0 Þ; e2in=0 Þ
and therefore, for =0 = 2 [0, 1) we have the The author thanks the University of Valencia,
holonomy groups Spain, where part of this work was done.
!ðÞ
Holððx0 ;y0 Þ;1Þ ¼ fe2inð=0 Þgn2Z See also: Deformation Quantization and Representation
Theory; Fractional Quantum Hall Effect; Geometric
Zq ; ¼ p=q; p; q 2 Z; ðp; qÞ ¼ 1
¼ Phases; Moduli Spaces: An Introduction; Quantum
Z; 62 Q Chromodynamics; Variational Techniques for
½43 Ginzburg–Landau Energies.

In the second case, Hol!() is dense in U(1): in fact,


((x0 , y0 ), 1)
suppose that for n1 , n2 2 Z, n1 6¼ n2 , e2in1 = e2in2 ,
then e2i(n1 n2 ) = 1 and so (n1  n2 ) = m for some
Further Reading
m 2 Z; therefore, 2 Q, which is a contradiction. Aguilar MA and Socolovsky M (2002) Aharonov–Bohm effect,
Finally, we should mention that the A–B effect flat connections, and Green’s theorem. International Journal
can be understood as a geometric phase à la Berry, of Theoretical Physics 41: 839–860.
Aharonov Y and Bohm D (1959) Significance of electro-
though not necessarily through an adiabatic change magnetic potentials in the quantum theory. Physical Review
of the parameters on which the Hamiltonian 15: 485–491.
depends. The Berry potential aB turns out to be Berry MV (1984) Quantal phase factors accompanying adiabatic
proportional to the real magnetic vector potential A: changes. Proceedings of the Royal Society of London A 392:
in the n.s.u., and for electrons, 45–57.
Chambers RG (1960) Shift of an electron interference pattern by
aB =  jejA ½44 enclosed magnetic flux. Physical Review Letters 5: 3–5.
Corichi A and Pierri M (1995) Gravity and geometric phases.
Physical Review D 51: 5870–5875.
Dirac PMA (1931) Quantised singularities in the electro-
Nonabelian and Gravitational A–B Effects magnetic field. Proceedings of the Royal Society of London A
133: 60–72.
Since the fundamental group 1 (R 2
, (x0 , y0 )) ffi Z, Kobe DH (1979) Aharonov–Bohm effect revisited. Annals of
eqn [43] shows that there is a homomorphism ’(!) : Physics 123: 381–410.
1 (R2
, (x0 , y0 )) ! U(1), ’(!)(n) = e2in , with Peshkin M and Tonomura A (1989) The Aharonov–Bohm Effect.
’(!) (1 (R2
)) = Hol!() Berlin: Springer.
((x0 , y0 ), 1) , which characterizes Schulman LS (1971) Approximate topologies. Journal of Mathe-
the A–B effect in that case. In general, an A–B matical Physics 12: 304–308.
effect in a G-bundle with a connection ! is Steenrod N (1951) The Topology of Fibre Bundles. Princeton, NJ:
characterized by a group homomorphism from the Princeton University Press.
fundamental group of the base space B onto the Sundrum R and Tassie LJ (1986) Non-abelian Aharonov–Bohm
effects, Feynman paths, and topology. Journal of Mathema-
holonomy group of the connection, which is a
tical Physics 27: 1566–1570.
subgroup of the structure group. The A–B effect is Wu TT and Yang CN (1975) Concept of nonintegrable phase
nonabelian if the holonomy group is nonabelian, factors and global formulation of gauge fields. Physical
which requires both G and 1 (B, x) to be Review D 12: 3845–3857.

Algebraic Approach to Quantum Field Theory


R Brunetti and K Fredenhagen, Universität physics. There are, however, severe obstacles against
Hamburg, Hamburg, Germany a straightforward translation of concepts of classical
ª 2006 Elsevier Ltd. All rights reserved. field theory into quantum theory, among them the
notorious divergences of quantum field theory and
the intrinsic nonlocality of quantum physics. There-
fore, the concept of locality is somewhat obscured in
Introduction
the formalism of quantum field theory as it is
Quantum field theory may be understood as the typically exposed in textbooks. Nonlocal concepts
incorporation of the principle of locality, which is at such as the vacuum, the notion of particles or the S-
the basis of classical field theory, into quantum matrix play a fundamental role, and neither the
Algebraic Approach to Quantum Field Theory 199

relation to classical field theory nor the influence of be applied to them. However, it has recently been
background fields can be properly treated. shown that formal perturbation theory can be
Algebraic quantum field theory (AQFT; synony- reshaped in the spirit of AQFT such that the algebras
mously, local quantum physics), on the contrary, of observables of these models can be constructed as
aims at emphasizing the concept of locality at every algebras of formal power series of Hilbert space
instance. As the nonlocal features of quantum operators. The price to pay is that the deep
physics occur at the level of states (‘‘entangle- mathematics of operator algebras cannot be applied,
ment’’), not at the level of observables, it is better but the crucial features of the algebraic approach can
not to base the theory on the Hilbert space of states be used.
but on the algebra of observables. Subsystems of a AQFT was originally proposed by Haag as a
given system then simply correspond to subalgebras concept by which scattering of particles can be
of a given algebra. The locality concept is abstractly understood as a consequence of the principle of
encoded in a notion of independence of subsystems; locality. It was then put into a mathematically
two subsystems are independent if the algebra of precise form by Araki, Haag, and Kastler. After the
observables which they generate is isomorphic analysis of particle scattering by Haag and Ruelle
to the tensor product of the algebras of the and the clarification of the relation to the Lehmann–
subsystems. Symanzik–Zimmermann (LSZ) formalism by Hepp,
Spacetime can then – in the spirit of Leibniz – be the structure of superselection sectors was studied
considered as an ordering device for systems. So, one first by Borchers and then in a fundamental series of
associates with regions of spacetime the algebras of papers by Doplicher, Haag, and Roberts (DHR)
observables which can be measured in the pertinent (see, e.g., Doplicher et al. (1971, 1974)) (soon after
region, with the condition that the algebras of Buchholz and Fredenhagen established the relation
subregions of a given region can be identified with to particles), and finally Doplicher and Roberts
subalgebras of the algebra of the region. uncovered the structure of superselection sectors as
Problems arise if one aims at a generally covariant the dual of a compact group thereby generalizing the
approach in the spirit of general relativity. Then, in Tannaka–Krein theorem of characterization of
order to avoid pitfalls like in the ‘‘hole problem,’’ group duals.
systems corresponding to isometric regions must be With the advent of two-dimensional conformal
isomorphic. Since isomorphic regions may be field theory, new models were constructed and it was
embedded into different spacetimes, this amounts shown that the DHR analysis can be generalized to
to a simultaneous treatment of all spacetimes of a these models. Directly related to conformal theories is
suitable class. We will see that category theory the algebraic approach to holography in anti-de Sitter
furnishes such a description, where the objects are (AdS) spacetime by Rehren.
the systems and the morphisms the embeddings of a The general framework of AQFT may be described
system as a subsystem of other systems. as a covariant functor between two categories. The
States arise as secondary objects via Hilbert space first one contains the information on local relations
representations, or directly as linear functionals on and is crucial for the interpretation. Its objects are
the algebras of observables which can be interpreted topological spaces with additional structures (typi-
as expectation values and are, therefore, positive cally globally hyperbolic Lorentzian spaces, possibly
and normalized. It is crucial that inequivalent spin bundles with connections, etc.), its morphisms
representations (‘‘sectors’’) can occur, and the being the structure-preserving embeddings. In the
analysis of the structure of the sectors is one of case of globally hyperbolic Lorentzian spacetimes,
the big successes of AQFT. One can also study the one requires that the embeddings are isometric and
particle interpretation of certain states as well as preserve the causal structure. The second category
(equilibrium and nonequilibrium) thermodynamical describes the algebraic structure of observables. In
properties. quantum physics the standard assumption is that one
The mathematical methods in AQFT are mainly deals with the category of C -algebras where the
taken from the theory of operator algebras, a field of morphisms are unital embeddings. In classical phys-
mathematics which developed in close contact to ics, one looks instead at Poisson algebras, and in
mathematical physics, in particular to AQFT. perturbative quantum field theory one admits alge-
Unfortunately, the most important field theories, bras which possess nontrivial representations as
from the point of view of elementary particle formal power series of Hilbert space operators. It is
physics, as quantum electrodynamics or the standard the leading principle of AQFT that the functor a
model could not yet be constructed beyond formal contains all physical information. In particular, two
perturbation theory with the annoying consequence theories are equivalent if the corresponding functors
that it seemed that the concepts of AQFT could not are naturally equivalent.
200 Algebraic Approach to Quantum Field Theory

In the analysis of the functor a, a crucial role is The concept of locally covariant quantum field
played by natural transformations from other theory is defined as follows.
functors on the locality category. For instance, a
Definition 1
field A may be defined as a natural transformation
from the category of test function spaces to the (i) A locally covariant quantum field theory is a
category of observable algebras via their functors covariant functor a from Loc to Obs and (writing
related to the locality category.  for a( )) with the covariance properties
 0  ¼ 0 ; idM ¼ idaðMÞ

Quantum Field Theories as Covariant for all morphisms 2 homLoc (M1 , M2 ), all
0
Functors morphisms 2 homLoc (M2 , M3 ), and all
M 2 obj(Loc).
The rigorous implementation of the generally covariant (ii) A locally covariant quantum field theory
locality principle uses the language of category theory. described by a covariant functor a is called
The following two categories are used: ‘‘causal’’ if the following holds: whenever there
Loc: The class of objects obj(Loc) is formed by all are morphisms j 2 homLoc (Mj , M), j = 1, 2,

(smooth) d-dimensional (d  2 is held fixed), so that the sets 1 (M1 ) and 2 (M2 ) are causally
globally hyperbolic Lorentzian spacetimes M separated in M, then one has
 
which are oriented and time oriented. Given any  1 ðaðM1 ÞÞ;  2 ðaðM2 ÞÞ ¼ f0g
two such objects M1 and M2 , the morphisms 2
homLoc (M1 , M2 ) are taken to be the isometric where the element-wise commutation makes
embeddings : M1 ! M2 of M1 into M2 but with sense in a(M).
the following constraints: (iii) One says that a locally covariant quantum field
theory given by the functor a obeys the ‘‘time-
(i) if  : [a, b] ! M2 is any causal curve and slice axiom’’ if
(a), (b) 2 (M1 ) then the whole curve must
be in the image (M1 ), that is, (t) 2 (M1 ) for  ðaðMÞÞ ¼ aðM0 Þ
all t 2 [a, b];
(ii) any morphism preserves orientation and holds for all 2 homLoc (M, M0 ) such that (M)
time orientation of the embedded spacetime. contains a Cauchy surface for M0 .
The composition is defined as the composition Thus, a quantum field theory is an assignment of
of maps, the unit element in homLoc (M, M) is C -algebras to (all) globally hyperbolic spacetimes
given by the identical embedding idM : M 7! M so that the algebras are identifiable when the
for any M 2 obj(Loc). spacetimes are isometric, in the indicated way. This
Obs: The class of objects obj(Obs) is formed by all is a precise description of the generally covariant
C -algebras possessing unit elements, and the locality principle.
morphisms are faithful (injective) unit-preserving
-homomorphisms. The composition is again
defined as the composition of maps, the unit The Traditional Approach
element in homObs (A, A) is for any A 2 obj(Obs)
The traditional framework of AQFT, in the Araki–
given by the identical map idA : A 7! A, A 2 A.
Haag–Kastler sense, on a fixed globally hyperbolic
The categories are chosen for definitiveness. One spacetime can be recovered from a locally covariant
may envisage changes according to particular needs, quantum field theory, that is, from a covariant
as, for instance, in perturbation theory where instead functor a with the properties listed above.
of C -algebras general topological -algebras are Indeed, let M be an object in obj(Loc). K(M)
better suited. Or one may use von Neumann denotes the set of all open subsets in M which are
algebras, in case particular states are selected. On relatively compact and also contain, with each pair
the other hand, one might consider for Loc bundles of points x and y, all g-causal curves in M
over spacetimes, or (in conformally invariant the- connecting x and y (cf. condition (i) in the definition
ories) admit conformal embeddings as morphisms. In of Loc). O 2 K(M), endowed with the metric of M
case one is interested in spacetimes which are not restricted to O and with the induced orientation and
globally hyperbolic, one could look at the globally time orientation, is a member of obj(Loc), and the
hyperbolic subregions (where one needs to be careful injection map M,O : O ! M, that is, the identical
about the causal convexity condition (i) above). map restricted to O, is an element in homLoc (O, M).
Algebraic Approach to Quantum Field Theory 201

With this notation, it is easy to prove the following Ultraviolet Structure and Idealized Localizations
assertion:
This section deals with the problem of inspecting the
Theorem 1 Let a be a covariant functor with theory at very small scales. In the limiting case, one
the above-stated properties, and define a map is interested in idealized localizations, eventually the
K(M) 3 O 7! A(O) a(M) by setting points of spacetimes. But the observable algebras are
trivial at any point x 2 M, namely
AðOÞ :¼ M;O ðaðOÞÞ \
AðOÞ ¼ C1; O 2 KðMÞ
Then the following statements hold: O3x

(i) The map fulfills isotony, that is, Hence, pointlike localized observables are neces-
sarily singular. Actually, the Wightman formulation
O1 O2 ) AðO1 Þ AðO2 Þ
of quantum field theory is based on the use of
for all O1 ; O2 2 KðMÞ
distributions on spacetime with values in the algebra
of observables (as a topological -algebra). In spite
(ii) If there exists a group G of isometric diffeo-
of technical complications whose physical signifi-
morphisms  : M ! M (so that   g = g) preser-
cance is unclear, this formalism is well suited for a
ving orientation and time orientation, then there
discussion of the connection with the Euclidean
is a representation G 3  7! ˜  of G by C -
theory, which allows, in fortunate cases, a treatment
algebra automorphisms ˜  : a(M) ! a(M)
by path integrals; it is more directly related to
such that
models and admits, via the operator-product expan-
~ ðAðOÞÞ ¼ AððOÞÞ;
 O 2 KðMÞ sion, a study of the short-distance behavior. It is,
therefore, an important question how the algebraic
(iii) If the theory given by a is additionally causal, approach is related to the Wightman formalism. The
then it holds that reader is referred to the literature for exploring the
results on this relation.
½AðO1 Þ; AðO2 Þ ¼ f0g
Whereas these results point to an essential equiva-
for all O1 , O2 2 K(M) with O1 causally sepa- lence of both formalisms, one needs in addition a
rated from O2 . criterion for the existence of sufficiently many Wight-
man fields associated with a given local net. Such a
These properties are just the basic assumptions of
criterion can be given in terms of a compactness
the Araki–Haag–Kastler framework.
condition to be discussed in the next subsection. As a
benefit, one derives an operator-product expansion
which has to be assumed in the Wightman approach.
The Achievements of the Traditional In the purely algebraic approach, the ultraviolet
Approach structure has been investigated by Buchholz and
Verch. Small-scale properties of theories are studied
In the Araki–Haag–Kastler approach in Minkowski
with the help of the so-called scaling algebras whose
spacetime M, many results have been obtained in
elements can be described as orbits of observables
the last 40 years, some of them also becoming a
under all possible renormalization group motions.
source of inspiration to mathematics. A description
There results a classification of theories in the scaling
of the achievements can be organized in terms of a
limit which can be grouped into three broad classes:
length-scale basis, from the small to the large. We
theories for which the scaling limit is purely classical
assume in this section that the algebra a(M) is
(commutative algebras), those for which the limit is
faithfully and irreducibly represented on a Hilbert
essentially unique (stable ultraviolet fixed point) and
space H, that the Poincaré transformations are
not classical, and those for which this is not the case
unitarily implemented with positive energy, and
(unstable ultraviolet fixed point). This classification
that the subspace of Poincaré invariant vectors is
does not rely on perturbation expansions. It allows
one dimensional (uniqueness of the vacuum).
an intrinsic definition of confinement in terms of the
Moreover, algebras correponding to regions which
so-called ultraparticles, that is, particles which are
are spacelike to a nonempty open region are
visible only in the scaling limit.
assumed to be weakly closed (i.e., von Neumann
algebras on H), and the condition of weak
Phase-Space Analysis
additivity is fulfilled, that is, for all O 2 K(M)
the algebra generated from the algebras As far as finite distances are concerned, there are
A(O þ x), x 2 M is weakly dense in a(M). two apparently competing principles, those of
202 Algebraic Approach to Quantum Field Theory

nuclearity and modularity. The first one suggests geometrical meaning. Indeed, these authors showed
that locally, after a cutoff in energy, one has a for the pair (A(W), ), where W denotes the wedge
situation similar to that of old quantum mechanics, region W = {x 2 M j jx0 j < x1 }, that the associated
namely a finite number of states in a finite volume modular unitary it is the Lorentz boost with velocity
of phase space. Aiming at a precise formulation, tanh(2t) in the direction 1 and that the modular
Haag and Swieca introduced their notion of com- conjugation J is the CP1 T symmetry operator with
pactness, which Buchholz and Wichmann sharpened parity P1 the reflection with respect to the x1 = 0
into that of nuclearity. The latter authors proposed plane. Later, Borchers discovered that already on the
that the set generated from the vacuum vector , purely algebraic level a corresponding structure exists.
He proved that, given any standard pair (A, ) and a
feH A j A 2 AðOÞ; kAk < 1g one-parameter group of unitaries ! U( ) acting on
H denoting the generator of time translations the Hilbert space H with a positive generator and
(Hamiltonian), is nuclear for any  > 0, roughly such that  is invariant and U( )AU( ) A, > 0,
stating that it is contained in the image of the unit then the associated modular operators  and J fulfill
ball under a trace class operator. The nuclear size the commutation relations
Z(,O) of the set plays the role of the partition
it Uð Þit ¼ Uðe2t Þ
function of the model and has to satisfy certain
bounds in the parameter . The consequence of this JUð ÞJ ¼ Uð Þ
constraint is the existence of product states, namely
which are just the commutation relations between
those normal states for which observables localized in
boosts and lightlike translations.
two given spacelike separated regions are uncorre-
Surprisingly, there is a direct connection between
lated. A further consequence is the existence of
the two concepts of nuclearity and modularity.
thermal equilibrium states (KMS states) for all  > 0.
Indeed, in the nuclearity condition, it is possible to
The second principle concerns the fact that, even
replace the Hamiltonian operator by a specific
locally, quantum field theory has infinitely many
function of the modular operator associated with a
degrees of freedom. This becomes visible in the
slightly larger region. Furthermore, under mild
Reeh–Schlieder theorem, which states that every
conditions, nuclearity and modularity together
vector  which is in the range of eH for some
determine the structure of local algebras completely;
 > 0 (in particular, the vacuum ) is cyclic and
they are isomorphic to the unique hyperfinite type
separating for the algebras A(O), O 2 K(M), that is,
III1 von Neumann algebra.
A(O) is dense in H ( is cyclic) and A = 0, A 2
A(O) implies A = 0 ( is separating). The pair Sectors, Symmetries, Statistics, and Particles
(A(O), ) is then a von Neumann algebra in the
so-called standard form. On such a pair, the Large scales are appropriate for discussing global
Tomita–Takesaki theory can be applied, namely issues like superselection sectors, statistics and
the densely defined operator symmetries as far as large spacelike distances are
concerned, and scattering theory, with the resulting
SA ¼ A ; A 2 AðOÞ notions of particles and infraparticles, as far as large
timelike distances are concerned.
is closable, and the polar decomposition of its
In purely massive theories, where the vacuum
closure 
S = J1=2 delivers an antiunitary involution
sector has a mass gap and the mass shell of the
J (the modular conjugation) and a positive self-
particles are isolated, a very satisfactory description
adjoint operator  (the modular operator) asso-
of the multiparticle structure at large times can be
ciated with the standard pair (A(O), ). These
given. Using the concept of almost local particle
operators have the properties
generators,
JAðOÞJ ¼ AðOÞ0  ¼ AðtÞ
where the prime denotes the commutant, and where  is a single-particle state (i.e., an eigenstate
it
 AðOÞ it
¼ AðOÞ; t2R of the mass operator), A(t) is a family of almost
local operators essentially localized in the kinema-
The importance of this structure is based on the tical region accessible from a given point by a
fact disclosed by Bisognano and Wichmann using motion with the velocities contained in the spectrum
Poincaré-covariant Wightman fields and local alge- of , one obtains the multiparticle states as limits of
bras generated by them, that for specific regions in products A1 (t)    An (t) for disjoint velocity sup-
Minkowski spacetime the modular operators have a ports. The corresponding closed subspaces are
Algebraic Approach to Quantum Field Theory 203

invariant under Poincaré transformations and are representation of the symmetric group. One may then
unitarily equivalent to the Fock spaces of noninter- enlarge the algebra of observables and obtain an
acting particles. algebra of operators which transform covariantly
For massless particles, no almost-local particle under the global gauge group and satisfy Bose or
generators can be expected to exist. In even Fermi commutation relations for spacelike separation.
dimensions, however, one can exploit Huygens In two spacetime dimensions, one obtains instead
principle to construct asymptotic particle generators braided tensor categories. They have been classified
which are in the commutant of the algebra of the under additional conditions (conformal symmetry,
forward or backward lightcone, respectively. Again, central charge c < 1) in a remarkable work by
their products can be determined and multiparticle Kawahigashi and Longo. Moreover, in their paper,
states obtained. one finds that by using completely new methods (Q-
Much less well understood is the case of massive systems) a new model is unveiled, apparently
particles in a theory which also possesses massless inaccessible by methods used by others. To some
particles. Here, in general, the corresponding states extent, these categories can be interpreted as duals
are not eigenstates of the mass operator. Since of generalized quantum groups.
quantum electrodynamics (QED) as well as the The question arises whether all representations
standard model of elementary particles have this describing elementary particles are, in the massive
problem, the correct treatment of scattering in these case, DHR representations. One can show that in the
models is still under discussion. One attempt to a case of a representation with an isolated mass shell
correct treatment is based on the concept of the so- there is an associated vacuum representation which
called particle weights, that is, unbounded positive becomes equivalent to the particle representation after
functionals on a suitable algebra. This algebra is restriction to observables localized spacelike to a given
generated by positive almost-local operators annihi- infinitely extended spacelike cone. This property is
lating the vacuum and interpreted as counters. weaker than the DHR condition but allows, in four
The structure at large spacelike scales may be spacetime dimensions, the same construction of a
analyzed by the theory of superselection sectors. The global gauge group and of covariant fields with Bose
best-understood case is that of locally generated and Fermi commutation relations, respectively, as the
sectors which are the objects of the DHR theory. DHR condition. In three space dimensions, however,
Starting from a distinguished representation 0 one finds a braided tensor category, which has similar
(vacuum representation) which is assumed to fulfill properties as those known from topological field
the Haag duality, theories in three dimensions.
   0 The sector structure in massless theories is not
0 AðOÞ ¼ 0 AðO0 Þ well understood, due to the infrared problem. This is
in particular true for QED.
for all double cones O, one may look at all
representations which are equivalent to the vacuum
representation if restricted to the observables loca-
Fields as Natural Transformations
lized in double cones in the spacelike complement of
a given double cone. Such representations give rise In order to be able to interpret the theory in terms of
to endomorphisms of the algebra of observables, measurements, one has to be able to compare
and the product of endomorphisms can be inter- observables associated with different regions of
preted as a product of sectors (‘‘fusion’’). In general, spacetime, or, even different spacetimes. In the
these representations violate the Haag duality, but absence of nontrivial isometries, such a comparison
there is a subclass of the so-called finite statistics can be made in terms of locally covariant fields. By
sectors where the violation of Haag duality is small, definition, these are natural transformations from
in the sense that the nontrivial inclusion the functor of quantum field theory to another
   0 functor on the category of spacetimes Loc.
 AðOÞ  AðO0 Þ The standard case is the functor which associates
with every spacetime M its space D(M) of smooth
has a finite Jones index. These sectors form (in at least
compactly supported test functions. There, the
three spacetime dimensions) a symmetric tensor
morphisms are the pushforwards D
 .
category with some further properties which can be
identified, in a generalization of the Tannaka–Krein Definition 2 A locally covariant quantum field  is
theorem, as the dual of a unique compact group. This a natural transformation between the functors d
group plays the role of a global gauge group. The and a, that is, for any object M in obj(Loc) there
symmetry of the category is expressed in terms of a exists a morphism M : D(M) ! a(M) such that for
204 Algebraic Approach to Quantum Field Theory

any pair of objects M1 and M2 and any morphism Field Theory: Fundamental Concepts and Tools;
between them, the following diagram commutes: Scattering in Relativistic Quantum Field Theory: The
Analytic Program; Spin Foams; Symmetries in Quantum
M1
DðM1 Þ ! AðM1 Þ Field Theory: Algebraic Aspects; Symmetries in Quantum
Field Theory of Lower Spacetime Dimensions;
# # Tomita–Takesaki Modular Theory; Two-Dimensional
Models; von Neumann Algebras: Introduction, Modular
DðM2 Þ ! AðM2 Þ Theory and Classification Theory; von Neumann
M2
Algebras: Subfactor Theory.

The commutativity of the diagram means, expli-


citly, that
Further Reading
   M1 ¼  M2  
Araki H (1999) Mathematical Theory of Quantum Fields.
which is the requirement sought for the covariance Oxford: Oxford University Press.
of fields. It contains, in particular, the standard Baumgærtel H and Wollemberg M (1992) Causal Nets of
covariance condition for spacetime isometries. Operator Algebras. Berlin: Akademie Verlag.
Borchers HJ (1996) Translation Group and Particle Representa-
Fields in the above sense are not necessarily linear. tion in Quantum Field Theory, Lecture Notes in Physics. New
Examples for fields which are also linear are the scalar Series m: Monographs, 40. Berlin: Springer.
massive free Klein–Gordon fields on all globally Borchers HJ (2000) On revolutionizing quantum field theory with
hyperbolic spacetimes and its locally covariant Wick Tomita’s modular theory. Journal of Mathematical Physics 41:
polynomials. In particular, the energy–momentum 3604–3673.
Bratteli O and Robinson DW (1987) Operator Algebras and
tensors can be constructed as locally covariant fields, Quantum Statistical Mechanics, vol. 1 Berlin: Springer.
and they provide a crucial tool for discussing the back- Brunetti R and Fredenhagen K (2000) Microlocal analysis and
reaction problem for matter fields. interacting quantum field theories: renormalization on physi-
An example for the more general notion of a field cal backgrounds. Communications in Mathematical Physics
are the local S-matrices in the Stückelberg–Bogolubov– 208: 623–661.
Brunetti R, Fredenhagen K, and Verch R (2003) The generally
Epstein–Glaser sense. These are unitaries SM () with covariant locality principle – a new paradigm for local
M 2 obj(Loc) and  2 D(M) which satisfy the quantum field theory. Communications in Mathematical
conditions Physics 237: 31–68.
Buchholz D and Haag R (2000) The quest for understanding in
SM ð0Þ ¼ 1 relativistic quantum physics. Journal of Mathematical Physics
41: 3674–3697.
SM ð þ
þ Þ ¼ SM ð þ
ÞSM ð
Þ1 SM ð
þ Þ Dixmier J (1964) Les C-algèbres et leurs représentations. Paris:
Gauthier-Villars.
for ,
, 2 D(M) such that the supports of  and Doplicher S, Haag R, and Roberts JE (1971) Local observables
can be separated by a Cauchy surface of M with and particle statistics I. Communications in Mathematical
supp  in the future of the surface. Physics 23: 199–230.
The importance of these S-matrices relies on the Doplicher S, Haag R, and Roberts JE (1974) Local observables
and particle statistics II. Communications in Mathematical
fact that they can be used to define a new quantum
Physics 35: 49–85.
field theory. The new theory is locally covariant if the Evans DE and Kawahigashi Y (1998) Quantum Symmetries on
original theory is and if the local S-matrices satisfy Operator Algebras. New York: Clarendon Press.
the condition of the locally covariant field above. A Haag R (1996) Local Quantum Physics, 2nd edn. Berlin: Springer.
perturbative construction of interacting quantum Haag R and Kastler D (1964) An algebraic approach to quantum
field theory. Journal of Mathematical Physics 5: 848–861.
field theory on globally hyperbolic spacetimes was
Hollands S and Wald RM (2001) Local Wick polynomials and time
completed in this way by Hollands and Wald, based ordered products of quantum fields in curved spacetime. Commu-
on previous work by Brunetti and Fredenhagen. nications in Mathematical Physics 223: 289–326.
Hollands S and Wald RM (2002) Existence of local covariant
See also: Axiomatic Quantum Field Theory; Constructive time ordered products of quantum field in curved spacetime.
Quantum Field Theory; Current Algebra; Deformation Communications in Mathematical Physics 231: 309–345.
Quantization and Representation Theory; Dispersion Kastler D (ed.) (1990) The Algebraic Theory of Superselection Sectors.
Relations; Indefinite Metric; Integrability and Quantum Introductions and Recent Results. Singapore: World Scientific.
Kawahigashi Y and Longo R (2004) Classification of local
Field Theory; Operads; Perturbative Renormalization
conformal nets. Case c < 1. Annals of Mathematics 160: 1–30.
Theory and BRST; Quantum Central Limit Theorems; Takesaki M (2003) Theory of Operator Algebras I, II, III,
Quantum Field Theory: A Brief Introduction; Quantum Encyclopedia of Mathematical Sciences, vols. 124, 125, 127.
Field Theory in Curved Spacetime; Quantum Fields Berlin: Springer.
with Indefinite Metric: Non-Trivial Models; Quantum Wald RM (1994) Quantum Field Theory in Curved Spacetime
Fields with Topological Defects; Quantum Geometry and Black Hole Thermodynamics. Chicago: University of
and its Applications; Scattering in Relativistic Quantum Chicago Press.
Anomalies 205

Anderson Localization see Localization for Quasi-Periodic Potentials

Anomalies
S L Adler, Institute for Advanced Study, Princeton, NJ, A
USA
ª 2006 Elsevier Ltd. All rights reserved.

Synopsis V V
Figure 1 The AVV triangle diagram responsible for the abelian
Anomalies are the breaking of classical symmetries by chiral anomaly.
quantum mechanical radiative corrections, which arise
when the regularizations needed to evaluate small
with F (x) = @  B (x)  @  B (x) the electromagnetic
fermion loop Feynman diagrams conflict with a
field strength tensor. The second term in eqn [2],
classical symmetry of the theory. They have important
which would be unexpected from the application of
implications for a wide range of issues in quantum
the classical Noether theorem, is the abelian axial-
field theory, mathematical physics, and string theory.
vector anomaly (often called the Adler–Bell–Jackiw
(or ABJ) anomaly after the seminal papers on the
subject). Since vector current conservation, together
Chiral Anomalies, Abelian with the axial-vector current anomaly, implies that
and Nonabelian the left- and right-handed chiral currents j  j5 are
Consider quantum electrodynamics, with the fer- also anomalous, the axial-vector anomaly is fre-
mionic Lagrangian density quently called the ‘‘chiral anomaly,’’ and we shall
use the terms interchangeably in this article.
L ¼ ði  @  e0   B  m0 Þ ½1a There are a number of different ways to understand
where  = y  0 , e0 and m0 are the bare charge and why the extra term in eqn [2] appears. (1) Working
mass, and B is the electromagnetic gauge potential. through the formal Feynman diagrammatic Ward
(We reserve the notation A for axial-vector quan- identity proof of the Noether theorem, one finds that
tities.) Under a chiral transformation there is a step where the closed fermion loop contribu-
tions are eliminated by a shift of the loop-integration
! ei5 ½1b variable. For Feynman diagrams that are convergent,
this is not a problem, but the AVV diagram is linearly
with constant , the kinetic term in eqn [1a] is
divergent. The linear divergence vanishes under sym-
invariant (because 5 commutes with  0   ), whereas
metric integration, but the shift then produces a finite
the mass term is not invariant. Therefore, naive
residue, which gives the anomaly. (2) If one defines the
application of Noether’s theorem would lead one to
AVV diagram by Pauli–Villars regularization with
expect that the axial-vector current
regulator mass M0 that is allowed to approach infinity
j5 ¼  5 ½1c at the end of the calculation, one finds a classical
Noether theorem in the regulated theory,
obtained from the Lagrangian density by applying a
chiral transformation with spatially varying , should @  j5 jm0  @  j5 jM0 ¼ 2im0 j5 jm0  2iM0 j5 jM0 ½3a
have a divergence given by the change under chiral
transformation of the mass term in eqn [1a]. Up to with the subscripts m0 and M0 indicating that
tree approximation, this is indeed true, but when one fermion loops are to be calculated with fermion
computes the AVV Feynman diagram with one axial- mass m0 and M0 , respectively. Taking the vacuum
vector and two vector vertices (see Figure 1), and to two-photon matrix element of eqn [3a], one finds
insists on conservation of the vector current that the matrix element h0jj5 jM0 ji, which is
j =  , one finds that to order e20 , the classical unambiguously computable after imposing vector-
Noether theorem is modified to read current conservation, falls off only as M1
0 as the
regulator mass approaches infinity. Thus, the
e20  product of 2iM0 with this matrix element has a
@  j5 ðxÞ ¼ 2im0 j5 ðxÞ þ F ðxÞF ðxÞ
 ½2
162 finite limit, which gives the anomaly. (3) If the
206 Anomalies

gauge-invariant axial-vector current is defined by @  ja5 ðxÞ ¼ normal divergence term


point-splitting
þ ð1=42 Þ
  traA ½ð1=4ÞFV ðxÞFV ðxÞ
þ ð1=12ÞFA ðxÞFA ðxÞ

j5 ðxÞ ¼ ðx þ
=2Þ 5 ðx 
=2Þeie0
B ðxÞ ½3b
þ ð2=3ÞiA ðxÞA ðxÞFV ðxÞ
with
! 0 at the end of the calculation, one
þ ð2=3ÞiFV ðxÞA ðxÞA ðxÞ
observes that the divergence of eqn [3b] contains
an extra term with a factor of
. On careful þ ð2=3ÞiA ðxÞFV  ðxÞA ðxÞ
evaluation, one finds that the coefficient of this  ð8=3ÞA ðxÞA ðxÞA ðxÞA ðxÞ ½4b
factor is an expression that behaves as
1 , which
gives the anomaly in the limit of vanishing
. (4) In eqn [4b], ‘‘tr’’ denotes a trace over internal
Finally, if the field theory is defined by a functional degrees of freedom, and aA is the internal symmetry
integral over the classical action, the standard matrix associated with the axial-vector external
Noether analysis shows that the classical action is field. In the abelian case, where there is no internal
invariant under the chiral transformation of eqn symmetry structure, the terms involving two or four
[1b], apart from the contribution of the mass term, factors of A , A , . . . vanish by antisymmetry of
which gives the naive axial-vector divergence. How-
  , and one recovers the AVV triangle anomaly,
ever, as pointed out by Fujikawa, the chiral as well as a kinematically related anomaly in the
transformation must also be applied to the func- AAA triangle diagram. In the nonabelian case, with
tional integration measure, and since the measure is nontrivial internal symmetry structure, there are also
an infinite product, it must be regularized to be well box- and pentagon-diagram anomalies.
defined. Careful calculation shows that the regular- In addition to coupling to spin-1 gauge fields,
ized measure is not chiral invariant, but contributes fermions can also couple to spin-2 gauge fields,
an extra term to the axial-vector Ward identity that associated with the graviton. When the coupling of
is precisely the chiral anomaly. fermions to gravitation is taken into account, the
A key feature of the anomaly is that it is axial-vector current T 5 , with T an internal
irreducible: a local polynomial counter term cannot symmetry matrix, has an additional anomalous
be added to the AVV diagram that preserves contribution to its divergence proportional to
vector-current conservation and eliminates the
anomaly. More generally, one can show that there tr T
 R R ½4c
is no way of modifying quantum electrodynamics
so as to eliminate the chiral anomaly, without where R is the Riemann curvature tensor of the
spoiling either vector-current conservation (i.e., gravitational field.
electromagnetic gauge invariance), renormalizabil-
ity, or unitarity. Thus, the chiral anomaly is a new
physical effect in renormalizable quantum field
Chiral Anomaly Nonrenormalization
theory, which is not present in the prequantization A salient feature of the chiral anomaly is the fact
classical theory. that it is not renormalized by higher-order radia-
The abelian chiral anomaly is the simplest case of tive corrections. In other words, the one-loop
the anomaly phenomenon. It was extended to expressions of eqns [2] and [4b] give the exact
nonabelian gauge theories by Bardeen using a anomaly coefficient without modification in higher
point-splitting method to compute the divergence, orders of perturbation theory. In gauge theories
followed by adding polynomial counter terms to such as quantum electrodynamics and quantum
remove as many of the residual terms as possible. chromodynamics, this result (the Adler–Bardeen
The resulting irreducible divergence is the nonabe- theorem) can be understood heuristically as fol-
lian chiral anomaly, which in terms of Yang–Mills lows. Write down a modified Lagrangian, in
field strengths for vector and axial-vector gauge which regulators are included for all gauge-boson
potentials V  and A , fields. Since the gauge-boson regulators do not
influence the chiral-symmetry properties of the
FV ðxÞ ¼ @  V ðxÞ  @ V  ðxÞ  i½V  ðxÞ; V ðxÞ theory, the divergences of the chiral currents are
 i½A ðxÞ; A ðxÞ not affected by their inclusion, and so the only
 ½4a
FA ðxÞ ¼ @  A ðxÞ  @ A ðxÞ  i½V  ðxÞ; A ðxÞ sources of anomalies in the regularized theory are
 i½A ðxÞ; V ðxÞ small single-fermion loops, giving the anomaly
expressions of eqns [2] and [4b]. Since the
is given by renormalized theory is obtained as the limit of
Anomalies 207

the regularized theory as the regulator masses quarks (or an equivalent Han–Nambu triplet), eqn
approach infinity, this result applies to the [6b] gives the correct neutral pion decay rate. This
renormalized theory as well. calculation was one of the first pieces of evidence for
The above argument can be made precise, and the color degree of freedom of quarks.
extends to nongauge theories such as the -model as
well. For both gauge theories and the -model, Anomaly Cancellation in Gauge Theories
cancellation of radiative corrections to the anomaly
coefficient has been explicitly demonstrated in In quantum electrodynamics, the gauge particle (the
fourth-order calculations. Nonperturbative demon- photon) couples to the vector current, and so the
strations of anomaly renormalization have also been anomalous conservation properties of the axial-
given using the Callan–Symanzik equations. For vector current have no effect. The same statement
example, in quantum electrodynamics, Zee, and holds for the gauge gluons in quantum chromody-
Lowenstein and Schroer, showed that a factor f namics, when treated in isolation from the other
that gives the ratio of the true anomaly to its one- interactions. However, in the electroweak theory
loop value obeys the differential equation that embeds quantum electrodynamics in a theory of
  the weak force, the gauge particles (the W  and Z
@ @ intermediate bosons) couple to chiral currents,
m þ ð Þ f ¼0 ½5
@m @ which are left- or right-handed linear combinations
Since f is dimensionless, it can have no dependence of the vector and axial-vector currents. In this case,
on the mass m, and since ( ) is nonzero this implies the chiral anomaly leads to problems with the
@f =@ = 0. Thus, f has no dependence on , and so renormalizability of the theory, unless the anomalies
f = 1. cancel between different fermion species. Writing all
fermions as left-handed, the condition for anomaly
cancellation is
Applications of Chiral Anomalies
trfT ; T gT ¼ trðT T þ T T ÞT ¼ 0
Chiral anomalies have numerous applications in the for all ; ;  ½7
standard model of particle physics and its exten-
sions, and we describe here a few of the most with T the coupling matrices of gauge bosons to
important ones. left-handed fermions. These conditions are obeyed
in the standard model, by virtue of three nontrivial
Neutral Pion Decay p 0 ! g g sum rules on the fermion gauge couplings being
satisfied (four sum rules, if one includes the
As a result of the abelian chiral anomaly, the
gravitational contribution to the chiral anomaly
partially conserved axial-vector current (PCAC)
given in eqn [4c], which also cancels in the standard
equation relevant to neutral pion decay is modified
model). Note that anomaly cancellation in the
to read
locally gauged currents of the standard model does
@  F 53 ðxÞ not imply anomaly cancellation in global-flavor
 pffiffiffi 0 currents. Thus, the flavor axial-vector current
¼ f 2 = 2  ðxÞ þ S F ðxÞF ðxÞ
 ½6a anomaly that gives the 0 !  matrix element
4
remains anomalous in the full electroweak theory.
with  the pion mass, f ’ 131 MeV the charged- Anomaly cancellation imposes important constraints
pion decay constant, and S a constant determined on the construction of grand unified models that
by the constituent fermion charges and axial-vector combine the electroweak theory with quantum
couplings. Taking the matrix element of eqn [6a] chromodynamics. For instance, in SU(5) the fer-
between the vacuum state and a two-photon state, mions are put into a 5 and 10 representation, which
and using the fact that the left-hand side has a together, but not individually, are anomaly free. The
kinematic zero (the Sutherland–Veltman theorem), larger unification groups SO(10) and E6 satisfy eqn
one sees that the 0 !  amplitude F is comple- [7] for all representations, and so are automatically
tely determined by the anomaly term, giving the anomaly free.
formula
pffiffiffi
F ¼ ð =Þ2S 2=f ½6b Instanton Physics and the Theta Vacuum
For a single set of fractionally charged quarks, the The theory of anomalies is intimately tied to the
amplitude F is a factor of three too small to agree physics associated with instanton classical Yang–
with experiment; for three fractionally charged Mills theory solutions. Since the instanton field
208 Anomalies

strength is self-dual, the nonvanishing instanton has the same anomaly coefficient as that in the
Euclidean action underlying theory. In other words, we must have
Z
1
SE ¼ d4 x F F ¼ 82 ½8a trfS ; S gS ¼ trfT ; T gT ½9
4
implies that the integral of the pseudoscalar density To prove this, one adjoins to the theory a set of
F F
  over the instanton is also nonzero, right-handed spectator fermions f with the same
Z flavor structure as the original set, but which are not
d4 xF F
  ¼ 642 ½8b acted on by the color force. These right-handed
fermions cancel the original anomaly, making the
Referring back to eqn [4b], this means that the underlying theory anomaly free at zero color
integral of the nonabelian chiral anomaly for coupling; since dynamics cannot spontaneously
fermions in the background field of an instanton is generate anomalies, the theory, when the color
an integer, which in the Minkowski space continua- dynamics is turned on, must also have no global
tion has the interpretation of a topological winding chiral anomalies. This implies that the bound-state
number change produced by the instanton tunneling spectrum must conspire to cancel the anomalies
solution. This fact has a number of profound associated with the right-handed spectators; in other
consequences. Since a vacuum with a definite wind- words, the bound-state anomaly structure must
ing number j i is unstable under instanton tunnel- match that of the original fermions. This anomaly
ing, careful analysis shows that the nonabelian matching condition has found applications in the
vacuum that has correct clustering properties is a study of the possible compositeness of quarks and
Fourier superposition leptons. It has also been applied to the derivation of
X nonperturbative dynamical results in whole classes
ji ¼ ei j i ½8c of supersymmetric theories, where the combined
tools of holomorphicity, instanton physics, and
anomaly matching have given incisive results.
giving rise to the -vacuum of quantum chromody-
namics, and a host of issues associated with (the lack
of) strong CP violation, the Peccei–Quinn mecha-
nism, and axion physics. Also, the fact that the Global Structure of Anomalies
integral of eqn [8b] is nonzero means that the U(1) We noted earlier that chiral anomalies are irreduci-
chiral symmetry of quantum chromodynamics is ble, in that they cannot be eliminated by adding a
broken by instantons, which as shown by ’t Hooft local polynomial counter-term to the action. How-
resolves the longstanding ‘‘U(1) problem’’ of strong ever, anomalies can be described by a nonlocal
interactions, that of explaining why the flavor effective action, obtained by integrating out the
singlet pseudoscalar meson 0 is not light, unlike its fermion field dynamics, and this point of view proves
flavor octet partners. very useful in the nonabelian case. Starting with the
abelian case for orientation, we note that if A is an
Anomaly Matching Conditions external axial-vector field, and we write an effective
The anomaly structure of a theory, as shown by ’t action [A], then the axial-vector current j5 asso-
Hooft, leads to important constraints on the forma- ciated with A is given (up to an overall constant) by
tion of massless composite bound states. Consider a the variational derivative expression
theory with a set of left-handed fermions if , with i a ½A
‘‘color’’ index acted on by a nonabelian gauge force, j5 ðxÞ ¼ ½10a
A ðxÞ
and f an ungauged family or ‘‘flavor’’ index. Suppose
that the family multiplet structure is such that the and the abelian anomaly appears as the fact that the
global chiral symmetries associated with the flavor expression
index have nonvanishing anomalies tr{T , T }T .
Then the ’t Hooft condition asserts that if the color 
@  j5 ¼ X½A ¼ G 6¼ 0; X ¼ @ ½10b
forces result in the formation of composite massless A ðxÞ
bound states of the original completely confined
fermions, and if there is no spontaneous breaking of is nonvanishing even when the theory is classically
the original global flavor symmetries, then these chiral invariant. Turning now to the nonabelian
bound states must contain left-handed spin-1/2 case, the variational derivative appearing in eqns
composites with a representation structure S that [10a] and [10b] must be replaced by an appropriate
Anomalies 209

covariant derivative. In terms of the internal- the consistency conditions. Subsequently, Witten
symmetry component fields Aa and Va of the gave a new construction of this local action, in
Yang–Mills potentials of eqn [4a], one introduces terms of the integral of a fifth-rank antisymmetric
operators tensor over a five-dimensional disk which has a
four-dimensional space as its boundary. He also
 
Xa ðxÞ ¼ @  þ fabc Vb c showed that requiring ei to be independent of the
Aa ðxÞ A ðxÞ choice of the spanning disk requires, in analogy with
 Dirac’s quantization condition for monopole charge,
þ fabc Ab
Vc ðxÞ the condition that the overall coefficient in the
½11a nonabelian anomaly be quantized in integer multi-
 
Y a ðxÞ ¼ @  a
þ fabc Vb c ples. Comparison with the lowest-order triangle
V ðxÞ V ðxÞ
diagram shows that in the case of SU(Nc ) gauge
 theory, this integer is just the number of colors Nc .
þ fabc Ab
Ac ðxÞ Thus, global considerations tightly constrain the
nonabelian chiral anomaly structure, and dictate
with fabc the antisymmetric nonabelian group struc- that up to an integer-proportionality constant, it
ture constants. The operators Xa and Y a are easily must have the form given in eqns [4a] and [4b].
seen to obey the commutation relations

½Xa ðxÞ; Xb ðyÞ ¼ fabc ðx  yÞYc ðxÞ Trace Anomalies


½Xa ðxÞ; Y b ðyÞ ¼ fabc ðx  yÞXc ðxÞ ½11b The discovery of chiral anomalies inspired the search
for other examples of anomalous behavior. First
½Y a ðxÞ; Y b ðyÞ ¼ fabc ðx  yÞYc ðxÞ indications of a perturbative trace anomaly obtained
in a study of broken scale invariance by Coleman and
Let [V, A] be the effective action as a functional of Jackiw were shown by Crewther, and by Chanowitz
the fields V  , A , constructed so that the vector and Ellis, to correspond to an anomaly in the three-
currents are covariantly conserved, as expressed point function  V V , where  is the energy–
formally by momentum tensor. Letting  (p) be the momentum
Y a ½V; A ¼ 0 ½12a space expression for this three-point function, and 
the corresponding V V two-point function, the trace
Then the nonabelian axial-vector current anomaly is anomaly equation in quantum electrodynamics reads
given by  
@
 ðpÞ ¼ 2  p  ðpÞ
Xa ½V; A ¼ Ga ½12b @p
R
From eqns [12a] and [12b] and the first line of  2 ðp p   p2 Þ ½13a
eqn [11b], we have 6
with the first term on the right-hand side the naive
Xb Ga  Xa Gb ¼ ðXb Xa  Xa Xb Þ½V; A divergence, and the second term the trace anomaly,
/ fabc Y c ½V; A ¼ 0 ½12c with anomaly coefficient R given by
X 1 X 2
which is the Wess–Zumino consistency condition on R¼ Q2i þ Q ½13b
the structure of the anomaly Ga . It can be shown 1
4 i;spin 0 i
i;spin 2
that this condition uniquely fixes the form of the
nonabelian anomaly to be that of eqn [4b], up to an The fact that there should be a trace anomaly can
overall constant, which can be determined by readily be inferred from a trace analog of the Pauli–
comparison with the simplest anomalous AVV Villars regulator argument for the chiral anomaly
triangle graph. A physical consequence of the given in eqn [3a]. Letting j =  be the scalar
consistency condition is that the 0 !  decay current in abelian electrodynamics, one has
amplitude determines uniquely certain other anom-  jm0   jM0 ¼ m0 jjm0  M0 jjM0 ½13c
alous amplitudes, such as 2 ! 3,  ! 3, and a
five pseudoscalar vertex. Taking the vacuum to two-photon matrix element
Although the action [V, A] is necessarily non- of this equation, and imposing vector-current con-
local, Wess and Zumino were able to write down a servation, one finds that the matrix element
local action, involving an auxiliary pseudoscalar h0jjjM0 ji is proportional to M1 
0 h0jF F jiM0
field, that obeys the anomalous Ward identities and for a large regulator mass, and so makes a
210 Anomalies

nonvanishing contribution to the right-hand side of sectors’’ of a theory, which do not contain the physical
eqn [13c], giving the lowest-order trace anomaly. fields that we directly observe, to the ‘‘physical sector’’
Unlike the chiral anomaly, the trace anomaly is containing the observed fields.
renormalized in higher orders of perturbation
theory; heuristically, the reason is that whereas
boson field regulators do not affect the chiral Further Anomaly Topics
symmetry properties of a gauge theory (which are The above discussion has focused on some of the
determined just by the fermionic terms in the principal features and applications of anomalies.
Lagrangian), they do alter the energy–momentum There are further topics of interest in the physics and
tensor, since gravitation couples to all fields, includ- mathematics of anomalies that are discussed in
ing regulator fields. An analysis using the Callan– detail in the references cited in the ‘‘Further reading’’
Symanzik equations shows, however, that the trace section. We briefly describe a few of them here.
anomaly is computable to all orders in terms of
various renormalization group functions of the Anomalies in Other Spacetime Dimensions
coupling. For example, in abelian electrodynamics, and in String Theory
defining ( ) and ( ) by ( ) = (m= )@ =@m and The focus above has been on anomalies in four-
1 þ ( ) = (m=m0 )@m0 =@m, the trace of the energy– dimensional spacetime, but anomalies of various
momentum tensor is given to all orders by types occur both in lower-dimensional quantum
field theories (such as theories in two- and three-
 ¼ ½1 þ ð Þm0  þ 14 ð ÞN½F F  þ    ½14 dimensional spacetimes) and in quantum field the-
ories in higher-dimensional spacetimes (such as N = 1
with N[ ] specifying conditions that make the division supergravity in ten-dimensional spacetime). Anoma-
into two terms in eqn [14] unique, and with the lies also play an important role in the formulation
ellipsis    indicating terms that vanish by the equa- and consistency of string theory. The bosonic string is
tions of motion. A similar relation holds in the consistent only in 26-dimensional spacetime, and the
nonabelian case, again with the function appearing analogous supersymmetric string only in ten-dimen-
as the coefficient of the anomalous tr N[F F ] term. sional spacetime, because in other dimensions both
Just as in the chiral anomaly case, when spin-0, these theories violate Lorentz invariance after quanti-
spin-1/2, or spin-1 fields propagate on a background zation. In the Polyakov path-integral formulation of
spacetime, there are curvature-dependent contribu- these string theories, these special dimensions are
tions to the trace anomaly, in other words, gravita- associated with the cancellation of the Weyl anomaly,
tional anomalies. These typically take the form of which is the relevant form of the trace anomaly
complicated linear combinations of terms of the discussed above. Yang–Mills, gravitational, and
form R2 , R R , R  R  , R, ; , with coefficients mixed Yang–Mills gravitational anomalies make an
depending on the matter fields involved. appearance both in N = 1 ten-dimensional super-
In supersymmetric theories, the axial-vector current gravity and in superstring theory, and again special
and the energy–momentum tensor are both dimensions play a role. In these theories, only when
components of the supercurrent, and so their anoma- the associated internal symmetry groups are either
lies imply the existence of corresponding supercurrent SO(32) or E8  E8 is elimination of all anomalies
anomalies. The issue of how the nonrenormalization possible, by cancellation of hexagon-diagram anoma-
of chiral anomalies (which have a supercurrent lies with anomalous tree diagrams involving
generalization given by the Konishi anomaly), and exchange of a massless antisymmetric two-form
the renormalization of trace anomalies, can coexist in field. This mechanism, due to Green and Schwarz,
supersymmetric theories originally engendered con- requires the factorization of a sixth-order trace
siderable confusion. This apparent puzzle is now invariant that appears in the hexagon anomaly in
understood in the context of a perturbatively exact terms of lower-order invariants, as well as two
expression for the function in supersymmetric field numerical conditions on the adjoint representation
theories (the so-called NSVZ, for Novikov, Shifman, generator structure, restricting the allowed gauge
Vainshtein, and Zakharov, function). Supersymme- groups to the two noted above.
try anomalies can be used to infer the structure of
effective actions in supersymmetric theories, and these
in turn have important implications for possibilities Covariant versus Consistent Anomalies;
Descent Equations
for dynamical supersymmetry breaking. Anomalies
may also play a role, through anomaly mediation, in The nonabelian anomaly of eqns [4a] and [4b] is
communicating supersymmetry breaking in ‘‘hidden called the ‘‘consistent anomaly,’’ because it obeys the
Anomalies 211

Wess–Zumino consistency conditions of eqn [12c]. spacetime integral of the anomaly is a topological
This anomaly, however, is not gauge covariant, as can invariant, as noted above in our discussion of
be seen from the fact that it involves not only the instanton-related applications of anomalies.

Yang–Mills field strengths FV, A , but the potentials
V  , A as well. It turns out to be possible, by adding
appropriate polynomials to the currents, to transform Retrospect
the consistent anomaly to a form, called the ‘‘covariant The wide range of implications of anomalies has
anomaly,’’ which is gauge covariant under gauge surprised – even astonished – the founders of the
transformations of the potentials V  , A . This anom- subject. New anomaly applications have appeared
aly, however, does not obey the Wess–Zumino within the last few years, and very likely the future
consistency conditions, and cannot be obtained from will see continued growth of the area of quantum
variation of an effective action functional. field theory concerned with the physics and mathe-
The consistent anomalies (but not the covariant matics of anomalies.
anomalies) obey a remarkable set of relations, called
the Stora–Zumino descent equations, which relate
the abelian anomaly in 2n þ 2 spacetime dimensions Acknowledgment
to the nonabelian anomaly in 2n spacetime dimen- This work is supported, in part, by the Depart-
sions. This set of equations has been interpreted ment of Energy under grant #DE-FG02-90ER40542.
physically by Callan and Harvey as reflecting the
fact that the Dirac equation has chiral zero modes in See also: Bosons and Fermions in External Fields;
the presence of strings in 2n þ 2 dimensions and of BRST Quantization; Effective Field Theories; Gauge
domain walls in 2n þ 1 dimensions. Theories from Strings; Gerbes in Quantum Field Theory;
Index Theorems; Lagrangian Dispersion (Passive
Anomalies and Fermion Doubling in Lattice Scalar); Lattice Gauge Theory; Nonperturbative and
Gauge Theories Topological Aspects of Gauge Theory; Quantum
Electrodynamics and Its Precision Tests; Quillen
A longstanding problem in lattice formulations of Determinant; Renormalization: General Theory;
gauge field theories is that when fermions are Seiberg–Witten Theory.
introduced on the lattice, the process of discretization
introduces an undesirable doubling of the fermion
particle modes. In particular, when an attempt is made Further Reading
to put chiral gauge theories, such as the electroweak Adler SL (1969) Axial-vector vertex in spinor electrodynamics.
theory, on the lattice, one finds that the doublers Physical Review 177: 2426–2438.
eliminate the chiral anomalies, by cancellation between Adler SL (1970) Perturbation theory anomalies. In: Deser S,
modes with positive and negative axial-vector charge. Grisaru M, and Pendleton H (eds.) Lectures on Elementary
Particles and Quantum Field Theory, vol. 1, pp. 3–164.
Thus, for a long time, it appeared doubtful whether Cambridge, MA: MIT Press.
chiral gauge theories could be simulated on the lattice. Adler SL (2005) Anomalies to all orders. In: ’t Hooft G (ed.) Fifty Years
However, recent work has led to formulations of lattice of Yang–Mills Theory, pp. 187–228. Singapore: World Scientific.
fermions that use a mathematical analog of a domain Adler SL and Bardeen WA (1969) Absence of higher order
wall to successfully incorporate chiral fermions and the corrections in the anomalous axial-vector divergence equation.
Physical Review 182: 1517–1536.
chiral anomaly into lattice gauge theory calculations. Bardeen W (1969) Anomalous ward identities in spinor field
theories. Physical Review 184: 1848–1859.
Relation of Anomalies to the Atiyah–Singer Bell JS and Jackiw R (1969) A PCAC puzzle: 0 !  in the
Index Theorem -model. Nuovo Cimento A 60: 47–61.
Bertlmann RA (1996) Anomalies in Quantum Field Theory.
The singlet (aA = 1) anomaly of eqn [4b] is closely Oxford: Clarendon.
related to the Atiyah–Singer index theorem. Specifi- De Azcárraga JA and Izquierdo JM (1995) Lie Groups,
Lie Algebras, Cohomology and Some Applications in Physics,
cally, the Euclidean spacetime integral of the singlet
ch. 10. Cambridge: Cambridge University Press.
anomaly constructed from a gauge field can be Fujikawa K and Suzuki H (2004) Path Integrals and Quantum
shown to give the index of the related Dirac Anomalies. Oxford: Oxford University Press.
operator for a fermion moving in that background Golterman M (2001) Lattice chiral gauge theories. Nuclear
gauge field, where the index is defined as the Physics Proceeding Supplements 94: 189–203.
difference between the numbers of right- and left- Green MB, Schwarz JH, and Witten E (1987) Superstring Theory.
vol. 2, sects. 13.3–13.5. Cambridge: Cambridge University Press.
handed zero-eigenvalue normalizable solutions of Hasenfratz P (2005) Chiral symmetry on the lattice. In: ’t Hooft G
the Dirac equation. Since the index is a topological (ed.) Fifty Years of Yang–Mills Theory, pp. 377–398.
invariant, this again implies that the Euclidean Singapore: World Scientific.
212 Arithmetic Quantum Chaos

Jackiw R (1985) Field theoretic investigations in current algebra Shifman M (1997) Non-perturbative dynamics in supersymmetric
and topological investigations in quantum gauge theories. In: gauge theories. Progress in Particle and Nuclear Physics 39:
Treiman S, Jackiw R, Zumino B, and Witten E (eds.) Current 1–116.
Algebra and Anomalies. Singapore: World Scientific and van Nieuwenhuizen P (1988) Anomalies in Quantum Field
Princeton: Princeton University Press. Theory: Cancellation of Anomalies in d = 10 Supergravity.
Jackiw R (2005) Fifty years of Yang–Mills theory and our Leuven: Leuven University Press.
moments of triumph. In: ’t Hooft G (ed.) Fifty Years of Yang– Volovik GE (2003) The Universe in a Helium Droplet, ch. 18.
Mills Theory, pp. 229–251. Singapore: World Scientific. Oxford: Clarendon.
Makeenko Y (2002) Methods of Contemporary Gauge Theory, Weinberg S (1996) The Quantum Theory of Fields, Vol. II
ch. 3. Cambridge: Cambridge University Press. Modern Applications, ch. 22. Cambridge: Cambridge
Neuberger H (2000) Chiral fermions on the lattice. Nuclear University Press.
Physics Proceeding Supplements 83: 67–76. Zee A (2003) Quantum Field Theory in a Nutshell, sect. IV.7.
Polchinski J (1999) String Theory, vol. 1, sect. 3.4; vol. 2, sect. Princeton: Princeton University Press.
12.2. Cambridge: Cambridge University Press.

Arithmetic Quantum Chaos


J Marklof, University of Bristol, Bristol, UK form a discrete spectrum with an asymptotic density
ª 2006 Elsevier Ltd. All rights reserved. governed by Weyl’s law
AreaðnHÞ
#fj : j  g  ; !1 ½3
4

Introduction We rescale the sequence by setting

The central objective in the study of quantum chaos AreaðnHÞ


Xj ¼ j ½4
is to characterize universal properties of quantum 4
systems that reflect the regular or chaotic features of which yields a sequence of asymptotic density 1.
the underlying classical dynamics. Most develop- One of the central conjectures in AQC says that, if
ments of the past 25 years have been influenced by M is an arithmetic hyperbolic surface (see the next
the pioneering models on statistical properties of section for examples of this very special class of
eigenstates (Berry 1977) and energy levels (Berry surfaces of constant negative curvature), the eigen-
and Tabor 1977, Bohigas et al. 1984). Arithmetic values of the Laplacian have the same local
quantum chaos (AQC) refers to the investigation of statistical properties as independent random vari-
quantum systems with additional arithmetic struc- ables from a Poisson process (see, e.g., the surveys by
tures that allow a significantly more extensive Sarnak (1995) and Bogomolny et al. (1997)). This
analysis than is generally possible. On the other means that the probability of finding k eigenvalues Xj
hand, the special number-theoretic features also in randomly shifted interval [X, X þ L] of fixed
render these systems nongeneric, and thus some of length L is distributed according to the Poisson law
the expected universal phenomena fail to emerge. Lk eL =k!. The gaps between eigenvalues have an
Important examples of such systems include the exponential distribution,
modular surface and linear automorphisms of tori Z b
(‘‘cat maps’’) which will be described below. 1
#fj  N : Xjþ1  Xj 2 ½a; bg ! es ds ½5
The geodesic motion of a point particle on a N a
compact Riemannian surface M of constant nega- as N ! 1, and thus eigenvalues are likely to appear
tive curvature is the prime example of an Anosov in clusters. This is in contrast to the general
flow, one of the strongest characterizations of expectation that the energy level statistics of generic
dynamical chaos. The corresponding quantum chaotic systems follow the distributions of random
eigenstates ’j and energy levels j are given by the matrix ensembles; Poisson statistics are usually
solution of the eigenvalue problem for the Laplace– associated with quantized integrable systems.
Beltrami operator  (or Laplacian for short) Although we are at present far from a proof of [5],
the deviation from random matrix theory is well
ð þ Þ’ ¼ 0; k’kL2 ðMÞ ¼ 1 ½1 understood (see the section ‘‘Eigenvalue statistics
and Selberg trace formula’’).
where the eigenvalues Highly excited quantum eigenstates ’j (j ! 1)
(cf. Figure 1) of chaotic systems are conjectured to
0 ¼ 0 < 1  2     ! 1 ½2 behave locally like random wave solutions of [1],
Arithmetic Quantum Chaos 213

in the current physics and mathematics literature. A


first rigorous proof of the existence of scarred
eigenstates has recently been established in the case
of quantized toral automorphisms. Remarkably,
these quantum cat maps may also exhibit QUE. A
more detailed account of results for these maps is
given in the section ‘‘Quantum eigenstates of cat
maps’’; see also Rudnick (2001) and De Bièvre (to
appear).
There have been a number of other fruitful
interactions between quantum chaos and number
theory, in particular the connections of spectral
statistics of integrable quantum systems with the
Figure 1 Image of the absolute-value-squared of an eigenfunc- value distribution properties of quadratic forms, and
tion ’j (z) for a nonarithmetic surface of genus 2. The surface is analogies in the statistical behavior of energy levels
obtained by identifying opposite sides of the fundamental region. of chaotic systems and the zeros of the Riemann zeta
Reproduced from Aurich and Steiner (1993) Statistical properties of
function. We refer the reader to Marklof (2006) and
highly excited quantum eigenstates of a strongly chaotic system.
Physica D 64(1–3): 185–214, with permission from R Aurich. Berry and Keating (1999), respectively, for informa-
tion on these topics.

where boundary conditions are ignored. This Hyperbolic Surfaces


hypothesis was put forward by Berry in 1977 and
Let us begin with some basic notions of hyperbolic
tested numerically, for example, in the case of
geometry. The hyperbolic plane H may be abstractly
certain arithmetic and nonarithmetic surfaces of
defined as the simply connected two-dimensional
constant negative curvature (Hejhal and Rackner
Riemannian manifold with Gaussian curvature 1.
1992, Aurich and Steiner 1993). One of the
A convenient parametrization of H is provided by
implications is that eigenstates should have uniform
the complex upper-half plane, H = {x þ iy: x 2
mass on the surface M, that is, for any bounded
R, y > 0}, with Riemannian line and volume
continuous function g : M ! R
elements
Z Z
j’j j2 g dA ! g dA; j!1 ½6 dx2 þ dy2 dx dy
M M ds2 ¼ ; dA ¼ ½7
y2 y2
where dA is the Riemannian area element on M.
This phenomenon, referred to as quantum unique respectively. The group of orientation-preserving
ergodicity (QUE), is expected to hold for general isometries of H is given by fractional linear
surfaces of negative curvature, according to a transformations
conjecture by Rudnick and Sarnak (1994). In the
case of arithmetic hyperbolic surfaces, there has az þ b
H !H ; z 7!
been substantial progress on this conjecture in the cz þ d
  ½8
works of Lindenstrauss, Watson, and Luo–Sarnak a b
2 SLð2; RÞ
(discussed later in this article; see also the review by c d
Sarnak (2003)). For general manifolds with ergodic
geodesic flow, the convergence in [6] is so far where SL(2, R) is the group of 2  2 matrices with
established only for subsequences of eigenfunctions unit determinant. Since the matrices 1 and 1
of density 1 (Schnirelman–Zelditch–Colin de Verdière represent the same transformation, the group of
theorem, see Quantum Ergodicity and Mixing of orientation-preserving isometries can be identified
Eigenfunctions), and it cannot be ruled out that with PSL(2, R):= SL(2, R)={1}. A finite-volume
exceptional subsequences of eigenfunctions have hyperbolic surface may now be represented as the
singular limit, for example, localized on closed quotient nH, where  PSL(2, R) is a Fuchsian
geodesics. Such ‘‘scarring’’ of eigenfunctions, at least group of the first kind. An arithmetic hyperbolic
in some weak form, has been suggested by numerical surface (such as the modular surface) is obtained, if 
experiments in Euclidean domains, and the existence has, loosely speaking, some representation in n  n
of singular quantum limits is a matter of controversy matrices with integer coefficients, for some suitable n.
214 Arithmetic Quantum Chaos

This is evident in the case of the modular surface,


where the fundamental group is the modular group

 ¼ PSLð2; ZÞ
  
a b
¼ 2 PSLð2; RÞ: a; b; c; d 2 Z =f1g
c d
A fundamental domain for the action of the
modular group PSL(2, Z) on H is the set
 
F PSLð2;ZÞ ¼ z 2 H : jzj > 1;  12 < Re z < 12 ½9
Figure 3 Fundamental domain of the regular octagon in the
(see Figure 2). The modular group is generated by Poincaré disk.
the translation
 
1 1
: z 7! z þ 1
0 1 The group of orientation-preserving isometries is
now represented by PSU(1, 1) = SU(1, 1)={1},
and the inversion where
 
0 1   
: z 7! 1=z  
1 0 SUð1; 1Þ ¼ : ;  2 C; jj2  jj2 ¼ 1 ½11
 
These generators identify sections of the boundary
of F PSL(2, Z) . By gluing the fundamental domain acting on D as above via fractional linear transfor-
along identified edges, we obtain a realization of the mations. The fundamental group of the regular
modular surface, a noncompact surface with one octagon surface is the subgroup of all elements in
cusp at z ! 1,pand ffiffiffi two conic singularities at z = i PSU(1, 1) with coefficients of the form
and z = 1=2 þ i 3=2.
An interesting example of a compact arithmetic qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffi pffiffiffi
surface is the ‘‘regular octagon,’’ a hyperbolic  ¼ k þ l 2;  ¼ ðm þ n 2Þ 1þ 2 ½12
surface of genus 2. Its fundamental domain is
shown in Figure 3 as a subset of the Poincaré disc where k, l, m, n 2 Z[i], that is, Gaussian integers of
D = {z 2 C: jzj < 1}, which yields an alternative the form k1 þ ik2 , k1 , k2 2 Z. Note that not all
parametrization of the hyperbolic plane H. In these choices of k, l, m, n 2 Z[i] satisfy the condition
coordinates, the Riemannian line and volume jj2  jj2 = 1. Since all elements  6¼ 1 of  act
element read fix-point free on H, the surface nH is smooth
without conic singularities.
4ðdx2 þ dy2 Þ 4dx dy In the following, we will restrict our attention to a
ds2 ¼ ; dA ¼ ½10
ð1  x2  y2 Þ 2 ð1  x2  y2 Þ2 representative case, the modular surface with
 = PSL(2, Z).

y
Eigenvalue Statistics and Selberg
Trace Formula
The statistical properties of the rescaled eigenvalues
Xj (cf. [4]) of the Laplacian can be characterized by
their distribution in small intervals

N ðx; LÞ :¼ #fj : x  Xj  x þ Lg ½13

where x is uniformly distributed, say, in the


–1 0 1 x interval [X, 2X], X large. Numerical experiments
Figure 2 Fundamental domain of the modular group PSL(2, Z ) by Bogomolny, Georgeot, Giannoni, and Schmit,
in the complex upper-half plane. as well as Bolte, Steil, and Steiner (see references in
Arithmetic Quantum Chaos 215

Bogomolny (1997)) suggest that the Xj are asymp- where H is the set of all primitive oriented closed
totically Poisson distributed: geodesics , and ‘ their lengths. The quantity j is
related to the eigenvalue j by the equation j = 2j þ
Conjecture 1 For any bounded function g : Z
0 ! C
1=4. The trace formula [18] holds for a large class of
we have
even test functions h. For example, it is sufficient to
Z X1
1 2X Lk eL assume that h is infinitely differentiable, and that the
gðN ðx; LÞÞ dx ! gðkÞ ½14 Fourier transform of h,
X X k¼0
k!
Z
1
as T ! 1. gðtÞ ¼ hðÞ eit d ½19
2 R
One may also consider larger intervals, where
has compact support. The trace formula for non-
L ! 1 as X ! 1. In this case, the assumption on
compact surfaces has additional terms from the
the independence of the Xj predicts a central-limit
parabolic elements in the corresponding group, and
theorem. Weyl’s law [3] implies that the expectation
includes also sums over the resonances of the
value is asymptotically, for T ! 1,
continuous part of the spectrum. The noncompact
Z
1 2X modular surface behaves in many ways like a
N ðx; LÞ dx  L ½15 compact surface. In particular, Selberg showed that
X X
the number of eigenvalues embedded in the con-
This asymptotics holds for any sequence of L tinuous spectrum satisfies the same Weyl law as in
bounded away from zero (e.g., L constant, or the compact case (Sarnak 2003).
L ! 1). Setting
Define the variance by   
Z AreaðMÞ 2 1
1 2X 2
hðÞ ¼ ½X;XþL  þ ½20
2 ðX; LÞ ¼ ðN ðx; LÞ  LÞ dx ½16 4 4
X X
where [X, XþL] is the characteristic function of the
In view of the above conjecture, p one
ffiffiffiffi expects interval [X, X þ L], we may thus view N (X, L) as
2 (X, L)  L in the limit X ! 1, L= X ! 0 (the the left-hand side of the trace formula. The above
variance exhibits
pffiffiffiffi a less universal behavior in the test function h is, however, not admissible, and
range L X (the notation A B means there is a requires appropriate smoothing. Luo and Sarnak (cf.
constant c > 0 such that A  cB), cf. Sarnak (1995), Sarnak (2003)) developed an argument of this type
and a central-limit theorem for the fluctuations to obtain a lower bound on the average number
around the mean: variance,
Conjecture 2 For any bounded function g : R ! C Z pffiffiffiffi
1 L 2 X
we have  ðX; L0 Þ dL0 ½21
! L 0 ðlog XÞ2
Z 2X pffiffiffiffi pffiffiffiffi
1 N ðx; LÞ  L
g pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx in the regime X= log X L X, which is
X X 2 ðx; LÞ consistent with the Poisson conjecture 2 (X, L)  L.
Z 1
1 2 Bogomolny, Levyraz, and Schmit suggested a remark-
! pffiffiffiffiffiffi gðtÞ eð1=2Þt dt ½17 able limiting formula for the two-point correlation
2 1
function for the modular surface (cf. Bogomolny
as X, L ! 1, L X. et al. (1997) and Bogomolny (2006)), based on an
The main tool in the attempts to prove the above analysis of the correlations between multiplicities of
conjectures has been the Selberg trace formula. It lengths of closed geodesics. A rigorous analysis of the
relates sums over eigenvalues of the Laplacians to fluctuations of multiplicities is given by Peter (cf.
sums over lengths of closed geodesics on the Bogomolny (2006)). Rudnick (2005) has recently
hyperbolic surface. The trace formula is in its established a smoothed version of Conjecture 2 in the
simplest form in the case of compact hyperbolic regime
surfaces; we have pffiffiffiffi pffiffiffiffi
X X
Z ! 1; !0 ½22
X1
AreaðMÞ 1 L L log X
hðj Þ ¼ hðÞ tanhðÞ d
j¼0
4 1 where the characteristic function in [20] is replaced
XX
1
‘ gðn‘ Þ by a certain class of smooth test functions.
þ ½18 All of the above approaches use the Selberg trace
2 sinhðn‘  =2Þ
2H n¼1 formula, exploiting the particular properties of the
216 Arithmetic Quantum Chaos

distribution of lengths of closed geodesics in exponential degeneracy in the length spectrum seems
arithmetic hyperbolic surfaces. These will be dis- to occur in a weaker form also for nonarithmetic
cussed in more detail in the next section, following surfaces.
the work of Bogomolny, Georgeot, Giannoni and A further useful property of the length spectrum
Schmit, Bolte, and Luo and Sarnak (see Bogomolny of arithmetic surfaces is the bounded clustering
et al. (1997) and Sarnak (1995) for references). property: there is a constant C (again surface
dependent) such that
#ðL \ ½‘; ‘ þ 1Þ  C ½28
Distribution of Lengths of Closed
for all ‘. This fact is evident in the case of the
Geodesics
modular surface; the general case is proved by Luo
The classical prime geodesic theorem asserts that the and Sarnak (cf. Sarnak (1995)).
number N(‘) of primitive closed geodesics of length
less than ‘ is asymptotically
Quantum Unique Ergodicity
e‘
Nð‘Þ  ½23 The unit tangent bundle of a hyperbolic surface nH

describes the physical phase space on which the
One of the significant geometrical characteristics of classical dynamics takes place. A convenient para-
arithmetic hyperbolic surfaces is that the number of metrization of the unit tangent bundle is given by
closed geodesics with the same length ‘ grows the quotient nPSL(2, R – this may be seen be means
exponentially with ‘. This phenomenon is most of the Iwasawa decomposition for an element
easily explained in the case of the modular surface, g 2 PSL(2, R),
where the set of lengths ‘ appearing in the lengths ! !
spectrum is characterized by the condition 1 x y1=2 0

2 coshð‘=2Þ ¼ jtr j ½24 0 1 0 y1=2
!
where  runs over all elements in SL(2, Z) with cos =2 sin =2
 ½29
jtrj > 2. It is not hard to see that any integer n > 2  sin =2 cos =2
appears in the set {jtr j:  2 SL(2, Z)}, and hence
the set of distinct lengths of closed geodesics is where x þ iy 2 H represents the position of the
particle in nH in half-plane coordinates, and 2
L ¼ f2 arcoshðn=2Þ: n ¼ 3; 4; 5; . . .g ½25
[0, 2) the direction of its velocity.

Multiplying the
Therefore, the number of distinct lengths less than ‘ matrix [29] from the left by ac db and writing the
is asymptotically (for large ‘) result again in the Iwasawa form [29], one obtains
the action
N 0 ð‘Þ ¼ #ðL \ ½0; ‘Þ  e‘=2 ½26  
az þ b
Equations [26] and [23] say that on average the ðz;
Þ 7! ;  2 argðcz þ dÞ ½30
cz þ d
number of geodesics with the same lengths is at least
}e‘=2 =‘. which represents precisely the geometric action of
The prime geodesic theorem [23] holds equally for isometries on the unit tangent bundle.
all hyperbolic surfaces with finite area, while [26] is The geodesic flow t on nPSL(2, R) is repre-
specific to the modular surface. For general arith- sented by the right translation
metic surfaces, we have the upper bound  
t et=2 0
 : g 7! g ½31
N 0 ð‘Þ  ce‘=2 ½27 0 et=2
for some constant c > 0 that may depend on the The Haar measure on PSL(2, R) is thus trivially
surface. Although one expects N 0 (‘) to be asympto- invariant under the geodesic flow. It is well known
tic to (1=2)N(‘) for generic surfaces (since most that is not the only invariant measure, that is, t is
geodesics have a time-reversal partner which thus not uniquely ergodic, and that there is in fact an
has the same length, and otherwise all lengths are abundance of invariant measures. The simplest
distinct), there are examples of nonarithmetic Hecke examples are those with uniform mass on one, or a
triangles where numerical and heuristic arguments countable collection of, closed geodesics.
suggest N 0 (‘)  c1 ec2 ‘ =‘ for suitable constants c1 > 0 To test the distribution of an eigenfunction
and 0 < c2 < 1=2 (cf. Bogomolny (2006)). Hence ’j in phase space, one associates with a function
Arithmetic Quantum Chaos 217

a 2 C1 (nPSL(2, R)) the quantum observable Hecke Operators, Entropy


Op(a), a zeroth order pseudodifferential operator and Measure Rigidity
with principal symbol a. Using semiclassical tech-
niques based on Friedrich’s symmetrization, one For compact surfaces, the sequence of probability
can show that the matrix element measures approaching the matrix elements j is
relatively compact. That is, every infinite sequence
j ðaÞ ¼ hOpðaÞ’j ; ’j i ½32 contains a convergent subsequence. Lindenstrauss’
central idea in the proof of QUE is to exploit the
is asymptotic (as j ! 1) to a positive functional presence of Hecke operators to understand the
that defines a probability measure on invariance properties of possible quantum limits.
nPSL(2, R). Therefore, if M is compact, any We will sketch his argument in the case of the
weak limit of j represents a probability measure modular surface (ignoring issues related to the non-
on nPSL(2, R). Egorov’s theorem (see Quantum compactness of the surface), where it is most
Ergodicity and Mixing of Eigenfunctions) in turn transparent.
implies that any such limit must be invariant For every positive integer n, the Hecke operator
under the geodesic flow, and the main challenge Tn acting on continuous functions on nH with
in proving QUE is to rule out all invariant  = SL(2, Z) is defined by
measures apart from Haar.
d1  
1 X n X
az þ b
Conjecture 3 (Rudnick and Sarnak (1994); see Tn f ðzÞ ¼ pffiffiffi f ½35
Sarnak (1995, 2003)). For every compact hyperbolic n a;d¼1 b¼0 d
surface nH, the sequence j converges weakly to . ad¼n

Lindenstrauss has proved this conjecture for The set Mn of matrices with integer coefficients and
compact arithmetic hyperbolic surfaces of congru- determinant n can be expressed as the disjoint union
ence type (such as the second example in the section [ [ a b
n d1
‘‘Hyperbolic surfaces’’) for special bases of eigen- Mn ¼  ½36
0 d
functions, using ergodic-theoretic methods. These a;d¼1 b¼0
ad¼n
will be discussed in more detail in the next section.
His results extend to the noncompact case, that is, to and hence the sum in [35] can be viewed as a sum
the modular surface where  = PSL(2, Z). Here he over the cosets in this decomposition. We note the
shows that any weak limit of subsequences of j is product formula
of the form c , where c is a constant with values in X
[0, 1]. One believes that c = 1, but with present Tm Tn ¼ Tmn=d2 ½37
djgcdðm;nÞ
techniques it cannot be ruled out that a proportion
of the mass of the eigenfunction escapes into the The Hecke operators are normal, form a com-
noncompact cusp of the surface. For the modular muting family, and in addition they commute with
surface, c = 1 can be proved under the assumption of the Laplacian . In the following, we consider an
the generalized Riemann hypothesis (see the section orthonormal basis of eigenfunctions ’j of  that
‘‘Eigenfunctions and L-functions’’ and Sarnak are simultaneously eigenfunctions of all Hecke
(2003)). QUE also holds for the continuous part of operators. We will refer to such eigenfunctions as
the spectrum, which is furnished by the Eisenstein Hecke eigenfunctions. The above assumption is
series E(z, s), where s = 1=2 þ ir is the spectral automatically satisfied, if the spectrum of  is
parameter. Note that the measures associated with simple (i.e., no eigenvalues coincide), a property
the matrix elements conjectured by Cartier and supported by numerical
computations. Lindenstrauss’ work is based on the
r ðaÞ ¼ hOpðaÞEð; 1=2 þ irÞ; Eð; 1=2 þ irÞi ½33 following two observations. Firstly, all quantum
are not probability measures but only Radon limits of Hecke eigenfunctions are geodesic-flow
measures, since E(z, s) is not square-integrable. Luo invariant measures of positive entropy, and sec-
and Sarnak, and Jakobson have shown that ondly, the only such measure of positive entropy
that is recurrent under Hecke correspondences is
r ðaÞ ðaÞ the Lebesgue measure.
lim ¼ ½34 The first property is proved by Bourgain and
r ! 1 r ðbÞ ðbÞ
Lindenstrauss (2003) and refines arguments of
for suitable test functions a, b 2 C1 (nPSL(2, R)) Rudnick and Sarnak (1994) and Wolpert (2001) on
(cf. Sarnak (2003)). the distribution of Hecke points (see Sarnak (2003) for
218 Arithmetic Quantum Chaos

references to these papers). For a given point z 2 H holds, it is inessential for the proof of QUE due to
the set of Hecke points is defined as the positive entropy of quantum limits discussed in
the previous paragraph.
Tn ðzÞ :¼ Mn z ½38
For most primes, the set Tpk (z) comprises (p þ 1)
pk1 distinct points on nH. For each z, the Hecke Eigenfunctions and L-Functions
operator Tn may now be interpreted as the
adjacency matrix for a finite graph embedded in An even eigenfunction ’j (z) for  = SL(2, Z) has the
nH, whose vertices are the Hecke points Tn (z). Fourier expansion
Hecke eigenfunctions ’j with X
1
’j ðzÞ ¼ aj ðnÞy1=2 Kij ð2nyÞ cosð2nxÞ ½41
Tn ’j ¼ j ðnÞ’j ½39
n¼1

give rise to eigenfunctions of the adjacency matrix. We associate with ’j (z) the Dirichlet series
Exploiting this fact, Bourgain and Lindenstrauss
show that for a large set of integers n X
1
Lðs; ’j Þ ¼ aj ðnÞns ½42
X
j’j ðzÞj2 j’j ðwÞj2 ½40 n¼1

w2Tn ðzÞ which converges for Re s large enough. These series


2 have an analytic continuation to the entire complex
that is, pointwise values of j’j j cannot be substan-
plane C and satisfy a functional equation,
tially larger than its sum over Hecke points. This,
and the observation that Hecke points for a large set ðs; ’j Þ ¼ ð1  s; ’j Þ ½43
of integers n are sufficiently uniformly distributed
on nH as n ! 1, yields the estimate of positive where
entropy with a quantitative lower bound.    
s s þ ij s  ij
Lindenstrauss’ proof of the second property, ðs; ’j Þ ¼    Lðs; ’j Þ ½44
2 2
which shows that Lebesgue measure is the only
quantum limit of Hecke eigenfunctions, is a result of If ’j (z) is in addition an eigenfunction of all Hecke
a currently very active branch of ergodic theory: operators, then the Fourier coefficients in fact
measure rigidity. Invariance under the geodesic flow coincide (up to a normalization constant) with the
alone is not sufficient to rule out other possible limit eigenvalues of the Hecke operators
measures. In fact, there are uncountably many
measures with this property. As limits of Hecke aj ðmÞ ¼ j ðmÞaj ð1Þ ½45
eigenfunctions, all quantum limits possess an addi-
tional property, namely recurrence under Hecke If we normalize aj (1) = 1, the Hecke relations [37]
correspondences. Since the explanation of these is result in an Euler product formula for the
rather involved, let us recall an analogous result in a L-function,
simpler setup. The map 2 : x 7! 2x mod 1 defines a Y
1
Lðs; ’j Þ ¼ 1  aj ðpÞps þ p12s ½46
hyperbolic dynamical system on the unit circle with
p prime
a wealth of invariant measures, similar to the case of
the geodesic flow on a surface of negative curvature. These L-functions behave in many other ways like
Furstenberg conjectured that, up to trivial invariant the Riemann zeta or classical Dirichlet L-functions.
measures that are localized on finitely many rational In particular, they are expected to satisfy a Riemann
points, Lebesgue measure is the only 2-invariant hypothesis, that is, all nontrivial zeros are con-
measure that is also invariant under action of strained to the critical line Ims = 1=2.
3 : x 7! 3x mod 1. This fundamental problem is Questions on the distribution of Hecke eigenfunc-
still unsolved and one of the central conjectures in tions, such as QUE or value distribution properties,
measure rigidity. Rudolph, however, showed that can now be translated to analytic properties of
Furstenberg’s conjecture is true if one restricts the L-functions. We will discuss two examples.
statement to 2-invariant measures of positive The asymptotics in [6] can be established
entropy (cf. Lindenstrauss (to appear)). In Linden- by proving [6] for the choices g = ’k , k = 1, 2, . . . ,
strauss’ work, 2 plays the role of the geodesic that is,
flow, and 3 the role of the Hecke correspondences. Z
Although here it might also be interesting to ask j’j j2 ’k dA ! 0 ½47
whether an analog of Furstenberg’s conjecture M
Arithmetic Quantum Chaos 219

Watson discovered the remarkable relation (Sarnak where


2003) Z Z
Z 2 CðaÞ :¼ aðt ðgÞÞaðgÞd ðgÞ dt ½54
R nPSLð2;RÞ
’j1 ’j2 ’j3 dA

M is the classical autocorrelation function for the
4 ð12 ; ’j1  ’j2  ’j3 Þ geodesic flow with respect to the observable a
¼ ½48
ð1; sym2 ’j1 Þð1; sym2 ’j2 Þð1; sym2 ’j3 Þ (Sarnak 2003). Up to the arithmetic factor
(1=2)L(1=2, ’j ), eqn [53] is consistent with the
The L-functions (s, g) in Watson’s formula are Feingold–Peres prediction for the variance of generic
more advanced cousins of those introduced earlier chaotic systems. Furthermore, recent estimates of
(see Sarnak (2003) for details). The Riemann moments by Rudnick and Soundararajan (2005)
hypothesis for such L-functions then implies, via indicate that Conjecture 4 is not valid in the case of
[48], a precise rate of convergence to QUE for the the modular surface.
modular surface,
Z Z
1=4þ
j’j j2 g dA ¼ g dA þ Oðj Þ ½49 Quantum Eigenstates of Cat Maps
M M
Cat maps are probably the simplest area-preserving
for any > 0, where the implied constant depends maps on a compact surface that are highly chaotic.
on and g. They are defined as linear automorphisms on the
A second example on the connection between torus T2 = R 2 =Z2 ,
statistical properties of the matrix elements
j (a) = hOp(a)’j , ’j i (for fixed a and random j) and A : T2 ! T2 ½55
values L-functions has appeared in the work of Luo
where a point  2 R2 (mod Z2 ) is mapped to
and Sarnak (cf. Sarnak (2003)). Define the variance
A(mod Z2 ); A is a fixed matrix in GL(2, Z) with
1 X 2 eigenvalues off the unit circle (this guarantees
V ðaÞ ¼ j ðaÞ  ðaÞ ½50 hyperbolicity). We view the torus T2 as a symplectic
NðÞ  
j
manifold, the phase space of the dynamical system.
with N() = #{j: j  }; cf. [3]. Following a conjec- Since T2 is compact, the Hilbert space of quantum
ture by Feingold–Peres and Eckhardt et al. (see Sarnak states is an N-dimensional vector space HN , N
(2003) for references) for ‘‘generic’’ quantum chaotic integer. The semiclassical limit, or limit of small
systems, one expects a central-limit theorem for the wavelengths, corresponds here to N ! 1.
statistical fluctuations of the j (a), where the normal- It is convenient to identify HN with L2 (Z=NZ),
ized variance N()1=2 V (a) is asymptotic to the with the inner product
classical autocorrelation function C(a), see eqn [54]. 1 X
h 1; 2i ¼ 1 ðQÞ 2 ðQÞ ½56
Conjecture 4 For any bounded function g : R ! C N Q mod N
we have
! For any smooth function f 2 C1 (T2 ), define a
1 X j ðaÞ  ðaÞ quantum observable
g pffiffiffiffiffiffiffiffiffiffiffiffi
NðÞ   V ðaÞ X
j
Z 1 OpN ðf Þ ¼ b
f ðnÞTN ðnÞ
1 2
! pffiffiffiffiffiffi gðtÞeð1=2Þt dt ½51 n2Z2
2 1
where b f (n) are the Fourier coefficients of f, and
as  ! 1. TN (n) are translation operators
Luo and Sarnak prove that in the case of the
TN ðnÞ ¼ ein1 n2 =N t2n2 t1n1 ½57
modular surface the variance has the asymptotics

lim NðÞ1=2 V ðaÞ ¼ hBa; ai ½52


!1
½t1 ðQÞ ¼ ðQ þ 1Þ
where B is a non-negative self-adjoint operator ½58
½t2 ðQÞ ¼ e2iQ=N ðQÞ
which commutes with the Laplacian  and all
Hecke operators Tn . In particular, we have The operators OpN (a) are the analogs of the

pseudodifferential operators discussed in the section
B’j ¼ 12 L 12; ’j Cð’j Þ’j ½53
‘‘Quantum unique ergodicity.’’
220 Arithmetic Quantum Chaos

A quantization of A is a unitary operator UN (A) Graffi, and Isola (1995). That is, [65] holds for all
on L2 (Z=NZ) satisfying the equation j = 1, . . . , N. Rudnick and Kurlberg, and more
recently Gurevich and Hadani, have established
UN ðAÞ1 OpN ðf ÞUN ðAÞ ¼ OpN ðf  A Þ ½59 results on the rate of convergence analogous to
1 2
for all f 2 C (T ). There are explicit formulas for [49]. These results are unconditional. Gurevich and
UN (A) when A is in the group Hadani use methods from algebraic geometry based
   on those developed by Deligne in his proof of the
a b Weil conjectures (an analog of the Riemann hypoth-
¼ 2 SLð2; ZÞ: ab  cd  0 mod 2 ½60
c d esis for finite fields).
In the case of quantum-cat maps, there are values
These may be viewed as analogs of the Shale–Weil of N for which the number of coinciding eigenvalues
or metaplectic representation for SL(2). for example, can be large, a major difference to what is expected
the quantization of for the modular surface. Linear combinations of
  eigenstates with the same eigenvalue are as well
2 1
A¼ ½61 eigenstates, and may lead to different quantum
3 2
limits. Indeed, Faure, Nonnenmacher, and De Bièvre
yields (see De Bièvre (to appear)) have shown that there
X are subsequences of values of N, so that, for all
1=2 2i 2 f 2 C1 (T2 ),
UN ðAÞ ðQÞ ¼ N exp ðQ
Q0 mod N
N Z
1 1
hOpðf Þ’Nj ; ’Nj i ! f ðÞd þ f ð0Þ ½66
0
 QQ þ Q Þ 02
ðQ0 Þ ½62 2 T2 2

that is, half of the mass of the quantum limit


In analogy with [1], we are interested in the localizes on the hyperbolic fixed point of the map.
statistical features of the eigenvalues and eigenfunc- This is the first, and to date the only, rigorous result
tions of UN (A), that is, the solutions to concerning the existence of scarred eigenfunctions in
UN ðAÞ’ ¼ ’; k’kL2 ðZ=NZÞ ¼ 1 ½63 systems with chaotic classical limit.

Unlike typical quantum-chaotic maps, the statistics


of the N eigenvalues Acknowledgment
1
N1 ; N2 ; . . . ; NN 2 S ½64 The author is supported by an EPSRC Advanced
Research Fellowship.
do not follow the distributions of unitary random
matrices in the limit N ! 1, but are rather singular See also: Quantum Ergodicity and Mixing of
(Keating 1991). In analogy with the Selberg trace Eigenfunctions; Random Matrix Theory in Physics.
formula for hyperbolic surfaces [18], there is an
exact trace formula relating sums over eigenvalues
of UN (A) with sums over fixed points of the classical
Further Reading
map (Keating 1991).
As in the case of arithmetic surfaces, the eigenfunc- Aurich R and Steiner F (1993) Statistical properties of highly
tions of cat maps appear to behave more generically. excited quantum eigenstates of a strongly chaotic system.
Physica D 64(1–3): 185–214.
The analog of the Schnirelman–Zelditch–Colin de
Berry MV and Keating JP (1999) The Riemann zeros and
Verdière theorem states that, for any orthonormal eigenvalue asymptotics. SIAM Review 41(2): 236–266.
basis of eigenfunctions {’Nj }N j = 1 we have, for all Bogomolny EB (2006) Quantum and arithmetical chaos. In:
f 2 C1 (T2 ), Cartier PE, Julia B, Moussa P, and Vanhove P (eds.) Frontiers
Z in Number Theory, Physics and Geometry on Random
Matrices, Zeta Functions, and Dynamical Systems, Springer
hOpðf Þ’Nj ; ’Nj i ! f ðÞd ½65 Lecture Notes. Les Houches.
T2
Bogomolny EB, Georgeot B, Giannoni M-J, and Schmit C (1997)
as N ! 1, for all j in an index set JN of full density, Arithmetical chaos. Physics Reports 291(5–6): 219–324.
that is, #JN  N. Kurlberg and Rudnick (see De Bièvre S Recent Results on Quantum Map Eigenstates,
Rudnick (2001)) have characterized special bases of Proceedings of QMATH9, Giens 2004 (to appear).
Hejhal DA and Rackner BN (1992) On the topography of Maass
eigenfunctions {’Nj }N
j = 1 (termed Hecke eigenbases, waveforms for PSL(2, Z). Experiment. Math. 1(4): 275–305.
in analogy with arithmetic surfaces) for which QUE Keating JP (1991) The cat maps: quantum mechanics and classical
holds, generalizing earlier work of Degli Esposti, motion. Nonlinearity 4(2): 309–341.
Asymptotic Structure and Conformal Infinity 221

Lindenstrauss E Rigidity of multi-parameter actions. Israel (Barcelona, 2000), Progr. Math., vol. 202, pp. 429–437.
Journal of Mathematics (Furstenberg Special Volume) (to Basel: Birkhäuser.
appear). Rudnick Z (2005) A central limit theorem for the spectrum of the
Marklof J (2006) Energy level statistics, lattice point problems and modular group, Park city lectures. Annales Henri Poincaré 6:
almost modular functions. In: Cartier PE, Julia B, Moussa P, 863–883.
and Vanhove P (eds.) Frontiers in Number Theory, Physics and Sarnak P Arithmetic quantum chaos. The Schur lectures (1992)
Geometry on Random Matrices, Zeta Functions, and Dynami- (Tel Aviv), Israel Math. Conf. Proc., 8, pp. 183–236. Bar-Ilan
cal Systems, Springer Lecture Notes. Les Houches. Univ., Ramat Gan, 1995.
Rudnick Z (2001) On quantum unique ergodicity for linear maps Sarnak P (2003) Spectra of hyperbolic surfaces. Bulletin of the
of the torus. In: European Congress of Mathematics, American Mathematical Society (N.S.) 40(4): 441–478.

Asymptotic Structure and Conformal Infinity


J Frauendiener, Universität Tübingen, Tübingen, the background. In other theories, like electrody-
Germany namics, the physical field, such as the Maxwell field,
ª 2006 Elsevier Ltd. All rights reserved. is very different from the background field, the flat
metric of Minkowski space. The fact that the metric
in GR plays a dual role makes it difficult to extract
physical meaning from the metric because there is no
Introduction
nondynamical reference point.
A major motivation for studying the asymptotic Imagine a system alone in the universe. As we
structure of spacetimes has been the need for a recede from the system we would expect its influence
rigorous description of what should be understood by to decrease. So we expect that the spacetime which
an ‘‘isolated system’’ in Einstein’s theory of gravity. models this situation mathematically will resemble
As an example, consider a gravitating system some- the flat Minkowski spacetime and it will approximate
where in our universe (e.g., a galaxy, a cluster of it even better the farther away we go. This implies
galaxies, a binary system, or a star) evolving accord- that one needs to impose fall-off conditions for the
ing to its own gravitational interaction, and possibly curvature and that the manifold will be asymptoti-
reacting to gravitational radiation impinging on it cally flat in an appropriate sense. However, there is
from the outside. Thereby it will emit gravitational the problem that fall-off conditions necessarily imply
radiation. We are interested in describing these waves the use of coordinates and it is awkward to decide
because they provide us with important information which coordinates should be ‘‘good ones.’’ Thus, it is
about the physics governing the system. not clear whether the notion of an asymptotically flat
To adequately describe this situation, we need to spacetime is an invariant concept.
idealize the real situation in an appropriate way, since What is needed, therefore, is an invariant defini-
it is hopeless to try to analyze the behavior of the tion of asymptotically flat spacetimes. The key
system in its interaction with the rest of the universe. observation in this context is that ‘‘infinity’’ is far
We are mainly interested in the behavior of the away with respect to the spacetime metric. This
system, and not so much in other processes taking means that geodesics heading away from the system
place at large distances from the system. Since we should be able to ‘‘run forever,’’ that is, be defined
would like to ignore those regions, we need a way to for arbitrary values of their affine parameter s.
isolate the system from their influence. ‘‘Infinity’’ will be reached for s ! 1. However,
The notion of an isolated system allows us to suppose we do not use the spacetime metric g but a
select individual subsystems of the universe and metric ^g which is scaled down with respect to g, that
describe their properties regardless of the rest of the is, in such a way that ^g = 2 g for some function .
universe so that we can assign to each subsystem Then it might be possible to arrange  in such a way
such physical attributes as its energy–momentum, that geodesics for the metric ^g cover the same events
angular momentum, or its emitted radiation field. (strictly speaking, this holds only for null geodesics,
Without this notion, we would always have to take but this is irrelevant for the present plausibility
into account the interaction of the system with its argument) as those for the metric g yet that their
environment in full detail. affine parameter ^s (which is also scaled down with
In general relativity (GR) it turns out to be a rather respect to s) approaches a finite value ^s0 for s ! 1.
difficult task to describe an isolated system and the Then we could attach a boundary to the spacetime
reason is – as always in Einstein’s theory – the fact manifold consisting of all the limit points corre-
that the metric acts both as the physical field and as sponding to the events with ^s = ^s0 on the ^g-geodesics.
222 Asymptotic Structure and Conformal Infinity

This boundary would have to be interpreted as Clearly, the metric is undefined at events with
‘‘infinity’’ for the spacetime because it takes infinitely cos U = 0 or cos V = 0. These would correspond to
long for the g-geodesics to get there. events with u = 1 or v = 1 which do not lie in
We arrived at this idea of attaching a boundary by M. However, by defining the function
considering the metric structure only ‘‘up to arbi-
 ¼ 2 cos U cos V
trary scaling,’’ that is, by looking at metrics which
differ only by a factor. This is the conformal we find that the metric ^g = 2 g with
structure of the spacetime manifold in question. By
considering the spacetime only from the point of ^g ¼ 4dU dV  sin2 ðV  UÞ d2 ½3
view of its conformal structure we obtain a picture is conformally equivalent to g and is regular for all
of the spacetime which is essentially finite but which values of U and V (keeping V  U). In fact, by
leaves its causal properties unchanged, and hence in defining the coordinates
particular the properties of wave propagation. This
is exactly what is needed for a rigorous treatment of T ¼ V þ U; R¼V U
radiation emitted by the system. this metric takes the form
^g ¼ dT 2  dR2  sin2 R d2 ½4

Infinity for Minkowski Spacetime the metric of the static Einstein universe E. Thus, we
may regard the Minkowski spacetime as the part of
The above discussion suggests that we should consider the Einstein cylinder defined by restricting the
the spacetime metric only up to scale, that is, coordinates T and R to the region jTj þ R <  as
to focus on the conformal structure of the spacetime illustrated in Figure 1. Although M can be considered
in question. Since we are interested in systems which as being diffeomorphic to the shaded part in Figure 1,
approach Minkowski spacetime at large distances these two manifolds are not isometric. This is obvious
from the source, it is illuminating to study Minkowski from considering the properties of the events lying on
spacetime as a preliminary example. So consider the
manifold M = R4 equipped with the flat metric

g ¼ dt2  dr2  r2 d2 ½1 i+


where r is the standard radial coordinate defined by
r2 = x2 þ y2 þ z2 and
d2 ¼ d2 þ sin2 d2 I +

2
is the standard metric on the unit sphere S . We now
introduce retarded and advanced time coordinates,
which are adapted to the null cone and hence to the
conformal structure of g by the definition
u ¼ t  r; v¼tþr
and obtain the metric in the form i0

g ¼ du dv  14 ðv  uÞ2 d2

The coordinates u and v both take arbitrary real values


but they are restricted by the relation v  u = 2r  0.
In order to see what happens ‘‘at infinity,’’ we introduce
the coordinates U and V by the relations –
I
u ¼ tan U; v ¼ tan V
i–
Then U and V both take values in the open interval
(=2, =2) with V  U and the metric is trans-
formed to
1  
g¼ 4dU dV  sin2 ðV  UÞ d2 ½2 Figure 1 The embedding of Minkowski spacetime into the
4 cos2 U cos2 V Einstein cylinder.
Asymptotic Structure and Conformal Infinity 223

the boundary @M of M in E. Fix a point P inside M Definition 1 A spacetime (M, gab ) is called ‘‘asymp-
and follow a null geodesic with respect to the metric ^g totically simple’’ if there exists a manifold-with-
from P toward the future. It will intersect @M after a boundary M c with metric ^gab and scalar field  on
finite amount of its affine parameter has elapsed. Mc and boundary I = @M such that the following
When we follow a null geodesic with respect to g conditions hold:
from P in the same direction, we find that it does not c M = int M;c
1. M is the interior of M:
reach @M for any value of its affine parameter. Thus, 2
2. ^gab =  gab on M;
the boundary is at infinity for the metric g but at a 
3.  and ^gab are smooth on all of M;
finite location with respect to the metric ^g. When we
4.  > 0 on M;  = 0, ra  6¼ 0 on I ; and
consider all possible kinds of geodesics for the metric g
5. each null geodesic acquires both future and past
we find that @M consists of five qualitatively different
endpoints on I .
pieces. The future pointing timelike geodesics all
approach the point iþ given by (T, R) = (, 0), while This definition formalizes the construction which
the past-pointing geodesics approach i with coordi- was explicitly performed above, by which one
nates (, 0). All spacelike geodesics come arbitrarily attaches a regular (nonempty) boundary to a space-
close to a point i0 with coordinates (0, ) (located on time after suitably rescaling its metric. Asymptoti-
the front of the cylinder in Figure 1). Null geodesics, cally simple spacetimes are exactly those for which
however, are different. For any point (T,   jTj) with this process of conformal compactification is possi-
T 6¼ 0,  on @M there are g-null-geodesics which ble. The purpose of condition 5 is to exclude
come arbitrarily close. pathological cases. There are spacetimes which do
In this sense, we may regard @M as consisting of not satisfy this condition (e.g., the Schwarzschild
limit points obtained by tracing-geodesics for infi- spacetime, where some of the null geodesics enter
nite values of their affine parameters. According to the event horizon and cannot escape to infinity).
the causal character of the geodesics the set of their Yet, one would like to include them as being
respective limit points is called future/past timelike asymptotically simple in a sense, because they
infinity i , spacelike infinity i0 or future/past null- clearly describe isolated systems. For these cases,
infinity, denoted by I  . These two parts of null- there exists the notion of weakly asymptotically
infinity are three-dimensional regular submanifolds simple spacetimes.
of the embedding manifold E, while the points i , i0 In order to arrive at asymptotically flat space-
are regular points in E in the sense that the metric ^g times, one needs to make certain assumptions about
is regular there. This is not automatic, considering the behavior of the curvature near the boundary,
the fact that infinitely many geodesics converge to a thus:
single point. However, the flatness of Minkowski
Definition 2 An asymptotically simple spacetime is
spacetime guarantees that the geodesics approach at
called ‘‘asymptotically flat’’ if its Ricci tensor Ric[g]
just the appropriate rate for the limit points to be
vanishes in a neighborhood of I .
regular.
This example shows that the structure of the Note that this definition imposes a rather strong
boundary is determined entirely by the metric g of restriction on the Ricci curvature; less restrictive
Minkowski spacetime. If we had chosen a different assumptions are possible. This condition applies
function 0 = ! with ! > 0 then we would not only near I . Thus, it is possible to consider
have obtained the Einstein cylinder but some spacetimes which contain matter fields as long as
different Lorentzian manifold (M0 , g0 ). Yet, the these fields do not extend to infinity.
boundary of M in M0 would have had the same Other asymptotically simple spacetimes which are
properties. not asymptotically flat are the de Sitter and anti-de
Sitter spacetimes which are solutions of the Einstein
equations with nonvanishing cosmological constant .
It is a simple consequence of the definition that
Asymptotically Flat Spacetimes
the boundary I is a regular three-dimensional
The physical idea of an isolated system is captured hypersurface of the embedding spacetime M c which
mathematically by an asymptotically flat space- is timelike, spacelike, or null depending on the sign
time. Since such a spacetime M is expected to of . In particular, for the Minkowski spacetime
approach Minkowski spacetime asymptotically, ( = 0) the boundary is necessarily a null hypersur-
the asymptotic structure of M is also expected to face, as noted above.
be similar to that of M. This expectation is The requirement that the vacuum Einstein
expressed in equations hold near I has several important
224 Asymptotic Structure and Conformal Infinity

consequences. First, I is a null hypersurface with In physical terms, the supertranslations arise
the special property of being shear-free. This means because there are infinitely many directions from
that any cross section of a bundle of its null which observers at infinity (whose world lines coincide
generators does not suffer any distortions when with the null generators of I in a certain limit) can
moved along the generators. Only expansion or observe the system and because each observer is free to
contraction can occur. The global structure of I choose its own origin of proper time u. The observers
is the same as the one from the example above. surrounding the system are not synchronized, because
Null infinity consists of two connected components, under the assumptions made there is no natural way to
I  , each of which is diffeomorphic to S2  R. Thus, fix a unique common origin. Hence, a supertranslation
topologically, I  are cylinders. The cone-like is a shift of the parameter along each null generator of
appearance as seen in Figure 1 is artificial. It I þ corresponding to a change of origin for each
depends on the particular conformal factor  chosen individual observer. It can be given as a map S2 ! R.
for the conformal compactification. Furthermore, it A choice of origin on each null generator of I þ is
is only in very exceptional cases that the metric ^g is referred to as a ‘‘cut’’ of I þ . It is a two-dimensional
regular at i0 or i . surface of spherical topology which intersects each null
The most important consequence, however, con- generator exactly once. It is an open question whether
cerns the conformal Weyl tensor Ca bcd . This is the one can always synchronize the observers by imposing
part of the full Riemann curvature tensor Ra bcd which canonical conditions at i0 or i , thereby reducing the
is trace-free. It is invariant under conformal rescal- BMS group to the smaller Poincaré group.
ings of the metric. Thus, on M, Ca bcd = C^ a bcd . When The supertranslations contain a unique four-
the vanishing of the Ricci tensor near I is assumed dimensional normal subgroup. In M these special
then it turns out that the Weyl tensor necessarily supertranslations are the ones which are induced by
vanishes on I . This is the ultimate justification for the translations of Minkowski spacetime in the
calling such manifolds asymptotically flat because the following way. Take the future light cone of some
entire curvature vanishes on I . event P and follow it out to I þ , where its intersection
defines an origin for each observer located there.
Now consider the light cone of another event Q
Some Consequences obtained from P by a translation in a spatial
direction. Then the light emitted from Q will arrive
There are several consequences of the existence of
at I þ earlier than that from P for observers in the
the conformal boundary I . They all can be traced
direction of the translation, while it will be delayed
back to the fact that this boundary can be used to
for observers in the opposite direction. This change
separate the geometric fields into a universal back-
in arrival time defines a specific supertranslation.
ground field and dynamical fields which propagate
Similarly, for a translation in a temporal direction,
on it. The background is given by the boundary
the light from Q will arrive later than that from P
points attached to an asymptotically flat spacetime
for all observers. Thus, every translation in M
which always form a three-dimensional null hyper-
defines a particular supertranslation on I þ . These
surface I with two connected components (in the
can be characterized in a different way, which is
sequel, we restrict our attention to I þ only; I  is
intrinsic to I þ and which can be used in the general
treated similarly), each with the topology of a
case even though there will be no Killing vectors
cylinder. And in each case, I is shear-free.
present in a general asymptotically flat spacetime. In
an appropriate coordinate system, the asymptotic
The BMS Group
translations are given as linear combinations of the
Since the structure of null-infinity is universal over first four spherical harmonics Y00 , Y10 , Y11 . The
all asymptotically flat spacetimes, it is obvious that space of asymptotic translations T is in a natural
its symmetry group should also possess a universal way isometric to M.
meaning. This group, the so-called Bondi–Metzner–
Sachs (BMS) group is in many respects similar to the
Poincaré group, the symmetry group of M. It is the The Peeling Property
semidirect product of the Lorentz group with an c Since it
Now consider the Weyl tensor Ca bcd on M.
abelian group which, however, is not the four-
vanishes on I where  = 0 we may form the
dimensional translation group but an infinite-dimen-
quotient
sional group of supertranslations. This group is a
normal subgroup, so the factor group is isomorphic
to the Lorentz group. Ka bcd ¼ 1 Ca bcd
Asymptotic Structure and Conformal Infinity 225

which can be shown to be smooth on I þ . The The quantity in brackets, the mass aspect, is a
physical interpretation of this tensor field is based combination of the scalar 2 which in a sense
on the following properties. In source-free regions measures the strength of the Coulomb-like part of
the field satisfies the spin-2 zero-rest-mass equation the gravitational field on I þ and the complex
quantity . In a so-called Bondi coordinate system,
ba Ka bcd ¼ 0
r this quantity is related to the radiation field 4 by
which is very similar to the Maxwell equations for the relation
the electromagnetic (spin-1) Faraday tensor. Thus, €
¼ 
4
Ka bcd is interpreted as the gravitational field, which
describes the gravitational waves contained inside the dot indicating differentiation with respect to the
the system. The zero-rest-mass equation for Ka bcd affine parameter along the null generators. Thus, 
and the fact that the field is smooth on I implies that is essentially the second time integral of the
the Weyl tensor satisfies the ‘‘peeling’’ property. This radiation field. The mass aspect is integrated against
is a characteristic conspiracy between the fall-off a function W which is an asymptotic translation,
behavior of certain components of the Weyl tensor that is, a linear combination of the first four
along outgoing g-null-geodesics approaching I þ in spherical harmonics. Thus, one can view the
M with respect to an affine parameter s for s ! 1 expression [6] as defining a linear map T ! R.
and their algebraic type. Symbolically, the Weyl Since T and M are isometric this defines a covector
tensor has the following behavior as s ! 1 along Pa on M, which can always be shown to be timelike,
the null geodesic: Pa Pa  0. This positivity property together with the
fact that in the special cases of Schwarzschild and
½4 ½31 ½211 ½1111 Kerr spacetimes the integral yields the mass para-
C¼ þ 2 þ 3 þ þ Oðs5 Þ ½5
s s s s4 meters when evaluated for a time translation
where the numerator of each component indicates (W = 1) motivates the interpretation of PC as the
its Petrov type. The repeated principal null direction energy–momentum 4-vector of the spacetime at the
(PND) in the first three components and one of the instant defined by the cut C. In particular, for W = 1
PNDs in the fourth component are aligned with the the integral gives the time component of PC , the
tangent vector of the geodesic. This implies that Bondi–Sachs energy E.
the farthest reaching component of the Weyl tensor, The interpretation of [6] as energy–momentum is
which is O(1=s), has the Petrov type of a radiation strengthened by the fact that PC arises as dual to the
field. It is customary to combine the components translations which is familiar from Lagrangian field
which are O(1=si ) into one complex function and theories where energy and momentum appear as
denote it by 5i . When expressed in terms of the generators for time and space translations. In fact,
c this fall-off behavior implies that
field Ka bcd on M, one can set up a Hamiltonian framework where the
of all components of Ka bcd only 4 does not role of the Bondi–Sachs energy–momentum as
necessarily vanish on I þ . generator of asymptotic translations is made
In special cases like the Minkowski, Schwarzs- explicit.
child, Kerr, and more generally in all asymptotically This point of view suggests that one should also
flat stationary spacetimes, even 4 vanishes on I þ . be able to define a notion of angular momentum for
For these reasons, 4 is called the radiation field of asymptotically flat spacetimes because angular
the system, that is, that part of the gravitational field momentum arises as the generator of rotations,
which can be registered by the observers at infinity. which can also be defined asymptotically. However,
It describes the outgoing radiation which is being while there is a unique notion of translation on I þ ,
emitted by the system during its evolution. this is not the case for rotations (and boosts). The
reason is hidden in the structure of the BMS group
where the Lorentz group appears naturally as a
The Bondi–Sachs Mass-Loss Formula factor group but not as a unique subgroup. In
Gravitational waves carry away energy from the physical terms, the angular momentum depends on
system. This is a consequence of the Bondi–Sachs an origin but there is no natural way to choose an
mass-loss formula. The Bondi–Sachs energy– origin on I þ . This ambiguity in the choice of origin
momentum is related to a weighted integral over a leads to several nonequivalent expressions for
cut C, angular momentum in the literature.
Z Consider now two cuts C and C0 , with C0 later than
1 C. Then we may compute the difference E = E  E0
PC ½W ¼  W ½ 2 þ _  d2 S ½6
4G C of the Bondi–Sachs energies with respect to the two
226 Averaging Methods

cuts. It turns out that this difference can be the neighborhood of spacelike infinity i0 is not
expressed as an integral over the (three-dimensional) sufficiently well understood so far.
piece  of I þ which is bounded by the two cuts
(i.e., @ = C0  C): See also: Black Hole Mechanics; Boundaries for
Z Spacetimes; Canonical General Relativity; Einstein
1 Equations: Exact Solutions; Einstein Equations: Initial
0
E E¼ _ _ d3 V ½7
4G  Value Formulation; General Relativity: Overview;
Gravitational Waves; Quantum Entropy; Spacetime
This result means that the Bondi–Sachs energy of the Topology, Causal Structure and Singularities; Stability of
system decreases, since E0 < E and the rate of Minkowski Space; Stationary Black Holes.
decrease is given by the (positive-definite) amount
of gravitational radiation which leaves the system
during the period defined by the two cuts. Further Reading
It is necessary to point out that in this article the Ashtekar A (1987) Asymptotic Quantization. Naples: Bibliopolis.
structure of null infinity has been postulated based Bondi H, van der Burg MGJ, and Metzner AWK (1962)
on physical reasonings. The Einstein equations have Gravitational waves in general relativity VII. Waves from
been used only in a very weak sense, namely only in axi-symmetric isolated systems. Proceedings of the Royal
a neighborhood of I . It is an entirely different Society of London, Series A 269: 21–52.
Frauendiener J (2004) Conformal infinity. Living Reviews in
question whether the field equations are compatible Relativity, vol. 3. http://relativity.livingreviews.org/Articles/
with this postulated structure. To answer it, one lrr-2004-1/index.html.
needs to show that there are global solutions of the Friedrich H (1992) Asymptotic structure of space-time. In: Janis AI
Einstein equations which exhibit the postulated and Porter JR (eds.) Recent Advances in General Relativity.
behavior in the asymptotic region. This question Boston: Birkhäuser.
Friedrich H (1998a) Einstein’s equation and conformal structure.
has been settled recently in the affirmative: there are In: Huggett SA, Mason LJ, Tod KP, Tsou SS, and Woodhouse
many global spacetimes which are asymptotically NMJ (eds.) The Geometric Universe: Science, Geometry and
flat in the sense described here. the Work of Roger Penrose. Oxford: Oxford University Press.
This article discussed has the notion of null Friedrich H (1998b) Gravitational fields near space-like and null
infinity, that is, of spacetimes which are asymptoti- infinity. Journal of Geometry and Physics 24: 83–163.
Geroch R (1977) Asymptotic structure of space-time. In: Esposito
cally flat in lightlike directions. Spacetimes which FP and Witten L (eds.) Asymptotic Structure of Space-Time.
are asymptotically flat in spacelike directions have New York: Plenum.
not been covered. The latter is a notion which has Hawking S and Ellis GFR (1973) The Large Scale Structure of
been developed largely independently of null infinity Space-Time. Cambridge: Cambridge University Press.
Penrose R (1965) Zero rest-mass fields including gravitation:
since it is essentially a property of an initial data set
asymptotic behaviour. Proceedings of the Royal Society of
and not of the entire four-dimensional spacetime. London, Series A 284: 159–203.
Ultimately, these two notions should coincide, in the Penrose R (1968) Structure of space-time. In: DeWitt CM
sense that if one has an initial data set which is and Wheeler JA (eds.) Battelle Rencontres. New York:
asymptotically flat in spatial directions in an appro- W. A. Benjamin.
priate sense then its Cauchy development will be an Penrose R and Rindler W (1984, 1986) Spinors and Space-Time,
Cambridge: Cambridge University Press.
asymptotically flat spacetime. However, as of yet, it Sachs RK (1962) Gravitational waves in general relativity VIII.
is not clear what the appropriate conditions should Waves in asymptotically flat space-time. Proceedings of the
be because the structure of the gravitational field in Royal Society of London, Series A 270: 103–127.

Averaging Methods
A I Neishtadt, Russian Academy of Sciences, fast oscillations. The most common field of applica-
Moscow, Russia tions of averaging methods is the analysis of the
ª 2006 Elsevier Ltd. All rights reserved. behavior of dynamical systems that differ from
integrable systems by small perturbations.

Introduction Averaging Principle


Averaging methods are the methods of perturbation Equations of motion of a system that differ from an
theory that are based on the averaging principle and integrable system by small perturbations often can
the idea of dividing the dynamics into slow drift and be written in the form
Averaging Methods 227

I_ ¼ "gðI; ’; "Þ; ’_ ¼ !ðIÞ þ "f ðI; ’; "Þ the first r vectors of which belong to . Instead of
n ’, one can introduce new variables:
I ¼ ðI1 ; . . . ; In Þ 2 R ½1
m
’ ¼ ð’1 ; . . . ; ’m Þ 2 T modd 2; 0 < "  1 # ¼ ð#1 ; . . . ; #r Þ 2 T r modd 2
 ¼ ð1 ; . . . ; mr Þ 2 T mr modd 2
The small parameter " characterizes the amplitude
#i ¼ ðkðiÞ ; ’Þ; j ¼ ðkðrþjÞ ; ’Þ
of the perturbation. For " = 0 one gets the
unperturbed system. The equation I = const. sin- Let R be an r  m matrix whose rows are vectors
gles out an invariant m-dimensional torus of the k(i) , 1  i  r. For an approximate description of the
unperturbed system. The motion on this torus is behavior of variables I, #, the averaging principle
quasiperiodic with frequency vector !(I); compo- prescribes replacing system [1] by the system
nents of vector I are called ‘‘slow variables’’
whereas components of vector ’ are called ‘‘fast J_ ¼ "G ð J; Þ; _ ¼ R!ð JÞ þ "RF ð J; Þ
variables’’ or ‘‘phases.’’ The right-hand sides of I
ðmrÞ
system [1] are 2-periodic with respect to all ’j . It G ð J; #Þ ¼ ð2Þ gð J; ’; 0Þ d
T mr
½3
is assumed that they are smooth enough functions I
of all arguments. It is also assumed that compo-
F ð J; #Þ ¼ ð2ÞðmrÞ f ð J; ’; 0Þ d
nents of the frequency vector are not linearly T mr
dependent over the ring of integer numbers
(one should express g, f through #,  and then
identically with respect to I. System [1] is called
integrate over , d = d1    dmr ). System [3] is
a ‘‘system with rotating phases.’’
called ‘‘partially averaged system’’ for resonances in
In applications, one is often interested mainly in
. Functions G , F can be obtained from Fourier
the behavior of slow variables. The ‘‘averaging
series expansions of functions g, f for " = 0
principle’’ (or method) consists in replacing the
by throwing away harmonics exp (i(k, ’)), k 2 =
system of perturbed equations [1] by the ‘‘averaged
(nonresonant harmonics). Passing from system [1]
system’’
to system [3] is based on the idea that the ignored
I nonresonant harmonics oscillate fast and do not
J_ ¼ "Gð JÞ; Gð JÞ ¼ ð2Þm gð J; ’; 0Þ d’ ½2 affect essentially the evolution of the slow variables.
Tm Now let system [1] be a Hamiltonian system close
to an integrable one. The Hamiltonian function has
for the purpose of providing an approximate
the form
description of the evolution of the slow variables
over time intervals of order 1=" or longer. Here, H ¼ H0 ðpÞ þ "H1 ðp; ’; y; x; "Þ
d’ = d’1    d’m . System [2] contains only slow
where ’, x are coordinates and p, y are conjugated
variables and, therefore, is much simpler for
to them. The equations of motion have the same
investigation than system [1]. When passing from
form as [1], with I = (p, y, x):
system [1] to system [2], one ignores the terms
g(I, ’, 0)  G(I) on the right-hand side of [1]. The @H1 @H1
p_ ¼ " ; y_ ¼ "
averaging principle is based on the idea that these @’ @x
terms oscillate and lead only to small oscillations ½4
@H1 @H0 @H1
which are superimposed on the drift described by x_ ¼ " ; ’_ ¼ þ"
@y @I @I
the averaged system. To justify the averaging
principle, one should establish a relation between The averaging principle in the case when there are
the behavior of the solutions of systems [1] and [2]. no resonant relations leads to the system
This problem is still far from being completely @H1 @H1
solved. p_ ¼ 0; y_ ¼ " ; x_ ¼ "
@x @y
Another version of the averaging principles is I ½5
used in the case when frequencies are approxi- H1 ¼ ð2Þm H1 ðp; ’; y; x; 0Þ d’
mately in resonance. This means that one or Tm

several relations of the form (k, !) = 0 approxi- Therefore, in this case there is no drift in p, and the
mately are valid with irreducible integer coefficient behavior of y, x is described by the Hamiltonian
vectors k 6¼ 0; here, (k, !) is the standard scalar system, which contains p as a parameter. Equations
product in R m . Let  be a sublattice of the integer of motion of planets around the Sun can be reduced
lattice Zm generated by these vectors. Let to the form [4]. The issue of the absence of the
r = rank  and k(1) , k(2) , . . . , k(m) be a basis in Zm , evolution of momenta p is known in this problem as
228 Averaging Methods

the Lagrange–Laplace theorem, about the absence of where gk , k 2 Zm , are Fourier coefficients of func-
the evolution of semimajor axes of planetary orbits. tion g at " = 0, and u01 is an arbitrary function of J. It
is assumed that the denominators in [9] do not
vanish, and that the series in [9] converges and
determines a smooth function. In the same way,
Elimination of Fast Variables, Decoupling
from the other equations in [8] one can sequentially
of Slow and Fast Motions determine F0 , v1 , . . . , Gi , u iþ1 , Fi , v iþ1 , i  1.
The basic role in the averaging method is played by On truncating the series in [6] and [7] at the terms
the idea that the exact system can be in the principal of order " l , we obtain a truncated system of the lth
approximation transformed into the averaged sys- approximation. The equation for J is decoupled
tem by means of a transformation of variables close from the other equations and can be solved
to the identical one. The extension of this idea is the separately. Then the behavior of is determined
idea that similar transformation of variables allows by means of quadrature. The behavior of original
one to eliminate, up to an arbitrary degree of variable I in this approximation is a slow drift
accuracy, the fast phases from the right-hand sides (described by the equation for J), on which small
of the equations of perturbed motion and in this oscillations (described by transformation of variables)
way decouple the slow motion from the fast one. are superimposed. The behavior of ’ can be repre-
For system [1], provided there are no resonant sented as a rotation with slowly varying frequency,
relations between frequencies, the elimination of fast on which oscillations are also superimposed. For l = 1,
variables is performed as follows. The desirable the truncated system coincides with the averaged
transformation of variables (I, ’) 7! (J, ) is sought system [2].
as a formal series If the sublattice  Zm specifying possible
resonant relations is given, then in an analogous
I ¼ J þ "u1 ð J; Þ þ "2 u2 ð J; Þ þ    manner one can construct a formal transformation
½6
’¼ þ "v1 ð J; Þ þ "2 v2 ð J; Þ þ    of variables (I, ’) 7! (J, ) such that, in the new
variables, the fast phase will appear on the right-
where functions uj , vj are 2-periodic in . The hand sides of the differential equations for the new
transformation [6] should be chosen in such a way variables only in combinations (k, ), with k 2 
that in the new variables the right-hand sides of (see, e.g., Arnol’d et al. (1988)). Again, on truncat-
equations of motion do not contain fast variables, ing the series on the right-hand sides of the
that is, the equations of motion should have the differential equations for the new variables at the
form terms of order " l , we obtain a truncated system of
J_ ¼ "G0 ð JÞ þ "2 G1 ð JÞ þ    the lth approximation. At l = 1, this truncated
½7 system coincides with the partially averaged system
_ ¼ !ð JÞ þ "F0 ð JÞ þ "2 F1 ð JÞ þ   
[3] (for some special choice of arbitrary functions
Substituting [6] into [7], taking into account [1], and that are contained in the formulas for transformation
equating the terms of the same order in ", we obtain of variables). If the original system is a Hamiltonian
the following set of relations: system of the form [4], then the transformation of
variables eliminating the fast phases from the right-
@u1 hand sides of the differential equations can be
G0 ð JÞ ¼ gð J; ; 0Þ  !
@ chosen to be symplectic. The corresponding
@! @v1 procedures are called ‘‘Lindstedt method’’ and
F0 ð JÞ ¼ f ð J; ; 0Þ þ u1  !
@J @ ‘‘Newcomb method’’ (nonresonant case for n = m),
½8 ‘‘Delaunay method’’ (resonant case for n = m), and
@uiþ1
Gi ð JÞ ¼ Xi ð J; Þ  ! ‘‘von Zeipel method’’ (resonant case for n  m) (see
@
Poincaré (1957) and Arnol’d et al. (1988)).
@! @viþ1
Fi ð JÞ ¼ Yi ð J; Þ þ uiþ1  !; i1 The calculation of high-order terms in the
@J @ procedures of elimination of fast variables is rather
The functions Xi , Yi are uniquely determined by the cumbersome. There are versions of these procedures
terms u1 , v1 , . . . , ui , vi in expansion [6]. The first which are convenient for symbolic processors
equation in [8] implies that (especially for Hamiltonian systems, e.g., the
Deprit–Hori method; Giacaglia 1972).
G0 ð JÞ ¼ g0 ð JÞ ¼ Gð JÞ The averaging method consists in using the
X gk ½9
u1 ð J; Þ ¼ expðiðk; ÞÞ þ u01 ð JÞ averaged system for the description of motion in
k6¼0
iðk; !Þ the first approximation and the truncated systems
Averaging Methods 229

obtained by means of the procedures of elimination time intervals of order 1=" (Bogolyubov and
of fast variables in the higher approximations, Mitropol’skii 1961).
together with the corresponding transformations of If system [1] is a multifrequency system (m  2), but
variables. the vector of frequencies is constant and nonresonant,
then for any  > 0 and small enough " < "0 () it holds
that jI(t)  J(t)j <  for 0  t  K=" (Bogolyubov
Justification of the Averaging Method
1945, Bogolyubov and Mitropol’skii 1961). If, in
To justify the averaging method, one should estab- addition, the frequencies satisfy the Diophantine
lish conditions under which the deviation of the condition j(k, !)j > const jkj for all k 2 Zm n{0}
slow variables along the solutions of the exact and some  > 0, then one can choose  = O("). In
system from the solutions of the averaged system this case, higher approximations of the procedure of
with appropriate initial data on time intervals of elimination of fast variables allow one to describe
order 1=" or longer tends to 0 as " ! 0. It is the dynamics with an accuracy of the order of any
desirable to have estimates from the above for these power in " on time intervals of order 1=" (see, e.g.,
deviations. The estimates of deviations of the Arnol’d et al. (1988)).
solutions of the exact system from the solutions of If the system is a multifrequency system, and
the truncated systems obtained by means of the frequencies are not constant (but depend on the slow
procedure of elimination of fast phases are impor- variables I), then due to the evolution of slow
tant as well. It can happen that there are ‘‘bad’’ variables the frequencies themselves are evolving
initial data for which the slow component of the slowly. At certain time moments, they can satisfy
solution of the exact system deviates from the certain resonant relations. One of the phenomena
solution of the averaged system by a value of order that can take place here is a capture into a
1 over time of order 1=". In this case, one should resonance; this capture leads to a large deviation of
have estimates from above for the measure of the set the solutions of the exact and averaged systems.
of such ‘‘bad’’ initial data; on the complementary set However, the general Anosov averaging theorem
of initial data, one should have estimates from (Anosov 1960) implies that if the frequencies ! are
above for the deviation of slow variables along the nonresonant for almost all I, then for any  > 0, the
solutions of the exact system from the solution of inequality jI(t)  J(t)j <  is satisfied for 0  t  K="
the averaged system. These problems are currently for all initial data outside a set E(, ") whose
far from being completely solved. Some general measure tends to 0 as " ! 0.pffiffiIn ffi many cases, it
results are described in the following. turns out that mes E(, ") = O( "=) (in particular,
Let functions !, f , g on the right-hand side of the sufficient condition for the last estimate is that
system [1] be defined and bounded together with a rank(@!=@I) = m) (Arnol’d et al. (1988)).
sufficient number of derivatives in the domain D{I}  The knowledge about averaging in two-
T m {’}  [0, "0 ]. Let J(t) be the solution of the frequency systems (m = 2) on time intervals, of order
averaged system [2] with initial condition I0 2 D. of 1=", is relatively more complete (see Arnol’d
Let (I(t), ’(t)) be the solution of the exact system [1] (1983), Arnol’d et al. (1988), and Lochak and
with initial conditions (I0 , ’0 ). So, I(0) = J(0). It is Meunier (1988)). For Hamiltonian and reversible
assumed that the solution J(t) is defined and stays at systems, the justification of the averaging method is
a positive distance from the boundary of D on the a by-product of Kolmogorov–Arnold–Moser (KAM)
time interval 0  t  K=", K = const > 0. theory. The KAM theory provides estimates of the
If system [1] is a one-frequency system (m = 1), difference between the solutions of the exact and
and the frequency ! does not vanish in D, then for averaged systems for majority of initial data on
0  t  K=" the solution (I(t), ’(t)) is well defined, infinite time interval 1 < t < þ1. For remaining
and jI(t)  J(t)j < C", C = const. > 0. For ! = 1, this data this difference can grow because of Arnol’d
assertion was proved by P Fatou (1928) and, by a diffusion, but, in general, very slowly. According to
different method, by L I Mandel’shtam and L D the Nekhoroshev theorem, this difference is small on
Papaleksi (1934). This was historically the time intervals whose length grows exponentially when
first result on the justification of the averaging the perturbation decays linearly (for an analytic
method (Mintropol’skii 1971). There is a proof Hamiltonian if the unperturbed Hamiltonian is a
based on the elimination of fast variables (see, e.g., generic function, the so-called steep function).
Arnol’d (1983)). For a one-frequency system, higher Another aspect of justification of the averaging
approximations of the procedure of elimination of method is establishing relations between invariant
fast variables allow the description of the dynamics manifolds of the exact and averaged systems.
with an accuracy of the order of any power in " on Consider, in particular, the case of a one-frequency
230 Averaging Methods

system and a multifrequency system with constant first theorem, if (t), 0  t  K=", is a solution of
Diophantine frequencies. Suppose that the averaged the averaged system, and x(t) is a solution of the
system has an equilibrium such that real parts of all exact system with initial condition x(0) = (0), then
its eigenvalues are different from 0, or a limit cycle for any  > 0 there exists "0 () > 0 such that
such that the absolute values of all but one of its jx(t)  (t)j <  for 0  t  K=" and 0 < " < "0 ().
multipliers are different from 1. Then the exact The second and the third Bogolyubov theorems
system has an invariant torus, respectively, m- or describe the motion in the neighborhoods of
(m þ 1)-dimensional, whose projection onto the equilibria and the limit cycles of the averaged
space of the slow variables is O(")-close to the system. In particular, if for an equilibrium real
equilibrium (cycle) of the averaged system. This parts of all its eigenvalues are different from 0, or,
torus is stable or unstable together with the for a limit cycle, the absolute values of all but one
equilibrium (cycle) of the averaged system. For multipliers are different from 1, then the exact
Hamiltonian and reversible systems, the problem of system has a solution which eternally stays near
invariant manifolds is considered in the framework this equilibrium (cycle). The stability properties of
of the KAM theory. this solution are the same as the stability properties
of the corresponding equilibrium (cycle) of the
averaged system.
Averaging in Bogolyubov’s Systems For systems of the form [10] a procedure exists
that, similarly to the procedure in the section
Systems in the standard form of Bogolyubov (1945)
‘‘Elimination of fast variables, decoupling of slow
are of the form
and fast motions,’’ allows us to eliminate time t
x_ ¼ "Xðt; x; "Þ; x 2 R p; 0 < "  1 ½10 from the right-hand side of the system with an
accuracy of the order of any power in " by means of
It is assumed that the function X, besides the usual a transformation of variables. (To perform this
smoothness conditions, satisfies the condition of procedure, one should assume that the conditions
uniform average: the limit (time average) of uniform average are satisfied for functions
that arise in the process of constructing higher
Z T
1 approximations in this procedure (Bogolyubuv and
X0 ðxÞ ¼ lim Xðt; x; 0Þ dt ½11
T!1 T 0
Mitropol’skii 1961).) In the first approximation,
such a transformation of variables transforms the
exists uniformly in x. The averaging principle of original system into the averaged one.
Bogolyubov consists of the replacement of the The condition of uniform average is very impor-
original system in standard form by the averaged tant for theory. If the limit in [11] exists, but
system convergence is nonuniform in x, then the time
average X0 could be, for example, a discontinuous
_ ¼ " X0 ðÞ ½12
function of x, and the averaged system would not be
with a goal to provide an approximate description well defined.
of the behavior of x. This approach generalizes the
approach of the section ‘‘Averaging principle’’ for
the case of constant frequencies (! = const). Upon
Averaging in Slow–Fast Systems
introducing in the given system with constant
frequencies the deviation from uniform rotation Systems of the form [1] are particular cases of the
= ’  !t and denoting x = (I, ), we obtain a systems of the form
system in the standard form [10]. Here the condition
x_ ¼ f ðx; y; "Þ; y_ ¼ "gðx; y; "Þ ½13
of uniform average is fulfilled because X(t, x, 0) is a
quasiperiodic function of time t. The averaged which are called ‘‘slow–fast systems’’ (or systems
system [12] for nonresonant frequencies coincides with slow and fast motions, with slow and fast
with the averaged system [2]; for resonant frequen- variables). The generalization of the approach of the
cies, it coincides with the partially averaged system section ‘‘Averaging principle’’ for these systems is
[3] (one should only supply systems [2] and [3] with the following averaging principle of Anosov (1960).
equations for some components of the vector ’  !t In the system [6], let x 2 M, y 2 Rn , where M is a
that do not enter into the right-hand side of the smooth compact m-dimensional manifold. At " = 0,
averaged system). the system for fast variables x contains slow
The averaging principle of Bogolyubov is justified variables y as parameters. Assume that this system
by three Bogolyubov theorems. According to the (which is called ‘‘fast system’’) has a finite smooth
Averaging Methods 231

invariant measure
y and is ergodic for almost all
values of y. Introduce the averaged system
Z See also: Central Manifolds, Normal Forms;
_Y ¼ " GðYÞ; GðYÞ ¼ 1 gðx; Y; 0Þd
Y Diagrammatic Techniques in Perturbation Theory;

Y ðMÞ M Hamiltonian Systems: Stability and Instability Theory;
KAM Theory and Celestial Mechanics; Multiscale
According to the averaging principle, one should use
Approaches; Random Walks in Random Environments;
the solution Y(t) of the averaged system with initial
Separatrix Splitting; Stability Problems in Celestial
condition Y(0) = y(0) for approximate description of Mechanics; Stability Theory and KAM.
slow motion y(t) in the original system. This
averaging principle is justified by the following
Anosov theorem [1]: for any positive  the measure Further Reading
of the set E(, ") of initial data (from a compact in Anosov DV (1960) Averaging in systems of ordinary differential
the phase space) such that equations with rapidly oscillating solutions. Izvestiya Akade-
mii Nauk SSSR, Ser. Mat. 24(5): 721–742 (Russian).
max j yðtÞ  YðtÞj >  Arnol’d VI (1983) Geometrical Methods in the Theory
0  t  1="
of Ordinary Differential Equations. New York–Berlin:
tends to 0 as " ! 0. Springer.
The particular case when the original system is Arnol’d VI, Kozlov VV, and Neishtadt AI (1988) Mathematical
Aspects of Classical and Celestial Mechanics, Encyclopaedia
a Hamiltonian system depending on slowly vary-
of Mathematical Sciences, vol. 3. Berlin: Springer.
ing parameter = "t, and for almost all values of Bakhtin VI (2004) Cramér asymptotics in the averaging method
the motion of the system with = const is for systems with fast hyperbolic motions. Proceedings of the
ergodic on almost all energy levels, is considered Steklov Institute of Mathematics 244(1): 79.
in Kasuga (1961). Bogolyubov NN (1945) On some statistical methods in mathe-
matical physics. Akad. Nauk USSR. L’vov (Russian).
For the case when the has strong mixing proper- Bogolyubov NN and Mitropol’skii YuA (1961) Asymptotic
ties, see Bakhtin (2004) and Kifer (2004). Methods in the Theory of Nonlinear Oscillations. New York:
For slow–fast systems, there is also a general- Gordon and Breach.
ization of approach of the previous section that uses Giacaglia GEO (1972) Perturbation Methods in Nonlinear
time averaging and the condition of uniform average Systems, Applied Mathematical Science, vol. 8. Berlin: Springer.
Kasuga T (1961) On the adiabatic theorem for the
(Volosov 1962).
Hamiltonian system of differential equations in the classical
mechanics I, II, III. Proceedings of the Japan Academy 37(7):
366–382.
Kevorkian J and Cole JD (1996) Multiple Scale and Singular
Applications of the Averaging Method Perturbations Methods, Applied Mathematical Sciences,
vol. 114. New York: Springer.
The averaging method is one of the most productive
Kifer Y (2004) Some recent advances in averaging. In: Modern
methods of perturbation theory, and its applications Dynamical Systems and Applications, 403. Cambridge:
are immense. It is widely used in celestial mechanics Cambridge University Press.
and space flight dynamics for the description of the Lochak P and Meunier P (1988) Multiphase Averaging for
evolution of motions of celestial bodies, in plasma Classical Systems, Applied Mathematical Sciences, vol. 72.
New York: Springer.
physics and theory of accelerators for description of Mitropol’skii YuA (1971) Averaging Method in Nonlinear
motion of charged particles, and in radio engineer- Mechanics. Kiev: Naukova Dumka (Russian).
ing for the description of nonlinear oscillatory Poincaré H (1957) Les Méthodes Nouvelles de la Mécanique
regimes. There are also applications in hydrody- Céleste, vols. 1–3. New York: Dover.
namics, physics of lasers, optics, acoustics, etc. (see Sanders JA and Verhulst F (1985) Averaging Methods in
Nonlinear Dynamical Systems, Applied Mathematical
Arnol’d et al. (1988), Bogolyubov and Mitropol’skii
Sciences, vol. 59. New York: Springer.
(1961), Lochak and Meunier (1988), Mitropol’skii Volosov VM (1962) Averaging in systems of ordinary differential
(1971), and Volosov (1962)). equations. Russian Mathematical Surveys 17(6): 1–126.
232 Axiomatic Approach to Topological Quantum Field Theory

Axiomatic Approach to Topological Quantum Field Theory


C Blanchet, Université de Bretagne-Sud, Vannes, 2. Functoriality If a cobordism (W, X, Z) is
France obtained by gluing two cobordisms (M, X, Y) and
V Turaev, IRMA, Strasbourg, France (M0 , Y 0 , Z) along a diffeomorphism f : Y ! Y 0 , then
ª 2006 Elsevier Ltd. All rights reserved. the following diagram is commutative:
ðWÞ
VðXÞ ! VðZÞ

Introduction ðMÞ# # ðM0 Þ


f]
The idea of topological invariants defined via path VðYÞ ! VðY 0 Þ
integrals was introduced by A S Schwartz (1977) in a
special case and by E Witten (1988) in its full 3. Normalization For any n-dimensional manifold
power. To formalize this idea, Witten (1988) X, the linear map
introduced a notion of a topological quantum field
theory (TQFT). Such theories, independent of ð½0; 1  XÞ : VðXÞ ! VðXÞ
Riemannian metrics, are rather rare in quantum is identity.
physics. On the other hand, they admit a simple 4. Multiplicativity There are functorial
axiomatic description first suggested by M Atiyah isomorphisms
(1989). This description was inspired by G Segal’s
(1988) axioms for a two-dimensional conformal VðX q YÞ  VðXÞ  VðYÞ
field theory. The axiomatic formulation of TQFTs Vð;Þ  k
makes them suitable for a purely mathematical
research combining methods of topology, algebra, such that the following diagrams are commutative:
and mathematical physics. Several authors explored VððX q YÞ q ZÞ  ðVðXÞ  VðYÞÞ  VðZÞ
axiomatic foundations of TQFTs (see Quinn (1995) # #
and Turaev (1994). VðX q ðY q ZÞÞ  VðXÞ  ðVðYÞ  VðZÞÞ

VðX q ;Þ  VðXÞ  k
Axioms of a TQFT # #
VðXÞ ¼ VðXÞ
An (n þ 1)-dimensional TQFT (V, ) over a scalar
field k assigns to every closed oriented n-dimen- Here  = k is the tensor product over k. The
sional manifold X a finite-dimensional vector space vertical maps are respectively the ones induced
V(X) over k and assigns to every cobordism by the obvious diffeomorphisms, and the stan-
(M, X, Y) a k-linear map dard isomorphisms of vector spaces.
5. Symmetry The isomorphism
ðMÞ ¼ ðM; X; YÞ : VðXÞ ! VðYÞ
VðX q YÞ  VðY q XÞ
Here a cobordism (M, X, Y) between X and Y is a
compact oriented (n þ 1)-dimensional manifold M induced by the obvious diffeomorphism corre-
endowed with a diffeomorphism @M  X q Y (the sponds to the standard isomorphism of vector
overline indicates the orientation reversal). All spaces
manifolds and cobordisms are supposed to be
smooth. A TQFT must satisfy the following axioms. VðXÞ  VðYÞ  VðYÞ  VðXÞ

1. Naturality Any orientation-preserving diffeo- Given a TQFT (V, ), we obtain an action of the
morphism of closed oriented n-dimensional mani- group of diffeomorphisms of a closed oriented
folds f : X ! X0 induces an isomorphism f] : V n-dimensional manifold X on the vector space
(X)! V(X0 ). For a diffeomorphism g between the V(X). This action can be used to study this group.
cobordisms (M, X, Y) and (M0 , X0 , Y 0 ), the follow- An important feature of a TQFT (V, ) is that it
ing diagram is commutative: provides numerical invariants of compact oriented
(n þ 1)-dimensional manifolds without boundary.
ðgjX Þ]
VðXÞ ! VðX0 Þ Indeed, such a manifold M can be considered as a
cobordism between two copies of ; so that (M) 2
ðMÞ # # ðM0 Þ Homk (k, k) = k. Any compact oriented (n þ 1)-
ðgjY Þ]
VðYÞ ! VðY 0 Þ dimensional manifold M can be considered as a
Axiomatic Approach to Topological Quantum Field Theory 233

cobordism between ; and @M; the TQFT assigns to circles S1 q S1 and one circle S1 ) defines a commu-
this cobordism a vector (M) in Homk (k, tative multiplication on the vector space A = V(S1 ).
V(@M)) = V(@M) called the vacuum vector. The 2-disk, considered as a cobordism between S1
The manifold [0, 1]  X, considered as a cobord- and ;, induces a nondegenerate trace on the algebra
ism from X q X to ; induces a nonsingular pairing A. This makes A into a commutative Frobenius
algebra (also called a symmetric algebra). This
VðXÞ  VðXÞ ! k algebra completely determines the TQFT (V, ).
We obtain a functorial isomorphism V(X) = Moreover, this construction defines a one-to-one
V(X) = Homk (V(X), k). correspondence between equivalence classes of two-
We now outline definitions of several important dimensional TQFTs and isomorphism classes of
classes of TQFTs. finite dimensional commutative Frobenius algebras
If the scalar field k has a conjugation and all the (Kock 2003).
vector spaces V(X) are equipped with natural The formalism of TQFTs was to a great extent
nondegenerate Hermitian forms, then the TQFT motivated by the three-dimensional case, specifi-
(V, ) is Hermitian. If k = C is the field of complex cally, Witten’s Chern–Simons TQFTs. A mathema-
numbers and the Hermitian forms are positive tical definition of these TQFTs was first given
definite, then the TQFT is unitary. by Reshetikhin and Turaev using the theory of
A TQFT (V, ) is nondegenerate or cobordism quantum groups. The Witten–Reshetikhin–Turaev
generated if for any closed oriented n-dimensional three-dimensional TQFTs do not satisfy exactly the
manifold X, the vector space V(X) is generated by definition above: the naturality and the functoriality
the vacuum vectors derived as above from the axioms only hold up to invertible scalar factors
manifolds bounded by X. called framing anomalies. Such TQFTs are said to
Fix a Dedekind domain D  C. A TQFT (V, ) be projective. In order to get rid of the framing
over C is almost D-integral if it is nondegenerate and anomalies, one has to add extra structures on the
there is d 2 C such that d(M) 2 D for all M with three-dimensional cobordism category. Usually one
@M = ;. Given an almost integral TQFT (V, ) and a endows surfaces X with Lagrangians (maximal
closed oriented n-dimensional manifold X, we define isotropic subspaces in H1 (X; R)). For 3-cobordisms,
S(X) to be the D-submodule of V(X) generated by all several competing – but essentially equivalent –
the vacuum vectors. This module is preserved under additional structures are considered in the literature:
the action of self-diffeomorphisms of X and yields a 2-framings (Atiyah 1989), p1 -structures (Blanchet
finer ‘‘arithmetic’’ version of V(X). et al. 1995), numerical weights (K Walker, V Turaev).
The notion of an (n þ 1)-dimensional TQFT over Large families of three-dimensional TQFTs are
k can be reformulated in the categorical language as obtained from the so-called modular categories.
a symmetric monoidal functor from the category of The latter are constructed from quantum groups at
n-manifolds and (n þ 1)-cobordisms to the category roots of unity or from the skein theory of links.
of finite-dimensional vector spaces over k. The See Quantum 3-Manifold Invariants.
source category is called the (n þ 1)-dimensional
cobordism category. Its objects are closed oriented
n-dimensional manifolds. Its morphisms are cobord- Additional Structures
isms considered up to the following equivalence:
The axiomatic definition of a TQFT extends in
cobordisms (M, X, Y) and (M0 , X, Y) are equivalent
various directions. In dimension 2 it is interesting to
if there is a diffeomorphism M ! M0 compatible
consider the so-called open–closed theories involving
with the diffeomorphisms @M  X q Y  @M0 .
1-manifolds formed by circles and intervals and
two-dimensional cobordisms with boundary
(G Moore, G Segal). In dimension 3 one often
TQFTs in Low Dimensions
considers cobordisms including framed links and
TQFTs in dimension 0 þ 1 = 1 are in one-to-one graphs whose components (resp. edges) are labeled
correspondence with finite-dimensional vector with objects of a certain fixed category C. In such a
spaces. The correspondence goes by associating theory, surfaces are endowed with finite sets of
with a one-dimensional TQFT (V, ) the vector points labeled with objects of C and enriched with
space V(pt) where pt is a point with positive tangent directions. In all dimensions one can study
orientation. manifolds and cobordisms endowed with homotopy
Let (V, ) be a two-dimensional TQFT. The linear classes of mappings to a fixed space (homotopy
map  associated with a pair of pants (a 2-disk with quantum field theory, in the sense of Turaev).
two holes considered as a cobordism between two Additional structures on the tangent bundles – spin
234 Axiomatic Quantum Field Theory

structures, framings, etc. – may be also considered Blanchet C, Habegger N, Masbaum G, and Vogel P (1995)
provided the gluing is well defined. Topological quantum field theories derived from the Kauff-
man bracket. Topology 34: 883–927.
Kock J (2003) Frobenius Algebras and 2D Topological Quantum
See also: Braided and Modular Tensor Categories; Hopf
Field Theories. LMS Student Texts, vol. 59. Cambridge:
Algebras and q-Deformation Quantum Groups; Indefinite Cambridge University Press.
Metric; Quantum 3-Manifold Invariants; Topological Quinn F (1995) Lectures on axiomatic topological quantum field
Gravity, Two-Dimensional; Topological Quantum Field Freed DS and Uhlenbeck KK (eds.) Geometry and Quantum
Theory: Overview. Field Theory, pp. 325–453. IAS/Park City Mathematical Series,
University of Texas, Austin: American Mathematical Society.
Segal G (1988) Two-dimensional conformal field theories and
modular functors. In: Simon B, Truman A, and Davies IM
Further Reading (eds.) IXth International Congress on Mathematical Physics,
pp. 22–37. Bristol: Adam Hilger Ltd.
Atiyah M (1989) Topological Quantum Field Theories. Publica- Turaev V (1994) Quantum Invariants of Knots and 3-Manifolds.
tions Mathématiques de l’Ihés 68: 175–186. de Gruyter Studies in Mathematics, vol. 18. Berlin: Walter de
Bakalov B and Kirillov A Jr. (2001) Lectures on Tensor Gruyter.
Categories and Modular Functors. University Lecture Series Witten E (1988) Topological quantum field theory. Communica-
vol. 21. Providence, RI: American Mathematical Society. tion in Mathematical Physics 117(3): 353–386.

Axiomatic Quantum Field Theory


B Kuckert, Universität Hamburg, Hamburg, Germany (in 1 þ 3 spacetime dimensions). So, the develop-
ª 2006 Elsevier Ltd. All rights reserved. ment of alternatives and modifications of the setting
got into the focus of the theory, and the axioms
themselves became the objects of research. Their
role as axioms – understood in the common sense –
Introduction
turned into the role of mere properties of quantum
The term ‘‘axiomatic quantum field theory’’ sub- fields. Today, the term ‘‘axiomatic quantum field
sumes a collection of research branches of quantum theory’’ is widely avoided for this reason.
field theory analyzing the general principles of In a long list of publications spread over the
relativistic quantum physics. The content of the 1960s, Araki, Borchers, Haag, Kastler, and others
results typically is structural and retrospective rather worked out an algebraic approach to quantum field
than quantitative and predictive. theory in the spirit of Segal’s ‘‘postulates for general
The first axiomatic activities in quantum field theory quantum Mechanics’’ (1947) (see Algebraic Approach
date back to the 1950s, when several groups started to Quantum Field Theory).
investigating the notion of scattering and S-matrix in The Wightman setting was the basis of a frame-
detail (Lehmann, Symanzik, and Zimmermann 1955 work into which the causal construction of the
(LSZ-approach), Bogoliubov and Parasiuk 1957, Hepp S-matrix developed by Stückelberg (1951) and
and Zimmermann (BPHZ-approach), Haag 1957–59 Bogoliubov and Shirkov (1959) has been fitted by
and Ruelle 1962 (Haag–Ruelle theory) (see Scattering, Epstein and Glaser (1973). The causality principle
Asymptotic Completeness and Bound States and fixes the time-ordered products up to a finite
Scattering in Relativistic Quantum Field Theory: number of parameters at each order, which are to
Fundamental Concepts and Tools). be put in as the renormalization constants.
Wightman (1956) analyzed the properties of the Already in 1949, Dyson had seen that problems in
vacuum expectation values used in these approaches the formulation of quantum electrodynamics (QED)
and formulated a system of axioms that the vacuum could be avoided by ‘‘just’’ multiplying the time
expectation values ought to satisfy in general. Together variable and, correspondingly, the energy variable by
with Gårding (1965), he later formulated a system of the imaginary unit constant (‘‘Wick rotation’’). Schwin-
axioms in order to characterize general quantum fields ger then investigated time-ordered Green functions of
in terms of operator-valued functionals, and the two QED in this Euclidean setting. This approach was
systems have been found to be equivalent. formulated in terms of axioms by Osterwalder and
A couple of spectacular theorems such as the PCT Schrader (1973, 1975) (see Euclidean Field
theorem and the spin–statistics theorem have been Theory).
obtained in this setting, but no interacting quantum Other extensions of the aforementioned settings
fields satisfying the axioms have been found so far are objects of current research (see Indefinite Metric,
Axiomatic Quantum Field Theory 235

Quantum Field Theory in Curved Spacetime, Continuity as a distribution For all ,  2 D, the
1þ3
Symmetries in Quantum Field Theory of Lower linear functionals T, ,  on C1
0 (R ) defined by
Spacetime Dimensions, and Thermal Quantum Field
a
Theory). T;; ð’Þ :¼ h; Fa ð’Þi

are distributions. They can be extended to tempered


distributions.
Quantum Fields
The Fourier transform of a tempered distribution
Gårding and Wightman characterized operator- is well defined as a tempered distribution. It is
valued quantum fields on the Minkowski spacetime mainly due to the importance of Fourier transforma-
R1þ3 by a couple of axioms. Given additional tions that the preceding assumption is convenient.
assumptions concerning the high-energy behavior, Bogoliubov et al. (1975) remark that the assumption
the Gårding–Wightman fields are in one–one corre- is not a mere technicality, since it rules out
spondence with algebraic field theories. nonrenormalizable quantum fields.
Without specifying or presupposing these addi-
tional assumptions, the axioms will now be for- Microcausality (Bose–Fermi alternative) If ’ and
mulated and discussed in detail and compared to the are test functions with spacelike separated support,
corresponding conditions in the algebraic setting. then
Adjoint operators are marked by an asterisk, and
Einstein’s summation convention is used. Fa ð’ÞFb ð ÞjD ¼  Fb ð ÞFa ð’ÞjD :
Operator-valued functionals The components of a
The sign depends on the statistics of the fields, it
field F are an n-tuple F1    Fn of linear maps that
1þ3 is ‘‘’’ if and only if both F a and Fb are fermion
assign to each test function ’ 2 C1 0 (R ) linear
fields.
operators F1 (’)    Fn (’) in a Hilbert space H with
Microcausality is closely related to Einstein
domains of definition D(F1 (’))    D(Fn (’)). There
causality. Einstein causality requires that any two
exists a dense subspace D of H with
observables located in spacelike separated regions
D  D(F (’)) \ D(F (’) ) and F (’)D [ F (’) D  D
commute in the strong sense, that is, their spectral
for all indices . Consider m such fields F 1    Fm
measures commute. But fields with Fermi–Dirac
with components Fa , 1  a  m, 1    na . Assume
statistics are not observables, and not even for Bose–
there to be an involution  : (1    m) ! (1    m) such
 Einstein fields with self-adjoint field operators does
that Fa (’) = Fa (’) , where ’(x) :¼ ’(x).
the above condition imply that the spectral projec-
Quantum fields cannot be operator-valued func- tions commute, which is the criterion for commen-
tions on R1þ3 if one wants them to exhibit (part of) surability. The sign on the right-hand side does,
the properties to follow. But point fields can be however, specify the statistics of the field.
quadratic forms; typically this is the case for fields in This is a crucial difference with the algebraic
a Fock space. approach. If O and P are spacelike separated open
For each component Fa and each open region regions and if A 2 A (O) and B 2 A (P), then one
O  R1þ3 , the field operators Fa (’) with supp ’  O assumes, like in the above case, that AB = BA
generate a  -algebra F a (O) of operators defined on (locality). But being elements of C -algebras, A and
D. These operators typically are unbounded, which B are bounded operators (or can be represented
is one of the differences with the traditional setting accordingly), so if A and B are self-adjoint, they are,
of the algebraic approach. There a C -algebra A (O) indeed, commensurable.
is assigned to each open region O in such a way Doplicher, Haag, and Roberts (1974) and Buch-
that O  P implies A (O)  A (P). Each C -algebra holz and Fredenhagen (1984) have derived from this
is a  -algebra, but in contrast to a C -algebra, input of observables a field structure of localized
a -algebra does not need to be endowed with a particle states, and they showed that the statistics of
norm. The fundamental observables in quantum these fields is Bose–Einstein, Fermi–Dirac, or some
theory are bounded positive operators (typically, but corresponding parastatistics (which is, a priori,
not always, projections), and these generate a C - forbidden if one assumes microcausality).
algebra. Recall that the unimodular group SL(2, C) is
There is no fundamental physical motivation for isomorphic to the universal covering group of
confining the setting to fields with a finite number of the restricted Lorentz group L"þ (the connected
components, except that it includes most of the component containing the unit element). Denote by
fields known from ‘‘daily life.’’  : SL(2, C) ! L"þ a covering map.
236 Axiomatic Quantum Field Theory

Covariance There exist strongly continuous uni- (using the nuclear theorem) and
tary representations U and T of SL(2, C) and
wa11 
 aN a1  aN
N ð Þ :¼ h; F1  N ð Þi: ½1

(R1þ3 , þ), respectively, and representations


D1    Dm of SL(2, C) in Cn1    Cnm , respectively, These distributions are called the ‘‘N-point func-
such that tions’’ of the fields F 1    F m and yield the vacuum
UðgÞFa ð’ÞUðgÞ ¼ Da ðg1 Þ Fa ð’ððgÞ1 ÞÞ expectation values of the theory. It is straightfor-
ward to deduce the following properties from the
and Gårding–Wightman axioms.
TðyÞFa ð’ÞTðyÞ ¼ Fa ð’ð  yÞÞ; Microcausality (Bose–Fermi alternative) If ’i and
’iþ1 have spacelike separated supports, then
where Da (g1 ) are the elements of the matrix
Da (g1 ). Dropping coordinate indices, this reads wa11 
 ai aiþ1  aN
i iþ1  N ð’1    ’i ’iþ1    ’N Þ

UðgÞF a ð’ÞUðgÞ ¼ Da ðg1 ÞF a ð’ððgÞ1 ÞÞ ¼  wa11 


 aiþ1 ai  aN
iþ1 i  N ð’1    ’iþ1 ’i    ’N Þ:

and or dropping coordinate indices,

TðyÞFa ð’ÞTðyÞ ¼ F a ð’ð  yÞÞ: wa1  ai aiþ1  aN ð’1    ’i ’iþ1    ’N Þ
The representations U and T generate a representa- ¼  wa1  aiþ1 ai  aN ð’1    ’iþ1 ’i    ’N Þ:
tion of the universal covering of the restricted
Poincaré group. Invariance For all g 2 SL(2, C) and y 2 R1þ3 , one has

As it stands, this assumption is a very strong one, wa11  aN


 N ð’1    ’N Þ
since it manifestly fixes the action of the representa-
¼ Da1 ðg1 Þ11    DaN ðg1 ÞNN
tion on the field operators. In the algebraic
approach, the covariance assumption is more mod- wa11 
 aN
N ððgÞ’1    ðgÞ’N Þ
estly formulated. Namely, it is assumed that ¼ wa11  aN
 N ð’1 ð  yÞ    ’N ð  yÞÞ
U(g)A (O)U(g) = A ((g)O) and T(y)A (O)T(y) =
A (O þ y), leaving open how the representation acts or dropping coordinate indices,
on the single local observables.
wa1  aN ð’1    ’N Þ
Vacuum vector There exists a unique (up to a  
multiple) vector  2 D that is invariant under the ¼ Da1 ðg1 Þ    DaN ðg1 Þ
representations U and T and cyclic with respect to wa1  aN ððgÞ’1    ðgÞ’N Þ
the algebra F (R1þ3 ) generated by all field operators
¼ wa1  aN ð’1 ð  yÞ    ’N ð  yÞÞ:
Fa (’), that is, F (R1þ3 ) = H.
By translation invariance, the N-point functions
Spectrum condition The joint spectrum of the
wa11 
aN
ðx1    xN Þ only depend on the N  1 relative-
components of the 4-momentum, i.e., of the gen- N
position vectors 1 :¼ x1  x2 , 2 :¼ x2  x3 ; . . . ,
erators of the spacetime translations, has support in
N1 :¼ xN1  xN . This means that there are distribu-
the closed forward light cone V þ , that is, the set
tions Wa11 N
on ðR1þ3 ÞN1 related to the N-point
{k2  0, k0  0}. N
functions by the symbolic condition
The existence of an invariant ground state called
the vacuum is standard in algebraic quantum field wa11 
aN
N
ðx1    xN Þ ¼ Wa11
aN
N
ð1    N1 Þ:
theory as well. In precise notation, this reads
Z
a1 aN
w1 N ð’Þ ¼ Wa11
aN
N
ð’x Þ dx;
1þ3
N-Point Functions
where
Consider the above fields F 1    F m . For each N 2 N
and each N-tuple (a1    aN ) of natural numbers  m ’x ð1    N1 Þ :¼ ’ðx; x  1 ; x  1  2 ; . . . ; x  1
(labeling fields), define families (F a1  aN ) :¼
     N1 Þ:
(Fa11 
 aN
N )i  nai and w
a1  aN
:¼ (wa11 
 aN
N )i  nai of dis-
1þ3 N
tributions on (R ) by The functions Wa11
aN
N
are called the Wightman
functions, and they have the following property
Fa11 
 aN a1 aN
N ð’1    ’N Þ :¼ F1 ð’1 Þ    FN ð’N Þ because of the spectrum condition of the field.
Axiomatic Quantum Field Theory 237

Spectrum condition The support of the Fourier and an antilinear involution  by ( , ) :¼ (  ,  ).
transform of each Wa11  aNN is contained in (V þ )N1 . This endows BI with the structure of a nonabelian
-algebra with unit element 1 = (1, ;) (Borchers
The uniqueness of the vacuum vector (up to a
algebra).
phase) is equivalent to the following condition.
If one defines F; (z) :¼ z1, then w; (z) = z, and the
Cluster property For N  2, let x be a spacelike Wightman functions induce a C-linear functional !
vector in R1þ3 , let L be a natural number < N, and on BI by
let ’ and be tempered test functions on (R1þ3 )L
1þ3 NL !ð ; Þ :¼ w ð Þ ½2

and (R ) , respectively. then


! exhibits the following two properties, which are
lim wa11 ...
 aN
N
ð’ ð  xÞÞ the announced additional conditions required for
0<!1
¼ wa11 
 aL aLþ1  aN reconstructing the fields from the N-point functions.
L ð’ÞwLþ1  N ð Þ:
Hermiticity !( ) = !():
On the one hand, these properties have been
deduced from the Gårding–Wightman axioms via Positivity !( )  0:
eqn [1]. Conversely, a family of distributions To see Hermiticity, compute
labeled in the above fashion and satisfying the

above properties may be used to construct a !ð ;  Þ ¼ h; F ð 
Þi
Gårding–Wightman field theory provided that two ¼ hF ð Þ; i ¼ !ð ; Þ
more conditions – which hold for all systems of
N-point functions – are satisfied. This requires and use C-linearity to prove the statement for
some elementary notation. arbitrary  2 B. For positivity, write any  as a finite
Define the index sets sum  = ( 1 , 1 ) þ    þ ( M , M ), and compute
  !
a1    aN XM
I N :¼ : 1  ai  m; 1  i  nai 
!ð Þ ¼ ! 
ð i ; i Þ ð j ; j Þ
1     N
 i;j¼1
!
for all 1  i  N ; N 2 N X 
 
¼! i j ; i j
S ij
I 0 :¼ {;}, and I :¼ N 2 N0 I N . On I a concatena- X  

tion is defined by ¼ wi j i j
      ij
a1    a N b 1    bM a 1    a N b 1    bM X  

:¼ ¼ h; Fi j i j i
1    N 1    M 1    N 1    M ij
X  

and ¼ h; Fi i Fj ð j Þi
ij
;  :¼  ; :¼  X
¼ hFi ð i Þ; Fj ð j Þi
and an involution  by ij
 2
     X 
 Fi ð i Þ
a1    aN

aN    a1
and ; :¼ ;:
¼   0:
1    N  N    1 i

Define an antilinear involution  on S N :¼ Theorem 1 (Wightman’s reconstruction theorem).


S((R1þ3 )N ) by Let m and n1    nm be natural numbers, let
I 0 , I 1 , I 2 , . . . , and I be the above index sets, and
ðx1    xN Þ :¼ ðxN    x1 Þ
let BI be the above Borchers algebra. Let D1    Dm
for each N 2 N. Put S 0 :¼ C and z :¼ z for all be matrix representations of SL(2, C) in Cn1    Cnm ,
z 2 C. respectively.
S
Define S I N :¼ S N I N , and S I :¼ N S I N . For For each natural number N, let (w ) 2 I N be a
 1þ3 N
each  2 I N , the set S :¼ S((R ) ) L {} is a family of distributions on (R 1þ3 )N . Suppose the
linear space. On the direct sum BI :¼  2 I S  family (w ) 2 I defined this way satisfies microcaus-
define an associative product by ality, covariance, spectrum condition, and the
cluster property. If the linear functional ! defined
ð ; Þð; Þ :¼ ð ;  Þ on BI by eqn [2] is Hermitian and positive, then
238 Axiomatic Quantum Field Theory

there is (up to unitary equivalence) a unique family unitary operators implementing the Lorentz boosts on
F 1    F m of Gårding–Wightman fields with n1    nm the fields are elements of modular groups. This means
components such that eqn [1] holds. that a uniformly accelerated observer perceives the
vacuum as a thermal state with a temperature
The proof uses the GNS construction known from
proportional to its acceleration, corresponding to the
the theory of operator algebras. The Borchers
famous Unruh effect.
algebra plays several roles. On the one hand, it is a
In addition, it was shown that P1 CT symmetries
linear space with an inner product. The Hilbert
(i.e., PCT combined with rotations by the angle ) are
space H and the invariant space D of the field theory
implemented by modular conjugations (modular P1 CT
are constructed from this structure. On the other
symmetry). Modular P1 CT symmetry is a consequence
hand, the Borchers algebra acts on itself as an
of the Unruh effect (Guido and Longo 1995).
algebra of linear operators by its own algebra
multiplication. This is the structure the -algebra of
Spin and Statistics
field operators is constructed from.
Immediately following Lüders’s PCT theorem, the
spin–statistics theorem was proved for the N-point
Results
functions of the Wightman setting (Lüders and
The mathematical and structural analysis of quan- Zumino 1958, Burgoyne 1958, Dell’Antonio 1961).
tum fields has improved the understanding of This was a remarkable and widely acknowledged
scattering theory in the different approaches men- progress. But as remarked earlier, the confinement to
tioned above; see Bogoliubov et al. (1975) and the finite-component fields, which is used in the proof,
relevant articles in this encyclopedia. Apart from cannot be motivated by physical first principles (i.e., in
this, the following results deserve to be mentioned. a truly axiomatic fashion). The representation D of
Evidently, many others have to be omitted for SL(2, C) acting on the components, however, is forced
practical reasons. to be finite dimensional by this assumption, and since
the representations Da are objects of investigation, a
PCT Symmetry considerable part of the result is assumed this way
An early famous result was Lüders’s proof (1957) from the outset. Even more so, there are examples of
that all fields in the above setting exhibit PCT fields with a ‘‘wrong’’ spin–statistics connection and
symmetry, that is, the symmetry under reflections in infinitely many components.
all space and time variables combined with a charge This was one reason to continue working on the
conjugation. This symmetry is exhibited by all subject. At the beginning of the 1990s, it was found
particle reactions observed so far. The proof, like that the spin–statistics theorem can be derived from
several of the main results, made extensive use of the the symmetries discovered by Bisognano and Wich-
fact that the N-point functions are boundary values mann, and Unruh. Two approaches not referring to
of analytic functions due to the spectrum condition, the number of internal degrees of freedom have been
and that a fundamental theorem by Bargmann, Hall, worked out: one assumes the Unruh effect (Guido
and Wightman (1957) yields invariant analytic and Longo 1995), the other modular P1 CT symme-
extensions. try (Kuckert 1995, 2005, Kuckert and Lorenzen
2005). The first approach has been generalized to
Reeh–Schlieder Theorem conformal fields, the second to the case that the
For each field Fa and each bounded open region symmetry group’s homogeneous part is not SL(2, C),
O  R1þ3 , the vacuum vector is cyclic with respect but only SU(2).
to F a (O) (Reeh and Schlieder 1961). So excitations Both approaches can be applied to infinite-
of the vacuum vector by field operators located in O component fields. They yield existence theorems; a
are not to be considered as state vectors of a particle distinguished representation is constructed from the
localized in O, since they are not perpendicular to modular symmetries, and this representation exhib-
the excitations by field operators located outside O. its Pauli’s spin–statistics connection. As mentioned
before, nothing more can be expected at this level of
Unruh Effect and Modular P1 CT Symmetry generality. The line of argument works in both the
algebraic and the Wightman setting.
In the 1970s, Bisognano and Wichmann (1975, 1976)
discovered a surprising link of symmetries to the
A Dynamical Property of the Vacuum
intrinsic algebraic structure of quantum fields, which is
established by the Tomita–Takesaki modular theory One can derive the spectrum condition, the Bisog-
(see Tomita–Takesaki Modular Theory). Namely, the nano–Wichmann symmetries/the Unruh effect, and
Axiomatic Quantum Field Theory 239

covariance from the condition that no (inertial or) (and, hence, also special) relativity have to satisfy to
uniformly accelerated observer can extract mechan- ensure causality. But the conflict can be solved by
ical energy from the field in vacuo by means of a smearing the densities out in space or time, as has
cyclic process (Kuckert 2002). first been realized by Ford (1991). The extent to
which the energy density can become negative
Interacting Fields depends on the extent to which it is smeared out:
‘‘more smearing means less violation of positivity,’’
The examples of interacting quantum fields that fit
so the classical positivity conditions are restored at
into the above settings live in one or two spatial
medium and large scales. There are many ways to
dimensions only, and their relevance for physics
make this principle concrete. Quantum energy
mainly consists in being such examples. This
inequalities hold for thermodynamically well-
has contributed to some frustration and to doubts
behaved quantum fields on causally well-behaved
on whether one is not, in fact, proving theorems on
classical spacetime backgrounds.
pretty empty sets, or in other words, working on
‘‘the most sophisticated theory of the free field.’’
The computations in quantum field theory are, like Bibliographic Notes
most of the computations in physics, perturbative. In
Important monographs on axiomatic quantum field
order to be successful, they need to yield good
theory are those by Streater and Wightman (1964),
agreement with experiment with reasonable compu-
Jost (1965), Bogoliubov et al. (1975), and Bogoliubov
tational efforts, that is, by evolution up to the second
et al. (1990). Note that the books of Bogoliubov et al.
or third order. This asymptotic convergence is more
differ in setup fundamentally and that neither replaces
important than convergence of the series as a whole.
the other. For a lecture notes volume, see also Völkel
There are low-dimensional examples of interacting
(1977), and for a review article, see Streater (1975).
Wightman fields (e.g., (’4 )2 ; cf. the monograph by
A valuable discussion of the Wightman axioms can
Glimm and Jaffe (1987)), and time will tell whether
also be found in the second volume of the series by
four-dimensional interacting Wightman fields exist.
Reed and Simon (1970).
But there is no reason to expect convergence for
The first monograph on the algebraic approach to
general interacting fields; for example, QED does not
quantum field theory is due to Haag (1992), a more
fit into the Wightman framework.
recent one has been written by Araki (1999).
The appropriate extension of the Wightman
Concerning the sufficient conditions for ‘‘switching’’
setting has been formulated by Epstein and Glaser
between the Gårding–Wightman and the algebraic
(1973). It defines the S-matrix rather than the field
approach, see Wollenberg (1988) and the Ph.D.
itself as a (in general divergent) formal power series
thesis of Bostelmann (2000) and references given
of operator-valued distributions.
there. Dynamical and thermodynamical foundation
The above results apply to this somewhat more
of standard axioms, the Bisognano–Wichmann
modest setting as well, so the ‘‘axiomatic’’
symmetries (Unruh effect), and the spin–statistics
approaches do help in understanding the known
theorem, have been investigated by Kuckert (2002,
high-energy physics interactions. This even includes
2005), see also the references given there for related
gauge theories (see Perturbative Renormalization
work.
Theory and BRST). The high-precision results of
In different formulations and at differing degrees of
QED can be reproduced within this setting, and
mathematical sophistication, the causal approach to
there occur no UV singularities: renormalization
perturbation theory can be found in the monographs
amounts to the need to extend distributions by
by Bogoliubov and Shirkov (1959), Scharf (1989,
fixing some parameters, that is, the renormalization
2001), and Steinmann (2000). Two modern review
constants. The infrared problem is circumvented by
articles have been written by Brunetti and Fredenhagen
considering the S-matrix as a (position-dependent)
(2000) and by Dütsch and Fredenhagen (2004).
distribution taking values in the unitary formal
The reference original articles on the Euclidean
power series of distributions rather than as a single
axioms are those of Osterwalder and Schrader (1973,
(global) unitary operator (or unitary power series).
1975). Note that the first one contains an error. (cf.
also Zinoviev (1995)). A monograph on Euclidean
Quantum Energy Inequalities
field theory and its relations to the other axiomatic
Energy densities of Wightman fields admit negative settings of quantum field theory and to statistical
expectation values (Epstein, Glaser, and Jaffe 1965). mechanics is that by Glimm and Jaffe (1987).
This is in contrast to the positivity conditions that A recent review on quantum energy inequalities is
the energy–momentum tensors of classical general due to Fewster (2003).
240 Axiomatic Quantum Field Theory

Acknowledgments Dütsch and Fredenhagen K (2004) Causal Perturbation Theory in


terms of retarded products, and a proof of the Action Ward
The author is a fellow of the Emmy-Noether Identity, to appear in. Rev. Math. Phys.
Programme (DFG). Thanks for discussions are due Fewster CJ (2003) Energy Inequalities in Quantum Field Theory.
Proceedings of the International Conference on Mathematical
to Professor D Arlt.
Physics (revised version under math-ph/0501073).
Glimm J and Jaffe A (1987) Quantum Physics: A Functional
See also: Algebraic Approach to Quantum Field Theory; Integral Point of View, 2nd edn. Berlin–Heidelberg–
C*-Algebras and Their Classification; Constructive New York: Springer.
Quantum Field Theory; Dispersion Relations; Euclidean Guido D and Longo R (1995) An algebraic spin and statistics
Field Theory; Indefinite Metric; Perturbative Theorem. Communications in Mathematical Physics 172: 517.
Haag R (1992) Local Quantum Physics. Berlin–Heidelberg–New
Renormalization Theory and BRST; Quantum Field
York: Springer.
Theory: A Brief Introduction; Quantum Field Theory in
Jost R (1965) The General Theory of Quantized Fields. American
Curved Spacetime; Scattering, Asymptotic Completeness Mathematical Society.
and Bound States; Scattering in Relativistic Quantum Kuckert B (2002) Covariant thermodynamics of quantum
Field Theory: Fundamental Concepts and Tools; systems: passivity, semipassivity, and the Unruh effect. Annals
Scattering in Relativistic Quantum Field Theory: The of Physics 295: 216.
Analytic Program; Symmetries in Quantum Field Theory: Kuckert B (2005) Spin, statistics, and reflections, I. Annales Henri
Algebraic Aspects; Symmetries in Quantum Field Theory Poincaré 6: 849.
of Lower Spacetime Dimensions; Thermal Quantum Field Kuckert B and Lorenzen R (2005) Spin, Statistics, and Reflec-
Theory; Tomita–Takesaki Modular Theory; tions, II. Preprint (math-ph/0512068).
Osterwalder K and Schrader R (1973) Axioms for Euclidean
Two-Dimensional Models.
Green’s functions. Communications in Mathematical Physics
31: 83.
Osterwalder K and Schrader R (1975) Axioms for Euclidean
Green’s functions. 2. Communications in Mathematical
Physics 42: 281.
Further Reading
Reed M and Simon B (1970) Methods of Modern Mathematical
Araki H (1999) Mathematical Theory of Quantum Fields. Physics, (4 volumes). London: Academic Press.
Oxford: Oxford University Press. Scharf G (1989) Finite Quantum Electrodynamics. Berlin–
Bogoliubov NN, Logunov AA, and Todorov IT (1975) Introduc- Heidelberg–New York: Springer.
tion to Axiomatic Quantum Field Theory, (Russian original Scharf G (2001) Quantum Gauge Theories A True Ghost Story.
edition: Nauka (Moskow) 1969). New York: Benjamin. Weinheim: Wiley.
Bogoliubov NN, Logunov AA, Oksak AI, and Todorov IT (1990) Streater RF (1975) Outline of axiomatic quantum field theory.
General Principles of Quantum Field Theory, (Russian Reports on Progress in Physics 38: 771.
original edition Nauka (Moskow) 1987). Dordrecht–Boston– Streater RF and Wightman AS (1964) PCT, Spin & Statistics, and
London: Kluwer. All That. New York: Benjamin.
Bostelmann H (2000) Lokale Algebren und Operatorprodukte am Völkel AH (1977) Fields, Particles, and Currents. Lecture Notes
Punkt (in German). Ph.D. thesis, Göttingen. in Physics, vol. 66. Berlin–Heidelberg–New York: Springer.
Brunetti R and Fredenhagen K (2004) Microlocal Analysis and Wollenberg M (1988) The existence of quantum fields for local
Interacting Quantum Field Theories: Renormalization on nets of observables. Journal of Mathematical Physics 29: 2106.
Physical Backgrounds. Communications in Mathematical Zinoviev YM (1995) Equivalence of Euclidean and Wightman field
Physics 208: 623. theories. Communications in Mathematical Physics 174: 1.
B
Bäcklund Transformations
D Levi, Università ‘‘Roma Tre’’, Rome, Italy showed that four such solutions can be related in an
ª 2006 Elsevier Ltd. All rights reserved. algebraic way:
 0   0 
~ w
w a1 þ a2 w w ~
tan ¼ tan ½4
4 a1  a2 4
Introduction Equation [4] is derived using the permutability
theorem proved by Bianchi in his Ph.D. thesis in
Bäcklund transformations appeared for the first time
1879:
in the work of the geometers of the end of the
nineteenth century, for instance, Bianchi, Lie,
w′
Bäcklund, and Darboux, when studying surfaces
a1 a2
of constant curvature. If on a surface in three-
dimensional Euclidean space, the asymptotic direc-
~
tions are taken as coordinate directions, then the w w′ ½5
surface metric may be written as
a2 a1
ds2 ¼ dx2 þ 2 cosðwÞ dx dy þ dy2 ½1 ~
w
where w(x, y) is a function of the surface coordi-
nates x, y. A necessary and sufficient condition for whereby the diagram
the surface to be of constant curvature is that w
a
satisfies the nonlinear partial differential equation w w′

w;xy ¼ sinðwÞ ½2


we mean a BT from w to w0 with parameter a.
where the subscript denotes partial derivative. For sG equation [2] a trivial solution is given, for
Equation [2] is nowadays called the sine Gordon example, by w(x, y) = . Then, from eqn [3a] we get
(sG) equation. Bianchi (1879), Lie (1888, 1890,  
1  e2½axþðyÞ
1893), and Bäcklund (1874) introduced a transfor- wðx;
~ yÞ ¼ 2 arcsin
mation which allows one to pass from a solution of 1 þ e2½axþðyÞ
eqn [2] to a new solution, that is, from a surface of Introducing this result in eqn [3b], we get ,y = 1=a.
constant curvature to a new one. Starting from the So, the application of the BT [3] to sG equation gives
work of Clarin (1903), this transformation has been the nontrivial solution
referred to as Bäcklund transformation (BT). The
BT for eqn [2] reads
  w=π ~ = 4 arctan 1 − e–[ax–y/a]
w ½6
w~ þw 1 + e–[ax–y/a]
w~ ;x ¼ w;x þ 2a sin ½3a
2
Clarin (1903) extended the results of Bäcklund to
 
2 w~ w the case of a generic partial differential equation of
~ ;y ¼ w;y þ sin
w ½3b second order,
a 2
where a is a nonzero constant parameter and w̃ is a Fðx; y; w; w;x ; w;y ; w;xx ; w;xy ; w;yy Þ ¼ 0 ½7
different solution of eqn [2]. It is immediate to prove by assuming that
by appropriate differentiation of eqns [3] with
respect to y and x that both w and w̃ must satisfy w;x ¼ f ðw; w;
~ w~ ;x ; w
~ ;y Þ
½8
eqn [2]. The BT [3] provides a denumerable set of w;y ¼ gðw; w;
~ w~ ;x ; w
~ ;y Þ
exact solutions once a solution w is known. Bianchi
242 Bäcklund Transformations

If the compatibility of eqns [8] with s1 < m1 and s2 < m2 , represents the BT of
eqns [13] iff the compatibility of eqns [14] is
f;y  g;x ¼ 0 ½9
identically satisfied on the solutions of eqns [13]
is identically satisfied by eqn [7] for the variable and Gj depends on a set of essential arbitrary
w̃(x, y), then we say that eqns [8] are an constant parameters.
auto-Bäcklund transformation for eqn [7]. In this
The Clarin formulation [8] and the classical BT
case, eqns [8] transform a solution of eqn [7] into a
for the sG [3] are clearly special subcases of this
new solution of the same equation. Thus, eqns [8]
definition. When a solution of F1 = 0 is known, a
simplify the problem of finding solutions of eqn [7].
solution of F2 = 0 is obtained by solving a set of
Given one solution w(x, y) of eqn [7], the existence
lower-order partial differential equations. By a
of a BT reduces the problem of integrating eqn [7]
proper choice of the BT parameters, once a new
into that of solving two first-order ordinary differ-
solution is obtained by solving the BT [14], one can
ential equations. From this point of view, the
use the obtained solution as a starting point to
Cauchy–Riemann relations
construct another one, and so on. In this way, one
w;x ¼ w
~ ;y ; w;y ¼ w
~ ;x ½10 can construct a whole ladder of solutions, a priori a
denumerable set of solutions. This same construc-
for the Laplace equation tion has been applied also to the case of functional
w;xx þ w;yy ¼ 0 ½11 equations. In particular, it has been considered for
the case of differential–difference and difference–
are a BT ante litteram (however, without a free difference equations both for finite (dynamical
parameter). systems (Wojciechowski 1982)) and infinite lattices
Consider the case when w̃(x, y) satisfies a different (Toda 1989).
partial differential equation, In the case when F1 and F2 represent the same
Gðx; y; w;
~ w~ ;x ; w
~ ;y ; w
~ ;xx ; w
~ ;xy ; w
~ ;yy Þ ¼ 0 ½12 equation, s1 = s2 = 1 and the BTs Gj = 0 are linear in
u , then Definition 1 is strictly related to the notion
In this case, one still has a BT, but not an auto-BT. (1)
of nonclassical symmetry or conditional symmetry
The best-known cases are when F1 = w,y þ w,xxx þ (Levi and Winternitz 1989, Olver 1993), an exten-
ww,x and G1 = w̃,y þ w̃,xxx þ w̃2 w̃,x , and F2 = w,xy  sion of the concept of Lie symmetry used to reduce
ew and G2 = w̃,xy (Lamb 1976). In the first case, the and integrate a differential equation. In the case of
BT relates the Korteweg–de Vries (KdV) equation to the nonclassical symmetries, the known solution u ~ is
the modified KdV equation and this transformation included in the arbitrary x-dependent coefficients of
paved the way to the discovery of the complete the transformation. In this case, the BT is just a way
integrability of the KdV equation by Gardner et al. to construct an explicit solution of the differential
(1967). In the second case, the BT relates the equation [7].
Liouville equation to the wave equation, and can Definition 1 is often too general to be able to get
be used to solve it completely. Due to the first explicit results. It is constructive for any partial
example, often a non-auto-BT is denoted as Miura differential equation, linear or nonlinear, but if one
transformation. is not able to get a nontrivial BT this does not
One can now state an operative definition of BT, mean that a BT does not exist. As noted later, the
extending the results of Bäcklund and Clarin to existence of an auto-BT is associated to the
more general equations. existence of an infinity of symmetries, and this is
Definition 1 Consider two partial differential a condition for the exact integrability of eqn [13]
equations of order m1 and m2 : (Fokas 1980, Ibragimov and Shabat 1980). So, the
F1 ðx; u; u ; u ; . . . ; u Þ ¼ 0 ½13a existence of a BT is closely related to the integr-
ð1Þ ð2Þ ðm1 Þ ability of eqn [13].

F2 ðx; u
~; u
~; u
~;...; u
~ Þ¼0 ½13b
ð1Þ ð2Þ ðm2 Þ Bäcklund via Integrability
where x 2 R n and (u, u~ ) 2 Cp , and u is the set of One can derive the BT from the integrability
(k)
k-order derivative of u. The set of n equations properties of eqn [13a]. Equation [13a] is said to
be integrable if it can be written as the compatibility
Gj ðx; u; u ; . . . ; u ; u
~; u
~;...; u
~Þ¼0 condition of an overdetermined system of linear
ð1Þ ðs1 Þ ð1Þ ðs2 Þ
partial differential equations for an auxiliary func-
j ¼ 1; 2; . . . ; n ½14
tion depending on a free parameter belonging to the
Bäcklund Transformations 243

complex C plane. The prototype of such a situation In eqn [21] and henceforth, d=dt denotes the total
is given by the Lax pair for the KdV equation derivative with respect to t.
In the following, for the sake of the simplicity
u;t þ u;xxx  6uu;x ¼ 0 ½15 of exposition and for the concreteness of the
introduced by Lax (1968): presentation, all the results presented on the BT
will be derived for the KdV equation. Similar
L ¼ k2 ; L ¼ @x2 þ uðx; tÞ ½16a results can be obtained and have been obtained in
the literature for many classes of integrable
;t ¼ M ; M ¼ 4@xxx  3ðu@x þ @x uÞ ½16b partial differential equations in two and three
dimensions and for differential–difference and
where k is a free parameter and = (x, t; k). As eqn difference–difference equations. For a partial
[16a] is nothing else but the stationary Schrödinger review of the available recent literature on
equation, the function can be interpreted as a the subject, see Rogers and Shadwick (1982) and
wave function, and k2 is the spectral parameter Coley et al. (2001)
corresponding to the potential u(x, t). The condition A more general form of introducing the non-
for the existence of a solution of the over- linear partial differential equation as a compat-
determined system of eqns [16] is given by the ibility of an overdetermined system of linear
operator equation equations has been provided by Zaharov and
L;t ¼ ½L; M ½17 Shabat (1979) with the dressing method (DM). In
the DM, the differential equations [16] are
the so-called Lax equation. In the case of substituted by a matrix system of linear equations
asymptotically bounded potentials, eqn [16a]
defines the spectrum unique. Introducing the ;x ¼ Uðuðx; tÞ; kÞ ½22a
following asymptotic boundary conditions for the ;t ¼ Vðuðx; tÞ; kÞ ½22b
wave function ,
where  = (x, t; k) and U and V are matrix
ðx; t; kÞ ! Tðk; tÞeikx functions. The existence of a nonsingular solution
x!1
½18 of the system of linear equations [22] requires
ðx; t; kÞ ! eikx þ Rðk; tÞeikx
x!þ1 that the matrix functions U and V satisfy the
equation
where R(k, t) and T(k, t) are, respectively, the
reflection and the transmission coefficient, the U;t  V;x þ ½U; V ¼ 0 ½23
spectrum is defined in the complex plane of
the variable k by often called zero-curvature condition. The KdV
equation [15] in the DM is obtained by choosing
S½u  fRðk; tÞ; 1 < k < 1; pn ; cn ðtÞ;  
j ¼ 1; 2; . . . ; Ng ½19 ik uðx; tÞ
Uðuðx; tÞ; kÞ ¼
1 ik
where pn are the bound state parameters corre-
Vðuðx; tÞ; kÞ
sponding to isolated singularities of the reflection !
coefficients on the imaginary positive k-axis corre- 2u þ 4k2 ux  2iku  4ik3
sponding to a solution n (x, t; pn ) of the spectral ¼
ux þ 2iku þ 4ik3 2uðu þ 2k2 Þ  2ikux  ux x
problem vanishing for x ! 1 and such that
½24
lim ½epn x n ðx; t; pn Þ ¼ 1 ½20
x!þ1
The existence of an auto-BT implies the existence
and cn are some functions of t related to the residues of a differential equation (see Definition 1) which
of R(k, t) at the poles pn . There is a one-to-one relates two solutions of the same nonlinear equa-
correspondence between the evolution of the poten- tion. The new solution ũ(x, t) of eqn [15] will be
tial u(x, t) in eqn [15] and that of the spectrum S[u] associated to a different Lax operator and a
of the Schrödinger spectral problem [16a]. In parti- different spectral problem (but of the same opera-
cular, for the KdV, taking into account eqn [16b], tional form)
the evolution of the reflection coefficient R(k, t) is
given by ~ ¼ @xx þ u
L ~ðx; tÞ ½25a

dRðk; tÞ
¼ 8ik3 Rðk; tÞ ½21 ~ ~ ¼ k2 ~
L ½25b
dt
244 Bäcklund Transformations

The existence of a relation between the potentials of the spectral problem, eqn [29a] provides a new
u(x, t) and ũ(x, t) thus implies that there must be a solution of the KdV, while eqn [29b] gives a new
(u, ũ; k)-dependent operator D such that solution of the spectral problem. This procedure can
be carried out recursively and gives a ladder of
~¼D ½26 explicit solutions for the KdV equation.
The DM is a particularly simple setting in which
The compatibility of eqns [16a], [25b], and [26]
one can derive DTs. In fact, expressing the matrix
implies that L̃D = Dk2 , that is,
D in terms of , eqn [28a] gives a relation between
~ ¼ DL
LD ½27 the potentials of the type given by eqn [29a], while
eqn [26] gives eqn [29b]. Depending on the form of
Equation [27] is the auto-BT in the Lax formalism. the matrix D in terms of k, one can introduce more
If L̃ and L are two different spectral problems parameters in the DT. The classical DT [29]
related to two different nonlinear partial differential depends on just one parameter; however, in the
equations, then eqn [27] will provide a Miura case of the Schrödinger spectral problem [16a], one
transformation. In the DM, the requirement of the can also have DTs depending on two parameters, a
existence of a BT is given again by eqn [26] with TDT.
and ˜ substituted by  and  ˜ and the operator D A more general DT, which can provide solutions
substituted by a matrix function D. The BT in the even when the initial solution is not bounded
DM is given by asymptotically, can be obtained for many equations
D;x ¼ Uð~uðx; tÞ; kÞD  DUðuðx; tÞ; kÞ ½28a and, in particular, also for the KdV equation. This is
obtained in a particular limit of the TDT when the
D;t ¼ Vð~
uðx; tÞ; kÞD  DVðuðx; tÞ; kÞ ½28b parameters coincide (Levi 1988) and it is often
referred to as binary DT (Matveev and Salle 1991).
In the particular case of the Hilbert–Riemann The binary DT for the KdV is given by
problem with zeros, providing the soliton solutions,
u
~ðx; tÞ ¼ uðx; tÞ  2ðlog Fðx; tÞÞ;xx ½30a
the matrix D can be expressed as a function of . In
this way, one derives the Moutard or Darboux
 
transformation (DT) (Moutard 1878, Levi et al. 1 Fðx;tÞ;xx
~ðx;t;kÞ ¼ k2
  2
 ;x ðx;t; kÞ
1984), the most efficient way to get soliton solutions k2  2 2Fðx;tÞ
of the nonlinear partial differential equation. 
Fx ðx;tÞ
Given a linear ordinary differential equation for  ðx;t; kÞ ½30b
Fðx;tÞ
the unknown , depending on a set of arbitrary
functions u(x) and parameters k, the DT provides a where  is a value of k for which the function
discrete transformation which leaves the equation (x, t; k) is asymptotically bounded at þ 1 and the
invariant. In the particular case of the KdV equation function F(x, t) is given by
associated with the stationary Schrödinger spectral Z þ1
problem [16a], we have Fðx; tÞ ¼ 1 þ  ðy; t; Þ2 dy ½31
u
~ðx; tÞ ¼ uðx; tÞ  2ðlog Fðx; tÞÞ;xx ½29a x

with  an arbitrary constant. The corresponding BT


obtained eliminating the function F from eqns [30]
~ðx; t; kÞ ¼  i
;x ðx; t; kÞ reads
k þ ip
Fx ðx; tÞ 1
 ðx; t; kÞ ½29b q
~;xx  q;xx ¼  ðq ~  qÞ3
Fðx; tÞ 8
 ½q
~x þ qx  2gðxÞ þ 2ðq
~  qÞ
where the intermediate wave function
q x  qx Þ 2
1 ð~
þ ½32
Fðx; tÞ ¼ ðx; t; k ¼ ipÞ þ a ðx; t; k ¼ ipÞ 2 q ~q
R1
is a linear combination of the Jost solution of the where q = x u0 (y, t) dy with u0 (x, t) = u(x, t) 
Schrödinger spectral problem with p a real para- g(x), the asymptotically bounded part of u(x, t),
meter and a an arbitrary constant. If one looks for and R g(x) its asymptotic behavior, and
1
an equation involving only the potentials u and ũ, q̃ = x ũ0 (y, t) dy with ũ0 (x, t) = ũ(x, t)  g(x).
from eqns [29], one gets the BT for the KdV Once the Lax operator L is given, we can obtain
equation. Given a trivial solution of the KdV in a constructive way the operators M which
equation, together with the corresponding solution give the admissible nonlinear partial differential
Bäcklund Transformations 245

equations and the operators D which give the Bäcklund and Symmetries
admissible BT. A technique to do so is provided by
A symmetry of the nonlinear equation [15] is given
the so-called Lax technique introduced by Bruschi
by a flow commuting with it, that is, by an
and Ragnisco (1980a–c). Using the Lax technique,
equation
we can easily obtain the nonlinear partial differ-
ential equations and BT associated with the Lax u; ¼ f ðu; ux ; ut ; . . .Þ ½37
operator [16a] both in the isospectral and non-
isospectral case (when k,t = 0 and when k,t 6¼ 0) where  is the group parameter, u = u(x, t; ), and the
and the corresponding evolution of the spectrum.  derivative of [15] is zero on its set of solutions.
We have A group transformation is obtained by integrating it.
Usually this is possible only when eqn [37] is a
u;t ¼ f ðL; tÞux þ gðL; tÞ½xux þ 2u ½33a quasilinear partial differential equation of the first
order. Taking into account the evolution of the
k;t ¼ kgð4k2 ; tÞ spectrum of the KdV equation [15], it is easy to
dRðk; tÞ ½33b prove that its symmetries are given by
¼ 2ikf ð4k2 ; tÞRðk; tÞ ( )
dt X
þ1 X
þ1
n n
u; ¼ n L  3 n tL u;x
FðÞð~
u  uÞ þ GðÞ 1 ¼ 0 ½33c n¼0 n¼0
( )
X
þ1
2 2 þ n Ln ½xu;x þ 2u ½38
~ tÞ ¼ Fð4k Þ  2ikGð4k Þ Rðk; tÞ
Rðk; ½33d n¼0
Fð4k2 Þ þ 2ikGð4k2 Þ
where n and n are a set of constant parameters.
where the functions f, g, F, and G are entire For each choice of the parameters n and n ,
functions of their first argument and the recursive one gets a symmetry of the KdV equation [15].
operators L and  are given by With eqn [38] one can associate the following
evolution of the reflection coefficient R(k, t; ):
Lf ðxÞ ¼ f;xx ðxÞ  4uðx; tÞf ðxÞ (
Z þ1 X
þ1
dR
þ 2u;x ðx; tÞ f ðyÞ dy ½34a ¼ 2ik n ð4k2 Þn
x d n¼0
)
f ðxÞ ¼ f;xx ðxÞ  2½~uðx; tÞ þ uðx; tÞf ðxÞ Xþ 1
2 nþ1
Z þ1 3 n tð4k Þ R ½39
n¼0
þ f ðyÞ dy ½34b
x
and of the spectral parameter k
u;x ðx; tÞ þ u;x ðx; tÞf ðxÞ þ ½~
f ðxÞ ¼ ½~ uðx; tÞ  uðx; tÞ X
þ1
Z þ1 k; ¼ n ð4k2 Þn k ½40
 uðy; tÞ  uðy; tÞf ðyÞ dy
½~ ½34c n¼0
x
As (1/2)L 1 = xu,x þ 2u, one can add to the
In the limit when ũ ! u the operator  ! L. A BT symmetries [38] the exceptional one (which has no
is obtained by choosing the functions F and G in spectral counterpart as u is not bounded
eqn [33c]. The simplest BT is obtained by setting asymptotically):
F =  and G = 1:
  u; ¼ 1 þ 6tu;x ½41
~ v  vÞ   12ð~
v;x þ v;x þ ð~ v  vÞ ¼ 0 ½35
By a proper natural choice of the constant para-
with u(x, t) = v,x (x, t) and  is the Bäcklund meters n and n , one can define two infinite series
parameter. By combining together BT of the form of symmetries. The first one is obtained by choosing
[35] with different parameters as in eqn [5], we get n = 0 and n =
n, m with m = 1, 2, . . . , 1 and can
the permutability theorem for the KdV BTs: be denoted as the isospectral series as k, = 0. This is
formed by commuting symmetries. The second one
ð1 þ 2 Þ½v0  v
~ is given by n = 0 and n =
n, m with m = 1, 2, . . . , 1
v0 ¼ v 
~ ½36 and can be denoted as the nonisospectral series as
1  2 þ ð1/2Þðv0  ~vÞ
k, 6¼ 0. The nonisospectral symmetries have a
Its proof is immediate from the point of view of the nonzero commutation relation among themselves
spectrum. and with the isospectral ones.
246 Bäcklund Transformations

Except for a few Lie point symmetries (given by which is an integrable differential–difference
eqn [41] and by choosing inside the series [38] those approximation to the KdV equation or
with different from zero only 0 or 0 or 1 ) they
are all generalized symmetries (Olver 1993). By wðn þ 1; tÞ;t ¼ wðn; tÞ;t


analyzing their spectrum, it is easy to prove that the wðn þ 1; tÞ þ wðn; tÞ
þ 2a sin ½46
choice [38] is such that they are all independent. For 2
the isospectral class, the evolution of the spectrum is a discrete integrable differential–difference approxima-
simple and can be integrated to provide the group tion to the sG equation (Hirota 1977, Orfanidis 1978).
transformation of the spectrum As the nonlinear superposition formulas are
Rðk; t; Þ ¼ Rðk; tÞ purely algebraic relations involving potentials asso-
" ( )# ciated with integrable nonlinear partial differential
X
þ1
equations, one can interpret them as difference–
2 n
 exp 2ik n ð4k Þ  ½42
n¼0
difference equations. In the case of the sG equation
from eqn [7], we have
Let us now consider the simplest BT obtained by
wnþ1;mþ1  wn;m
choosing, in eqn [33c], F() =  and G() = 1, where  
 is an arbitrary parameter. In the spectral space, this a1 þ a2 wn;mþ1  wnþ1;m
¼ 4 arctan1 tan ½47
corresponds to the following change of the spectrum: a1  a2 4
where w(x, t) = wn, m , w̃(x, t) = wnþ1, m , w0 (x, t) =
~ tÞ ¼   2ik Rðk; tÞ
Rðk; ½43
 þ 2ik wn, mþ1 , and w̃0 (x, t) = wnþ1, mþ1 . In a similar manner,
from [36], one gets
Defining R̃(k, t) = R(k, t; ), eqn [42] is equal to
eqn [43] iff ð1 þ 2 Þ½vnþ1;m  vn;mþ1 
vnþ1;mþ1 ¼ vn;m  ½48
1  2 þ 12 ½vnþ1;m  vn;mþ1 
2
n ¼  ; n ¼ 0; 1; . . . ; 1 ½44
2nþ1 ð2n þ 1Þ The continuous limit of eqn [47], obtained by setting
x = 1 n and y = 2 m and choosing
So we need an infinite number of symmetries to
be able to reconstruct the change of the spectrum a1 1 2
¼
given by the BT. This shows that the existence of a BT a2 4
is strictly connected to the existence of an infinity of gives back eqn [2] (Rogers and Schief 1997). It is
symmetries which is a condition for the exact worth mentioning that one can also use known
integrability of the nonlinear partial differential nonlinear lattice equations to construct BT for
equation (Fokas 1980, Ibragimov and Shabat 1980). nonlinear partial differential equations (Levi 1981).

See also: Integrable Systems and Discrete Geometry;


Discretization via Bäcklund Integrable Systems: Overview; Painlevé Equations;
BTs, apart from providing classes of exact solutions Solitons and Kac–Moody Lie Algebras; Toda Lattices.
to nonlinear equations, play a very important role in
the discretization of partial differential equations. As
noted earlier, an auto-BT is a differential relation Further Reading
between two different solutions of the same non-
linear partial differential equation. If it is assumed Bäcklund AV (1874) Einiges über Curven und Flächentransfor-
that the new solution ũ is just the old solution u mationen. Lund Universitëts Arsskrift 10: 1–12.
Bianchi L (1879) Ricerche sulle superficie a curvatura costante e sulle
computed in a different point of a lattice, then the elicoidi. Annali della R. Scuola normale superiore di Pisa 2: 285.
BT becomes just a differential–difference equation Bruschi M and Ragnisco O (1980a) Existence of a Lax pair for
(Chiu and Ladik 1977, Levi and Benguria 1980). any member of the class of nonlinear evolution equations
This can be carried out also at the level of the associated to the matrix Schrödinger spectral problem. Lettere
associated compatibility condition and in such a al Nuovo Cimento 29: 321–326.
Bruschi M and Ragnisco O (1980b) Extension of the Lax method
way one is able to also obtain its Lax pair. This to solve a class of nonlinear evolution equations with
demonstrates the integrability of the differential– x-dependent coeffcients associated to the matrix Schrödinger
difference equation spectral problem. Lettere al Nuovo Cimento 29: 327–330.
Bruschi M and Ragnisco O (1980c) Bäcklund transformations
vðn þ 1; tÞ;t þ vðn; tÞ;t þ ½vðn þ 1; tÞ  vðn; tÞ and Lax technique. Lettere al Nuovo Cimento 29: 331–334.
 Chiu S-C and Ladik JF (1977) Generating exactly soluble
   12½vðn þ 1; tÞ  vðn; tÞ ¼ 0 ½45 nonlinear discrete evolution equations by a generalized
Batalin–Vilkovisky Quantization 247

Wronskian technique. Journal of Mathematical Physics Levi D and Benguria R (1980) Bäcklund transformations and
18: 690–700. nonlinear differential difference equations. Proceedings of the
Clarin J (1903) Sur quelques équations aux dérivées partielles du National Academy of Science USA 77: 5025–5027.
second ordre. Annales de la Facult des Sciences de Toulouse pour Levi D and Winternitz P (1989) Non-classical symmetry reduction:
les Sciences Mathmatiques et les Sciences Physiques. Serie 2 example of the Boussinesq equation. Journal of Physics A:
5: 437–458. Mathematical and General 22: 2915–2924.
Coley A, Levi D, Milson R, Rogers C, and Winternitz P (eds.) (2001) Levi D, Ragnisco O, and Sym A (1984) Dressing method vs. classical
Bäcklund and Darboux transformations. The Geometry of Darboux transformation. Il Nuovo Cimento 83B: 34–42.
solitons. Proceedings of the AARMS-CRM Workshop, Halifax, Lie S (1888, 1890, 1893) Theorie der Transformationgruppen.
NS, June 4–9, 1999. CRM Proceedings and Lecture Notes, vol. Leipzig: B.G. Teubner.
29. Providence, RI: American Mathematical Society. Matveev VB and Salle LA (1991) Darboux Transformations and
Faddeev LD and Takhtajan LA (1987) Hamiltonian Methods in Solitons. Berlin: Springer.
the Theory of Solitons. Berlin: Springer. Moutard Th-F (1878) Sur la construction des équations de la forme
Fokas AS (1980) A symmetry approach to exactly solvable evolution (1=z)(d2 z=dxdy) = (x,y), qui admettent une integrale général
equations. Journal of Mathematical Physics 21: 1318–1325. explicite. Journal de l’Ecole Polytechnique, Paris 28: 1–11.
Gardner CS, Greene JM, Kruskal MD, and Miura RM (1967) Olver PJ (1993) Applications of Lie Groups to Differential
Method for solving the Korteweg–de Vries equation. Physical Equations. New York: Springer.
Review Letters 19: 1095–1097. Orfanidis SJ (1978) Discrete sine-Gordon equations. Physical
Hirota R (1978) Nonlinear partial difference equations. III. Review D 18: 3822–3827.
Discrete sine-Gordon equation. Journal of the Physical Society Rogers C and Schief WK (1997) The classical Bäcklund
of Japan 43: 2079–2086. transformation and integrable discretization of characteristic
Ibragimov NH and Shabat AB (1980) Infinite Lie–Bcklund algebras equations. Physics Letters A 232: 217–223.
(in Russian). Funktsional. Anal. i Prilozhen 14: 79–80. Rogers C and Shadwick WF (1982) Bäcklund Transformations
Lamb GL (1976) Bäcklund transformations at the turn of and Their Applications. New York: Academic Press.
the century. In: Miura RM (ed.) Bäcklund Transformations, Toda M (1989) Theory of Nonlinear Lattices. Berlin: Springer.
pp. 69–79. Berlin: Springer. Wojciechowski S (1982) The analogue of the Bäcklund transforma-
Lax PD (1968) Integrals of nonlinear equations of evolution and tion for integrable many-body systems. Journal of Physics A:
solitary waves. Communications in Pure and Applied Mathe- Mathematical and General 15: L653–L657.
matics 21: 647–690. Zaharov VE and Shabat AB (1979) Integration of the nonlinear
Levi D (1981) Nonlinear differential difference equations as equations of mathematical physics by the method of the
Backlund transformations. Journal of Physics A: Mathematical inverse scattering problem. II (Russian). Funktsional Analiz
and General 14: 1083–1098. i ego Prilozheniya 13: 13–22. (English translation: Functional
Levi D (1988) On a new Darboux transformation for the Analysis and Applications 13: 166–173 (1980)).
construction of exact solutions of the Schroedinger equation.
Inverse Problems 4: 165–172.

Batalin–Vilkovisky Quantization
A C Hirshfeld, Universität Dortmund, examples of the Batalin–Vilkovisky formalism are
Dortmund, Germany given. At the present time, it is the most general
ª 2006 Elsevier Ltd. All rights reserved. treatment available. Alexandrov, Kontsevich, Schwarz,
and Zabaronsky (AKSZ 1997) have presented a
geometric interpretation for the case in which the
action is topologically invariant.
Introduction
The Batalin–Vilkovisky formalism for quantizing
Structure of the Set of Gauge
gauge theories has a long history of development. It
Transformations
begins with the Faddeev–Popov procedure for
quantizing Yang–Mills theory, involving the Faddeev– Consider a system whose dynamics is governed by
Popov ghost fields (Faddeev and Popov 1967). It a classical action S[i ] which depends on the
continued with the discovery of BRST symmetry by fields i (x), i = 1, . . . , n. We employ a compact
Becchi et al. (1976). Then Zinn-Justin (1975) notation in which the multi-index i may denote
introduced sources for these transformations, and the various fields involved, the discrete indices on
a symmetric structure in the space of fields and which they depend, and the dependence on the
sources in his study of renormalizability of these spacetime variables as well. The generalized
theories. Finally, Batalin and Vilkovisky (1981) summation convention then means that a
systematized and generalized these developments. repeated index may denote not only a sum over
A more detailed account of this history can be discrete variables, but also integration over
found in Gomis et al. (1994), where many worked the spacetime variables. i = (i ) denotes the
248 Batalin–Vilkovisky Quantization

Grassmann parity of the fields. Fields with i = 0 Equations [8] and [10] lead to the following
are called bosonic, with i = 1 fermionic. The condition:
graded commutation rule is  
 ji
½1 ; 2 i ¼ Ri T  S0;j E "1 "2 ½11
i ðxÞj ðyÞ ¼ ð1Þi j j ðyÞi ðxÞ ½1

For a gauge theory the action is invariant under a set The tensors T are called the structure constants of the
of gauge transformations with infinitesimal form gauge algebra, although they depend, in general, on
ij
the fields of the theory. When E = 0, the gauge
i ¼ Ri " ;  ¼ 1 or 2 or . . . m ½2 algebra is said to be ‘‘closed,’’ otherwise it is ‘‘open.’’
Equation [11] defines a Lie algebra if the algebra is
The " are the infinitesimal gauge parameters and 
closed and the T are independent of the fields.
Ri the generators of the gauge transformations. The gauge tensors have the following graded
When  = (" ) = 0 we have an ordinary symmetry, symmetry properties:
when  = 1 the equation is characteristic of a
supersymmetry. The Grassmann parity of Ri is 
T 
¼ ð1Þ  T
(Ri ) = i þ  (mod 2). ij ji ij
½12
A subscript after a comma denotes the right E ¼ ð1Þ  E ¼ ð1Þ  E
derivative with respect to the corresponding field,
The Grassmann parities are
that is, the field is to be commutated to the far right
and then dropped. The field equations may then be 
ðT Þ ¼  þ  þ  ðmod 2Þ ½13
written as
and
S0;i ¼ 0 ½3
ij
where S0 is the classical action. Let  denote the ðE Þ ¼ i þ j þ  þ  ðmod 2Þ ½14
surface in the space of solutions where the field
Various restrictions are imposed by the Jacobi
equations are satisfied:
identity
S0;i j ¼ 0 ½4 X
½1 ; ½2 ; 3  ¼ 0 ½15
If the gauge transformations are ‘‘independent’’ cyclicð123Þ
on-shell, that is,
These restrictions are
rank Ri j ¼ m ½5 X  
ji
the gauge theory is said to be ‘‘irreducible.’’ We Ri A  S0;j B " " " ¼ 0 ½16
cyclicð123Þ
assume here that this is the case. When it is not, the
theory is ‘‘reducible.’’ For details of the treatment in where
that case, see Gomis, Paris, and Samuel. The  
classical solutions are 0 2 . 3A  Tk

Rk  T 
T
þ ð1Þ ð þ Þ
The Noether identities are  

 Tk Rk  T 
T
S0;i Ri ¼ 0 ½6  
The general solution to the Noether identity is þ ð1Þ ð þ Þ Tk
Rk  T

T

i ¼ Ri T  þ S0;j Eji ½7 and



The commutator of two gauge transformations is 3Bji  Ejik Rk  Eji T

 ð1Þi 
  
½1 ; 2 i ¼ Ri;j Rj  ð1Þ  Ri;j Rj "1 "2 ½8 j
R;k Eki
 þ ð1Þ
j ði þ Þ i
R;k E
kj

Since this commutator is a symmetry of the action, it þ ð1Þ ð þ Þ ð !  ! Þ þ ð1Þ ð þ Þ
satisfies the Noether identity  ð !  ! Þ
 
S0;i Ri;j Rj  ð1Þ  Ri;j Rj ¼ 0 ½9 As in the familiar Faddeev–Popov procedure, it is
useful to introduce ghost fields C with opposite
which by eqn [7] implies that Grassmann parities to the gauge parameters " :
Ri;j Rj  ð1Þ  Ri;j Rj ¼ Ri T

þ S0;j Eji ½10 ðC Þ ¼  þ 1 ðmod 2Þ ½17
Batalin–Vilkovisky Quantization 249

and to replace the gauge parameters by ghost fields. For bosonic fields
One must then modify the graded symmetry proper-
ties of the gauge structure tensors according to @B @B
ðB; BÞ ¼ 2 ½29
@A @A
2 þ4 þ
T1 2 3 4 ... ! ð1Þ T1 2 3 4 ... ½18
for fermionic fields
The Noether identities then take the form
ðF; FÞ ¼ 0 ½30
S0;i Ri C ¼ 0 ½19
and the structure relations [10] become and for any X

j  ji ððX; XÞ; XÞ ¼ 0 ½31


ð2Ri;j R  Ri T þ S0;j E ÞC C ¼ 0 ½20
If one groups the fields and the antifields together
into the set
Introducing the Antifields za ¼ fA ; A g; a ¼ 1; . . . ; 2N ½32
We incorporate the ghost fields into the field set
then the antibracket is seen to define a symplectic
A = {i , C }, where i = 1, . . . , n and  = 1, . . . , m.
structure on the space of fields and antifields
Clearly A = 1, . . . , N, where N = n þ m. One then
further increases the set by introducing an antifield @r X ab @l Y
A for each field A . The Grassmann parity of the ðX; YÞ ¼ ! ½33
@za @zb
antifields is
  with
 A ¼ ðA Þ þ 1 ðmod 2Þ ½21 
ab 0 BA
! ¼ ½34
Each field is assigned a ghost number, with BA 0
gh½i  ¼ 0 The antifields can be thought of as conjugate
gh½C  ¼ 1 ½22 variables to the fields, since
   A 
gh A ¼ gh½A   1  ; B ¼ BA ½35
In the space of fields and antifields, the antibracket
is defined by
@r X @l Y @r X @l Y The Classical Master Equation
ðX; YÞ ¼  ½23
@A @A @A @A Let S[A , A ] be a functional of the fields and
antifields with the dimension of an action, vanishing
where @r denotes the right, @l the left derivative. The ghost number and even Grassmann parity. The
antibracket is graded antisymmetric: equation
ðX; YÞ ¼ ð1ÞðX þ1ÞðY þ1Þ ðY; XÞ ½24 @S @S
ðS; SÞ ¼ 2 ¼0 ½36
@A @A
It satisfies a graded Jacobi identity
is the classical master equation. Solutions of the
ððX; YÞ; ZÞ þ ð1ÞðX þ1ÞðY þ1Þ classical master equation with suitable boundary
ððY; ZÞ; XÞ þ ð1ÞðZ þ1ÞðX þY Þ ððZ; XÞ; YÞ ¼ 0 ½25 conditions turn out to be generating functionals for
the gauge structure of the theory. S is also the
It is a graded derivation starting point for the quantization. One denotes by
 the subspace of stationary points of the action in
ðX; YZÞ ¼ ðX; YÞZ þ ð1ÞX Y ðX; ZÞY
½26 the space of fields and antifields:
ðXY; ZÞ ¼ XðY; ZÞ þ ð1ÞX Y YðX; ZÞ

a
@S
It has ghost number ¼ z
a¼0 ½37
@z
gh½ðX; YÞ ¼ gh½X þ gh½Y þ 1 ½27
Given a classical solution 0 of S0 one stationary
and Grassmann parity point is
ððX; YÞÞ ¼ ðXÞ þ ðYÞ þ 1 ðmod 2Þ ½28 i ¼ i0 ; Ca ¼ 0; A ¼ 0 ½38
250 Batalin–Vilkovisky Quantization

An action which satisfies the classical master We define a surface in functional space
equation has its own set of invariances:
 A   @
@S a  ¼  ; A jA ¼ ½46
R ¼0 ½39 @A
@za b
so that for any functional X[,  ]
with 
@
@l @r S Xj ¼ X ; ½47
Rab ¼ !ac ½40 @
@zc @zb
This equation implies To construct a gauge-fixing fermion  of ghost

number 1, one must again introduce additional
Rac Rab
 ¼ 0 ½41 fields. The simplest choice utilizes a trivial pair
  ,
 with
C
One says that Rab is invariant on-shell. A nilpotent
2N  2N matrix has rank  N. Let r be the rank of   Þ ¼  þ 1;
ðC
 Þ ¼  
ð
the hessian of S at the stationary point: ½48

gh½C  ¼ 1; gh½

  ¼ 0

@l @r S

r ¼ rank a b
½42 The fields C   are the Faddeev–Popov antighosts.
@z @z 
Along with these fields we include the corresponding
We then have r  N. The relevant solutions of the   ,
 . Adding the term
 C
antifields C   to the
classical master equation are those for which r = N. action S does not spoil its properties as a proper
In this case the number of independent gauge solution to the classical master equation, and one
invariances of the type in eqn [39] equals the number gets the nonminimal action
of antifields. When at a later stage the gauge is fixed,
 
Snon ¼ S þ
 C ½49
the nonphysical antifields are eliminated.
To ensure the correct classical limit, the proper The simplest possibility for  is
solution must contain the classical action S0 in the
sense that    ðÞ
¼C ½50
 

S A ; A
 ¼0 ¼ S0 ½i  ½43 where  are the gauge-fixing conditions for the
A
fields . The gauge-fixed action is denoted by
The action S[A , A ] can be expanded in a series in
the antifields, while maintaining vanishing ghost S ¼ Snon j ½51
number and even Grassmann parity:
Quantization is performed using the path integral
S½;   ¼ S0 þ i Ri C þ Ca 12 T

ð1Þ C C to calculate a correlation function X, with the
þ i j ð1Þi 14 Eji ð1Þ C C þ    ½44 constraint [45] implemented by a -function:
Z 
@
When this is inserted into the classical master I ðXÞ ¼ DD  A 
equation, one finds that this equation implies the @A

gauge structure of the classical theory. i
exp W½;   X½;   ½52
h
Here W is the quantum action, which reduces to S in
Gauge Fixing and Quantization the limit h ! 0. An admissible  leads to well-
defined propagators when the path integral is
Equation [39] shows that the action S still possesses
expressed as a perturbation series expansion.
gauge invariances, and hence is not yet suitable for
The results of a calculation should be independent
quantization via the path integral approach: a
of the gauge fixing. Consider the integrand in eqn
gauge-fixing procedure is necessary. In the Batalin–
[52],
Vilkovisky approach the gauge is fixed, and the

antifields eliminated, by use of a gauge-fixing i
fermion  which has Grassmann parity () = 1 I½;   ¼ exp W½;   X½;   ½53
h
and gh[] = 1. It is a functional of the fields A
only; its relation to the antifields is Under an infinitesimal change in 
Z
@ Iþ ðXÞ  I ðXÞ D  I ½54
A ¼ ½45
@A
Batalin–Vilkovisky Quantization 251

where the Laplacian  is versa. The geometric object corresponding to a


classical mechanical system in the Batalin–Vilkovisky
@ @
 ¼ ð1ÞA þ1 ½55 formalism is a QP-manifold.
@A @A The nondegenerate closed 2-form ! is written as
Obviously, the integral I (X) is independent of  if
I = 0. For X = 1 one gets the requirement ! ¼ dza ab dzb ½62
 
i i where za are local coordinates in the supermanifold
 exp W ¼ exp W M. For functions on M, an (odd) Poisson bracket is
h
 h

 defined as in eqn [33], where !ab stands for the
i 1
 W  2 ðW; WÞ ¼ 0 ½56 inverse matrix of !ab . An even function S on M
h
 2
h satisfies the classical master equation if (S, S) = 0.
The formula The correspondence between vector fields and
1
functions on M is given by KF G = (G, F), where KF
2 ðW; WÞ ¼ i
hW ½57 is the vector field, F the given function, and G an
is the quantum master equation. A gauge-invariant arbitrary function. The function F is called the
correlation function satisfies Hamiltonian of the vector field KF .
Geometrically, equivalent QP-manifolds describe
ðX; WÞ ¼ ihX ½58 the same physics. In particular, one can consider
The terms of higher order in  h by which the an even Hamiltonian vector field KF corresponding
quantum action W may differ from the solution of to an odd function F. This vector field determines
the classical master equation S correspond to the an infinitesimal transformation preserving P-structure.
counter-terms of the renormalizable gauge theory if It transforms a solution S to the classical master
equation into the physically equivalent solution
S ¼ 0 ½59 S þ (S, F), where  is an infinitesimally small
One must, of course, use a regularization scheme parameter.
which respects the symmetries of the theory. For A submanifold L of a P-manifold M is called a
W = S þ O(h) the quantum master equation [57] Lagrangian submanifold if the restriction of the
reduces in this case to the classical master equation form ! to L vanishes. In the particular case when
M = T  N (the cotangent bundle to N with reversed
ðS; SÞ ¼ 0 ½60 parity of fibres) with standard P-structure, one can
Hence, up to possible counter-terms, one may construct many examples of Lagrangian submani-
simply choose W = S. folds in the following way. Fix an odd function  on
To implement the gauge fixing, one uses for the N, the gauge fermion. The submanifold L 2 M
action W = Snon . For the path integral Z = I (X = 1), determined by the equation
the integration over the antifields in eqn [52] is
@
performed by using the -function. The result is a ¼ ½63
Z  @xa
i
Z ¼ D exp S ½61 where {xa , a } are coordinates corresponding to the
h

identification of M, will be a Lagrangian submani-
fold of M.
The P-manifold M in the neighborhood of L can
be identified with T  L. In other words, one can
Geometrical Interpretation of Topological
find such a neighborhood U of L in M and a
Field Theories neighborhood V of L in T  L that there exists an
The Batalin–Vilkovisky formalism for topological isomorphism of P-manifolds U and V leaving L
field theories has been given a geometrical inter- intact. Using this isomorphism a function  defined
pretation by AKSZ (1997). on a Lagrangian submanifold L
M determines
A supermanifold equipped with an odd vector another Lagrangian submanifold L
M.
field satisfying Q2 = 0 is called a Q-manifold. A Consider a solution S to the classical master
Q-manifold provided with an odd symplectic struc- equation on M. In the Batalin–Vilkovisky formalism
ture ! (P-structure) is called a QP-manifold if the we have to restrict S to a Lagrangian submanifold
odd symplectic structure is Q-invariant, that is, L 2 M, then the quantization of S can be performed
LQ ! = 0. Every solution to the classical master by integration of exp (iS=h) over L. One may
equation determines a QP-structure on M and vice construct an odd vector field Q on L in such a
252 Batalin–Vilkovisky Quantization

way that the functional S restricted to L is Considering the commutator of two gauge transfor-
Q-invariant. This invariance is BRST invariance. mations leads to (see eqns [8]–[11])
AKSZ apply these geometric constructions to obtain Z
 
in a natural way the action functionals of two-  2Pmi ;j Pnj  Pji Pmn ;j Cm Cn ¼ 0
dimensional sigma-models (Witten 1998) and to ZM 
show that the Chern–Simons theory (Axelrod and  2ðPjk i Dlj þ Pmk ;ij Am Pjl Þ ½70
Singer 1991) in Batalin–Vilkovisky formalism arises as M

a sigma-model with target space G, where G stands Dm kl  j kl
i P ;m þ ð D X Þ P ;ji Cl Ck ¼ 0
for a Lie algebra and  denotes parity inversion.
The Jacobi identity is
Pij ;m Pmk Ci Cj Ck ¼ 0 ½71
The Poisson-Sigma Model
The quantization of the Poisson-sigma model was The fields and antifields of the model are
performed by Hirshfeld and Schwarzweller (2000)  
A ¼ fAi ; Xi ; Ci g and A ¼ Ai ; Xi ; Ci ½72
and by Cattaneo and Felder (2001). The Poisson-
sigma model is the simplest topological field theory The extended action is
in two dimensions. It is a field theory on a two- Z 
dimensional world sheet without boundary (Schaller S¼   ðAi @ Xi þ Pij ðXÞAi Aj Þ
M
and Strobl 1994). It involves a set of bosonic scalar
j 1
fields, which can be seen as a set of maps þ Ai Di Cj þ Xi Pji ðXÞCj þ Ci Pjk ;i ðXÞCj Ck
Xi : M ! N, where N is a Poisson manifold. In 2
addition, one has a 1-form A on the world sheet M 1 i j kl
þ A A  P ;ij ðXÞCk Cl ½73
which takes values in T  (N), for x coordinates on M 4
we have A = Ai dxi ^ dXi . Its action is
Z The gauge-fixing conditions are taken to be of the
  form i (A, X), so that the gauge fermion [50] becomes
S0 ½X; A ¼   ðAi @ Xi þ Pij ðXÞAi Aj ½64  i i (A, X). The antifields are then fixed to be
M =C

where  is the antisymmetric tensor and  is the j @ j ðA; XÞ
Ai ¼ C
volume form on M. The gauge transformations of @Ai
the model are
 j @ j ðA; XÞ
Xi ¼ C ½74
Xi ¼ Pij ðXÞ"j ; Ai ¼ Di "j
j
½65 @Xi

Ci ¼ 0
j j
where Di = @ i þ Pkj ,i Ak . The equations of motion   ¼ i ðA; XÞ
C i
are
The gauge-fixed action is
 Dji Aj ¼ 0 ½66 Z 
S ¼   ðAi @ Xi þ Pij ðXÞAi Aj Þ
and M
 i ij 
 ð@ X þ P Aj Þ ¼  D X ¼ 0 i
½67 k @ k ðA; XÞ j  k @ k ðA; XÞ Pij Cj
þC Di Cj þ C
@Ai @Xi
The gauge algebra is given by
1  m @ m ðA; XÞ  n @ n ðA; XÞ
þ C C  Pkl ;ij ðXÞ
½ð"1 Þ; ð"2 ÞXi ¼ Pji ðPmn ;j "1n "2m Þ 4 @Ai @Aj

j
½ð"1 Þ; ð"2 ÞAi ¼ Di ðPmn ;j "1n "2m Þ ½68 Ck Cl þ
i i ðA; XÞ ½75
 ð D Xj Þ Pmn ;ji "1n "2m
Now consider different gauge conditions:
In our general notation the generators of the gauge
j 1. First, the Landau gauge for the gauge potential
transformations R are here Pij and Di . The gauge
tensors T and E are Pij ,k and  Pmn ,ji . The higher- i = @  Ai , so that the gauge fermion becomes
order gauge tensors A and B vanish. =C  i @  Ai . The antifields are fixed to be
The ghost fields are again denoted by Ci . The i
Ai ¼ @  C
Noether identities are then
Z   Xi ¼ Ci ¼ 0 ½76
  Dji Aj Pki þ ð D Xi ÞDki Ck ¼ 0 ½69   ¼ @  Ai
C i
M
Bethe Ansatz 253

for this gauge choice the gauge-fixed action is Notice that in the noncovariant gauges 2 and 3 the
Z  action simplifies, in that the term which arose
S ¼  i @  Dj Cj
  ðAi @ Xi þ Pij ðXÞAi Aj Þ þ C because of the nonclosed nature of the gauge algebra
i
M vanishes.
1  i Þð@  C
 j Þ Pkl ;ij ðXÞ
þ ð@  C
4 See also: BF Theories; BRST Quantization; Constrained

Systems; Graded Poisson Algebras; Operads;
 Ck Cl  i ð@  Ai Þ ½77 Perturbative Renormalization Theory and BRST;
Supermanifolds; Topological Sigma Models.
Translating this action into the notation of Cattaneo
and Felder, one sees that it is exactly the expression
they use to derive the perturbation series.
Further Reading
2. Now consider the temporal gauge i = A0i . The
gauge fermion is given by  = C  i A0i . The anti- Alexandrov M, Kontsevich M, Schwarz A, and Zaboronsky O
fields are fixed to (1997) Geometry of the Master Equation. International
Journal of Modern Physics A12: 1405–1430.
i
A0i ¼ C Axelrod S and Singer IM (1991) Chern–Simons Perturbation
Theory, Proceedings of the XXth Conference on Differential
A1i ¼ 0 Geometric Methods in Physics, Baruch College/CUNY, NY.
½78 (hep-th/9110056).
Xi ¼ Ci ¼ 0
Batalin IA and Vilkovisky GA (1977) Gauge algebra and
  ¼ A0i
C quantization. Physics Letters 69B: 309–312.
i
Becchi C, Rouet A, and Stora R (1976) Renormalization of gauge
The gauge-fixed action is theories. Annals of Physics (NY) 98: 287–321.
Z  Cattaneo AS and Felder G (2001) On the AKSZ formulation of
S ¼   ðAi @ Xi þ Pij ðXÞAi Aj Þ the Poisson–Sigma model. Letters of Mathematical Physics
M 56: 163–179.

 i Dj Cj  i ðA0i Þ Faddeev LD and Popov VN (1967) Feynman diagrams for the
þC 0i ½79 Yang–Mills field. Physics Letters 25B: 29–30.
Gomis J, Paris J, and Samuel S (1994) Antibracket Antifields and
3. Finally consider the Schwinger–Fock gauge gauge-theory quantization. Physics Reports 269: 1–145.
i = x Ai . Then the antifields are fixed to be Hirshfeld AC and Schwarzweller T (2000) Path integral quantiza-
tion of the Poisson–Sigma model. Annals of Physics (Leipzig)
i
Ai ¼ x C 9: 83–101.
Schaller P and Strobl T (1994) Poisson structure induced
Xi ¼ Ci ¼ 0 ½80 (topological) field theories. Modern Physics Letters A9:
  ¼ x Ai
C 3129–3136.
i
Witten E (1988) Topological sigma models. Communications in
for this gauge choice the gauge-fixed action is Mathematical Physics 118: 411–449.
Z  Zinn-Justin J (1975) Renormalization of gauge theories. In:
Rollnik H and Dietz K (eds.) Trends in Elementary
S ¼   ðAi @ Xi þ Pij ðXÞAi Aj Þ Particle Physics, Lecture Notes in Physics, vol. 37. Berlin:
M
 Springer.
þC i x Dj Cj  i ð@  Ai Þ ½81
i

Bethe Ansatz
M T Batchelor, Australian National University, theory. At the heart of the Bethe ansatz is the way in
Canberra, ACT, Australia which multibody interactions factor into two-body
ª 2006 Elsevier Ltd. All rights reserved. interactions. The Bethe ansatz is thus intimately
entwined with the theory of integrability.
The way in which the Bethe ansatz works is best
Introduction understood by working through an explicit hands-on
example. The canonical example is the isotropic
The Bethe ansatz is a particular form of wave function antiferromagnetic Heisenberg Hamiltonian
introduced in the diagonalization of the Heisenberg
X
L1
spin chain. It underpins the majority of exactly solved H¼ hi;iþ1 þ hL;1 ; hij ¼ 12 ðs i  s j þ 1Þ ½1
models in statistical mechanics and quantum field i¼1
254 Bethe Ansatz

where s = ( x , y , z ) are Pauli matrices and L is the E ¼ L  2 þ 2 cos k ½6


length of the chain. Periodic boundary conditions are
imposed. However, open boundary conditions may The boundary conditions are such that a(0) = a(L)
also be treated, along with the addition of magnetic and a(L þ 1) = a(1); either gives eikL = 1, from which
bulk and boundary fields. The z-components of each the L values of k follow.
of the spins are either up or down. Since the
z-component of the total spin commutes with the Case 3: n = 2
Hamiltonian, the total number n of up spins serves as a
good quantum number. A state of the system can Here the wave function can be written in terms of
therefore be conveniently described in terms of the the two flipped spins as
coordinates of all the up spins. Denote these coordi- X
¼ aðx; yÞj ðx; yÞi ½7
nates by xi , with 1  xi  L. The quantum number n x<y
ensures that the Hamiltonian decomposes into L þ 1
sectors, each of size L choose n. The antiferromagnetic It is to be emphasized that one is working in the
ground state occurs in the largest sector. region with x < y. There are two cases to consider:
The normalization of the Hamiltonian [1] is such (1) y > x þ 1 and (2) y = x þ 1. Consider the
that its action is that of the permutation operator: interactions in the bulk. For (1) the action of the
Hamiltonian implies
hji ¼ ji; hjþþi ¼ jþþi Eaðx; yÞ ¼ ðL  4Þaðx; yÞ þ aðx  1; yÞ þ aðx þ 1; yÞ
½2
hjþi ¼ jþi; hjþi ¼ jþi þ aðx; y  1Þ þ aðx; y þ 1Þ ½8
and for (2)
Eaðx; x þ 1Þ ¼ ðL  2Þaðx; x þ 1Þ
Diagonalization of Sectors þ aðx  1; x þ 1Þ þ aðx; x þ 2Þ ½9
One can address the diagonalization of the sectors
The compatibility of these two equations requires that
for various cases.
2aðx; x þ 1Þ ¼ aðx; xÞ þ aðx þ 1; x þ 1Þ ½10
Case 1: n = 0 which is known as the ‘‘collision’’ or ‘‘meeting’’
Consider the case with all spins down. The condition.
eigenstate is  = j    i, with H = L and, Some adjustments need to be made for spins
thus, E = L is the trivial solution. which get flipped at the boundaries. Looking at
[8] and [9] with x = 1 and x = L, it is evident that
one can take
Case 2: n = 1
aðy; x þ LÞ ¼ aðx; yÞ ½11
There are L states, with
to restore the original ordering. The terms which
X
L arise involve up spins at sites 0 and L þ 1. This
¼ aðxÞj ðxÞi ½3 illustrates the periodic boundary condition.
x¼1 We now assume (the Bethe ansatz) that

where j (x)i is the state with an up spin at site x. aðx; yÞ ¼ A12 eik1 x eik2 y þ A21 eik2 x eik1 y ½12
The aim is to find the amplitudes a(x). It is clear
Substitution of the ansatz [12] into [8] gives
that
E ¼ L  4 þ 2 cos k1 þ 2 cos k2 ½13
Hj ðxÞi ¼ ðL  2Þj ðxÞi þ j ðx  1Þi Substitution of [12] into [10] gives
þ j ðx þ 1Þi ½4
A12 1  2 eik1 þ eiðk1 þk2 Þ
¼ ½14
in the bulk (away from either boundary). Insertion A21 1  2 eik2 þ eiðk1 þk2 Þ
of [3] into H = E gives The three relations [11], [12], and [14] give the
Bethe equations
EaðxÞ ¼ ðL  2ÞaðxÞ þ aðx  1Þ þ aðx þ 1Þ ½5
A12 A21
ikx eik1 L ¼ and eik2 L ¼ ½15
Substitution of spin waves a(x) = e gives A21 A12
Bethe Ansatz 255

which are to be solved for k1 and k2 . Note that In this case the Bethe ansatz is
ei(k1 þk2 )L = 1. y y
aðx; y; zÞ ¼ A123 zx1 z2 zz3 þ A132 zx1 z3 zz2
Case 4: n = 3 þ A213 zx2 zy1 zz3 þ A231 zx2 zy3 zz1
The full power of the Bethe ansatz method becomes þ A321 zx3 zy2 zz1 þ A312 zx3 zy1 zz2 ½24
evident for three particles. Here
in which zj = eikj . This is a sum over the 3!
X permutations of the integers 1, 2, 3. Inserting this
¼ aðx; y; zÞj ðx; y; zÞi ½16
x<y<z
ansatz into [17] gives
E ¼ L  6 þ 2ðcos k1 þ cos k2 þ cos k3 Þ ½25
There are several cases to consider:
To determine the kj , it is convenient to define
1. y > x þ 1 and z > y þ 1, where
sij ¼ 1  2zj þ zi zj ½26
Eaðx; y; zÞ ¼ ðL  6Þaðx; y; zÞ þ aðx  1; y; zÞ
Substitution of [24] into the meeting conditions [21]
þ aðx; y  1; zÞ þ aðx; y; z  1Þ ½17
and [22] then gives
By a(x  1, y, z), we mean a(x þ 1, y,z)þ s12 A123 þ s21 A213 þ s13 A132 þ s31 A312
a(x  1, y, z), etc.
þ s23 A231 þ s32 A321 ¼ 0 ½27
2. y = x þ 1 and z > y þ 1, with

Eaðx; x þ 1; zÞ s23 A123 þ s32 A132 þ s13 A213 þ s31 A231


¼ ðL  4Þaðx; x þ 1; zÞ þ aðx  1; x þ 1; zÞ þ s21 A321 þ s12 A312 ¼ 0 ½28
þ aðx; x þ 2; zÞ þ aðx; x þ 1; z  1Þ ½18 These equations are assumed to be satisfied in
permutation pairs, that is,
3. y > x þ 1 and z = y þ 1, where
s12 A123 þ s21 A213 ¼ 0
Eaðx; y; y þ 1Þ ½29
s23 A123 þ s32 A132 ¼ 0; etc:
¼ ðL  4Þaðx; y; y þ 1Þ þ aðx  1; y; y þ 1Þ
þ aðx; y  1; y þ 1Þ þ aðx; y; y þ 2Þ ½19 Up to an overall constant, the relations [27] and [28]
are satisfied by
4. y = x þ 1 and z = y þ 1, for which A123 ¼ s21 s31 s32 ; A132 ¼s31 s21 s23
Eaðx; x þ 1; x þ 2Þ ¼ ðL  2Þaðx  1; x þ 1; x þ 2Þ A312 ¼ s13 s23 s21 ; A321 ¼s23 s13 s12 ½30
þ aðx; x þ 1; x þ 3Þ ½20 A231 ¼ s32 s12 s13 ; A213 ¼s12 s32 s31

Again, we must ensure that these equations are The boundary condition, a(y, z, x þ L) = a(x, y, z),
compatible. This involves comparison of the last gives
three equations with [17]. The three equations to be  L  y   y
z1 A321  A132 z1x z3 z2x þ zL2 A312  A231 z2x z3 z1x
satisfied are    
y y
þ zL1 A231  A123 z1x z2 z3x þ zL3 A213  A321 z3x z2 z1x
2aðx; x þ 1; zÞ ¼ aðx; x; zÞ þ aðx þ 1; x þ 1; zÞ ½21    
þ zL2 A132  A213 z2x z1y z3x þ zL3 A123  A312 z3x z1y z2x
2aðx; y; y þ 1Þ ¼ aðx; y; yÞ þ aðx; y þ 1; y þ 1Þ ½22 ¼0 ½31
4aðx; x þ 1; x þ 2Þ ¼ aðx; x; x þ 2Þ þ aðx; x þ 1; x þ 1Þ This leads to the equations
þ aðx; x þ 2; x þ 2Þ A123 A132 s21 s31
þ aðx þ 1; x þ 1; x þ 2Þ ½23 zL1 ¼ ¼ ¼
A231 A321 s12 s13
But note that setting z = x þ 2 in [21] and y = x þ 1 A213 A231 s12 s32
zL2 ¼ ¼ ¼ ½32
in [22] leads to [23] being automatically satisfied. A132 A312 s21 s23
We are thus left with only two equations [21] and A321 A312 s13 s23
[22]. Note the similarity between these two equa- zL3 ¼ ¼ ¼
A213 A123 s31 s32
tions and the meeting condition [10] for the n = 2
case. which can be solved for the Bethe roots kj .
256 Bethe Ansatz

General n meeting conditions can be handled in terms of two-


body interactions. To see this more clearly, the six
The general Bethe ansatz is
X permutation pair equations [29] can be written in
aðx1 ; . . . ; xn Þ ¼ Ap1 ;...;pn zxp11 . . . zxpnn ½33 the general form Aabc = Yab Abac and Aabc = Ybc Aacb ,
P where Yab = sba =sab . Now there are two possible
where the sum is over all n! permutations paths to get from Aabc to Acba , namely
P = {p1 , . . . , pn } of the integers 1, . . . , n. The boundary Acba ¼ Yab Yac Ybc Aabc
condition is ½42
Acba ¼ Ybc Yac Yab Aabc
aðx2 ; x3 ; . . . ; xn ; x1 þ LÞ ¼ aðx1 ; x2 ; . . . ; xn Þ ½34
Both paths must be equivalent, with
leading to the Bethe equations
Yab Yba ¼ 1 and Yab Yac Ybc ¼ Ybc Yac Yab ½43
Ap1 ;...;pn
zLp1 ¼ ½35 The latter is a condition of nondiffraction or
Ap2 ;...;pn ;p1 equivalently a manifestation of the Yang–Baxter
for all permutations, with equation.
Y Historically, the next model to be exactly solved in
Ap1 ;...;pn ¼ P spj ;pi ½36 terms of the Bethe ansatz was the one-dimensional
1i<jn model of N interacting bosons on a line of length L
where P is the signature of the permutation. Finally, defined by the Hamiltonian

Y
n
sp ;p Y
n
s‘;j XN
@2 X
zLp1 ¼ ðÞn1 ‘ 1
or zLj ¼ ðÞn1 ½37 H¼ þ 2c ðxi  xj Þ ½44
s sj;‘ @x2i
‘¼2 p1 ;p‘ ‘¼1 i¼1 1i<jN
6¼j

where c is a measure of the interaction strength. For


for j = 1, . . . , n. The eigenvalues are given by
this model the Bethe ansatz wave function is of the
X
n   same form as [33] with the two-body interaction
E¼Lþ 2 cos kj  2 ½38 term given by
j¼1
sab ¼ ka  kb þ ic ½45
Another form of the Bethe equations is obtained
by defining The Bethe equations are given by

uj  ð1/2Þi Y
N
kj  k‘ þ ic
eikj ¼ ½39 expðikj LÞ ¼ 
uj þ ð1/2Þi ‘¼1
kj  k‘  ic

which gives for j ¼ 1; . . . ; N ½46


X
n
1 The energy eigenvalue is
E¼L ½40
u2
j¼1 j
þ 1/4 X
N
E¼ k2j ½47
with uj satisfying j¼1
  Y
uj  ð1/2Þi L n
uj  u‘  i For repulsive (c > 0) interactions, one can prove that
¼ ½41 all Bethe roots are real.
uj þ ð1/2Þi u  u‘ þ i
‘¼1 j
The Bethe ansatz has been applied to a number of
for j = 1, . . . , n. other and more general models, both for discrete
All eigenvalues of the Heisenberg spin chain may spins and in the continuum. These include the
be obtained in terms of the Bethe ansatz solution. anisotropic Heisenberg (XXZ) spin chain, for
For example, the distribution of roots uj for the which the above working readily generalizes to
ground state are real and symmetric about the trigonometric functions. The underlying ansatz [33]
origin. Excitations may involve complex roots. remains the same. One key generalization is the
Although obtained exactly in terms of the Bethe nested Bethe ansatz, which arises, for example, in
roots, the Bethe ansatz wave function is the solution of the general N-state permutator
cumbersome. model, the Hubbard model, and the Gaudin–Yang
We have thus seen how the Bethe ansatz works model of interacting fermions. For such models the
for the Heisenberg spin chain. The underlying nested Bethe ansatz involves an additional level of
mechanism is the way in which the collision or work to determine the amplitudes appearing in the
BF Theories 257

wave function [33] due to higher symmetries. This Theory; Integrable Systems: Overview; Quantum Spin
results in Bethe equations involving different types Systems; Yang–Baxter Equations.
or colors of roots.
The exactly solved one-dimensional quantum spin
chains may also be obtained from their two-dimen- Further Reading
sional classical counterparts – the vertex models. For
Baxter RJ (1983) Exactly Solved Models in Statistical Mechanics.
example, the six-vertex model shares the same Bethe
London: Academic Press.
ansatz wave function and Bethe equations as the Baxter RJ (2003) Completeness of the Bethe ansatz for the six-
XXZ spin chain. The more general permutator and eight-vertex models. Journal of Statistical Physics
Hamiltonians are related to multistate vertex models. 108: 1–48.
One may also consider other spin-S models. Bethe HA (1931) Zur Theorie der Metalle I. Eigenwerte und
Eigenfunktionen der linearen Atomkette. Zeitschrift für Physik
The discussion in this article has centered on what is
71: 205–226.
known as the coordinate Bethe ansatz. Another Gaudin M (1967) Un Système à Une Dimension de Fermions en
formulation is the algebraic Bethe ansatz, which was Interaction. Physics Letters A 24: 55–56.
developed for the systematic treatment of the higher- Gaudin M (1983) la Fonction d’onde de Bethe. Paris: Masson.
spin models. In this formulation, operators create the Korepin VE, Izergin AG, and Bogoliubov NM (1993) Quantum
Inverse Scattering Method and Correlation Functions.
Bethe states by acting on a vacuum. The algebraic
Cambridge: Cambridge University Press.
Bethe ansatz goes hand-in-hand with the quantum Lieb EH and Liniger W (1963) Exact analysis of an interacting
inverse-scattering method. In all of the exactly solved Bose gas I. The general solution and the ground state. Physical
Bethe ansatz models, it is possible to derive quantities Review 130: 1605–1616.
like the ground-state energy per site via the root density Mattis DC (1993) The Many-Body Problem: An Encyclopaedia of
Exactly Solved Models in One-Dimension. Singapore: World
method, which assumes that the Bethe roots form a
Scientific.
uniform distribution in the infinite-size limit. The McGuire JB (1964) Study of exactly soluble one-dimensional
thermodynamics of the Bethe ansatz solvable models N-body problems. Journal of Mathematical Physics
may also be calculated in a systematic fashion. 5: 622–636.
Despite Bethe’s early optimism, the Bethe ansatz Sutherland B (2004) Beautiful Models: 70 Years of Exactly Solved
Quantum Many–Body Problems. Singapore: World Scientific.
has not been extended to higher-dimensional Takahashi M (1999) Thermodynamics of One-Dimensional
systems. Solvable Models. Cambridge: Cambridge University Press.
Yang CN (1967) Some exact results for the many-body problem
See also: Affine Quantum Groups; Eight Vertex and Hard in one-dimension with repulsive Delta-function interaction.
Hexagon Models; Integrability and Quantum Field Physical Review Letters 19: 1312–1315.

BF Theories
M Blau, Université de Neuchâtel, Neuchâtel, that A is flat, FA = 0, and thus BF theories are
Switzerland topological gauge theories of flat connections.
ª 2006 Elsevier Ltd. All rights reserved. Abelian BF theories and their relation to topolo-
gical invariants (the Ray–Singer torsion) were
originally discussed by Schwarz (1978, 1979). In
the context of the topological field theory, non-
Introduction
abelian BF theories were introduced in Horowitz
BF theories are a class of gauge theories with a (1989) and Blau and Thompson (1989, 1991).
nontrivial metric-independent classical action. As Since then, BF theories have attracted a lot of
such these theories are candidate topological field attention as simple toy-models of (topological)
theories akin to the Chern–Simons theory in three gauge theories, and also because of their relation-
dimensions, but in contrast to the Chern–Simons ships with the Chern–Simons theory, the Yang–Mills
theory these exist and are well defined in arbitrary theory, and gauge-theory formulations of gravity, as
dimensions. well as because of the rather rich and intricate
The name ‘‘BF theories’’ derives from the fact structure of their quantum theories.
that, roughly (see [1] below and the subsequent The purpose of this article is to provide an
discussion for a more precise description),
R the action overview of these various features of BF theories.
of the BF theory takes the form B ^ FA with FA the The standard reference for the basic classical and
curvature of a connection A and B a Lagrange quantum properties of BF theories is Birmingham
multiplier. The classical equations of motion imply et al. (1991).
258 BF Theories

Basic Classical Properties of BF Theories Stora–Tyupkin procedure), a typical gauge choice


being dA0 ? (A  A0 ) = 0 where A0 is a reference
Nonabelian BF Theories
connection, and ? is the Hodge duality operator
The classical action and equations of motion Typi- corresponding to a choice of metric on M.
Typically, the classical action of the BF theory takes
the form Local p-form symmetries For n = 2, the only local
Z symmetries of the BF action are the above G gauge
SBF ðA; BÞ ¼ trG B ^ FA ½1 transformations. For n > 2, however, there are other
M local symmetries associated with shifts of Bp 2
where FA is the curvature of a connection A on a p (M, g) with p = n  2 > 0. Indeed, integration by
principal G-bundle P ! M over an n-dimensional parts using Stokes’ theorem and @M = 0 shows that [1]
manifold M, B is an ad-equivariant horizontal is invariant under
(n  2)-form on P, and trG (a trace) denotes an
ad-invariant nondegenerate scalar product on the A ! A; Bp ! Bp þ dA p1 ; p1 2 p1 ðM; gÞ ½6
Lie algebra g of the Lie group G. Generalizations of For p = 1,  is a 0-form and the invariance follows.
this are possible, in particular, for G abelian or for For p > 1, however, the gauge parameter has, in
n = 3 and are mentioned below. some sense, its own gauge invariance. Namely,
We consider FA and B as forms on M taking under the shift
values in the bundle of Lie algebras ad P = P ad g
and refer to such objects as elements of  (M, g). p1 ! p1 þ dA p2 ½7
Then tr B ^ FA 2 n (M, R) is a volume form on M. one has
In order to simplify the exposition, in the following
we will mostly assume that G is compact semisimple dA p1 ! dA p1 þ ½FA ; p2  ½8
and that M is compact without a boundary (even Thus for FA = 0, the shift [7] has no effect on the
though relaxing any one of these conditions is local symmetry [6]. Likewise, for p > 2 the parameter
possible and also of interest in its own right). p2 itself has a similar invariance, etc. Since FA = 0
Varying the action [1] with respect to A and B, is one of the classical equations of motion, the shift
one obtains the classical equations of motion symmetry [6] is what is called an ‘‘on-shell reducible
FA ¼ 0; dA B ¼ 0 ½2 symmetry.’’ Gauge-fixing such symmetries is not
straightforward, and one generally appeals to the
where Batalin–Vilkovisky formalism to accomplish this.
dA B ¼ dB þ ½A; B ½3
Diffeomorphisms and local symmetries One mani-
is the covariant exterior derivative. In particular, festation of the general covariance of the BF action
therefore, the equations of motion imply that the [1] is the on-shell equivalence of (infinitesimal)
connection A is flat. diffeomorphisms and (infinitesimal) local symme-
tries. Diffeomorphisms are generated by the Lie
Gauge invariance For any n, the action [1] is derivative LX along a vector field X. The action of
invariant under G gauge transformations (vertical LX on differential forms is given by the Cartan
automorphisms of P) acting on A and B as formula LX = diX þ iX d, where i(.) is the operation
of contraction. The action of the Lie derivative on
A ! g1 Ag þ g1 dg; B ! g1 Bg ½4 A and B can be written in gauge covariant form as
(the latter is what is meant by the fact that B takes LX A ¼ iX FA þ dA ðXÞ;
values in ad P), because FA is also ad-equivariant, ½9
LX B ¼ iX dA B þ ½B; ðXÞ þ dA 0 ðXÞ
FA ! g1 FAg , and trG is ad-invariant. The infinitesi-
mal version of this statement is that the action is where (X) = iX A and 0 (X) = iX B. This shows that
invariant under the variations on-shell diffeomorphisms are equivalent to field-
dependent gauge and p-form symmetries of the
A ¼ dA ; B ¼ ½B;  ½5 BF action.
where  2 0 (M, g) can (formally) be thought of as
an element of the Lie algebra of the group of gauge The classical moduli space The classical moduli
transformations. space C = C(P, M, G) is the space of solutions to the
Gauge-fixing this symmetry can proceed in the classical equations of motion modulo the local
usual way (via the Faddeev–Popov or Becchi–Rouet– symmetries of the action. Since the field content
BF Theories 259

and the nature of the local symmetries of the BF for example, the usual Yang–Mills action for
theory depend strongly on the dimension n of M, the nonabelian gauge fields
structure and interpretation of the classical moduli Z
1
space also depend on n. SYM ¼ 2 trG FA ^ ?FA ½17
For n = 2, by [5] the equation of motion [2] for 4g M
B 2 0 (M, g) says that A is invariant under the it does not require a metric (or the corresponding
infinitesimal gauge transformation generated by B. Hodge duality operator ?) for its formulation. This
Thus if A is ‘‘irreducible,’’ there are no nontrivial makes it a candidate action for a ‘‘topological field
solutions for B and, away from reducible flat theory,’’ this term loosely referring to field theories
connections, the classical moduli space is just the which, in a suitable sense, do not depend on
moduli space of flat connections on P ! M over the additional structures imposed on the underlying
surface M: space(-time) manifold M, in this case a Riemannian
structure.
Cn¼2 ¼ Mflat ðP; GÞ ½10
To establish that BF theories are ‘‘topological
This space may or may not be empty, depending on quantum field theories,’’ one needs to show that
whether P admits flat connections or not. the partition function (and correlation functions)
For n = 3, the equation of motion [2] for of the quantized BF theory are also metric
B 2 1 (M, g) says that B is a tangent vector to the independent. This is not completely automatic as
space of flat connections at the flat connection A, in typically the metric enters in the gauge fixing of
the sense that under the variation A = B, one has the local symmetries of the action which is
required to make the quantum theory well defined.
FA ¼ dA B ¼ 0 ½11
The usual lore is that since the metric only enters
The local G gauge symmetry and the 1-form symmetry through the gauge fixing and since the quantum
[6] now imply that the moduli space of classical theory should be independent of the choice of
solutions can be identified with the (co-)tangent bundle gauge, it should also be metric independent. In the
of the moduli space of flat connections on P ! M case of nonabelian BF theories, the complexity of
over the 3-manifold M: their local symmetries complicates the analysis
somewhat, but it can nevertheless be shown that
Cn¼3 ¼ TMflat ðP; GÞ ½12
BF theories indeed define topological field theories
In higher dimensions there appears to be less also at the quantum level.
geometrical structure associated with BF theories,
and all that can be said in general is that the tangent Special Features of Abelian BF Theories
space to Cn at a solution (A, B) of the equations of
motion [2] is the vector space: All the features of nonabelian BF theories discussed
above are, of course, also valid when G is abelian
TðA;BÞ Cn ¼ HA1 ðM; gÞ  HAn2 ðM; gÞ ½13 (with some obvious modifications and simplifica-
where HAk (M, g) are the cohomology groups of the tions). However, when G is abelian, a more general
deformation complex action than [1] is possible. Indeed, although there is
no obvious higher p-form analog of nonabelian
dA :  ðM; gÞ ! þ1 ðM; gÞ ½14 gauge fields, in the abelian case G = U(1) or G = R,
2 and the condition FA 2 2 (M, R) can be relaxed. In
associated with the flat connection A, FA = (dA ) = 0.
particular, one can consider the actions
When M is topologically of the form M =   R
Z
(where one can think of R as time), one has
Sðn; pÞ  SðBp ; Cnp1 Þ ¼ Bp ^ dCnp1 ½18
TðA;BÞ Cn ¼ HA1 ð; gÞ  HAn2 ð; gÞ ½15 M

with Bp 2  (M, R), Cnp1 2 np1 (M, R), and


p
This is naturally a symplectic vector space (necessary FC = dC; its (n  p)-form field strength. More gen-
for a phase space), the nondegenerate antisymmetric erally, one can also consider the hybrid action
pairing being given by Poincaré duality: Z
Z
SA ðn; pÞ ¼ Bp ^ dA Cnp1 ½19
!ð½a1 ; ½b1 ; ½a2 ; ½b2 Þ ¼ trG ða1 ^ b2  a2 ^ b1 Þ ½16 M

where A is a fixed (nondynamical) flat G-connection,
dA2 = 0, and B and C take values in the corresponding
Metric independence Perhaps the most important adjoint bundle. This action can be considered as the
property of the action [1] is that, in contrast to, linearization of the nonabelian BF action [1] around
260 BF Theories

the flat connection A, and it reduces to the abelian BF is well defined. The Ray–Singer torsion of (M, g)
action [18] for g = R. (with respect to the flat connection A) is then
The action is invariant under the (reducible) local defined by
symmetries
n 
Y  p
ðpÞ ð1Þ p=2
Bp ! Bp þ dA p1 TA ðMÞ ¼ det A ½25
½20 p¼0
Cnp1 ! Cnp1 þ dA 0np2
The space of solutions to the equations of motion Even though this definition depends strongly on the
dA C = dA B = 0 modulo gauge symmetries is (cf. [13]) metric g on M, the Ray–Singer torsion has the
the finite-dimensional vector space remarkable property of being independent of g. The
Ray–Singer torsion can be shown to be trivial
p np1
Cn; p ¼ HA ðM; gÞ  HA ðM; gÞ ½21 (essentially =1 modulo zero-mode contributions)
in even dimensions, but is a nontrivial topological
which is naturally symplectic for M =   R. invariant in odd dimensions. Henceforth, we will
suppress the dependence on M and denote the
n-dimensional Ray–Singer torsion by TA (n).
Uses and Applications of Quantum
Abelian BF Theories Gaussian path integrals and determinants The path
Quantization of Abelian BF Theories and the integral for abelian BF theories is modeled on the
Ray–Singer Torsion usual formula for a -function
Z
We will now show that the partition function of 1
n ðxÞ ¼ pffiffiffiffiffiffi n dn  eix ½26
the abelian BF theory (actually more generally that ð 2Þ Rn
of the linearized nonabelian BF action [19]) is
related to the Ray–Singer torsion of M. This from which one deduces the Gaussian integral
requires some preparatory material on Gaussian formula
path integrals, determinants, and gauge fixing that Z
1
we present first. pffiffiffiffiffiffi dn  dn x eiDxþiKxþiJ
In order to simplify the exposition, we assume ð 2Þn Rn Rn
that there are no harmonic modes, either because Z
they have been gauged away or because the ¼ dn xn ðDx þ JÞ eiKx
Rn
cohomology groups of dA are trivial, HAk (M, g) = 0,
1 1
that is, the deformation complex [14] is ‘‘acyclic.’’ ¼ eiK:D J ½27
det D
Here, we have assumed that the operator (matrix) D
Laplacians, determinants, and the Ray–Singer
is invertible. The model that one uses in the path
torsion Choosing a Riemannian metric g (and
integral is that
Hodge duality operator ?) on M, the twisted
Laplacian on p-forms is Z R
i
¼ ðdet DÞ 1
?D
d½ d½ e M ½28
ðpÞ
A ¼ ðdA þ dA? Þ2 ¼ dA dA? þ dA? dA ½22
where  is a set of fields and the  are a set of dual
where dA? =  ? dA ? is the adjoint of d with respect to
fields with D again a nondegenerate operator. The
the scalar product on p-forms defined by ?. This is an
inverse determinant arises for Grassmann even fields
elliptic operator whose determinant can be defined, for
(as in [27]), while it is the determinant that appears
example, by a -function regularization. Denoting the
for Grassmann odd fields.
(nonzero) eigenvalues of A(p) by k(p) , its -function is
X ðpÞ s Gauge fixing – the Faddeev–Popov trick If the
 ðpÞ ðsÞ ¼ k ½23 R
k
action [19], SA (n, p) = Bp dA Cnp1 , were non-
degenerate, its partition function could be defined
This converges for Re(s) sufficiently large and can be directly by [28]. However, because of gauge invariance
analytically continued to a meromorphic function of of the action, the kinetic term is degenerate and one
s analytic at s = 0, so that needs to eliminate the gauge freedom to obtain an (at
ðpÞ ðpÞ0
least formally) well-defined expression for the partition
det A :¼ e ð0Þ
½24 function. Concretely, this degeneracy can be seen by
BF Theories 261

recalling that, when there are no harmonic forms (as we where  denotes collectively all the fields. Concre-
have assumed), there is a unique orthogonal Hodge tely, when n = 2 and p = 0 (or, equivalently, p = 1),
decomposition of a p-form Bp 2 p (M, g) into a sum of the quantum action is
a dA -exact and a dA -coexact form: Z
q ð0Þ
Bp ¼ dA p1 þ dA? pþ1 ½29 SA ð2; 0Þ ¼ B0 dA C1 þ dA ? C1 þ c ? A c ½35

(and likewise for C). Evidently, the exact (longitudinal) Likewise, for n = 3 and p = 1 (the only other case
parts dA  of B and C do not appear in the action, and when the gauge symmetry is indeed irreducible),
these are precisely the gauge-dependent parts of B and both B1 and C1 require separate gauge fixing, and
C under the gauge transformation [20]. Gauge fixing the quantum action is
amounts to imposing a condition F (Bp ) = 0 on Bp that Z
determines the longitudinal part uniquely in terms of q ð0Þ
SA ð3; 1Þ ¼ B1 dA C1 þ dA ? C1 þ c ? A c
the transversal part dA? . A natural condition is
ð0Þ
dA p1 ¼ 0 , F ðBp Þ ¼ dA? Bp ¼ 0 ½30 þ 0 dA ? B1 þ c0 ? A c0 ½36

A gauge-fixing condition independent of the partition Formally, therefore, the two-dimensional partition
function results from inserting ‘‘1’’ in the form of function is
Z
det ð0Þ
1¼ d½gðF ðBg ÞÞF ðBÞ ½31 ZA ð2; 0Þ ¼ ½37
G det DA
into the functional integral (the Faddeev–Popov where DA is the operator:
trick), where G is the gauge group. This defines the  
Faddeev–Popov determinant F , and the functional ?dA
DA ¼ : 1 ðM; gÞ
properties of the delta functional imply that F is ?dA ?
the determinant of the operator that one obtains ! 0 ðM; gÞ  0 ðM; gÞ ½38
upon gauge variation of F (B).
In the general case of reducible gauge symmetries, One can define the determinant of this operator as
the nature of the gauge group is complicated and the square root of the determinant of the operator
requires some more thought. In the irreducible case, D?A DA = (1)
A , and therefore the partition function
however, that is, for p = 1, the Lie algebra of the
gauge group can be identified with 0 (M, g), and ZA ð2; 0Þ ¼ det ð0Þ ðdet ð1Þ Þ1=2 ¼ TA ð2Þ ½39
F is the determinant of the operator: is equal to the two-dimensional Ray–Singer torsion
F [25]. In this case, it is easy to see directly that the
dA : 0 ðM; gÞ ! 0 ðM; gÞ ½32 even-dimensional Ray–Singer torsion is trivial, as
B
one could have equally well defined the determinant
For [30], this is simply the Laplacian on 0-forms, of DA as the square root of the operator
and thus DA D?A = (0) (0)
A  A , which implies ZA (2, 0) = 1.
F ¼ det A
ð0Þ
½33 In three dimensions, the two pairs of ghosts each
contribute a det (0)
A , and thus

ðdet ð0Þ Þ2
ZA ð3; 1Þ ¼ ½40
det DA
The partition function Following the finite-dimen-
sional model, both the -function implementing the where !
gauge-fixing condition and the Faddeev–Popov ?dA dA
determinant can be lifted into the exponential, the DA ¼ : 0 ðM; gÞ  1 ðM; gÞ
dA ? 0
former by a Lagrange multiplier  [26], a Grassmann
even 0-form, and the latter by a pair of Grassmann ! 0 ðM; gÞ  1 ðM; gÞ ½41
odd 0-forms c and c̄ [28], the ghost and antighost
is the operator acting on the fields (B1 , C1 , , 0 ). As
fields, respectively. The sum of the classical action
before, this operator can be diagonalized by squar-
and these gauge-fixing and ghost terms defines the
q ing it, DA DA = (0)  (1) , and thus
(BRST-invariant) ‘‘quantum action’’ SA (n, p), and the
partition function is ð0Þ
ZA ð3; 1Þ ¼ ðdet A Þ3=2 ðdet A Þ1=2
ð1Þ
Z
¼ TA ð3Þ1
q
ZA ðn; pÞ ¼ d½eiSA ðn;pÞðÞ ½34 ½42
262 BF Theories

is again related to the (this time genuinely nontrivial) Since the dimension of  is equal to the codimen-
Ray–Singer torsion. sion of S0 = @0 ,  and S0 will generically intersect
In spite of the complications caused by reducible transversally at isolated points, and we define the
gauge symmetries, it can be shown that all of the ‘‘linking number’’ of S and S0 to be the intersection
above generalizes to arbitrary n and p, with the number of  and S0 , expressed in terms of de Rham
result that (for n odd) currents as
p Z Z
ZA ðn; pÞ ¼ TA ðnÞð1Þ ½43
LðS; S0 Þ ¼ S0 ¼  S0 ½46
confirming the topological nature of BF theories.  M

In the nonabelian case, the situation is significantly In terms of de Rham currents, the Wilson
R surface
more complicated because of the complexity of the operators can be written as WS [B] = M S ^ B, etc.
classical moduli space, the (higher cohomology) zero Thus, the generating functional for correlation
modes, and the on-shell reducibility of the gauge functions of Wilson surface operators
symmetries. Nevertheless, ignoring all the zero modes
except those of A, that is, except the moduli m of flat hei WS ½B ei
WS0 ½C i
Z R
connections A(m), the result is similar to that in the i ðB dCþ
S0 Cþ S BÞ
¼ D½CD½Be M ½47
abelian case, in that the partition function reduces to an
integral over the moduli space of flat connections, with
is simply a Gaussian path integral. Using the
measure determined by the Ray–Singer torsion TA(m) .
defining properties of de Rham currents, this can
be formally evaluated (using [27]) to give
Linking Numbers as Observables of Abelian 0

BF Theories hei WS ½B ei


WS0 ½C i ¼ ei
LðS;S Þ ½48

With the exception of p = 0, there are no interesting As expected, correlation functions of these topolog-
‘‘local’’ observables (gauge-invariant functionals of the ical field theories encode topological information.
fields C and B) in the abelian BF theory, since the gauge-
invariant field strengths dC and dB vanish by the
equations of motion. (For p = 0, B is a gauge-invariant Uses and Applications of Classical
0-form and hence B(x) is a good local observable.) Nonabelian BF Theories
However, as in the Chern–Simons and Yang–Mills Low-dimensional BF theories are closely related to
theories, certain (weakly) nonlocal observables such as other theories of interest, for example, the Yang–
Wilson loops are also of interest. In the case at hand (eqn Mills theory, the Chern–Simons theory, and gravity.
[18]), we have abelian Wilson surface operators Here, we briefly review some of these relationships.
Z Z
In order to avoid the complexities of quantum
WS ½B ¼ B; WS0 ½C ¼ C ½44 nonabelian BF theories, we focus on their classical
S S0
features. Brief suggestions for further reading are
associated with p- and (n  p  1)-dimensional sub- provided at the end of each subsection.
manifolds S and S0 of M, respectively. These operators
are gauge invariant, that is, invariant under the local Relation with Yang–Mills Theory
symmetries [20] provided that @S = @S0 = 0, so that S
and S0 represent homology cycles of M. In any dimension, the nonabelian BF action can be
For M = R n , correlation functions of these opera- regarded as the zero-coupling limit g2 ! 0 of the
tors are related to the topological linking number of Yang–Mills theory since the Yang–Mills action [17]
S and S0 . We choose S = @ and S0 = @0 to be can be written in first-order form as
disjoint compact-oriented boundaries of oriented Z
1
submanifolds  and 0 of Rn . We also introduce trG FA ^ ?FA
4g2 M
de Rham currents  and S (essentially distribu- Z
tional differential forms with -function support on  trG ½iBn2 ^ FA þ g2 Bn2 ^ ?Bn2  ½49
 or S, respectively), characterized by the properties M
Z Z
However, whereas for n
3 the B2 -term breaks the
!p ¼  S ^ !p p-form gauge invariance of the BF action (and thus
Z S ZM ½45 liberates the physical Yang–Mills degrees of free-
!pþ1 ¼  ^ !pþ1 dom), this limit is nonsingular in two dimensions
 M
where this p-form symmetry is absent and, indeed,
k
for all !k 2  (M, R) (and likewise for S0 and 0 ). both theories have zero physical degrees of freedom.
BF Theories 263

c c
A nonsingular BF-like zero coupling limit of [Ja , Jb ] = fab Jc , [Ja , Pb ] = fab Pc and [Pa , Pb ] = 0, and
the Yang–Mills theory for n
3 can be obtained the curvature of the TG-connection C = Ja Aa þ Pa Ba is
by introducing an auxiliary (Stückelberg) field
2 n3 (M, g) which restores the p-form gauge FC ¼ Ja FAa þ Pa dA Ba ½53
invariance. The resulting BF Yang–Mills action is Thus, the equations of motion of the TG Chern–
Z 
Simons theory are equivalent to the equations of
SBFYM ¼ trG iBn2 ^ FA motion [2] of the BF theory with gauge group G.
M
  This equivalence also holds at the level of the action:
2 1
þ g Bn2  pffiffiffi dA 1
2g 2 SCS ðCÞ ¼ SBF ðA; BÞ ½54
 
1
^  Bn2  pffiffiffi dA ½50 provided that one chooses the nondegenerate invar-
2g iant scalar product to be
This action is not only invariant under ordinary G
trTG ðJa Pb Þ ¼ trG ðJa Jb Þ
gauge transformations, but also under the p-form ½55
gauge symmetry B ! B þ dA  [6] provided that trTG ðJa Jb Þ ¼ trTG ðPa Pb Þ ¼ 0
pffiffiffi
transforms as ! þ 2g. Thus, this shift can be
For G = SO(3), TG is the Euclidean group of
used to set to zero, upon which one recovers the
isometries of R3 and for G = SO(2, 1), TG is the
first-order form of the Yang–Mills action. More-
Poincaré group of isometries of the three-dimensional
over, in the zero-coupling limit all that survives is a
Minkowski space R2, 1 . For these gauge groups, the BF
standard (and nontopological) minimal coupling of
action takes the form of the three-dimensional
to the BF action:
(Euclidean or Lorentzian) Einstein–Hilbert action,
lim SBFYM with the interpretation of B = e as the dreibein and
g2 !0 A = ! as the spin connection. The equations of motion
Z

for e and ! express the vanishing of the torsion
¼ trG iBn2 ^ FA þ 12 dA ^ dA ½51
M and the Riemann tensor (equivalent to the vanishing
of the Ricci tensor for n = 3), respectively. This
accounting for the correct number of degrees of
Chern–Simons interpretation of three-dimensional
freedom of the Yang–Mills theory (the (n  3)-form
gravity extends to gravity with a cosmological
being absent for n = 2).
constant, with H the appropriate de Sitter or anti-de
Two-dimensional quantum BF and Yang–Mills
Sitter isometry group (SO(4), SO(3, 1), or SO(2, 2),
theories have a variety of interesting topological
depending on the signature and the sign of the
properties. An account of some of them can be found
cosmological constant). In terms of the BF interpreta-
in Blau and Thompson (1994) and Witten (1991). For
tion, this corresponds to the simple topological
a detailed discussion of the gauge symmetries and gauge
deformation
fixing of the BFYM action, see Cattaneo et al. (1998).
Z

S BF ðA; BÞ ¼ trG B ^ FA þ 13 B ^ B ^ B ½56
M
Chern–Simons Theory, Gravity, and (Deformed)
BF Theory of the BF action, which has the deformed local
symmetries (cf. [5] and [6])
The Chern–Simons theory is a three-dimensional
gauge theory. The Chern–Simons action for an A ¼ dA  þ ½B; 0 ; B ¼ ½B;  þ dA 0 ½57
H-connection C, H the gauge group, is
Z A simple way to understand these symmetries is to
note that the action can be written as the difference
SCS ðCÞ ¼ trH C ^ dC þ 23 C ^ C ^ C ½52
M of two Chern–Simons actions:
pffiffiffi pffiffiffi
It is invariant under the infinitesimal gauge transforma- SCS ðA þ BÞ  SCS ðA  BÞ
tions C = dC ,  2 0 (M, h), and the gauge-invariant pffiffiffi
¼ 4 S BF ðA; BÞ ½58
equation of motion is the flatness condition FC = 0.
Now let H = TG be the tangent bundle group whose evident standard local gauge symmetries
pffiffiffi
TG G s g. This is a semidirect product group (A  B) = dApffiffi B  are equivalent to [57] for
pffiffiffi
with G acting on g via the adjoint and g regarded  =   0 .
as an abelian Lie algebra of translations. Thus, in A detailed account of three-dimensional classical
terms of generators (Ja , Pa ), where the Ja are and quantum gravity can be found in Carlip
generators of G, the commutation relations are (1998).
264 BF Theories

Relation with Gravity Wilson loops and string topology has been investi-
gated in Cattaneo et al. (2003).
Theories of two-dimensional gravity and topological
gravity also have a BF formulation (Blau and See also: Batalin–Vilkovisky Quantization; BRST
Thompson 1991, Birmingham et al. 1991) which Quantization; Chern–Simons Models: Rigorous Results;
resembles the Chern–Simons BF formulation of Gauge Theories From Strings; Knot Invariants and
three-dimensional gravity described above, the nat- Quantum Gravity; Loop Quantum Gravity; Moduli
ural gauge group now being SO(2, 1) or SO(3) or Spaces: An Introduction; Nonperturbative and
one of its contractions. Topological Aspects of Gauge Theory; Schwarz-Type
In the first-order (Palatini) formulation, the Topological Quantum Field Theory; Spin Foams;
Einstein–Hilbert action for four-dimensional gravity Topological Quantum Field Theory: Overview.
can be written as
Z
SEH ¼ trðe ^ e ^ F! Þ ½59
Further Reading
Baez J (2000) An introduction to spin foam models of
where e is the vierbein and ! is the spin quantum gravity and BF theory. Lecture Notes in Physics
543: 25–94.
connection. This action has the general form of a Birmingham D, Blau M, Rakowski M, and Thompson G (1991)
BF action with a constraint that B = e ^ e be a Topological field theory. Physics Reports 209: 129–340.
simple bi(co-)vector. Thus, four-dimensional Blau M and Thompson G (1989) A New Class of Topological
general relativity can be regarded as a constrained Field Theories and the Ray–Singer Torsion. Physics Letters B
BF theory. Although this constraint drastically 228: 64–68.
Blau M and Thompson G (1991) Topological gauge theories
changes the number of physical degrees of freedom of antisymmetric tensor fields. Annals of Physics
(BF theory has zero degrees of freedom, while 205: 130–172.
four-dimensional gravity has two), this is never- Blau M and Thompson G (1994) Lectures on 2d gauge theories:
theless a fruitful analogy which also lies at the topological aspects and path integral techniques. In: Gava E,
heart of the spin-foam quantization approach to Masiero A, Narain KS, Randjbar–Daemi S, and Shafi Q (eds.)
Proceedings of the 1993 Trieste Summer School on High
quantum gravity. This constrained BF description Energy Physics and Cosmology, pp. 175–244. Singapore:
of gravity is also available for higher-dimensional World Scientific.
gravity theories. Carlip S (1998) Quantum Gravity in 2 þ 1 Diemensions. Cambridge:
For further details, and references, see Freidel et al. Cambridge University Press.
(1999) and the review article (Baez 2000). Cattaneo A and Rossi C (2001) Higher-dimensional BF theories in
the Batalin–Vilkovisky formalism: the BV action and general-
ized Wilson loops. Communications in Mathematical Physics
Knot and Generalized Knot Invariants 221: 591–657.
Cattaneo A, Cotta-Ramusino P, Fucito F, Martellini M, and
The known relationship between Wilson loop Rinaldi M, et al. (1998) Four-dimensional Yang–Mills theory
observables of the Chern–Simons theory with as a deformation of topological BF theory. Communications in
Mathematical Physics 197: 571–621.
a compact gauge group and knot invariants Cattaneo A, Pedrini P, and Fröhlich J (2003) Topological field
(Witten 1989), and the interpretation of the three- theory interpretation of string topology. Communications in
dimensional BF theory as a Chern–Simons theory Mathematical Physics 240: 397–421.
with a noncompact gauge group raise the question of Freidel L, Krasnov K, and Puzio R (1999) BF description of
higher-dimensional gravity theories. Advances in Theoretical
the relation of observables of an n = 3 BF theory to
and Mathematical Physics 3: 1289–1324.
knot invariants, and suggest the possibility of using Horowitz GT (1989) Exactly soluble diffeomorphism invariant
an n
4 BF theory to define higher-dimensional theories. Communications in Mathematical Physics
analogs of knot invariants. It turns out that an 125: 417–437.
appropriate observable of n = 3 BF theory for Schwarz AS (1978) The partition function of a degenerate
G = SU(2) is related to the Alexander–Conway quadratic functional and Ray–Singer Invariants. Letters in
Mathematical Physics 2: 247–252.
polynomial. The analysis of higher-dimensional BF Schwarz AS (1979) The partition function of a degenerate
theories requires the full power of the Batalin– functional. Communications in Mathematical Physics
Vilkovisky (BV) formalism. BV observables general- 67: 1–16.
izing Wilson loops have been shown to give rise to Witten E (1989) Quantum field theory and the Jones
cohomology classes on the space of imbedded curves. polynomial. Communications in Mathematical Physics
127: 351–399.
For a detailed discussion of these issues, see Witten E (1991) On quantum gauge theories in two dimen-
Cattaneo and Rossi (2001) and references therein. sions. Communications in Mathematical Physics
A relation between the algebra of generalized 141: 153–209.
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 265

Bicrossproduct Hopf Algebras and Noncommutative Spacetime


S Majid, Queen Mary, University of London,
London, UK Position Momentum

ª 2006 Elsevier Ltd. All rights reserved. Gravity Curved Noncommutative


Σμ xμ2 = γ12 [pi , pj ] = ihγ ijk pk

Cogravity Noncommutative Curved


Introduction
[xi , xj ] = 2iλ Σμ Pμ2 = λ12
ijk xk
One of the sources of quantum groups is a
bicrossproduct construction coming in the case of Quantum
mechanics [xi , pj ] = ihδij
Lie groups from considerations of Planck-scale
physics in the 1980s. This article describes these Figure 1 Noncommutative spacetime means curvature in
objects and their currently known applications. See momentum space. The equations are for illustration.
also the overview of Hopf algebras which provides
the algebraic context (see Hopf Algebras and
q-Deformation Quantum Groups). for flat space in the bottom line, which is quantum
The construction of quantum groups here is mechanics (there is a similar story for quantum
viewed as a microcosm of the problem of quantiza- mechanics on a curved space). We see however a
tion in a manner compatible with geometry. Here third and dual possibility – noncommutativity in
quantization enters in the noncommutativity of the position space which should be interpreted as
algebra of observables and ‘‘curvature’’ enters as a curvature in momentum space, that is, the dual of
quantum nonabelian group structure on phase gravity. This is an independent physical effect and
space. Among the main features of the resulting comes therefore with its own length scale which we
bicrossproduct models (Majid 1988) are denote . These ideas were made precise in the mid
1990s using the quantum group Fourier transform;
1. Compatibility takes the form of nonlinear ‘‘matched
see Majid (2000). Here we show what is involved on
pair equations’’ generically leading to singular
three illustrative examples.
accumulation regions (event horizons or a max-
imum value of momentum depending on context). 1. We consider the ‘‘spin space’’ algebra
2. The equations are solved in an ‘‘equal and
opposite’’ form from local factorization of a R3 : ½xi ; xj  ¼ i2ij k xk
larger object. where 12 3 = 1 and where it is convenient to insert a
3. Different classical limits are related by observer– factor 2. This is the enveloping algebra U(su2 ), that
observed symmetry and Hopf algebra duality. is, just angular momentum space but now regarded
4. Nonabelian Born reciprocity re-emerges and is ‘‘upside down’’ as a coordinate algebra (see Hopf
linked to T-duality. Algebras and q-Deformation Quantum Groups).
It has also been argued that noncommutative Then a plane wave is of the form
geometry should emerge as an effective theory of the ¼ eipx ; p 2 R3
p
first corrections to geometry coming from any
unknown theory of quantum gravity. Concrete where we set h = 1 for this discussion. The momenta
models of noncommutative spacetime currently pi are nothing but local coordinates for the
provide the first framework for the experimental corresponding point ei  p 2 SU2 where  is the
verification of such effects. The most basic of these representation by Pauli matrices. It is really elements
possible effects is curvature in momentum space or of this curved space SU2 where momenta live. Here
‘‘cogravity.’’ We start with this. R3 = U(su2 ) has dual C[SU2 ] and Hopf algebra
Fourier transform (after suitable completion) takes
one between these spaces. Thus, in one direction
Cogravity Z Z
We recall that curvature in space or spacetime F ðf Þ ¼ duf ðuÞu  d3 pJðpÞf ðpÞ eipx
SU2
means by definition noncommutativity among the
covariant derivatives Di . Here the natural momenta for f a function on SU2 . We use the Haar measure on
are pi = ihDi and the situation is typified by the SU2 . The local result on the right has J the Jacobian
top line in Figure 1. There are also mixed relations for the change to the local p coordinates and f is
between the Di and position functions as indicated written in terms of these. Note that the coproduct in
266 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

C[SU2 ] in terms of the pi generators is an infinite which is of the form of Schrödinger’s equation with
series given by the Campbell–Baker–Hausdorff series, respect to an auxiliary time variable and for a
and not the usual linear one (this is why the measure particle with mass 1=.
is not the Lebesgue one). The physical content here is The reader may ask what happens to the
in the plane waves themselves, one can use any other Euclidean group of translations and rotations in
momentum coordinates to parametrize them with the this context. From the above we find that
corresponding measure and coproduct. Differential U (poinc3 ) = C[SU2 ] U(su2 ), the semidirect pro-
operators on R3 are given by the action of elements of duct generated by translations @ i and usual rota-
C[SU2 ] and are diagonal on these plane waves, tions. This in turn is the quantum double D(U(su2 ))
of the classical enveloping algebra, and as such a
f: p ¼ f ðpÞ p
quantum group with braiding etc. (see Hopf
which corresponds under Fourier transform simply Algebras and q-Deformation Quantum Groups).
to pointwise multiplication in C[SU2 ]. For example, This quantum double has been identified as part
the function 2 (tr  2) as a function on SU2 will of an effective theory in 2 þ 1 quantum gravity in a
give a rotationally invariant wave operator which is Euclidean version based on Chern–Simons theory
also invariant under inversion in the group. Its value with Lie algebra poinc3 and the spin space algebra
on plane waves is proposed as an effective theory for this. The
quotient of R3 by an allowed value of the quadratic
1 2
trðeip  1Þ ¼ 2 ðcosðjpjÞ  1Þ Casimir x2 (which then makes it a matrix algebra)
2  is called a ‘‘fuzzy sphere’’ and appears as a ‘‘world-
In the limit  ! 0 this gives the usual wave operator volume algebra’’ in certain string theories and
on R3 . reduced matrix models. The noncommutative dif-
It is also possible to put a differential graded ferential geometry that we have described is due to
algebra (DGA) structure of differential forms on this Batista and the author.
algebra, the natural one being 2. We take the same type of construction to
obtain the ‘‘bicrossproduct model’’ spacetime
2
dxi ¼ i ; xi   xi ¼ i dxi algebra

ðdxi Þxj  xj dxi ¼ iij k dxk þ iij  R1;3
 : ½t; xi  ¼ ixi ; ½xi ; xj  ¼ 0

where  is the 2  2 identity matrix which, together These are the relations of a Lie algebra bþ (say) but
with the Pauli matrices i , completes the basis of again regarded as coordinates on a noncommutative
left-invariant 1-forms. The 1-form  provides a spacetime. Here  is a timescale which can be
natural time direction, even though there is no time written as a mass scale  = 1= instead. We
coordinate, and the new parameter  6¼ 0 appears as parametrize the plane waves as
the freedom to change its normalization. The partial 0
derivatives @ i are defined by p;p0 ¼ eipx eip t ; p;p0 p0 ;p00 ¼ pþep0 p0 ;p0 þp00

d ðxÞ ¼ ð@ i Þdxi þ ð@ 0 Þ which identifies the p as the coordinates of the


3
nonabelian group Bþ = R  R with Lie algebra
and act diagonally on plane waves as bþ . The group law in these coordinates is read off
i pi as usual from the product of plane waves, which
@i ¼ trði ð ÞÞ ¼ i sinðjpjÞ also gives the coproduct of C[Bþ ] on the p . We
2 jpj
have parametrized plane waves in this way
while @ 0 = i(tr  2)=22 is computed as above. (rather than the canonical way by the Lie algebra
Note that  cannot be taken to be zero due to an as before) in order to have a more manage-
anomaly for translation invariance of the DGA. It is able form for this. We do pay a price that in these
in fact a typical feature of noncommutative differ- coordinates group inversion is not simply p ,
ential geometry that there is a 1-form  generating d but
by commutator which can be required as an extra
0
cotangent direction with its associated partial ðp; p0 Þ1 ¼ ðep p; p0 Þ
derivative an induced Hamiltonian. In the present
model we have which is also the action of the antipode S on the
abstract p generators.
X i 2
@0 ¼ i ð@ Þ þ Oð2 Þ In particular, the right-invariant Haar measure on
2 i Bþ in these coordinates is the usual d4 p so the
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 267

quantum group Fourier transform reduces to the in units where 1 is the usual speed of light. So
usual one but normal ordered, the prediction is that the speed of light depends
Z on energy. What is remarkable is that even if
0
F ðf Þ ¼ d4 p f ðpÞeipx eip t   1044 s (the Planck timescale), this prediction
R4 could in principle be tested, for example using -ray
(one can also Fourier transform with respect to the left- bursts. These are known in some cases to travel
invariant measure d4 p e3p on Bþ ). The inverse is again
0
cosmological distances before arriving on Earth, and
given in terms of the usual inverse transform if we have a spread of energies from 0.1–100 MeV.
specify general fields in R1,
3
by normal ordering of According to the above, the relative time delay t
usual functions, which we shall do. As before, the action on traveling distance L for frequencies correspond-
of elements of C[Bþ ] defines differential operators on ing to p0 , p0 þ p0 is
R1, 3
 and these act diagonally on plane waves.
We also have a natural DGA with L
t  p0  1044 s  100 MeV  1010 y  1 ms
ðdxj Þx ¼ x dxj ; ðdtÞx  x dt ¼ idx c

which leads to the partial derivatives which is in principle observable by statistical


analysis of a large number of bursts correlated
@ with distance (determined, e.g., by using the Hubble
@ i ¼: ðx; tÞ :¼ ipi :
@xi telescope to lock in on the host galaxy of each
ðx; t þ iÞ  ðx; tÞ i 0 burst). Although the above is only one of a class of
@0 ¼: :¼ ð1  ep Þ:
i  predictions, it is striking that even Planck-scale
for normal-ordered polynomial functions or in effects are now in principle within experimental
terms of the action of the coordinates p in C[Bþ ]. reach.
These @  do respect our implicit -structure We now explain what happens to the full
(unitarity) on R1, 3
but in a Hopf algebra sense Poincaré symmetry here. The nonlinear action of

which is not the usual sense, since the action of the the Lorentz group on Bþ Fourier transforms to an
antipode S is not just p . This can be remedied by action on the generators of R1, 3
 , which combines

using adjusted derivatives L(1=2) @  where with the above action of the p to generate an entire
Poincaré quantum group U(so1, 3 ) C[Bþ ]. We will
0
L ¼: ðx; t þ iÞ :¼ ep : say more about its ‘‘bicrossproduct’’ structure in a
1 0 2 later section. The above wave operator in momen-
In
P thisi 2case the natural 4D Laplacian is L ((@ )  tum space is the natural Casimir in these momentum
i (@ ) ), which acts on plane waves as coordinates. A common mistake in the literature for
2 0 this model is to suppose that the Casimir relation
 ðcoshðp0 Þ  1Þ þ p2 ep alone amounts to a physical prediction, whereas in
2
fact the momentum coordinates are arbitrary and
where
X
3 have meaning only in conjunction with the plane
p2 = pi2 waves that they parametrize. The deformed Poincaré
i¼1
as an algebra alone is actually isomorphic to the
This deforms the usual Laplacian in such a way as to undeformed one by a different choice of generators,
remain invariant under the Lorentz group (which now so by itself has no physical content; one needs rather
acts nonlinearly on Bþ in this model) and under group the noncommutative spacetime as well. Prior work
inversion. on the relevant deformed Poincaré algebra either did
This model may provide the first experimental test not consider it acting on spacetime or took it acting
for noncommutative spacetime and cogravity. For the on classical (commutative) Minkowski spacetime
analysis of an experiment, we assume the identification with inconsistent results (there is no such action as a
of noncommutative waves in the above normal-ordered quantum group).
form with classical ones that a detector might register. The above model was introduced by Majid
In that case one may argue (Amelino-Camelia and and Ruegg (1994) and later tied up with a dual
Majid 2000) that the dispersion relation for such waves approach of Woronowicz. There is also a previous
has the classical derivation as @p0=@pi which now ‘‘-Poincaré’’ version of the Hopf algebra alone
computes as propagation speed for a massless particle: obtained (Lukierski et al. 1991) in another context
 0 (by contraction of Uq (so2, 3 )) but with fundamentally
@p  p0
  different generators and relations and hence
 @p  ¼ e
different physical content (e.g., the Lorentz
268 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

generators there do not close among themselves but where p = i@  . The wave operator @ @  is
mix with momentum). therefore given by the action of p p and has value
3. The usual Heisenberg algebra of quantum k k as usual on plane waves. On the other hand,
mechanics is another possible noncommutative  0

k 

(phase) space; one may also take the same algebra k


k0 ¼ eði=2Þk kþk0
and view it as a noncommutative spacetime, so:
or in algebraic terms the twist functor T applied
R 1;3 ½x ; x
 ¼ i
to the Fourier transform implies also a twisted
 :
coproduct or coaddition law for the abstract k
for any antisymmetric tensor 
. This is not a generators, now different from the linear one for the
Hopf algebra but it turns out that this model can covariance momentum operators p . This leads to
also be completely solved by Hopf algebra meth- some of the more interesting features of the model.
ods, namely the theory of covariant twists. Twist One immediately also has a Poincaré quantum
models also include versions of the noncommuta- group here, U (poinc1, 3 ), obtained by similarly
tive torus studied by Connes, and related -spaces, twisting the classical U(poinc1, 3 ). We just view
which are nontrivial at the level of C -algebras. F as living here rather than in the original H. The
However, at an algebraic level, all covariant translation sector is unchanged as before but if M
structures are automatically provided by applying are the usual Lorentz generators, then
the twisting functor T to the desired classical
construction (see Hopf Algebras and q-Deformation F M ¼ M 1 þ 1 M
Quantum Groups). This is not usually appreciated in þ 12 ðp   p    p p Þ
the physics literature on such models, but see Oeckl
 12 ðp   p    p p Þ
(2000).
Thus, consider H = U(R1, 3 ) with generators p = using the metric 
to raise or lower indices. The
i@  acting as usual on functions on Minkowski antipode is also modified according to the theory
space. It has a cocycle in Majid (1995). The relations in the Poincaré

p

algebra are not modified (so, e.g., p p will
F ¼ eði=2Þp
remain central). Any construction originally Poin-
which induces a new product
on functions by caré covariant becomes covariant under this

= (F1 ( )). This is just the standard twisted one after application of the twisting
Moyal product, in the present case on R 1, 3 , viewed functor. As with the differentials above, the
as a covariant twist using Hopf algebra methods. action on R 1,
3
is not actually modified but may
The Hopf algebra U(R1, 3 ) in principle has a twisted appear so when functions are expressed in terms
coproduct given by F = F(( ))F1 but this does of the
product.
not change as the algebra is commutative. The above model is popular at the time of
Next, H also acts covariantly on (R 1, 3 ), the writing in connection with string theory. Here, an
usual algebra of differential forms, and twisting this effective description of the endpoints of open
in the same way gives strings landing on a fixed 4-brane has been
modeled conveniently in terms of the
product
ðxÞ
dx ¼ dx ¼ ðdx Þ ¼ ðdx Þ
above (Seiberg and Witten 1999). It should be
unchanged. This is because no terms higher than borne in mind, however, that this fixed 4-brane
p p

contribute and then d(1) = 0. The asso- lives in some of the higher dimensions of the string
ciated partial derivatives defined by d are likewise spacetime, so this is not necessarily a prediction of
unchanged and act in the usual way as derivations noncommutative spacetime R1, 3 .
with respect to both the
product and the In fact, a proposal superficially similar to R1,

3

undeformed product. The result may look different above was already proposed in Snyder (1947).
when the same (x) is expressed as a function of the Here
variables with the
product. In other words, the
½x ; x
 ¼ i2 M

only deformation comes from the Moyal product


itself, with the rest being automatic. Moreover, the where  is our length scale and the M
are now
plane waves themselves are unchanged because operators with the usual commutation rules for the
(x  k)
n = (x  k)n due to  being antisymmetric. Lorentz algebra with themselves and with x and the
Hence, momenta p . The latter obey

k ðxÞ ¼ eixk

¼ eix:k ; p k ðxÞ ¼ k k ðxÞ ½p ; x
 ¼ ið
 2 p p
Þ; ½p ; p
 ¼ 0
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 269

so the entire Poincaré algebra is undeformed but the The full extent of quantum bundles and gravity
phase-space relations are deformed. Snyder also (see Quantum Group Differentials, Bundles and
constructed the orbital angular momentum realiza- Gauge Theory) and quantum field theory is not
tion M
= x p
 x
p . This model is not a propo- always possible, although both have been done for
sal for a noncommutative spacetime because the covariant twist examples (for functorial reasons)
algebra does not even close among the x . Rather it and for small finite sets. For the first two models
is a proposal for ‘‘mixing’’ of position and Lorentz above, for example, it is not clear at the time of
generators. On the other hand (which was the point writing how to interpret scattering when the addi-
of view in Snyder (1947)), in any representation of tion of momenta is nonabelian.
the Poincaré algebra, the M
become operators and
in some sense numerical. The rotational sector has
discrete eigenvalues as usual, so to this extent the
Matched Pair Equations
spacetime has been discretized. Although not fitting
into the methods in this article, it is also of interest Although we have presented noncommutative space-
that the relations above were motivated by con- time first, the first actual application of quantum
sidering p as coordinates projected from a 5D flat group methods to Planck-scale physics was the
space to de Sitter space and x as the 5-component Planck-scale Hopf algebra obtained by a theory of
of orbital angular momentum in the flat space. bicrossproducts. Like the Snyder model, the inten-
To conclude this section, let us note that there are tion here was to deform phase space itself, but since
further models that we have not included for lack of then bicrossproducts have had many further appli-
space. One of them is a much-studied R 1, q
3
in which cations. The main ingredient here is the notion of a
t is central but the xi enjoy complicated q-relations pair of groups (G, M), say, acting on each other as
best understood as q-deformed Hermitian matrices. we explain now. The mathematics here goes back to
One of the motivations in the theory was the result the early 1910s in group theory, but also arose in
in Majid (1990) that q-deformation could be used to mathematical physics as a toy version of Einstein’s
regularize infinities in quantum field theory as poles equation in the sense of compatibility between
at q = 1. Another entire class is to use noncommu- quantization and curvature (see the next section).
tative geometry and quantum group methods on By definition, (G, M) are a matched pair of
finite or discrete spaces. Unlike lattice theory where groups if there are left and right actions
a finite lattice is viewed as approximation, these
3 "
models are not approximations but exact noncom- M M  G!G
mutative geometries valid even on a few points. The
of each group on the set of the other, such that
noncommutativity enters into the fact that finite
differences are bilocal and hence naturally have s3e ¼ s; e"u ¼ u; s"e ¼ e; e3u ¼ e
different left and right multiplications by functions.
ðs3uÞ3v ¼ s3ðuvÞ; s"ðt"uÞ ¼ ðstÞ"u
Both aspects are mentioned briefly in the overview
article (see Hopf Algebras and q-Deformation s"ðuvÞ ¼ ðs"uÞððs3uÞ"vÞ
Quantum Groups). Also, on the experimental ðstÞ3u ¼ ðs3ðt"uÞÞðt3uÞ
front, another large area that we have not had
room to cover is the prediction of modified for all u, v 2 G, s, t 2 M. Here e denotes the relevant
uncertainty relations both in spacetime and phase group unit element. As a first application of such
space (Kempf et al. 1995). data, one may make a ‘‘double cross product group’’
Moreover, for all of the models above, once one G ffl M with product
has a noncommutative differential calculus one may ðu; sÞ:ðv; tÞ ¼ ðuðs"vÞ; ðs3vÞtÞ
proceed to gauge theory etc., on noncommutative
spacetimes, at least at the level where a connection and with G, M as subgroups. Since it is built on the
is a noncommutative (anti-Hermitian) 1-form . direct product space, the bigger group factorizes into
Gauge transformations are invertible (unitary) these subgroups. Conversely, if X is a group
elements u of the noncommutative ‘‘coordinate factorization such that the product G  M ! X is
algebra’’ and the connection and curvature trans- bijective, each group acts on the other by actions
form as ", 3 defined by su = (s"u)(s3u) for u 2 G and s 2
M, where s, u are multiplied in X and the product is
! u1 u þ u1 du factorized as something in G and something in M.
So finite group matched pairs are equivalent to
Fð Þ ¼ d þ ^ ! u1 Fð Þu group factorizations. In the Lie group context, the
270 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

corresponding system of differential equations is


equivalent to a local factorization.
s . =s 1= Σ
s
s
u v uv e
There is a nice graphical representation of the
matched pair conditions which relates to ‘‘surface a
integration.’’ Thus, consider squares Δ s Σ εs = δs,e
u ab = s b u
u s
u
s s u u –1
u Ss = s –1
u
labeled by elements of M on the left edge and
Figure 3 Bicrossproduct Hopf algebra showing horizontal
elements of G on the bottom edge. We can fill in the product and vertical coproduct as an ‘‘unproduct.’’
other two edges by thinking of an edge transformed
by the other edge as it goes through the square either
this, so is a semidirect coalgebra C(M) CG. Hence
horizontally or vertically, the two together is the
the two together are denoted C(M) CG. The dual
surface transport ) across the square. The matched
needs G finite and has the same form but with
pair equations have the meaning that a square can
vertical and horizontal compositions interchanged,
be subdivided either vertically or horizontally as
that is, a bicrossproduct CM C(G). Both Hopf
shown in Figure 2, where the labeling on vertical
algebras have the above labeled squares as basis.
edges is to be read from top down. The transport
It is possible to generalize both bicrossproducts
operation here is nothing other than normal order-
and double cross products associated to matched
ing in the factorizing group. In the Lie setting, it
pairs to general Hopf algebras H1 H2 and
means that the equations can be solved from
H1 H2 , respectively, where H1 , H2 are Hopf
infinitesimal solutions (a matched pair of Lie
algebras (see Majid 1990) and to relate the two in
algebras) by a simultaneous double integration over
general by dualization of one factor. Another
the group (i.e., building up a large box from many
general result (Majid 1995) is that H1 H2 acts
small ones). If one considers solving the quantum
covariantly on the algebra H1 from the right, or
Yang–Baxter equations on groups, they appear in
H1 H2 acts covariantly on H2 from the left. A
this notation as an equality of surface transport
third general result is that bicrossproducts solve the
going two ways around a cube, and the classical
extension problem
Yang–Baxter equations as curvature of the under-
lying higher-order connection. H1 ! H ! H2
Also in this notation there is a bicrossproduct
meaning that such a Hopf algebra H subject to some
quantum group defined in Figure 3, at least when M
technical requirements (such as an algebra splitting
is finite. The expressions are considered zero unless
map H2 ! H) is of the form H ffi H1 H2 . The
the juxtaposed edges have the same group labels. In
theory was also extended to include cocycle bicros-
that case, the product is a semidirect product
sproducts at the end of the 1980s (by the author).
algebra C(M) CG of functions on M by the
The finite group case, however, was first found by
group algebra of G. The coproduct is the adjoint of
Kac and Paljutkin (1966) in the Russian literature
and later rediscovered independently in Takeuchi
(st ) u s (t u) (1981) and in the course of Majid (1988).

s s s (t u)
(st ) u =
t t u
t t u
The Planck-Scale Hopf Algebra
u u
We consider a quantum algebra of observables H
s (uv ) s u (s u) v
and ask when it is a Hopf algebra extending some
s u classical position coordinate algebra C[M] and some
s s (uv) = s (s u ) v
possibly noncommutative momentum coordinate
uv u v
algebra U(g ) in the form of a strict extension
e u s e e C½M ! H ! Uðg Þ
e e u = e u e s s e = s s
u From the theory above this problem is governed by local
u e e
solutions of the matched pair equations on (G, M). It
Figure 2 Matched pair condition as a subdivision property. requires that H ffi C[M] U(g ) as an algebra, that is,
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 271

the quantization of a particle moving on orbits in M a background curvature scale , and the correspond-
under some action of G (in an algebraic setting, or ing bicrossproduct C[p] C[x] is
one can use von Neumann or C -algebras etc.). And
it requires the classical phase space to be a ½p; x ¼ ihð1  e x Þ; x ¼ x 1 þ 1 x
 x
nonabelian or ‘‘curved’’ group M g  . This extends p ¼ p e þ 1 p; x ¼ p ¼ 0
to a coproduct on H which becomes the bicross- Sx ¼ x; Sp ¼ pe x

product Hopf algebra C[M] U(g ). In this way, the


problem which was open at the start of the 1980s of where we should allow power series or take e x as
finding true examples of Hopf algebras was given a an invertible generator.
physical interpretation as being equivalent to finding It is important to note that the matched pair
quantum-mechanical systems reconciled with curva- equations here have only this solution and it is
ture, and the equations that governed this were the necessarily singular at p = 0 or x = 0. The inter-
matched pair ones (Majid 1988). pretation in position space is as follows. Consider an
We still have to solve these equations. In the infalling particle of mass m with fixed momentum
Lie case, they mean a pair of cross-coupled first- p = mv1 (in terms of the velocity at infinity). By
order equations on G  M. These can be solved definition, p is the free-particle momentum and acts
locally as a double-holonomy construction in line on R as above. This corresponds to a free-particle
with the surface transport point of view, but are Hamiltonian p ^2 =2m and induces
nonlinear typically with singularities in the non- p_ ¼ 0
compact case. The equations are also symmetric  
under interchange of G, M so Born reciprocity p  x 1
x_ ¼ ð1  e Þ ¼ v1 1 
between position and momentum is extended to m 1 þ x þ   
the quantum system with generally ‘‘curved’’ at the classical level. We see that the particle takes
position and momentum spaces. Moreover, in so an infinite time to reach the origin, which is an
far as Einstein’s equation G
= 8T
is also a accumulation point. This can be compared with the
compatibility between a quantity in position formula in standard radial infalling coordinates
space and a quantity originating (ultimately) in !
momentum space, the matched pair equations can 1
be viewed as a toy version of these. x_ ¼ v1 1  c2 x
1 þ 2GM
Let us note that the reason to look for H a Hopf
algebra in the first place, aside from the reasons for distance x from the event horizon of a black hole
already given, is for observer–observed symmetry of mass M (here G is Newton’s constant and c the
(this was put forward as a postulate for Planck-scale speed of light). So  c2=GM and for the sake of
physics). Thus, H  is also an algebra of observables further discussion we will use this value. With a
of some dual system, in our case U(m) C[G] or little more work, one can then see that
particles in G moving on orbits under M. Thus,
Born reciprocity is truly implemented in the mM m2P
quantum/curved system by Hopf algebra duality. C½x C½pusual qu. mech.
C½x C½p!
!
Put another way, Hopf algebras are the simplest CðXÞusual curved geometry
objects after abelian groups that admit Fourier mM  m2P
transform (see Hopf Algebras and q-Deformation
Quantum Groups) and we require this on phase where mP is the Planck mass of the order of 105 g
space if Born reciprocity is to be extended to the and X = R R is a nonabelian group. In the first
quantum/curved system. limit, the particle motion is not detectably different
The Planck-scale Hopf algebra is the simplest from usual flat space quantum mechanics outside
example of these ideas (Majid 1988). Here G = the Compton wavelength from the origin. In the
M = R and the matched pair equations can be solved second limit, the estimate is such that noncommu-
completely. The general solution is tativity would not show up for length scales much
larger than the background curvature scale.
@ i @ This Hopf algebra is also the simplest way to
p hð1  e x Þ
^ ¼ i ; x
^¼ ð1  eh p Þ
@x h
 @p extend classical position C[x] and momentum C[p]
in the sense above. In other words, requiring to
for the action of one group with generator p on maintain observer–observed symmetry or Born
functions of x in the other group and vice-versa. It reciprocity throws up both quantum mechanics (in
has two parameters which we have denoted as h and the form of h) and something with the flavor of
272 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

gravity (in the form of ) and both are required for a for i = 1, 2 and the usual additive ones for p3 , M3 .
nontrivial Hopf algebra. Moreover, the construction There is also an appropriate counit and antipode.
necessarily has a self-dual form and indeed the The deformed spheres under the nonlinear rotation
dually paired Hopf algebra is C[p] C[x] with new in Majid (1990) are constant values of the Casimir
parameters  h0 = 1=h and 0 = 
h if we take the for the above algebra. This is
standard pairing x, p across the two algebras. Hopf
2
algebra duality realized by the quantum group ðcoshðp3 Þ  1Þ þ p2 ep3
Fourier transform F takes one between the two 2
models. which from the group of motions point of view
generates the noncommutative Laplacian when
acting on R3 . The model here is a Euclidean
inhomogeneous one.
Bicrossproduct Poincaré The four-dimensional (4D) version U(so1, 3 )
Quantum Groups C[Bþ ] of this construction (Majid and Ruegg
Another example from the 1980s in the same family 1994) is again linked to Planck-scale predictions,
as the Planck-scale Hopf algebra is G = SU2 and this time as a generalized symmetry. In terms of
M = Bþ , a nonabelian version of R3 with Lie algebra translation generators p , rotations Mi and boosts
bþ of the form Ni we have

½x3 ; xi  ¼ ixi ; ½xi ; xj  ¼ 0 ½p ; p


 ¼ 0; ½Mi ; Mj  ¼ iij k Mk

for i = 1, 2. The required solution of the matched ½Ni ; Nj  ¼ iij k Mk ; ½Mi ; Nj  ¼ iij k Nk
pair equations was found in Majid (1990) and has a ½p0 ; Mi  ¼ 0; ½pi ; Mj  ¼ ii jk pk ; ½p0 ; Ni  ¼ ipi
nonlinear action of rotations on Bþ . The interpreta-
tion of C[Bþ ] U(su2 ) is of particles moving along as usual, and the modified relations and coproduct
!
orbits which are deformed spheres in Bþ , and there 0
i i i 1  e2p
is a dual model where particles move instead on ½p ; Nj  ¼  j þ p þ ipi pj
2
2 
orbits in SU2 under the action of bþ . Moreover,
0
from the general theory of bicrossproducts, we Ni ¼ Ni 1 þ ep Ni þ ijk pj Mk
automatically have a covariant action of C[Bþ ] 0

U(su2 ) on the auxiliary noncommutative space pi ¼ pi 1 þ ep pi


R3 = U(bþ ) with relations as above. and the usual additive coproducts on p0 , Mi . This
The quantum group here was actually obtained as a time the Lorentz group orbits in Bþ are deformed
Hopf–von Neumann algebra but we limit ourselves to hyperboloids rather than deformed spheres, and the
the underlying algebraic version. Also, there is of Casimir that controls this has the same form as
course nothing stopping one considering this Hopf above but with  in the cosh term, that is, the
algebra equally well as U (poinc3 ), that is, a deforma- model is a Lorentzian one. We know from the
tion of the group of motions on R3 , rather than as an general theory of bicrossproducts that this Hopf
algebra of observables. The only difference is to denote algebra acts on U(bþ ) = R 1, 3
the spacetime in the

the generators of C[Bþ ] by the symbols pi , reserving xi section ‘‘Cogravity,’’ and the Casimir induces the
instead for the auxiliary noncommutative space. We wave operator as we have seen there.
lower i, j, k indices using the Euclidean metric. Then Let us look a bit more closely at the deformed
the bicrossproduct has the form hyperboloids. Because neither group here is com-
pact, one expects from the general theory of
½pi ; pj  ¼ 0; ½Mi ; Mj  ¼ iij k Mk
bicrossproducts to have limiting accumulation
½M3 ; pj  ¼ i3j k pk ; ½Mi ; p3  ¼ ii3 k pk regions. This is visible in the contour plot of p0
against jpj in Figure 4, where the p0 > 0 mass shells
as usual, for i, j = 1, 2, 3, and the modified relations
are now cups with almost vertical walls, compressed
  into the vertical tube
i 1  e2p3
½Mi ; pj  ¼ ij 3  p2 þ ii k3 pj pk
2  jpj < 1
for i, j = 1, 2 and p2 = p21 þ p22 . The coproducts are In other words, the 3-momentum is bounded above
by the Planck momentum scale (if  is the Planck
Mi ¼ Mi ep3 þ M3 pi þ 1 Mi
time). Indeed, the light-cone equation (setting the
pi ¼ pi ep3 þ 1 pi Casimir to zero) reads jpj = 1  ep3 so this is
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 273

2 group’’ and the quantization of this is provided


by the quantum group coordinate algebras Cq [G]
(see Hopf Algebras and q-Deformation Quantum
Groups and Classical r-matrices, Lie Bialgebras, and
1 Poisson Lie Groups). The bicrossproduct quantum
groups are nevertheless unrelated to the latter even
though they spring form related classical data.
0 As already discussed, one interpretation here is
of quantized particles in G moving on orbits
under G and in vice versa in the dual model. The
dual model is equivalent in the sense that the
–1 states of one (in the sense of positive-linear
functionals) lie in the algebra of observables of
the other and we also saw in the Planck-scale
–2 example inversion of structure constants reminis-
–2 –1 0 1 2 cent of T-duality in string theory. Motivated in
Figure 4 Deformed mass-shell orbits in the bicrossproduct part by this duality Klimcik (1996) along with
curved momentum space for  = 1. Severa in the mid 1990s showed that indeed a
-model on G could be constructed in such a way
immediate. Nevertheless, this observation is so that there was a matching dual -model on G in
striking that the bicrossproduct model has been some sense equivalent in terms of solutions to the
dubbed ‘‘doubly special’’ and spawned the search for equations of motion. The Lagrangians here have
other such models. Such accumulation regions are a the usual form
main discovery of the noncompact bicrossproduct
L ¼ Eu ðu1 @þ u; u1 @ uÞ;
theory visible already in the Planck-scale Hopf
algebra. The model further confirms the role of L^ ¼ E
^ s ðs1 @þ s; s1 @ sÞ
the matched pair equations as a toy version of
where u : R1, 1 ! G and s : R1, 1 ! G are the dyna-
Einstein’s. ^
mical fields, except that the inner products E, E
are not constant. Rather they are obtained by
solving nonlinear differential equations on the
Poisson–Lie T-Duality groups defined through the structure constants
We have explained in Section 3 that the matched of g , g  and the Drinfeld double D(g ). At the time,
pair equations are equivalent to a local factorization T-duality here was well understood in the case of
of Lie groups, with the action and back-reaction abelian groups while these Poisson–Lie T-duality
created ‘‘equally and oppositely’’ from this. For the models provided the first convincing nonabelian
two models in the last section, these are SL2 (C) models.
factorizing as SU2 and a 3D Bþ , and SO2, 3 locally as This construction was extended by Beggs and
SO1, 3 and a 4D Bþ . The first of these examples is in Majid (2001) to a general matched pair (G, M), that
fact one of a general family based on the Iwasawa is, a -model on G dual to one on M. The Poisson–
decomposition GC = G G where G is a compact Lie case is the special case where the actions are
Lie group with complexification GC and G a coadjoint actions and the Lie algebra of G M is
certain solvable group. From this, one may construct D(g ). The solutions of the equations of motion for
a solution (G, G ) of the matched pair equations and the two systems are created ‘‘equally and oppo-
bicrossproduct quantum group sitely’’ from one on the factorizing group. It could
be expected that T-duality ideas again play a role in
C½G  Uðg Þ Planck-scale physics.
associated to all complex simple Lie algebras. This is
again part of the bicrossproduct theory from the
Other Bicrossproducts
1980s. On the other hand, the Lie algebra g  here
can be identified with the dual of g in which case its There are also infinite-dimensional factorizations
Lie algebra corresponds to a Lie coproduct such as the Riemann–Hilbert problem (see
 : g ! g g and makes (g , ) into a Lie bialgebra in Riemann–Hilbert Problem) in the theory of
the sense of Drinfeld. This  exponentiates to a integrable systems and hence infinite-dimensional
Poisson bracket on G making it a ‘‘Poisson–Lie matched pairs and bicrossproducts linked to
274 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

them. Here we mention just one partly infinite See also: Classical r-Matrices, Lie Bialgebras, and
example of current interest. Poisson Lie Groups; Hopf Algebra Structure of
Thus, the diffeomorphisms on the line R may be Renormalizable Quantum Field Theory; Hopf Algebras
factorized into transformations of the form ax þ b and q-Deformation Quantum Groups; Quantum Group
Differentials, Bundles and Gauge Theory;
and diffeomorphisms that fix the origin and have
Riemann–Hilbert Problem; von Neumann Algebras:
unit differential there. After a (logarithmic) change
Introduction, Modular Theory, and Classification Theory.
of generators to arrive at an algebraic picture, one
has a bicrossproduct
Hð1Þ ¼ Uðbþ Þ H1
Further Reading
where bþ is now the two-dimensional (2D) Lie
Amelino-Camelia G and Majid S (2000) Waves on noncommu-
algebra with relations [x, y] = x and H1 is the algebra tative spacetime and gamma-ray bursts. International Journal
of polynomials in generators n and a certain of Modern Physics A 15: 4301–4323.
coalgebra as a model of the coordinate algebra of Beggs E and Majid S (2001) Poisson–Lie T-duality for quasi-
the group of diffeomorphisms that fix the origin with triangular Lie bialgebras. Communications in Mathematical
unit differential. The Hopf algebra H(1) was intro- Physics 220: 455–488.
Connes A and Moscovici H (1998) Hopf algebras, cyclic
duced by Connes and Moscovici (1998) although not cohomology and the transverse index theory. Communications
actually as a bicrossproduct (but motivated by the in Mathematical Physics 198: 199–246.
bicrossproduct theory) as part of a family H(n) useful Kac GI and Paljutkin VG (1966) Finite ring groups. Transactions
in cyclic cohomology computations. It has cross of the American Mathematical Society 15: 251–294.
relations and coproduct determined by Kempf A, Mangano G, and Mann RB (1995) Hilbert space
representation of the minimal length uncertainty relation.
½n ; x ¼ nþ1 ; ½n ; y ¼ nn ; Physical Review D 52: 1108–1118.
Klimcik C (1996) Poisson–Lie T-duality. Nuclear Physics B (Proc.
1 ¼ 1 1 þ 1 1 Suppl.) 46: 116–121.
x ¼ x 1 þ 1 x þ 1 y; Lukierski J, Nowicki A, Ruegg H, and Tolstoy VN (1991)
q-Deformation of Poincaré algebra. Physics Letters B
y ¼ y 1 þ 1 y 268: 331–338.
Majid S (1988) Hopf algebras for physics at the Planck scale.
which we see has a semidirect product form where Journal of Classical and Quantum Gravity 5: 1587–1606.
n 3x = nþ1 , n 3y = nn . The coalgebra is also a Majid S (1990) Physics for algebraists: non-commutative and
semidirect coproduct by means of a back-reaction of non-cocommutative Hopf algebras by a bicrossproduct
H1 in Bþ (expressed as a coaction). From the construction. Journal of Algebra 130: 17–64.
Majid S (1990) Matched pairs of Lie groups associated to
bicrossproduct theory, we also have a dual model
solutions of the Yang–Baxter equations. Pacific Journal of
C½Bþ  Uðdiff 0 Þ Mathematics 141: 311–332.
Majid S (1990) On q-regularization. International Journal of
where diff 0 is the Lie algebra of the group of Modern Physics A 5: 4689–4696.
diffeomorphisms fixing the origin. As such it could be Majid S (1995) Foundations of Quantum Group Theory.
Cambridge: Cambridge University Press.
viewed as in the family of examples in the section
Majid S (2000) Meaning of noncommutative geometry and the
‘‘Bicrossproduct Poincaré quantum groups’’ but Planck-scale quantum group. Springer Lecture Notes in
now with a 2D Bþ . We also conclude from Physics 541: 227–276.
the bicrossproduct theory that this acts covariantly on Majid S and Ruegg H (1994) Bicrossproduct structure of the
R2 = U(bþ ) after introducing the scaling parameter . -Poincaré group and non-commutative geometry. Physics
Letters B 334: 348–354.
Finally, the Hopf algebra H(1) is also part of a
Oeckl R (2000) Untwisting noncommutative Rd and the
family of bicrossproduct Hopf algebras built on rooted equivalence of quantum field theories. Nuclear Physics B
trees and related to bookkeeping of overlapping 581: 559–574.
divergences in renormalizable quantum field theories Seiberg N and Witten E (1999) String theory and noncommuta-
(see Hopf Algebra Structure of Renormalizable Quan- tive geometry. Journal of High Energy Physics 9909: 032.
Snyder HS (1947) Quantized space-time Physical Review D
tum Field Theory). While we have not had room to
67: 38–41.
cover all bicrossproduct quantum groups of interest, it Takeuchi M (1981) Matched pairs of groups and bismash products
would appear that bicrossproducts are indeed inti- of Hopf algebras. Communications in Algebra 9: 841.
mately tied up with actual quantum physics.
Bifurcation Theory 275

Bifurcation Theory
M Haragus, Université de Franche-Comté, Besançon, equation as  varies. A widely used way to
France characterize these ‘‘changes’’ is to say that the vector
G Iooss, Institut Non Linéaire de Nice, Valbonne, field F(  , 0 ) is structurally stable if the sets of orbits
France of the differential equation are homeomorphic for 
ª 2006 Elsevier Ltd. All rights reserved. close to 0 , with homeomorphisms which preserve
the orientation of the orbits in time t. Then a
bifurcation occurs at  = 0 if F(  , 0 ) is not
Introduction structurally stable. It turns out that there is a close
link between the stability properties of equilibrium
Consider the following equation: solutions of the differential equation and the type of
FðX; Þ ¼ 0 ½1 the bifurcation in static theory.
The tools developed in bifurcation theory are
where X is the variable,  is a parameter, and X, , F extensively used to solve concrete problems arising
belong to appropriate (finite- or infinite-dimensional) in physics and natural sciences. These problems may
spaces. The problem of bifurcation theory is to be modeled by ordinary or partial differential
describe the singularities of the set of solutions equations, integral equations, but also delay equa-
S ¼ fX; ðX; Þ satisfies FðX; Þ ¼ 0g tions or iteration maps, and in all these cases the
presence of parameters naturally leads to bifurcation
The word ‘‘bifurcation’’ was introduced by H phenomena. They can be regarded as problems of
Poincaré (1885) in his study of equilibria of rotating the form [1] or [3], in suitable function spaces, and
liquid masses. bifurcation theory allows to detect solutions and to
The simplest example is the study of the real roots describe their qualitative properties. During the last
x of a quadratic polynomial decades, a class of problems in which the use of
x2 þ bx þ c ¼ 0 ½2 bifurcation theory led to significant progress is
concerned with nonlinear waves in partial differen-
where  is represented by the pair of parameters tial equations, including hydrodynamic problems,
(b, c) 2 R 2 . As it is well known, real roots are nonlinear water waves, elasticity, but also pattern
determined by the sign of formation, front propagation, or spiral waves in
def reaction–diffusion type systems.
 ¼ b2  4c
For  < 0, there is no real solution of [2], while
there are two solutions x in the region  > 0, Examples in One and Two Dimensions
which merge when the distance between the point
(b, c) and the parabola  = 0 tends towards 0. It is The most complete results in bifurcation theory are
then clear that a singularity occurs in the structure available in one and two dimensions. The study of
of the set of solutions of [2] at the crossing of the static bifurcations in one dimension is concerned
parabola  = 0 or, in other words, a bifurcation with scalar equations
occurs in the parameter space (b, c) on the parabola f ðx; Þ ¼ 0 ½4
 = 0. A point (0 , x0 ) 2 R 3 is then called a
bifurcation point if 0 = (b, c) satisfies  = 0, and where x 2 R,  2 R, and the function f is supposed to
x0 = b=2. be regular enough with respect to (x, ). When
In the theory of differential equations, F(X, ) f (x0 , 0 ) = 0 and the derivative of f with respect to x
often represents a vector field. This study is then satisfies @x f (x0 , 0 ) 6¼ 0, the implicit function theorem
concerned with the existence of equilibrium solu- gives a unique branch of solutions x() for  close to
tions to the differential equation 0 , and shows the absence of bifurcation points near
(0 , x0 ). Bifurcation theory intervenes when
dX
¼ FðX; Þ ½3 @x f ðx0 ; 0 Þ ¼ 0 ½5
dt
and one cannot apply the implicit function theorem
and is therefore referred to as static bifurcation for solving with respect to x near x0 . A complete
theory. In addition, dynamic bifurcation theory is description of the set of solutions near (x0 , 0 ) can
concerned here with ‘‘changes’’ in the dynamic be obtained by looking at the partial derivatives of f
properties of the solutions of the differential with respect to x and .
276 Bifurcation Theory

For example, if x
@ f ðx0 ; 0 Þ 6¼ 0;
it is possible to solve with respect to  and obtain a
regular solution (x) such that (x0 ) = 0 and
f (x, (x))  0. In addition, if the second order (µ 0, 0) µ
derivative
@x2 f ðx0 ; 0 Þ 6¼ 0
the picture of the solution set in the plane (, x), also
called bifurcation diagram, shows a turning point Figure 2 Supercritical pitchfork bifurcation in the case
2
with a fold opened to the left or to the right @x f (0, 0 ) > 0 and @x3 f (0, 0 ) < 0.. The solid (dashed) lines
depending upon the sign of the product @ f (x0 , 0 ) indicate the branch of stable (unstable) solutions in the
differential equation.
@x2 f (x0 , 0 ); see Figure 1. Notice that here the
bifurcation point (0 , x0 ) 2 R2 corresponds to the
appearance of a pair of solutions of [4] ‘‘from solutions x() for  close to 0 . This situation arises
nowhere’’. This is the simplest example of a one- often in applications where usually this branch consists
sided bifurcation in which the bifurcating solutions of trivial solutions x() = 0. Then at a bifurcation
exist for either  > 0 or  < 0 . point (0 , x0 ) a second branch of solutions appears
A particularly interesting situation arises when the forming either a one-sided bifurcation, or a two-sided
equation possesses a symmetry. For example, assume bifurcation; see Figure 3.
that in [4] the function f is odd with respect to x. This We can now view f as a vector field in the
implies that we always have the solution x = 0, for any ordinary differential equation
value of the parameter . Assume now that f satisfies dx
¼ f ðx; Þ ½8
@x f ð0; 0 Þ ¼ 0 ½6 dt
and the study above corresponds to looking for
and that
equilibrium solutions of [8]. The stability of such a
2
@x f ð0; 0 Þ 6¼ 0; @x3 f ð0; 0 Þ 6¼ 0 ½7 solution is determined by the sign of the derivative
@x f (x, ) of f at this equilibrium, and it is closely
Then the point (0 , 0) is a pitchfork bifurcation related to the type of the static bifurcation.
point, this denomination being related with the In the case of a turning point bifurcation, when
bifurcation diagram in the plane (, x); see Figure 2. @x2 f (x0 , 0 ) 6¼ 0, the sign of @x f (x, ) is different for
Notice that here, the bifurcation point (0 , x0 ) 2 R2 the two bifurcating solutions. This means that one
corresponds to the bifurcation from the origin of a pair solution is attracting (i.e., stable), the other one
of solutions exchanged by the symmetry x !x, in being repelling (i.e., unstable); see Figure 1. In the
addition to the persistent ‘‘trivial’’ solution x = 0 case of a pitchfork bifurcation as above, the stability
which is invariant under the above symmetry. Such a of the trivial solution x = 0 changes when  crosses
bifurcation is also referred to as a symmetry-breaking 0 , and the stability of both bifurcating nonzero
bifurcation. Similar bifurcation diagrams are found solutions is the opposite from the stability of the
when the equation [4] has a ‘‘known’’ branch of origin on the side of the bifurcation. The bifurcation

(µ 0, x 0) µ

(a) (b) (c)


Figure 3 Typical bifurcation diagrams in the case of a branch
of trivial solutions. One-sided bifurcations: (a) supercritical,
Figure 1 Turning point bifurcation in the case @ f (x0 , 0 ) > 0 (b) subcritical; two-sided bifurcation: (c) transcritical. The solid
and @x2 f (x0 , 0 ) < 0. The solid (dashed) line indicates the branch (dashed) lines indicate the branch of stable (unstable) solutions
of stable (unstable) solutions in the differential equation. in the differential equation.
Bifurcation Theory 277

is called supercritical if the bifurcating solutions lie


on the side of the bifurcation point where the basic
solution x = 0 is unstable and subcritical otherwise;
see Figure 2. The situation is the same in the case of
one-sided bifurcations for an equation which has a µ
‘‘known’’ branch of solutions. In the case of a two-
sided bifurcation, there is an exchange of stability at
the bifurcation point (0 , x0 ), solutions on the two
branches having opposite stability for  > 0 and
 < 0 , which changes at (0 , x0 ). Such a bifurcation Figure 4 Supercritical Hopf bifurcation.
is also referred to as transcritical; see Figure 3.
Notice that the study of fixed points or periodic by Poincaré, and then proved in two dimensions by
points for maps enter in the above frame. Specifi- Andronov (1937) using a Poincaré map, and later in
cally, the period-doubling process occurring in n dimensions by Hopf (1948) by means of a
successive bifurcations of one-dimensional maps is Liapunov–Schmidt-type method. For the differential
a common phenomenon in physics. equation, the absence of the zero eigenvalue in the
The analysis of bifurcations in two dimensions spectrum of L is not enough to ensure that the
leads to more complicated scenarios. Consider the vector field f (  , 0 ) is structurally stable in a
differential equation [8] in which now x 2 R2 and neighborhood of x0 . This only holds when the
f (x, ) 2 R 2 , and assume that f (x0 , 0 ) = 0. The spectrum of L does not contain purely imaginary
behavior of solutions near (x0 , 0 ) is determined by eigenvalues, as asserted by the Hartman–Grobman
the differential Dx f (x0 , 0 )=: L of f with respect to theorem. We are then left with the case when L has
x, which can be identified with a 2  2 matrix. For a pair of purely imaginary eigenvalues i!, ! 2 R .
steady solutions, the implicit function theorem Static bifurcation theory gives that the system has a
insures the existence of a unique branch of solutions unique branch of equilibria (x(), ) for  close to
x() provided L is invertible or, in other words, zero 0 , and typically their stability changes as  crosses
does not belong to the spectrum of L. Consequently, 0 . For the differential equation a Hopf bifurcation
the study of bifurcations of steady solutions is occurs in which a branch of periodic orbits
concerned with the case when zero belongs to the bifurcates on one side of 0 , and their stability is
spectrum of L, and can be performed following opposite to that of the steady solution on this side;
the strategy described for one dimension, provided see Figure 4. A convenient way to study this
that the zero eigenvalue of L is simple. For example, bifurcation is through ‘‘normal form theory,’’
assuming that the second eigenvalue is negative which is briefly described below.
leads in general to a saddle–node bifurcation, where
an additional dimension is added to the previous
picture of a turning point bifurcation, in which one
Local Bifurcation Theory
of the two bifurcating steady solutions is a stable
node, while the other one is a saddle. If, in addition, There are two aspects of bifurcation theory, local
there is a symmetry S commuting with f, that is, and global theory. As this designation suggests, local
such that f (Sx, ) = Sf (x, ), and if, for example, x0 theory is concerned with (local) properties of the set
is invariant under S, Sx0 = x0 , and the eigenvector 0 of solutions in a neighborhood of a ‘‘known’’
associated to the zero eigenvalue of L is antisym- solution, while global theory investigates solutions
metric, L0 = 0 , then there is again a pitchfork in the entire space.
bifurcation. The equation possesses a branch of An important class of tools in local bifurcation
symmetric steady solutions the stability of which theory consists of reduction methods, among which
changes when crossing the value 0 of the para- the Liapunov–Schmidt reduction and the center
meter, node on one side and saddle on the other, manifold reduction are often used to investigate
and a pair of solutions is created in a one-sided static and dynamic bifurcations, respectively. The
bifurcation which are exchanged by the symmetry S basic idea is to replace the bifurcation problem by
and have stability opposite to the one of the an equivalent problem in lower dimensions, for
symmetric solution, just as in the one-dimensional example, a one- or a two-dimensional problem as
pitchfork bifurcation above. the ones above.
A new type of bifurcation that arises for vector Consider again the equation [1] in which F : X 
fields in two dimensions is the so-called Hopf M ! Y is sufficiently regular, and X , Y, and M are
bifurcation. This bifurcation was first understood Banach spaces. Assume, without loss of generality,
278 Bifurcation Theory

that F(0, 0) = 0, or, in other words, that one solution Since dynamic bifurcations are related to the existence
is known. The equation can be then written as of purely imaginary spectral values of L, the kernel of L
alone is not enough to describe this situation. One has to
LX þ GðX; Þ ¼ 0
consider the spectral space Y c of L associated to the
in which L = DX F(0, 0) represents the differential of purely imaginary spectrum of L. A spectral gap is
F with respect to X at (0, 0), and is assumed to have needed between this part of the spectrum and the rest
a closed range. The implicit function theorem shows (always true in finite dimensions), so that the spectral
absence of bifurcation if L has a bounded inverse, so projection P onto Y c is well defined. One writes
that bifurcations are related to the existence of a X ¼ Xc þ Xh ; Xc ¼ PX; Xh ¼ ðid  PÞX
nontrivial kernel of L. The Liapunov–Schmidt
reduction then goes as follows. and obtains the decomposed system
Let N(L) and R(L) denote the kernel and the range of dXc
L, respectively, and consider continuous projections ¼ LXc þ PGðXc þ Xh ; Þ
dt
P : X ! N(L) and Q : Y ! R(L). Then there exists a dXh
bounded linear operator B : R(L) ! (id  P)X , the right ¼ LXh þ ðid  PÞGðXc þ Xh ; Þ
dt
inverse of L, satisfying LB = id on R(L) and BL = id  P
on X . For X 2 X one may write The reduction procedure works provided the non-
homogeneous linear equation
X ¼ X0 þ X1 ; X0 ¼ PX; X1 ¼ ðid  PÞX
dXh
¼ LXh þ f ðtÞ
and then by projecting with id  Q and Q the dt
equation becomes possesses a unique solution in suitably chosen
function spaces with weak exponential growth,
ðid  QÞGðX0 þ X1 ; Þ ¼ 0
such that one can then solve the second equation
X1 þ BQGðX0 þ X1 ; Þ ¼ 0 for Xh = (Xc ) in a neighborhood of the origin in
these function spaces. This property is always true in
The implicit function theorem allows to solve the finite dimensions, but it has to be checked in infinite
second equation for X1 = (X0 , ) in a neighborhood dimensions. Different results showing the solvability
of the origin. Substitution into the first equation leads of this equation are available in both Banach and
to the equation in (id  Q)Y for X0 in PX , Hilbert spaces, relying upon additional conditions
ðid  QÞGðX0 þ ðX0 ; Þ; Þ ¼ 0 on the spectrum of L, decaying properties of the
resolvent of L on the imaginary axis, and regularity
also called bifurcation equation. This equation properties of the nonlinearity G. The map  is then
completely describes the set of solutions to [1] in a used to construct a map : PX  M ! (id  P)X ,
neighborhood of (0, 0), and this problem is then defined in a neighborhood of the origin, which
posed in a space of dimension much smaller than the parametrizes a local center manifold invariant under
dimension of X . the flow of the equation. The flow on this center
The basic principle of the Liapunov–Schmidt method manifold is governed by the reduced equation in Y c ,
has been discovered and used independently by different
dXc
authors. E Schmidt (1908) used this method for integral ¼ LXc þ PGðXc þ ðXc ; Þ; Þ
equations, while Liapunov used it to study the stability dt
of the zero solution of nonlinear partial differential which completely describes the bifurcation problem.
equations when the linear part has zero eigenvalues The first proofs of this result were given in finite
(1947), and later in 1960 for the bifurcation problem dimensions by Pliss (1964) and Kelley (1967). Center
studied by Poincaré (1885). In working in a Banach manifolds in infinite dimensions have been studied in
space of t-periodic functions, the Liapunov–Schmidt different settings determined by assumptions on the
method may be used to solve the Hopf bifurcation linear part L and the nonlinear part G. One typical
problem, as did Hopf himself in 1948. assumption in infinite dimensions is that the spectrum
The analog of this reduction procedure for the of L contains only a finite number of purely imaginary
differential equation [3] is the center manifold eigenvalues, so that the reduced equation above is a
reduction. Assuming that F(0, 0) = 0, we obtain the differential equation in a finite-dimensional space.
differential equation These reduction methods work for a large class of
problems and the advantage of such an approach is
dX that one is left with a bifurcation problem in a
¼ LX þ GðX; Þ
dt lower-dimensional space. The methods involved in
Bifurcation Theory 279

solving this reduced bifurcation problem can be very part. The center manifold reduction provides a
different from one problem to another, and often two-dimensional reduced system with linear part
make use of some additional structure in the problem, having the simple eigenvalues i!, for which it is
such as a gradient-like structure, Hamiltonian convenient to write the normal form in complex
structure, or the presence of symmetries, which variables
are preserved by the reduction procedure.
dA 2  2kþ2 
A powerful tool for the analysis of these reduced ¼ i!A þ AQ A ;  þ o A
differential equations is provided by the normal dt
form theory, which goes back to works of Poincaré for A(t) 2 C, where Q is a complex polynomial of
(1885) and Birkhoff (1927). The idea is to use degree k in jAj2 with Q(0, 0) = 0, or, equivalently, in
coordinate transformations to make the expression polar coordinates A = rei ,
of the vector field as simple as possible. The
transformed vector field is called normal form. dr  
¼ rQr r 2 ;  þ o r 2kþ2
There is an extensive literature on normal forms dt
for vector fields in many different contexts, in both d  
finite- and infinite-dimensional cases. Typically the ¼ ! þ Q r 2 ;  þ o r 2kþ1
dt
classes of normal forms are characterized in terms of
the linear part of the differential equation. Qr and Q being the real and imaginary part of Q,
For differential equations of the form respectively. The radial equation for r truncated at
order 2k þ 1 decouples and admits a pitchfork bifurca-
dx tion. The bifurcating steady solutions of this equation
¼ Lx þ gðx; Þ ½9
dt then lead first to periodic solutions for the truncated
system, which are then shown to persist for the full
in which L is a matrix and g a sufficiently regular
equation by a standard perturbation analysis.
map such that g(0, 0) = 0, Dx g(0, 0) = 0, as encoun-
A situation that occurs in a large class of problems
tered in bifurcation theory, one possible character-
is when the problem possesses a reversibility
ization of normal forms makes use of the adjoint
symmetry, which often comes from some reflection
matrix L . Fixing any order k 2, there exist
invariance in the physical space, that is, when the
polynomials  and N of degree k in x with
vector field F(  , ) anticommutes with a symmetry
coefficients which are regular functions of ,
operator S. One of the simplest examples is the case
and (0, 0) = N(0, 0) = 0, Dx (0, 0) = Dx N(0, 0) = 0,
of a differential equation [9] when the matrix L has
such that by the change of variables
a double eigenvalue in 0, no other eigenvalues with
x ¼ y þ ðy; Þ zero real part, and a one-dimensional kernel which
is invariant by S. In this case, the center manifold
the equation [9] is transformed into the normal form reduction provides a two-dimensional reduced rever-
dy sible system, which can be put in the normal form
¼ Ly þ Nðy; Þ þ oðkykk Þ ½10
dt da
¼b
in which the polynomial N is characterized through dt
db
 
NðetL y; Þ ¼ etL Nðy; Þ ¼   a2 þ oððjaj þ jbjÞ3 Þ
dt
for all y, , and t, or, equivalently, which anticommutes with the symmetry
 
Dy Nðy; ÞL y ¼ L Nðy; Þ (a, b) 7! (a, b). The above system undergoes a
reversible Takens–Bogdanov bifurcation and has
for all y and . This characterization allows to determine for  > 0 a phase portrait as in Figure 5. There are
the classes of possible normal forms for a given matrix L, two equilibria, one a saddle, the other a center, and
and also provides an efficient way to compute the a family of periodic orbits with the zero-amplitude
normal form for a given vector field g. As for the limit at the center equilibrium, and the infinite-
reduction methods, normal form transformations can be period limit a homoclinic orbit, originating at the
made to preserve the additional structure of the saddle point. In concrete problems the bounded
problem, such as Hamiltonian structure or symmetries. orbits of such a reduced system determine the shape
As an example, consider a differential equation of of physically interesting solutions of the full system
the form [9] with x 2 Rn and  2 R, which supports a of equations, such as, for example, in water-wave
Hopf bifurcation so that L has simple eigenvalues theory where to homoclinic and periodic orbits
i!, ! > 0, and no other eigenvalues with zero real correspond solitary and periodic waves, respectively.
280 Bifurcation Theory

solutions (0, ) for any . The bifurcation result


asserts that if for some real parameter value 0 zero
is an eigenvalue of odd multiplicity of the operator
id  0 L, then the set S of nontrivial solutions (X, )
possesses a maximal subcontinuum which contains
(0, 0 ) and meets either infinity in X  R or another
trivial solution (0, 1 ), 1 6¼ 0 . In particular, (0 , 0)
Figure 5 Phase portrait of the reduced system in a reversible is a bifurcation point. A local version of this result is
Takens–Bogdanov bifurcation (left) and sketch of the a-component often referred to as Krasnoselski’s theorem.
of solutions corresponding to homoclinic and periodic orbits (right).
Different versions and extensions of these theo-
rems can be found in the literature, as, for example,
in the case of a simple eigenvalue, or if the field F is
real-analytic when the set of solutions is path-
connected. More recent works address the question
of lack of compactness, and a number of results are
now available for problems with additional struc-
ture (gradient-like or Hamiltonian structure), but
Figure 6 Phase portrait of the reduced system in absence of also for concrete problems, such as the water-wave
reversibility (left) and sketch of the a-component of the solution problem.
corresponding to the bounded orbit (right).
See also: Bifurcations in Fluid Dynamics; Bifurcations of
Notice that in the absence of the reversibility Periodic Orbits; Central Manifolds, Normal Forms;
symmetry, the same type of bifurcation may lead to Dynamical Systems in Mathematical Physics: An
a completely different phase portrait for the reduced Illustration from Water Waves; Ginzburg–Landau
system as, for example, the one in Figure 6 in which Equation; Integrable Systems: Overview; Leray–
the homoclinic and the periodic orbits disappear. Schauder Theory and Mapping Degree; Singularity and
This situation often occurs in the presence of a small Bifurcation Theory; Stability Theory and KAM; Symmetry
and Symmetry Breaking in Dynamical Systems.
dissipation in nearly reversible systems.

Global Bifurcation Theory Further Reading


Most of the existing results in global bifurcation Arnold VI (1988) Geometrical Methods in the Theory of
theory concern the static problem [1]. The analysis Ordinary Differential Equations. Grundlehren der Mathema-
of global sets of solutions often relies upon tischen Wissenschaften, vol. 250. New York: Springer.
topological methods, degree theory, but also varia- Buffoni B and Toland J (2003) Analytic Theory of Global
tional methods, or analytic function theory. Signifi- Bifurcation. Princeton: Princeton University Press.
Chossat P and Lauterbach R (2000) Methods in Equivariant
cant progress in understanding global branches of Bifurcations and Dynamical Systems. Advanced Series in
solutions has been made in the 1970s, in particular, Nonlinear Dynamics, vol. 15. River Edge, NJ: World
for nonlinear eigenvalue problems and the Hopf Scientific.
bifurcation problem (see, e.g., works by Rabinowitz, Chow S-N and Hale JK (1982) Methods of Bifurcation Theory.
Crandall, Dancer, and Alexander, Yorke, Ize, Grundlehren der Mathematischen Wissenschaften, vol. 251.
New York: Springer.
respectively).
Golubitsky M and Schaeffer DG (1985) Singularities and Groups
A now-classical result in the topological theory of in Bifurcation Theory, Vol. I. Applied Mathematical Sciences,
global bifurcations is the following theorem by vol. 51. New York: Springer.
Rabinowitz (1970), which gives a characterization Golubitsky M, Stewart I, and Schaeffer DG (1988) Singularities
of global sets of solutions for eigenvalue problems of and Groups in Bifurcation Theory, Vol. II. Applied Mathema-
tical Sciences, vol. 69. New York: Springer.
the form Guckenheimer J and Holmes P (1990) Nonlinear Oscillations,
X ¼ FðX; Þ ¼ LX þ HðX; Þ Dynamical Systems, and Bifurcations of Vector Fields. Applied
Mathematical Sciences, vol. 42. New York: Springer.
H(X, ) ¼ o(kXk), posed for (X, ) 2 X  R, X being Iooss G and Adelmeyer M (1998) Topics in Bifurcation Theory
a Banach space. In contrast to local theory where and Applications, Advances Series in Nonlinear Dynamics,
2nd edn., vol. 3, Singapore: World Scientific.
the function F is usually k-times differentiable (with Iooss G, Helleman RHG, and Stora R (eds.) (1983) Chaotic
a suitable k), in the global theory a typical behavior of deterministic systems. Session XXXVI of the
assumption is that F : X  R ! X is compact. The Summer School in Theoretical Physics held at Les Houches
equation above possesses a ‘‘trivial’’ branch of June 29–July 31, 1981. Amsterdam: North-Holland.
Bifurcations in Fluid Dynamics 281

Ize J and Vignoli A (2003) Equivariant Degree Theory. de Ruelle D (1989) Elements of Differentiable Dynamics and
Gruyter Series in Nonlinear Analysis and Applications, vol. 8. Bifurcation Theory. Boston MA: Academic Press.
Berlin: de Gruyter and Co. Vanderbauwhede A (1989) Centre Manifolds, Normal Forms and
Kielhöfer H (2004) Bifurcation Theory. An Introduction with Elementary Bifurcations. Dynamics Reported, Dynam. Report.
Applications to PDEs, Applied Mathematical Sciences, Ser. Dynam. Systems Appl., vol. 2, pp. 89–169. Chichester: Wiley.
vol. 156. New York: Springer. Vanderbauwhede A and Iooss G (1992) Center Manifold Theory
Kuznetsov YA (2004) Elements of Applied Bifurcation Theory, in Infinite Dimensions. Dynamics Reported: Expositions in
3rd edn. Applied Mathematical Sciences, vol. 112. New York: Dynamical Systems, vol. 1, pp. 125–163. Berlin: Springer.
Springer.

Bifurcations in Fluid Dynamics


G Schneider, Universität Karlsruhe, Karlsruhe, time-periodic. If the rotational velocity of the inner
Germany cylinder is increased further, more complicated pat-
ª 2006 Elsevier Ltd. All rights reserved. terns occur. The bifurcation scenario is well under-
stood from experiments and analytic investigations.
Bénard’s problem consists in finding the flow of a
viscous incompressible fluid contained in between two
Introduction plates, where the lower plate is heated and the upper
Almost all classical hydrodynamical stability problems plate is kept at a constant temperature, cf. Figure 2. If
are experiments or gedankenexperiment which have the temperature difference between the two plates is
been designed to understand and to extract special below a certain threshold, the transport of energy from
phenomena in more complicated situations. Examples below to above is made by pure conduction. At this
are the Taylor–Couette problem, Bénard’s problem, threshold, this spatially homogenous solution becomes
Poiseuille flow, or Kolmogorov flow. unstable, convection sets in, and spatially periodic
The Taylor–Couette problem consists in finding the patterns as rolls or hexagons occur. Convection
flow of a viscous incompressible fluid contained in problems play a big role in geophysical applications,
between two coaxial co- or counterrotating cylinders, that is, in spherical domains, as the earth. The paradigm
cf. Figure 1. If the rotational velocity of the inner for an anisotropic pattern-forming system is electro-
cylinder is below a certain threshold, the trivial convection in nematic crystals.
solution, called the Couette flow, is asymptotically Poiseuille flow consists in finding the flow of a
stable. At the threshold, this spatially homogenous viscous incompressible fluid flowing through a pipe
solution becomes unstable and bifurcates via a pitch- driven by some pressure gradient, cf. Figure 3. In
fork bifurcation or a Hopf bifurcation into different noncircular pipes, the trivial laminar flow becomes
spatially periodic patterns, that is, depending on the unstable at a critical pressure gradient. Experimen-
rotational velocity of the outer cylinder the basic tally, a direct transition to turbulent flow with large
patterns are stationary (called the Taylor vortices) or amplitudes is observed, according to the fact that in
general at the instability point of the trivial solution
a subcritical bifurcation occurs.

Figure 2 Bénard’s problem with rolls.

Figure 1 The Taylor–Couette problem with the Taylor vortices. Figure 3 Poiseuille flow with the trivial solution.
282 Bifurcations in Fluid Dynamics

@t U ¼ U þ NðUÞ
where U = 0 corresponds to the trivial solution, where
 is a linear and N(U) = O(U2 ) for U ! 0 a nonlinear
operator. Most of the examples from the previous
section are semilinear, that is, from a functional
analytic point of view, the nonlinear operator N can
be controlled in terms of the linear operator .
φ Since the form of the bifurcating pattern is only
Figure 4 The inclined-plane problem. The trivial Nusselt slightly influenced by far away boundaries, that is, for
solution possesses a flat top surface and a parabolic flow profile. instance, the upper and lower end of the rotating
cylinders in the Taylor–Couette problem, the problems
Kolmogorov flow consists in finding the flow of a are considered from a theoretical point of view in
viscous incompressible fluid under the action of an unbounded domains,  = Rd  , with   Rm the
external force parallel to the flow direction x and bounded cross section that is, for instance, that the
varying periodically in the perpendicular y-direction. Taylor–Couette problem is considered with two cylin-
This gedankenexperiment has been designed by ders of infinite length. Then the eigenfunctions of the
Kolmogorov in 1958 as a simplified model for the linear operator  are given by Fourier modes, that is,
Poiseuille flow problem in order to study the nature
ðeikx ’k;n ðzÞÞ ¼ n ðkÞeikx ’k;n ðzÞ
of turbulence. The trivial solution which is called
P
Kolmogorov flow can become unstable via a long- with x 2 Rd , k 2 Rd , k  x = dj= 1 kj xj , z 2 , n 2 N.
wave instability along the flow direction. If an external control parameter is changed, inde-
The inclined-plane problem consists in finding the pendent of the underlying physical problem, the
flow of a viscous liquid running down an inclined trivial solution becomes unstable, then the surface
plane, cf. Figure 4. The trivial solution, the so-called k 7! Re1 (k) intersects the plane {Re1 (k) = 0}.
Nusselt solution, becomes sideband-unstable if the Generically, this happens first at a nonzero wave
inclination angle  is increased. Then the dynamics is vector kc 6¼ 0 (cf. Figure 5).
dominated by traveling pulse trains, although the Examples for such an instability are the Taylor–
individual pulses are unstable due to the long-wave Couette problem, Bénard’s problem, or Poiseuille
instability of the flat surface. Time series taken from flow. Very often, due to some conserved quantity in
the motion of the individual pulses indicates the the problem we have Re1 (0) = 0 for all values of
occurrence of chaos directly at the onset of instability. the bifurcation parameter. Then, a so-called side-
There are other famous hydrodynamical stability band instability can occur, cf. Figure 6.
problems, with arbitrarily complicated bifurcation Examples for such an instability are the Kolmo-
scenarios. gorov flow problem or the inclined plane problem.
According to some symmetries in the problem, for
instance, reflection along the cylinders in the
Taylor–Couette problem or rotational symmetry in
Spectral Analysis of the Trivial Solution
Bénard’s problem, the curves in Figure 5 are double
All classical hydrodynamical stability problems are or rotational symmetric.
described by the Navier–Stokes equations In case of  being spherical symmetric, we have
1 ðfl ðrÞ’l; n ðzÞÞ ¼ l fl ðrÞ’l; n ðzÞ
@t U ¼ U  rp  ðU  rÞU þ f
 ½1
0¼rU
where U = U(x, t) 2 Rd with d = 2, 3 is the velocity
field, p = p(x, t) 2 R the pressure field, f some external
forcing, and  the dynamic viscosity. These equations k
are completed with boundary conditions. In case of
Bénard’s problem, the Navier–Stokes equations are
coupled to a nonlinear heat equation. Rest of spectrum
By projecting U onto the space of divergence-free
vector fields and by taking the trivial solution as
new origin all problems from the previous section Figure 5 Real part of the spectrum in case of an instability at a
can be written as evolutionary system wave number kc 6¼ 0. Definition of the small bifurcation parameter ".
Bifurcations in Fluid Dynamics 283

Es

k Mc

Ec
Rest of spectrum

Figure 8 The center manifold is invariant under the flow, is


Figure 6 Real part of the spectrum in case of a sideband
tangential to the central subspace Ec , and attracts nearby
instability. Definition of the small bifurcation parameter ".
solutions with some exponential rate.

with r  0, z 2 Sd , ’l, n for l 2 N0 and m =  l,


l  1, . . . , l þ 1, l being a spherical harmonic, that Mc ¼ fu ¼ c1 ’1 þ hðc1 Þj
is, if l0 is the eigenvalue having first positive real hðc1 Þ 2 spanf’2 ; ’3 ; . . .gg
part, then by symmetry, simultaneously 2l0 þ 1
eigenvalues cross the imaginary axis. the so-called center manifold which is tangential to Ec ,
that is, kh(c1 )k  Ckc1 k2 (Figure 8). The dynamics on
Mc is no longer trivial due to the nonlinear terms.
Due to the fact that real problems are considered
Reduction of the Dimension Re1 (kc ) = 0 implies Re1 (kc ) = 0, that is, in case
In order to understand the occurrence of the spatially of 2=kc -periodic boundary conditions always two
periodic Taylor vortices in the Taylor–Couette pro- eigenvalues cross the imaginary axis simultaneously.
blem and of the roll solutions and hexagons in For Bénards’s problem in a strip or for the Taylor–
Bénard’s problem, the problems are considered with Couette problem in case of a bifurcation of fixed
periodic boundary conditions along the unbounded points, the reduced system on the center manifold is
directions. Then the instability of the trivial solution derived with the ansatz
occurs when at least one eigenvalue crosses the U ¼ "Að"2 tÞeikc x þ c:c: þ Oð"2 Þ
imaginary axis. Generically, this happens by a simple
real eigenvalue or a pair of complex-conjugate where 0 < " 1 is the small bifurcation parameter,
eigenvalues crossing the imaginary axis (Figure 7). cf. Figure 5. Then due to eikc x eikc x eikc x = eikc x the
Center manifold theory and the Lyapunov–Schmidt complex-valued amplitude A satisfies the so-called
reduction allow to reduce the a priori infinite-dimen- Landau equation
sional bifurcation problem to a finite-dimensional one.
@T A ¼ A  AjAj2 þ Oð"2 Þ
In case of a real eigenvalue 1 crossing the imaginary
axis, the solution u can be written as a sum of the where the Landau coefficient  2 R is obtained by
weakly unstable mode and the stable modes, that is, classical perturbation analysis (Figure 9). The
u = c1 ’1 þ ur , (c1 2 R), where ur lives in the closure of reduced system is symmetric under the S1 -symmetry
the span of the stable eigenfunctions {’2 , ’3 , . . . }. For
the linearized system all solutions are attracted by the
one-dimensional set Ec = {u j ur = 0}, in which all Im
solutions diverge to infinity.
For the nonlinear system and small bifurcation
parameter this attracting structure survives, no
longer as a linear space, but as a manifold
Re

Rest of Rest of
spectrum spectrum
Figure 9 The dynamics of the Landau equation. Except of the
origin which corresponds to the Couette flow, all solutions
converge towards the circle of fixed points, which corresponds
to the family of Taylor vortices. The translation invariance of the
Figure 7 Generically, a simple real eigenvalue or a pair of Taylor–Couette problem is reflected by the rotational symmetry of
complex-conjugate eigenvalues cross the imaginary axis. the reduced system.
284 Bifurcations in Fluid Dynamics

A 7! Aei with  2 R which corresponds to the


translation invariance of the original systems.
This so-called equivariant bifurcation theory has
been applied successfully to convection problems in
the plane and on the sphere.
The stability of time-periodic flows can be
analyzed with Floquet multipliers. Bifurcations
from a time-periodic solution can lead to quasiper- Figure 10 The front solution of the Ginzburg–Landau equation
modulates the underlying pattern in the original system.
iodic motion in time. Ruelle and Takens (1971)
showed that already the next bifurcation leads to
chaotic dynamics. Since this time many classical connecting the stable Taylor vortices with the
hydrodynamical stability problems have been ana- unstable Couette flow, cf. Figure 10.
lyzed with bifurcation theory up to turbulent flows. The diffusion operator in the Ginzburg–Landau
It was observed that center manifold theory can equation reflects the parabolic shape of Re1 close
also be applied successfully to elliptic PDE problems to k = kc in Figure 5. In case of the long-wave
posed in spatially unbounded cylindrical domains. instability, as drawn in Figure 6, the second-order
A famous example is the construction of capillary- differential operator changes in a fourth-order
gravity solitary waves for the so-called water-wave differential operator.
problem. For Kolmogorov flow with T = "4 t and X = "x and
the amplitude scaled with ", we obtain that in lowest
order A has to satisfy a Cahn–Hilliard equation
Modulation Equations pffiffiffi
@T A ¼  2@X2 A  3@X4 A þ @X2 ðA3 Þ
The analysis of the last section is of no use in case of
a sideband instability occurring at the wave number where A(X, T) 2 R and  2 R a constant (cf. Figure 6).
kc = 0, as it happens in the inclined-plane problem The Kuramoto–Shivashinsky (KS)-perturbed KdV
or in the Kolmogorov flow problem. Moreover, in equation
case of an instability at a wave vector kc 6¼ 0, based 3
on the above analysis, front solutions cannot be @T A ¼ @X u  @X ðA2 Þ=2  "ð@x2 þ @x4 Þu
described. In such situations, the method of modula- with A = A(X, T) 2 R, X 2 R, T  0, where 0 < " 1
tion equations generalizes the role of the finite- is still a small parameter, can be derived for the
dimensional amplitude equations from the last inclined problem with T = "3 t and X = "x and the
section. amplitude scaled with "2 .
The complex cubic Ginzburg–Landau equation in The theory of modulation equations is nowadays a
normal form is given by well-established mathematical tool which allows us to
@T A ¼ ð1 þ iÞ@X2 A þ A  ð1 þ iÞAjAj2 construct special solutions, global existence results for
the solutions of pattern-forming systems, or allows to
where the coefficients ,  2 R are real, and we have characterize the attractors in such systems. The
X 2 R, T  0, and A(X, T) 2 C. The Ginzburg– method is based on approximation results, showing
Landau equation is a universal amplitude equation that solutions of the original systems can be approxi-
that describes slowly varying modulations, in space mated by the modulation equation and attractivity
and time, of the amplitude of bifurcating spatially results showing that every solution of the original
periodic solutions in pattern-forming systems close system develops in such a way that it can be described
to the threshold of the first instability. Whenever the by the modulation equation.
instability drawn in Figure 5 occurs, that is, for the This method can also be applied to secondary
Taylor–Couette problem and Bénard’s problem in a bifurcations describing instabilities of spatially per-
strip, that is, d = 1, it can be derived by a multiple iodic wave trains. Then the so-called phase-diffusion
scaling ansatz equations, conservation laws, Burgers equations,
and again the KS equations occur.
uðx; tÞ
"Að"ðx  cg tÞ; "2 tÞeiðkc x!0 tÞ þ c:c:
However, this method cannot be applied success-
For instance, in case of  =  = 0, the Ginzburg– fully in all situations. There are counterexamples
Landau equation possesses front solutions connect- showing that not every formally derived modulation
ing the stable fixed point A = 1 with the unstable equation describes the original system in a correct
fixed point A = 0. Such solutions correspond in the way. Moreover, very often according to some
Taylor–Couette problem to modulating fronts symmetries in the original problem no consistent
Bifurcations of Periodic Orbits 285

Im
Leray–Schauder Theory and Mapping Degree; Multiscale
Approaches; Newtonian Fluids and Thermohydraulics;
Symmetry and Symmetry Breaking in Dynamical Systems;
Continuous spectrum
Turbulence Theories; Variational Methods in Turbulence.

Re Further Reading
Chandrasekhar S (1961) Hydrodynamic and Hydromagnetic
Stability. Oxford: Clarendon.
Discrete eigenvalues
Chang H-C and Demekhin EA (2002) Complex Wave Dynamics
on Thin Films, Studies in Interface Science, vol. 14. Amsterdam:
Figure 11 Spectrum for the flow around an obstacle. Elsevier.
Chossat P and Iooss G (1994) The Taylor–Couette Problem,
multiple scaling analysis is possible, that is, that the Applied Mathematical Sciences, vol. 102. Springer.
modulation equations still depend on ". Chow S-N and Hale J (1982) Methods of Bifurcation Theory,
Grundlehren der Mathematischen Wissenschaften, vol. 251.
Berlin: Springer.
Discussion Golubitsky M and Schaeffer DG (1985) Singularities and Groups
in Bifurcation Theory I, Applied Mathematical Sciences,
There is no satisfactory bifurcation analysis for situa- vol. 51. Berlin: Springer.
tions where boundary layers play a role. The most Golubitsky M, Stewart I, and Schaeffer DG (1988) Singularities
and Groups in Bifurcation Theory II, Applied Mathematical
simple problem is the flow around some obstacle. The
Sciences, vol. 69. Berlin: Springer.
difficulties are according to the fact that due to the Haken H (1987) Advanced Synergetics. Berlin: Springer.
unbounded flow region there is always continuous Henry D (1981) Geometric Theory of Semilinear Parabolic Equa-
spectrum up to the imaginary axis. From the localized tions, Lecture Notes in Mathematics, vol. 840. Berlin: Springer.
obstacle discrete eigenvalues are created, (cf. Figure 11). Mielke A (2002) The Ginzburg–Landau equation in its role as a
modulation equation. In: Fiedler B (ed.) Handbook of Dyna-
In such a situation, so far there is no mathematical mical Systems II, pp. 759–834. Amsterdam: North-Holland.
bifurcation theory available. Ruelle D and Takens F (1971) On the nature of turbulence.
Communications in Mathematical Physics 20: 167–192.
See also: Bifurcation Theory; Dynamical Systems in Temam R (1988) Infinite-Dimensional Systems in Mechanics and
Mathematical Physics: An Illustration from Water Waves; Physics. Berlin: Springer.

Bifurcations of Periodic Orbits


J-P Françoise, Université P.-M. Curie, Paris VI, Paris, The Asymptotic Phase of a Stable
France Periodic Orbit
ª 2006 Elsevier Ltd. All rights reserved.
Let  be a periodic orbit of a vector field and let
S() denote the stable manifold of  (resp. U()
denotes the unstable manifold of ). The following
Introduction theorem can be found, for instance, in Hartman
Bifurcation theory of periodic orbits relates to (1964).
modeling of quite diverse subjects. It appeared Theorem There exist  and K such that Re(j ) < ,
classically in the field of celestial mechanics with j = 1, . . . , k and Re(j ) > , j = k þ 1, . . . , and for all
the contributions of H Poincaré. Van der Pol (1926, x 2 S(), there is an asymptotic phase t0 such that for
1927, 1928, 1931) observed the frequency-locking all t  0
phenomenon in electrical circuits. More recently,
Malkin’s theory (Malkin 1952, 1956, Roseau 1966) j t ðxÞ  ðt  t0 Þ j< K eðt=TÞ
was used to justify synchronization of weakly
coupled oscillators modeling the electrical activity Similarly, for any x 2 U(), there is a t0 such that t  0,
of the cells of the sinusal node in the heart. This j t ðxÞ  ðt  t0 Þ j< K eðt=TÞ
article provides the essential mathematical back-
ground necessary for existence of frequency locking. If the periodic orbit is stable, the local stable
Applications can be found, for instance, in Weakly manifold coincides with an open neighborhood of .
Coupled Oscillators. In such a case, there is a foliation of this open set
286 Bifurcations of Periodic Orbits

whose leaves are the points with a given asympto- di


tic phase. The asymptotic phase can be considered ¼ fi ð; ; Þ
dt
as a coordinate function  defined on the ½3
neighborhood S(). di
¼ Fi ð; ; Þ; i ¼ 1; . . . ; m
If we consider now the particular case of a plane dt
system, this function can be completed with the
square of the distance function to the orbit into a Definition The system [2] has a phase locking if
coordinate system called the ‘‘amplitude–phase’’ the system induced by [3] on (t)
system and denoted as (, ).
di
¼ Fi ð0; ; Þ ½4
dt

Frequency Locking and Phase Locking has an attractive singular point.

The term ‘‘oscillator’’ has two meanings. A con- As the attractive singular points are structurally
servative ‘‘oscillator’’ is a plane vector field which stable, this is enough to assume that the system
displays an open set of periodic orbits. It is said to di
be isochronous if all orbits have same period. A ¼ Fi ð0; ; 0Þ ½5
dt
dissipative ‘‘oscillator’’ is a planar vector field which
displays an attractive limit cycle (attractive periodic displays an attractive singular point.
orbit).
We consider N dissipative oscillators:
Periodic Orbits of Linear Systems
dxi
¼ f ðxi ; yi Þ
dt Consider the linear system
½1
dyi dx
¼ gðxi ; yi Þ ¼ PðtÞ  x þ qðtÞ ½6
dt dt
where i = 1, . . . , m. where P is a continuous T-periodic matrix function
The dynamical system obtained by considering the and q is a vector T-periodic continuous function,
space of all the variables (xi , yi ), i = 1, . . . , m, dis- x = (x1 , . . . , xn ). Consider also the two associated
plays an invariant torus full of periodic orbits that homogeneous equations:
we denote by T m (0).
dx
Assume now that the N oscillators are weakly ¼ PðtÞ  x ½7a
coupled: dt

dxi
¼ f ðxi ; yi Þ þ Fi ðx; y; Þ dx
dt ¼ P ðtÞ  x ½7b
½2 dt
dyi
¼ gðxi ; yi Þ þ Gi ðx; y; Þ where P denotes the transposed of P.
dt
The set of T-periodic solutions of [7b] is a vector
where  can be considered as small as we wish. space. m denotes its dimension. Let Uj (t), j = 1, . . . , m,
Definition The system [2] has a frequency locking be a basis of this vector space. This basis is completed
if it displays a family of stable periodic orbits  for by adding n  m solutions Uj (t), j = m þ 1, . . . , n, to
all values of  small enough which tends to (in the obtain a basis of Rn . Let U(t) be the matrix whose
sense of Hausdorff’s topology) a periodic orbit of [1] columns are these vectors; denote Uij (t) the elements of
contained in the periodic torus T m (0). this matrix.
With the change of variable x = U (0)1 y, system
Assume now that [2] has a frequency locking [6] gets transformed into
associated with the periodic orbit (t). Consider the
projections i (t) of (t) on the coordinates plane dy
¼ QðtÞy þ rðtÞ ½8
(xi , yi ), i = 1, . . . , m. Assume that  is small enough dt
so that the projection belongs to the open set Si on with Q(t) = U (0)P(t)U (0)1 and r(t) = U (0)q(t).
which are defined the ‘‘amplitude–phase’’ coordi- Matrix V(t) = U1 (0)U(t) is such that
nates of the system [1]. We can write the system [2],
restricted to the open set S = m i=1 Si , as
dV
þ Q ðtÞV ¼ 0; Vð0Þ ¼ I
dt
Bifurcations of Periodic Orbits 287

and the k first column vectors V(t), denoted as [7a]. To be more specific, one can choose x (t) to
V j (t), j = 1, . . . , m, are T-periodic. be the unique solution of [6] such that
Let X(t) be the fundamental solution defined by y(0)k = 0, k = m þ 1, . . . , n, and j (t) solutions of
[7a], such that y(0)k = jk . With these notations,
dX
¼ QðtÞ  X; Xð0Þ ¼ I x (t) is such that
dt
yð0Þk ¼ k ; k ¼ 1; . . . ; m
then,
and its other initial conditions y(0)k = k , k = m þ
X1 ðtÞ ¼ V  ðtÞ
1, . . . , n, are fixed:
The solution of [8] can be written as k ¼ k0
Z t
yðtÞ ¼ XðtÞ  yð0Þ þ XðtÞ  X1 ðuÞrðuÞ du ½9
0

This yields that T-periodic solutions of [8] have Malkin’s Theorem for Quasilinear
initial data y(0) given by Systems
Z T
Consider now nonlinear systems with the
ðV  ðTÞ  IÞ  yð0Þ ¼ V  ðsÞrðsÞ ds ½10 perturbation:
0
dx
Conversely, given a solution y(0) of [10], ¼ PðtÞ  x þ qðtÞ þ f ðx; t; Þ ½14
T-periodicity of P and q and uniqueness of solutions dt
of a differential equation imply that y(0) represents the where f is C1 and T-periodic in t.
initial data of a T-periodic solution of [8]. Hence, the Assume that the solutions y(t, y(0), ) of [14] exist
T-periodic solutions of [8] are in one-to-one corre- for all values of t, 0  t  T. The solutions define a
spondence with the affine space defined by the differential function of their initial data y(0). This is,
solutions of [10]. The m first rows of V  (T)  I are for instance, true for perturbations of linear systems
zero and its rank is exactly n  m. In the following, if  is small enough.
assume that the determinant  formed by the (n  m) Assume that q satisfies la condition [12] and that
last rows and last columns of (V  (T)  I) is not zero. there is a solution
A necessary and sufficient condition so that [8]  0 
1 ; . . . ; 0m
displays a T-periodic solution is
Z TX n to the equations
Vjk ðuÞrj ðuÞ du ¼ 0; k ¼ 1; . . . ; m ½11a n Z T
X
0 j¼1
k ðÞ ¼ Ujk ðuÞfj ðx ðuÞ; u; 0Þ du ¼ 0;
j¼1 0

X
n
ðVjk ðTÞ  jk Þyj ð0Þ k ¼ 1; . . . ; m ½15a
j¼mþ1
n Z
so that
X T
¼ Vjk ðsÞrj ðsÞ ds; mþ1sn ½11b @ k ðÞ
j¼1 0 j¼0 ; k ¼ 1; . . . m; j ¼ 1; . . . ; m ½15b
@j
This yields the Fredholm alternative, if the m is invertible.
conditions, Proceed as in previous section with the coordinate
Xn Z T change x = U (0)1 y. Equation [14] gets trans-
Ujk ðsÞqj ðsÞ ds ¼ 0; k ¼ 1; . . . ; m ½12 formed into
j¼1 0
dy
¼ QðtÞy þ rðtÞ þ Fðy; t; Þ ½16
are satisfied, then [6] displays a family x (t) of dt
T-periodic solutions depending of m parameters with F = U (0)f (U (0)1  y, t, ).
(1 , . . . , m ): Solutions of [16] are uniquely determined by their
x ðtÞ ¼ 1 1 ðtÞ þ    þ m m ðtÞ þ x
ðtÞ ½13 initial data. We can understand the parameters (, )
as coordinates on the space of solutions. With this
where x (t) is a particular T-periodic solution and viewpoint, for instance, the set of T-periodic
j (t) denote T-periodic independent solutions of solutions of [6] is an affine space of dimension m
288 Bifurcations of Periodic Orbits

given by the equations = 0 and is parametrized by displays an m-parameter family x (t) of T-periodic
the coordinates . In this space, we pick up a point orbits.
(which corresponds to a particular T-periodic solu- Assume that the solutions y(t, y(0), ) exist for all
tion of [6]): ( = 0 ). T-periodic solutions of [16] are 0  t  T and define a differentiable mapping of the
in one-to-one correspondence with the solutions of initial data y(0). This is, for instance, the case if we
Xn Z T assume that the nonperturbed equation defines a
Ck ð; ; Þ ¼ Vjk ðsÞFj ðyðs; ; ; Þ; s; Þds ¼ 0; flow and if  is small enough.
j¼1 0 Assume also that the different solutions x (t) are
k ¼ 1; . . . ; m ½17a independent in the sense that the mapping

X  7! x ðtÞ
Ck ð; ; Þ ¼ ðVjk ðTÞ  IÞ j
j¼mþ1;...;n is an immersion for any t. In other words, the m
n Z
X T vectors dx (t)=dj are independent.
 Vjk ðsÞrj ðsÞ ds We linearize the solution along the family of
j¼1 0
periodic orbits:
n Z
X T
 Vjk ðsÞFj ðyðs; ; ; Þ; s;Þds ¼ 0;
0 x ¼ x ðtÞ þ 
½23
j¼1

k ¼ m þ 1; . . . ; n ½17b Equation [21] gets transformed into


where k , k = 1, . . . , m and k = yk (0), k = m þ
1, . . . , n parametrize the solutions y(t, , , ) of d

¼ Dfx ðx ðtÞ;tÞ 


þ gðx ðtÞ;t; 0Þ þ Fð
;t;Þ ½24
[14] in this way: dt
X
m
Set, furthermore,
yð0Þ¼U ð0Þ  xð0Þ; xð0Þ ¼ j j ð0Þ þ x
ð0Þ ½18
j¼1
PðtÞ ¼ Dfx ðx ðtÞ; tÞ; rðtÞ ¼ gðx ðtÞ; t; 0Þ
Consider the determinant of the Jacobian matrix
and denote U(t) the fundamental solution of [7b]
of the mapping
described earlier.
ð; Þ 7! Cð; ; Þ ½19
Theorem Assume that there is a solution
0
for  =  , k = k0 ,
k = m þ 1, . . . , n ,  = 0. This is
 
equal to the product of  and the determinant of 01 ; . . . ; 0m
@ k ðÞ
j 0 ½20 of the m equations:
@j ¼
which is nonzero. n Z
X T
The implicit-function theorem shows that the k ðÞ ¼ Ujk ðuÞgj ðx ðuÞ; u; 0Þ du ¼ 0;
j¼1 0
differential equation [14] (and thus [16] as well)
has, for  small enough, a unique T-periodic solution
k ¼ 1; . . . ; m ½25a
which tends to x0 when  tends to 0.
such that

Generalization of Malkin’s Theorem @ k ðÞ


j 0; k ¼ 1; . . . m; j ¼ 1; . . . ; m ½25b
@j ¼
Finally, we consider the most general situation of
the perturbation of a general system (not necessarily
is invertible. Then, for all  sufficiently small, eqn
linear):
[21] has a unique T-periodic solution which tends to
dx x0 when  tends to 0.
¼ f ðx; tÞ þ gðx; t; Þ ½21
dt We show that under the hypothesis of the
where we assume that theorem, we can apply the results proved in the
preceding section. Note that one can prove the
dx theorem for eqn [24] because it reduces to [21] with
¼ f ðx; tÞ ½22
dt the change of variables [23].
Bifurcations of Periodic Orbits 289

Note first that the m conditions [25a] imply that Then, the solutions
(t) depend linearly on . We thus
the m equations, obtain that a priori p () are quadratic functions of :

d
p ð1 ; . . . ; m Þ
¼ Dfx ðx0 ðtÞ; tÞ 
þ gðx0 ðtÞ; t; 0Þ Z
dt 1X T
@ 2 fj @zk @zl
¼ q r Ujp   ds
display a family of T-periodic solutions which 2 qrkl 0 @zk @zl @q @r
depend on m parameters  = (1 , . . . ,m ). From Z " !
X T
1 @ 2 fj @zk  @zl 
(13), one can write þ q Ujp 
l þ
k
qkl 0 2 @zk @zl @q @q


 ðtÞ ¼ 1 1 ðtÞ þ    þ m m ðtÞ þ
ðtÞ ½26 #
@gj @zk
 is a particular T-periodic solution and þ  ds þ    ½28
where
(t) @zk @q
the j (t) are independent T-periodic solutions
of (22a). where the dots represent quantities independent of .
We use then the expression
Lemma 1 A possible choice for the solutions j (t)
is @x (t)=@j j=0 .  2 
d @ zj
We have already assumed that these vectors are dt @q @@r
independent. They are obviously T-periodic solu- X @ 2 fj @zk @zl X @fj @ 2 zk
tions to (22a). ¼   þ
In the following, we will assume that all other periodic kl
@zk @zl @q @r k
@zk @q @@r
solutions of (22a) are linear combinations of these.
As a consequence of what was proved in the This allows one to find the homogeneous quadratic
section on periodic orbits of linear systems, system part as
[24] displays a periodic solution (for  small enough)
if there exists a solution XZ T
@ 2 fj @zk @zl
Ujp   ds
 0  jkl 0 @zk @zl @q @r
0
1 ; . . . ; m  2 
XZ T d @ zj
¼ Ujp ðsÞ ds
to equations 0 ds @q @@r
j
n Z
X T XZ T @fj @ 2 zk
k ðÞ ¼ Ujk ðsÞFj ð
 ðsÞ; s; 0Þ ds ¼ 0;  Ujp ðsÞ ds
j¼1 0 jk 0 @zk @q @r

k ¼ 1; . . . ; m
Integration by parts yields
such that XZ T
@ 2 fj @zk @zl
Ujp   ds
@ k ðÞ jkl 0 @zk @zl @q @r
j 0; k ¼ 1; . . . m; j ¼ 1; . . . ; m
@j ¼ X Z T dUjp  2
@fj @ zk
¼ þ Ujp ðsÞ ds ¼ 0
is invertible. j 0 ds @z k @ q @r

Lemma 2 The quantities k () depend linearly in .


because U is solution to [7a]. This shows that [28]
Proof Observe first that the quantities Fj (
, s, 0) is linear in . Suffices to show that the determinant
depend quadratically of
: of this system does not vanish to have existence and
uniqueness of the solution such that
1 X @ 2 fj
Fj ð
; s; 0Þ ¼ ðx 0 ðsÞ; sÞ
k
l
2 k;l @zk @zl  @ 1 ; . . . ; m
6¼ 0
@1 ; . . . ; m
X @gj
þ ðx0 ðsÞ; s; 0Þ Consider now the coefficient of the linear part:
k
@zk
XZ T  2 
@gj @ fj  @gj @zk
þ ðx 0 ðsÞ; s; 0Þ ½27 Ujp 
l þ  ds
@  kl 0 @zk @zl @zk @q
290 Bi-Hamiltonian Methods in Soliton Theory

  2 
and the coefficient d p  XZ T @ fj  @gj @zk
¼ Ujp 
þ  ds
n Z
X T dq ¼0 @zk @zl l @zk @q
kl 0
p ðÞ ¼ Ujp ðuÞgj ðx ðuÞ; u; 0Þ du
j¼1 0
This achieves the proof of the theorem. In the special
We can write case of Hamiltonian systems, in the case of the
Z T  peturbations of an isochronous system, the method
d p @Ujp @gj @zk explained is equivalent to Moser’s averaging theory.
¼  gj þ Ujp  ds
dq 0 @q @zk @q The reader is referred to other articles in this
encyclopedia for a discussion of other aspects of
Note that
synchronization, frequency locking, and phase locking.
d
j X @fj 
¼
r þ gj ðzðtÞ; 0 ; 0Þ See also: Bifurcation Theory; Fractal Dimensions in
dt r
@zr
Dynamics; Integrable Systems: Overview; Isochronous
and we obtain Systems; Leray–Schauder Theory and Mapping Degree;
! Ljusternik–Schnirelman Theory; Singularity and
Z
d p T
@Ujpd
j X @fj  Bifurcation Theory; Symmetry and Symmetry Breaking in
¼  
r Dynamical Systems; Synchronization of Chaos; Weakly
dq 0 @q ds r
@zr
 Coupled Oscillators.
@gj @zk
þUjp  ds
@zk @q
Further Reading
Integration by parts yields Hartman P (1964) Ordinary Differential Equations. New York:
 Z T   ! Wiley.
d p  d @Ujp X @fj Malkin I (1952) Stability Theory of the Motion. Moscou–
¼ 

j þ 

r
dq ¼0 0 ds @q r
@zr Leningrad: Izdat. Gos.
  Malkin I (1956) Some Problems in the Theory of Nonlinear
Z T
@gj @zk Oscillations. Gostekhisdat.
þ Ujp  ds Moser J (1970) Regularization of Kepler’s problem and the
0 @zk @q
averaging method on a manifold. Communication of Pure and
From the equation Applied Mathematics 23: 609–636.
Roseau M (1966) Vibrations non linéaires et théorie de la stabilité,
dUjp X @fk Springer Tracts in Natural Philosophy, vol. 8. Berlin: Springer.
þ U ¼0 Van der Pol B (1926) On relaxation-oscillations. Philosophical
dt @zj kp
k Magazine 3(7): 978–992.
Van der Pol B (1931) Oscillations sinusoidales et de relaxation.
we deduce that L’onde électrique 245–256.
  X @fk @Ujp X @ 2 fk Van der Pol B and Van der Mark J (1927) Frequency
d @Ujp @zr demultiplication. Nature 120: 363–364.
¼ þ Ukp
dt @q k
@zj @q k
@zj @zr @q Van der Pol B and Van der Mark J (1928) The heart beat
considered as a relaxation oscillation, and an electrical model
and thus this shows that of the heart. Philosophical Magazine 6(7): 763–775.

Bi-Hamiltonian Methods in Soliton Theory


M Pedroni, Università di Bergamo, solution of the (nonlinear) Korteweg–de Vries
Dalmine (BG), Italy equation (henceforth simply the KdV equation)
ª 2006 Elsevier Ltd. All rights reserved. ut ¼ 14ðuxxx  6uux Þ ½1
to the solution of linear equations. After the KdV
equation, a lot of other nonlinear partial differential
Introduction
equations, solvable by means of the inverse-scattering
At the end of the 1960s, the theory of integrable method, were found out. A common feature of such
systems received a great boost by the discovery equations is the existence of soliton solutions, that
(made by Gardner, Green, Kruskal, and Miura) of is, solutions in the shape of a solitary wave (with
the inverse-scattering method (see Integrable additional interaction properties). For this reason
Systems: Overview). It allows one to reduce the they are called ‘‘soliton equations.’’
Bi-Hamiltonian Methods in Soliton Theory 291

It was soon observed that the KdV equation can Hamiltonian Methods in Soliton Theory
be seen as an infinite-dimensional Hamiltonian
The most famous example of soliton equation is
system with an infinite sequence of constants of
the KdV equation [1], where u is usually a
motion in involution; the corresponding (commut-
periodic or rapidly decreasing real function. The
ing) vector fields are symmetries for the KdV
choice of the coefficients in the equation has no
equation, and form the so-called KdV hierarchy. In
special meaning, since they can be changed
particular, Zakharov and Faddeev constructed
arbitrarily by rescaling x, t, and u. Right after
action-angle variables for the KdV equation. These
the discovery of the inverse-scattering method for
facts pointed out that the KdV equation is an
solving the Cauchy problem for the KdV equation,
infinite-dimensional analog of a classical integrable
it was realized that this equation can be seen as an
Hamiltonian system (Dubrovin et al. 2001), whose
infinite-dimensional Hamiltonian system. Indeed,
theory has been developed during the nineteenth
from a geometrical point of view, eqn [1] defines a
century by Liouville, Jacobi, and many others.
vector field X(u) = (1=4)(uxxx  6uux ) on M, the
Moreover, the infinite-dimensional case suggested
infinite-dimensional vector space of C1 functions
methods (such as the existence of a Lax pair) which
from the unit circle S1 to R. (For the sake of
were applied successfully also to finite-dimensional
simplicity, we consider only the periodic case; the
cases such as the Toda lattices and the Calogero
integrals in this article are therefore understood to
systems. More recently, after the discovery by
be taken on S1 .) The vector field X associated with
Witten and Kontsevich of remarkable relations
the KdV equation is Hamiltonian, that is, it can be
between the KdV hierarchy and matrix models of
factorized as
two-dimensional (2D) quantum gravity, there has
been a renewed interest in the study of soliton  
XðuÞ ¼ ½2@x  18ðuxx þ 3u2 Þ
equations in the community of theoretical physicists.
We also mention that the classical versions of the where dH = (1/8)(uxx þ 3u2 ) is the differential of
extended W n -algebras of 2D conformal field theory the functional
are the (second) Poisson structures of the Gelfand– Z  
1 1
Dickey hierarchies. HðuÞ ¼ u3 þ u2x dx
In this article we describe the so-called 8 2
bi-Hamiltonian formulation of soliton equations. that is, the variational derivative h=u of the density
This approach to integrable systems springs from the h = (1=8)(u3 þ (1/2)u2x ), and P = 2@x is a Poisson
observation, made by Magri at the end of the 1970s, that (or Hamiltonian) operator. This means that the
the KdV equation can be seen as a Hamiltonian system corresponding composition law
in two different ways. In the same circle of ideas, there Z Z
were important works by Adler, Dorfman, Gelfand, fF; Gg ¼ dF PðdGÞ dx ¼ 2 dF ðdGÞx dx ½2
Kupershmidt, Wilson, and many others. Thus, the
concept of bi-Hamiltonian manifold, which constitutes
between functionals of u has the usual properties
the geometric setting for the study of bi-Hamiltonian
of the Poisson bracket, that is, it is R-bilinear
systems, emerged. This notion and its applications to the
and skew-symmetric, and it fulfills the Leibniz
theory of finite-dimensional integrable systems is
rule and the Jacobi identity. In other words,
discussed in Multi-Hamiltonian Systems.
(M, P) is an infinite-dimensional Poisson mani-
In the first section of this article, we discuss the
fold. Using the Poisson bracket [2], eqn [1] can
Hamiltonian form of soliton equations and, more
be written as
generally, we present an important class of infinite-
dimensional Poisson (also called Hamiltonian) ut ¼ fu; Hg ½3
structures, namely those of hydrodynamic type.
Then we show how to use the bi-Hamiltonian corresponding to the usual Hamilton equation in
properties of the KdV equation in order to construct R2n
its conserved quantities. We also recall that the KdV z_ i ¼ fzi ; Hg; i ¼ 1; . . . ; 2n ½4
equation can be seen as an Euler equation on the
dual of the Virasoro algebra. In the third section, we up to the replacement of z with u, and of the
deal with other examples of integrable evolution discrete index i with the continuous index x. More
equations admitting a bi-Hamiltonian representa- precisely, in the expression ut = {u, H} the symbol u
tion, that is, the Boussinesq and the Camassa–Holm should be replaced by ux (in analogy with zi ), the
equations, and we consider the bi-Hamiltonian functional assigning to the generic function v 2 M
structures of hydrodynamic type. its value at a fixed point x, that is, ux : v 7! v(x). In
292 Bi-Hamiltonian Methods in Soliton Theory

 
these notations, the Poisson bracket [2] takes the ij 0 I
form ½P  ¼
I 0
fux ; uy g ¼ 20 ðx  yÞ then we have the Hamiltonian formulation of the
where the -function is as usual defined as field equations,
Z h h
f ðyÞðx  yÞ dx ¼ f ðxÞ qit ¼ ; pit ¼  ; i ¼ 1; . . . ; N
pi qi
so that its derivatives are given by Another important example of Poisson bracket on
Z Mn is given by
f ðyÞðkÞ ðx  yÞ dx ¼ f ðkÞ ðxÞ
fui; x ; uj; y g ¼ gij 0 ðx  yÞ ½8
Another important example is given by the where gij are the entries of a constant symmetric
Boussinesq equation matrix. In this case,R the Hamiltonian vector field

associated with H = h dx is given by
utt ¼ 13 uxxxx þ 4u2x þ 4uuxx ½5  
Xn
h
i ij
describing, like KdV, shallow water (soliton) waves ut ¼ g @x ; i ¼ 1; . . . ; n ½9
uj
in a nonlinear approximation. It can be obtained by j¼1
the first-order (in time) system R
Notice that this vector field is zero if H = uk dx,
u1 t ¼ 23 u2 u2x þ u1xx  23 u2xxx ; u2 t ¼ 2u1x  u2xx ½6 with k = 1, . . . , n. This amounts to saying that such
an H is a Casimir function of the Poisson bracket
by taking the derivative of its second equation with [8], that is, that {H, F} = 0 for all functionals F. A
respect to t, plugging the result in the first one, and simple example of this class (with n = 2) is given by
setting u= u2 . The system [6] is Hamiltonian, since it the Poisson structure of the Boussinesq equation,
can be written as corresponding to the choice g11 = g22 = 0 and
    g12 = g21 = 1. Suppose now that the matrix with
h h
u1 t ¼ 2
; u 2
t ¼ entries gij is invertible. Then they can be interpreted
u x u1 x as the contravariant components of a flat pseudo-
with h = (u1 )2 þ (1=9)(u2 )3  u1 u2x þ (1=3)(u2x )2 , and Riemannian metric in Rn . A change of coordinates
  (u1 , . . . , un ) 7! ( n ) in Rn transforms the
u1 , . . . , u
0 @x Poisson bracket [9] in
½7
@x 0
ui; x ; u
f uÞ0 ðx  yÞ þ ijk ð
j; y g ¼ gij ð ukx ðx  yÞ ½10
uÞ
is easily seen to be a Poisson operator. Thus, the
Poisson manifold associated with the Boussinesq where gij (
u) are the components of the metric in the
ij
equation is the space of periodic C1 functions with new coordinates and the k are the contravariant
values in R2 . More generally, one can consider the Christoffel symbols related to the usual Christoffel
space Mn of C1 functions from the unit circle S1 to symbols by
Rn . If Pij , for i, j = 1, . . . , n, are the entries of a ij j
constant skew-symmetric matrix and ui, x assigns to k ¼ gil lk ½11
the generic function v 2 Mn the value of its ith Conversely, the expression [10] gives a Poisson
components at a fixed point x, then bracket if the metric defined by gij is flat and its
Christoffel symbols are related to the ijk by [11].
fui; x ; uj; y g ¼ Pij ðx  yÞ
These are the Poisson structures of hydrodynamic
defines a Poisson bracket on Mn . One can also let type introduced by Dubrovin and Novikov. We will
the Pij depend on the uk in such a way that they consider them again later.
form Rthe components of a Poisson tensor on Rn . If
H = h dx is a functional on Mn with density h, the
associated Hamiltonian vector field gives rise to the Bi-Hamiltonian Formulation
following system of partial differential equations: of the KdV Equation
X
n
h The KdV equation [1] has a lot of remarkable
uit ¼ Pij ; i ¼ 1; . . . ; n
j¼1
u j properties, such as the Lax representation and the
existence of a -function. In this section, we recall a
In particular, if n = 2N and geometrical feature of KdV, namely, the fact that it
Bi-Hamiltonian Methods in Soliton Theory 293

has a second Hamiltonian structure, and we show Such relations are often called Lenard–Magri rela-
that the integrability of KdV can be seen as a natural tions. Then the functionals Hk are in involution with
consequence of its double Hamiltonian representa- respect to both Poisson brackets. Indeed, for k > j,
tion. We have already seen that the KdV vector field one has
X(u) = (1=4)(uxxx  6uux ) can be written as
fHj ; Hk g0 ¼ fHj ; Hk1 g1 ¼ fHjþ1 ; Hk1 g0
XðuÞ ¼ P0 dH2 ¼    ¼ fHk ; Hj g0
where P0 = 2@x and so that {Hj , Hk }0 = 0 for all j, k  0, and therefore
Z   {Hj , Hk }1 = 0 for all j, k  0. Hence, these func-
1 1
H2 ¼ u3 þ u2x dx tionals are constants of motion (in involution) for
8 2
the KdV equation. The Hamiltonian vector fields
But X admits another Hamiltonian representation: associated with them are symmetries for the KdV
equation; the corresponding evolution equations are
XðuÞ ¼ P1 dH1 called higher-order KdV equations. The set of such
equations is the well-known KdV hierarchy. We
where P1 = (1=2)@xxx þ 2u@x þ ux and
remark that the existence of a sequence of func-
Z
1 tionals {Hk }k0 , fulfilling the Lenard–Magri rela-
H1 ¼  u2 dx tions [12] and starting from a Casimir of P0 , is
4
equivalentP to the existence of a Casimir function
The important point is that P1 is also a Poisson H() = k0 Hk k for the Poisson pencil
operator. Moreover, it is compatible with P0 , that is, P = P1  P0 , where  is a real parameter. A
any linear combination of P0 and P1 is still a Poisson straightforward way (due essentially to Miura,
operator. Thus, the KdV equation is a bi-Hamiltonian Gardner, and Kruskal) to determine such a Casimir
system, that is, it can be seen in two different (but function is to consider the (generalized) Miura map
compatible) ways as a Hamiltonian system. Next, we h 7! u = hx þ h2  . As shown by Kupershmidt
will show how this property can be used to construct and Wilson, it transforms the Poisson structure
an infinite sequence of conserved quantities for the (1=2)@x (in the variable h) into the Poisson pencil
KdV equation, which are in involution with respect to P =  (1=2)@xxx þ 2(u þ )@x þ ux . Given u, the
the Poisson brackets { , }0 and { , }1 associated with Riccati equation
P0 and P1 . In particular, the phase space M of KdV
is a bi-Hamiltonian manifold, that is, it has two hx þ h2 ¼ u þ  ½13
different (but compatible) Poisson structures. Let us
rename X1 = X the KdV vector field. Since admits a unique Psolution with the asymptotic
X = P0 dH2 = P1 dH1 , one is naturally led to con- expansion h = z þ k1 hk zk , where z2 = . More-
sider the vector fields over, the coefficients hk are differential polynomials
in u (i.e., polynomials in u and its x-derivatives) that
X0 ¼ P0 dH1 ; X2 ¼ P1 dH2 can be computed by recurrence. Thus, the general-
ized Miura map can be seen as an Rinvertible
Explicitly, X0 (u) = ux and X2 (u) = (1=16)(uxxxxx 
transformation. Since the functional h 7! h dx is a
10uuxxx  20ux uxx þ 30u2 ux ). One can check that
Casimir of the Poisson structure (1=2)@x , it follows
these vector fields are also Rbi-Hamiltonian. Indeed,
that if h(u) is the
R solution of the Riccati equation
X0 (u) = P1 dH0 , with H0 = u dx, and
[13], then u 7! h(u) dx is a Casimir of the Poisson
X2 ¼ P0 dH3 with pencil P . More precisely,
R one has to introduce the
Z functional H() = z h(u) dx, that turns out to be a
1 2 5
H3 ¼  uxx þ 5uu2x þ u4 dx Laurent series in , because the even coefficients of
64 2 h(u) are x-derivatives. This is the Casimir function
The functional H0 is a Casimir of P0 , that is, we were looking for. Explicitly, one finds that the
P0 dH0 = 0, so that the iteration ends on this side, first terms of h(u) are
but it can be continued indefinitely from the other
h1 ¼ 12u; h2 ¼ 14ux ; h3 ¼ 18ðuxx  u2 Þ
side, as shown below. For the time being, let us take
1
for granted that there exists an infinite sequence h4 ¼ 16 ðuxxx  4uux Þ
{Hk }k0 of functionals such that P1 dHk = P0 dHkþ1 ; 1
h5 ¼ 32 ðuxxxx  6uuxx 5u2x þ 2u3 Þ
in other words,
Obviously, h1 is the density of a Casimir function of
f; Hk g1 ¼ f; Hkþ1 g0 ½12 P0 , while h3 and h5 are (one-half of) the densities of the
294 Bi-Hamiltonian Methods in Soliton Theory

two Hamiltonians H1 and H2 of the KdV equation. This is (up to rescaling) the second Poisson
We conclude this section showing that, as observed bracket of KdV. The KdV equation is therefore
by Khesin and Ovsienko (Arnol’d and Khesin 1998), an Euler equation, that is, it can be obtained from
the bi-Hamiltonian structures of KdV have a clear the Euler equations for the rigid body by repla-
Lie-algebraic origin. Indeed, the second Hamiltonian cing the Lie algebra of the rotation group with
structure is the Lie–Poisson structure on the dual of the Virasoro algebra. To be more precise, the
the Virasoro algebra, while the first one can be Hamiltonian vector R field associated with
obtained by ‘‘freezing’’ the second one at a suitable H1 (u, c) = (1=2)( u2 dx þ c) is
point. Let X (S1 ) be the Lie algebra of vector fields
on S1 . The Virasoro algebra is the vector space ut þ 3uux þ cuxxx ¼ 0; ct ¼ 0
g = X (S1 )  R endowed with the Lie-algebra
If c 6¼ 0, this is (up to rescaling) the KdV equation
structure
[1]. For c = 0, we have the Burgers equation (also
   
@ @ called dispersionless KdV equation), to be discussed
f ðxÞ ; a ; gðxÞ ; b again later on. The first Poisson bracket for the KdV
@x @x
@ hierarchy can be obtained by ‘‘freezing’’ the Lie–
¼ ðf 0 ðxÞgðxÞ  g0 ðxÞf ðxÞÞ ; Poisson bracket at the point ((1=2)dx  dx, 0) of the
Z @x
dual of the Virasoro algebra. This means that
f 0 ðxÞg00 ðxÞ dx ½14 instead of [16] one has to consider

It is called a central extension of X (S1 ) since it is fF; Gg0 ðu; cÞ


   0  
obtained by considering the usual commutator 1 f g
between vector fields (up to a sign) and by adding ¼ dx  dx; 0 ;
2 u u
a copy of R, which turns out to be the center of  0   Z  0  00 
g f @ f g
the Virasoro algebra. Equation [14] gives rise  ; dx
indeed to aR Lie-algebra structure because the u u @x u u
Z  0    0  
expression f 0 (x)g00 (x) dx defines a 2-cocycle of 1 f g g f
¼  dx ½17
X (S1 ). The dual space g of g can be considered 2 u u u u
as the space of the pairs (u dx  dx, c), where
u 2 C1 (S1 ) and c 2 R. The pairing is obviously The
R corresponding Hamiltonian is H2 = (1=2)
given by (u3 þ cu2x ) dx. From this (Lie algebraic) point of
  Z view, the compatibility between the two Poisson
@ brackets follows from the fact that the pencil { , } =
ðu dx  dx; cÞ; f ;a uðxÞf ðxÞ dx þ ac
@x { , }  { , }0 is obtained from the Lie–Poisson
The Lie–Poisson structure on the dual g of a Lie bracket { , } by applying the translation
  
algebra g is defined as 
ðu dx  dx; cÞ 7! u þ dx  dx; c
fF; GgðXÞ ¼ hX; ½dFðXÞ; dGðXÞi ½15 2

where F, G 2 C1 (g) and their differentials at X 2 g


are seen as elements
R of g. When g is the RVirasoro algebra
and F(u, c) = f (u, c) dx, G(u, c) = g(u, c) dx are
Other Examples
two functionals on g whose densities f and g are
differential polynomials in u, one has In the previous section, we have presented the bi-
Hamiltonian structure of the KdV equation and
fF; Ggðu; cÞ some of its properties. Now we give two more
 0  
f g examples of equations – the Boussinesq equation
¼ ðu dx  dx; cÞ; and the Camassa–Holm equation – admitting a
u u
 0   Z  0  00  bi-Hamiltonian formulation. We have seen in an
g f @ f g
 ; dx earlier section that the system [6] associated with
u u @x u u
Z           the Boussinesq equation [5] is Hamiltonian with
f 0 g g 0 f respect to the Poisson structure [7] and the
¼ u  dx
u u u u Hamiltonian
Z  0  00 Z
f g
þ c dx ½16 H1 ðu1 ; u2 Þ ¼ ðu1 Þ2 þ 19 ðu2 Þ3  u1 u2x þ 13 ðu2x Þ2 dx
u u
Bi-Hamiltonian Methods in Soliton Theory 295

Z x
A more complicated Poisson structure for this
system is uðxÞ ¼ mðyÞ sinhðy  xÞ dy
0
! Z 1  
1 1
A 3@x4 þ 3u2 @x2 þ 9u1 @x þ 3u1x þ mðyÞ cosh y  x  dy
P¼ ½18 2 sinhð1=2Þ 0 2
B 6@x3 þ 6u2 @x þ 3u2x
The Camassa–Holm equation is then bi-Hamiltonian
with with respect to the Poisson pair
A ¼ 2@x5  4u2 @x3  6u2x @x2 þ ð2ðu2 Þ2 þ 6u1x  6u2xx Þ@x P1 ¼ @xxx  @x ; P2 ¼ 2m@x þ mx
þ ð3u1xx  2u2xxx þ 2u2 u2x Þ
Indeed, it can be written as mt = P1 dH2 = P1 dH2 ,
and where
Z
B ¼ 3@x4  3u2 @x2 þ ð9u1  6u2x Þ@x þ ð6u1x  3u2xx Þ 1
H1 ¼  ðu2 þ u2x Þ dx
2
It can be obtained by means of the Drinfeld– Z
1
Sokolov reduction (or also by means of a H2 ¼ ðu3 þ uu2x Þ dx
2
bi-Hamiltonian reduction) from the Lie–Poisson
structure (modified with the cocycle @x ) on the Notice that the Poisson pair of the Camassa–Holm
space of C1 maps from S1 to the Lie algebra of equation can be obtained from that of KdV by
3 3 traceless matrices. This is the reason why it is moving the cocycle @xxx from the second Poisson
a Poisson structure, compatible with [7]. The system structure to the first one. Indeed,
[6] can be written as
0 1
! Pða;b;cÞ ¼ a@xxx þ b@x þ cð2m@x þ mx Þ
u1t h2 /u1
@ A¼ P
a; b; c 2 R ½20
u2 h2 /u2
t
is a family of pairwise compatible Poisson operators.
where h2 = (1=3)u1 is the density of a Casimir of the Moreover, we mention that Misiołek has shown that
Poisson structure [7]. Thus, the Boussinesq equation also the Camassa–Holm equation is an Euler equation
is a bi-Hamiltonian system and can be shown to on the dual of the Virasoro algebra. We conclude this
possess, like KdV, an infinite sequence of conserved article with a brief discussion concerning the so-called
quantities and symmetries, forming the Boussinesq bi-Hamiltonian structures of hydrodynamic type. They
hierarchy. The KdV and the Boussinesq hierarchy are play a relevant role in the theory of Frobenius
indeed particular examples of Gelfand–Dickey hier- manifolds, that, in turn, have deep relations with
archies (Dickey 2003). They are hierarchies of many important topics in contemporary mathematics
systems of n equations with n unknown functions and physics, such as Gromov–Witten invariants and
and they are related, via the Drinfeld–Sokolov isomonodromic deformations. As we have seen in the
approach, to the Lie algebra sl(n þ 1). As shown by earlier section, a Poisson structure of hydrodynamic
Adler, Dickey, and Gelfand, these hierarchies have a type is given, on the space of C1 maps from S1 to (an
bi-Hamiltonian formulation. Also the generalized open set of) Rn , by
KdV equations, associated by Drinfeld and Sokolov
with an arbitrary affine Kac–Moody Lie algebra, are fui; x ; uj; y g ¼ gij ðuÞ0 ðx  yÞ þ ijk ðuÞukx ðx  yÞ ½21
bi-Hamiltonian (or are obtained as suitable reduc-
tions of bi-Hamiltonian systems). Let us consider where gij (u) are the contravariant components of
now the (dispersionless) Camassa–Holm equation a (pseudo-)Riemannian flat metric and the ijk are
the (contravariant) Christoffel symbols of the
ut  utxx ¼ 3uux þ 2ux uxx þ uuxxx ½19 metric. If two Poisson structures of hydrodynamic
type are given, it can be shown that they are
which also describes shallow water waves, and compatible if and only if the two corresponding
possesses remarkable solutions called peakons, since metrics form a flat pencil. This means that their
they represent traveling waves with discontinuous linear combinations (with constant coefficients)
first derivative. In order to supply this equation with a are still flat (pseudo-)Riemannian metrics, and
(bi-)Hamiltonian structure, one has to perform the that the contravariant Christoffel symbols of the
change of variable m = u uxx , whose inverse, in the linear combinations are the linear combinations
space of period-1 functions, turns out to be given by of the contravariant Christoffel symbols of the
296 Billiards in Bounded Convex Domains

two metrics. The simplest example is given by the Further Reading


bi-Hamiltonian formulation of the Burgers (or
Arnol’d VI and Khesin BA (1998) Topological Methods in
dispersionless KdV) equation, Hydrodynamics. New York: Springer.
ut þ 3uux ¼ 0 Błaszak M (1998) Multi-Hamiltonian Theory of Dynamical
Systems. Berlin: Springer.
that we have already encountered. We know that Dickey LA (2003) Soliton Equations and Hamiltonian Systems,
this equation is Hamiltonian with respect to the 2nd edn. River Edge: World Scientific.
(Lie–)Poisson operator R2u@x þ ux , with Hamiltonian Dorfman I (1993) Dirac Structures and Integrability of Nonlinear
Evolution Equations. Chichester: Wiley.
function H1 = (1=2) u2 dx, and with respect to Drinfeld VG and Sokolov VV (1985) Lie algebras and equations
the Poisson operator
R @x , with Hamiltonian function of Korteweg–de Vries type. Journal of Soviet Mathematics
H2 = (1=2) u3 dx. This also means that the bi- 30: 1975–2036.
Hamiltonian structure of the Burgers equation Dubrovin BA (1996) Geometry of 2D topological field theories.
comes from the family [20]. The first Hamiltonian In: Donagi R et al. (ed.) Integrable Systems and Quantum
Groups (Montecatini Terme, 1993), Lecture Notes in Mathe-
structure corresponds to the standard metric on R, matics, vol. 1620, pp. 120–348. Berlin: Springer.
that is, du  du, whereas the second one is given by Dubrovin BA, Krichever IM, and Novikov SP (2001) Integrable
the metric (2u)1 du  du. systems. I. In: Arnol’d VI (ed.) Encyclopaedia of Mathematical
Sciences. Dynamical Systems IV, pp. 177–332. Berlin: Springer.
See also: Classical r-Matrices, Lie Bialgebras, and Faddeev LD and Takhtajan LA (1987) Hamiltonian Methods in
Poisson Lie Groups; Hamiltonian Fluid Dynamics; the Theory of Solitons. Berlin: Springer.
Infinite-Dimensional Hamiltonian Systems; Integrable Magri F, Falqui G, and Pedroni M (2003) The method of Poisson
pairs in the theory of nonlinear PDEs. In: Conte R et al. (ed.)
Systems and Recursion Operators on Symplectic and
Direct and Inverse Methods in Nonlinear Evolution Equations,
Jacobi Manifolds; Integrable Systems: Overview;
Lecture Notes in Physics, vol. 632, pp. 85–136. Berlin: Springer.
Korteweg–de Vries Equation and Other Modulation Marsden JE and Ratiu TS (1999) Introduction to Mechanics and
Equations; Multi-Hamiltonian Systems; Recursion Symmetry, 2nd edn. New York: Springer.
Operators in Classical Mechanics; Solitons and Olver PJ (1993) Applications of Lie Groups to Differential
Kac–Moody Lie Algebras; Toda Lattices; WDVV Equations, 2nd edn. New York: Springer.
Equations and Frobenius Manifolds.

Billiards in Bounded Convex Domains


S Tabachnikov, Pennsylvania State University, table and let AXB be a billiard trajectory from A to
University Park, PA, USA B with reflection at a boundary point X. Then, the
ª 2006 Elsevier Ltd. All rights reserved. position of a variable point X extremizes the length
AXB. This is the Fermat principle of geometrical
optics.
In this article, we discuss billiards in bounded
Billiard Flow and Billiard Ball Map
convex domains with smooth boundary, also called
The billiard system describes the motion of a free Birkhoff billiards. A related article treats billiards in
particle inside a domain with elastic reflection off the polygons (see Polygonal Billiards).
boundary. More precisely, a billiard table is a The billiard flow is defined as a continuous-time
Riemannian manifold M with a piecewise smooth dynamical system. The time-t billiard transformation
boundary, for example, a domain in the plane. The acts on unit tangent vectors to M which constitute the
point moves along a geodesic line with a constant speed phase space of the billiard flow, and the manifold M is
until it hits the boundary. At a smooth boundary point, its configuration space. Thus, the billiard flow is the
the billiard ball reflects so that the tangential compo- geodesic flow on a manifold with boundary.
nent of its velocity remains the same, while the normal It is useful to reduce the dimensions by one and to
component changes its sign. This means that both replace continuous time by discrete one, that is, to
energy and momentum are conserved. In dimension 2, replace the billiard flow by a mapping, called the
this collision is described by a well-known law of billiard ball map and denoted by T. The phase space
geometrical optics: the angle of incidence equals the of the billiard ball map consists of unit tangent
angle of reflection. Thus, the theory of billiards has vectors (x, v) with the foot point x on the boundary
much in common with geometrical optics. If the billiard of M and the inward direction v. A vector (x, v)
ball hits a corner, its further motion is not defined. moves along the geodesic through x in the direction
The billiard reflection law satisfies a variational of v to the next point of its intersection x1 with the
principle. Let A and B be fixed points in the billiard boundary @M, and then v reflects in @M to the new
Billiards in Bounded Convex Domains 297

β
where Sn1 and Bn1 are the unit sphere and the unit
β
disk in Euclidean spaces.
α

α
Existence and Nonexistence of Caustics
Given a plane billiard table, a caustic is a curve
Figure 1 Billiard ball map. inside the table such that if a segment of a billiard
trajectory is tangent to this curve then so is each
reflected segment. Caustics correspond to invariant
inward vector v1 . Then, one has: T(x, v) = (x1 , v1 ). circles of the billiard ball map (i.e., invariant curves
For a convex M, the map T is continuous. If M is that go around the phase cylinder): such an invariant
n-dimensional, then the dimension of the phase circle is a one-parameter family of oriented lines,
space of the billiard ball map is 2n  2. and the respective caustic is their envelop. An
Equivalently, and more in the spirit of geometrical envelop may have cusp-like singularities but if the
optics, one considers L, the space of oriented boundary of the billiard table is a smooth curve with
geodesics (rays of light) that intersect the billiard positive curvature then a caustic, sufficiently close to
table. This space of lines is in one-to-one correspon- the boundary, is smooth and convex.
dence with the phase space of the billiard ball map: One can recover the table from a caustic by the
to an inward unit vector (x, v) there corresponds the following string construction. Let  be a caustic.
oriented line through x in the direction v (Figure 1). Wrap a closed nonstretchable string around , pull it
The space of rays L carries a canonical symplec- tight at a point and move this point around  to
tic structure, that is, a closed nondegenerate obtain a new curve . Then,  is a caustic for the
differential 2-form. In the Euclidean case, this billiard inside . Note that this construction has one
symplectic structure ! is defined as follows. Given parameter, the length of the string.
an oriented line ‘ in Rn , let q be the unit vector The following useful ‘‘mirror equation’’ relates
along ‘ and p be the vector obtained by dropping various quantities depicted in Figure 2:
the perpendicular
P from the origin to ‘. Then, 1 1 2k
! = dp ^ dq = dpi ^ dqi . This construction identi- þ ¼
fies L with the cotangent bundle of the unit sphere: a b sin 
q is a unit vector and p is a (co)tangent vector at q, where k is the curvature of the boundary at the
and ! identifies with the canonical symplectic impact point.
structure of T  Sn1 . In the general case of a Do caustics exist for every convex billiard table?
Riemannian manifold M, the symplectic structure This is important to know, in particular, because the
on the space of oriented geodesics is obtained from existence of a caustic implies that the billiard ball
that on T  M by symplectic reduction. map is not ergodic. The answer is given by a
One has an important result: the billiard ball map theorem of Lazutkin: if the boundary of the billiard
preserves the symplectic structure T  (!) = !. As a table is sufficiently smooth and its curvature never
consequence, T is also measure preserving. In the vanishes, then there exists a collection of smooth
planar case, one has the following explicit formula caustics in the vicinity of the billiard curve whose
for this measure. Let t be an arc length parameter union has a positive area. Originally this theorem
along the boundary of the billiard table and let asked for 553 continuous derivatives; later this was
 2 [0, ] be the angle made by the unit vector with reduced to six. This result uses the techniques of the
this boundary. Then, (, t) are coordinates in the KAM (Kolmogorov–Arnol’d–Moser) theory. The
phase space, identified with the cylinder, and the
invariant measure is sin  d dt. k
As a consequence, the total area of the phase Γ
α α
space equals 2L where L is the perimeter length of a b
the boundary of the billiard table, and the mean free
path equals A=L, where A is the area of the billiard
table. In the general n-dimensional case, the mean
free path equals
γ
volðSn1 Þ volðMÞ
volðBn1 Þ volð@MÞ Figure 2 String construction and mirror equation.
298 Billiards in Bounded Convex Domains

crucial fact is that, in appropriate coordinates, the One has the following theorem: a billiard
billiard ball map is approximated, near the bound- trajectory inside M remains tangent to fixed
ary of the phase cylinder, by the integrable map (n  1) confocal quadrics. A similar and closely
(x, y) 7! (x þ y, y). related result holds for the geodesic curves on M:
On the other hand, by a theorem of Mather, if the the tangent lines to a fixed geodesic on M are
curvature of a convex smooth billiard curve vanishes tangent to (n  2) other fixed quadrics, confocal
at some point, then this billiard ball map has no with M. For a triaxial ellipsoid, this theorem goes
invariant circles. This result belongs to the well- back to Jacobi.
developed theory of area-preserving twist maps of Explicit formulas for the integrals of the billiard
the cylinder, of which the billiard ball map is an in an n-dimensional ellipsoid [1] are as follows. Let
example. (x, v) be a phase point, a unit inward tangent vector
whose foot point x lies on the boundary. The
following functions are invariant under the billiard
ball map:
Integrable Billiards
X ðvi xj  vj xi Þ2
Let a plane billiard table be an ellipse with foci F1 Fi ðx; vÞ ¼ v 2i þ ; i ¼ 1; . . . ; n
and F2 . It is known since antiquity that a billiard j6¼i
a2j  a2i
ball shot from F1 reflects to F2 . A generalization of
this optical property of the ellipse is the following these functions are not independent: F1 þ    þ Fn = 1.
theorem: a billiard trajectory inside an ellipse In fact, the integrals Fi Poisson-commute (with
forever remains tangent to a fixed confocal conic. respect to the Poisson bracket associated with the
More precisely, if a segment of a billiard trajectory symplectic structure in the phase space of the
does not intersect the segment F1 F2 , then all the billiard ball map that was described above). Accord-
segments of this trajectory do not intersect F1 F2 and ing to the Arnol’d–Liouville theorem, this complete
are all tangent to the same ellipse with foci F1 and F2 ; integrability of the billiard inside an ellipsoid implies
and if a segment of a trajectory intersects F1 F2 , that the phase space is foliated by invariant tori and,
then all the segments of this trajectory intersect F1 F2 in appropriate coordinates, the map on each torus is
and are all tangent to the same hyperbola with foci a parallel translation.
F1 and F2 . Similar results on complete integrability hold
It follows that confocal ellipses are the caustics of for billiards inside quadrics in spaces of constant
the billiard inside an ellipse. In particular, a positive or negative curvature. The former is
neighborhood of the boundary of such a billiard the intersection of a quadratic cone with the
table is foliated by caustics. A long-standing unit sphere, and the latter with the unit
conjecture, attributed to Birkhoff, asserts that if a pseudosphere.
neighborhood of a strictly convex smooth boundary
of a billiard table is foliated by caustics, then this
Periodic Orbits
table is an ellipse. This conjecture remains open. The
best result in this direction is a theorem of Bialy: if Periodic billiard trajectories inside a planar billiard
almost every phase point of the billiard ball map in a table correspond to inscribed polygons of extremal
strictly convex billiard table belongs to an invariant perimeter length. When counting periodic trajec-
circle, then the billiard table is a disk. tories, one does not distinguish between polygons
The multidimensional analogs of the optical obtained from each other by cyclic permutation or
properties of an ellipse are as follows. Consider an reversing the order of the vertices. In other words,
ellipsoid M in Rn given by the equation one counts the orbits of the dihedral group Dn
acting on n-periodic billiard polygons.
x21 x22 x2n An additional topological characteristic of a
þ þ    þ ¼1 ½1
a21 a22 a2n periodic billiard trajectory is the rotation number
defined as follows. Assume that the boundary  of a
and define the confocal family of quadrics M by the billiard table is parametrized by the unit circle and
equation consider a polygon (x1 , x2 , . . . , xn ) inscribed in .
x21 x2 x2 For all i, one has xi þ1 = xi þ ti with ti 2 (0, 1). Since
þ 2 2 þ  þ 2 n ¼ 1 the polygon is closed, t1 þ    þ tn 2 Z. This integer,
a21 þ  a2 þ  an þ 
that takes values from 1 to n  1, is called the
where  is a real parameter. The topological type of rotation number of the polygon and denoted by .
M changes as  passes the values a2i . Changing the orientation of a polygon replaces the
Billiards in Bounded Convex Domains 299

4 5 4 4
5 2

3
3 2 3 2 5
5

2 4 3
1 1
1 1

Figure 3 Rotation numbers of periodic trajectories.

rotation number  by n  . The leftmost 5-periodic f = f , f j@M = 0. From the physical point of view,
trajectory in Figure 3 has  = 1 and the other three the eigenvalues  are the eigenfrequencies of the
 = 2. membrane M with a fixed boundary. Roughly
The following theorem is due to Birkhoff: for speaking, one can recover the length spectrum from
every n  2 and   b(n  1)=2c, coprime with n, that of the Laplacian. More precisely, the following
there exist two geometrically distinct n-periodic theorem of K Anderson and R Melrose holds:
billiard trajectories with the rotation number . For X  pffiffiffiffiffiffiffiffi
example, there are at least two 2-periodic billiard cos t i
trajectories inside every smooth oval: one is the i 2spec 
diameter, the longest chord, and another one is of is a well-defined generalized function (distribution)
minimax type, similar to the minor axis of an of t, smooth away from the length spectrum. That is,
ellipse. if l > 0 belongs to the singular support of this
In higher dimensions, lower bounds on the distribution, then there exists either a closed billiard
number of periodic billiard trajectories inside strictly trajectory of length l, or a closed geodesic of length l
convex domains with smooth boundaries were in the boundary of the billiard table.
obtained only recently by Farber and the present This relation between the Laplacian and the
author. Here is one of the results: for a generic length spectrum is due to the fact that geometric
billiard table in Rm , the number of n-periodic optics is not a very accurate description of light. In
trajectories is not less than (n  1)(m  1). The wave optics, light is considered as electromagnetic
proof consists in using the Morse theory to estimate waves, and geometric optics gives a realistic approx-
below the number of critical points of the perimeter imation only when the wave length is small. This
length function on the space of inscribed n-gons and small-wave approximation is based on the assump-
its quotient space by the dihedral group Dn , and the tion that the waves are locally almost harmonic,
main difficulty is in describing the topology of these while their amplitudes change slowly from point to
spaces. point. The substitution of such a function into the
Returning to convex smooth planar billiards, the corresponding PDEs gives, in the first approxima-
following conjecture remains open for a long time: tion, the equations of wave fronts, that is, of
the set of n-periodic points of the billiard ball map geometric optics.
has zero measure. This is easy for n = 2; for n = 3 Here is another spectral result concerning a
this is a theorem by M Rychlik. The motivation for smooth strictly convex plane domain, due to
this question comes from spectral geometry. In S Marvizi and R Melrose. Let Ln be the supremum
particular, according to a theorem of Ivrii, the and ln the infimum of the perimeters of simple
above conjecture implies the Weyl conjecture on billiard n-gons. Then,
the second term for the spectral asymptotics of the
Laplacian in a bounded domain with the Dirichlet lim nk ðLn  ln Þ ¼ 0
n!1
or Neumann boundary conditions.
for any positive k. Furthermore, Ln has an asymp-
totic expansion, as n ! 1,
Length Spectrum X1
ci
The set of lengths of the closed trajectories in a Ln l þ
i¼1
n2i
convex billiard M is called the length spectrum of M.
There is a remarkable relation between the length where l is the length of the boundary of billiard table
spectrum and the spectrum of the Laplace operator and ci are constants, depending on the curvature of
in M with the Dirichlet boundary condition: the boundary.
300 Black Hole Mechanics

Acknowledgments Gutkin E (2003) Billiard dynamics: a survey with the emphasis on


open problems. Regular and Chaotic Dynamics 8: 1–13.
This work was partially supported by NSF. Katok A and Hasselblatt B (1995) Introduction to the Modern
Theory of Dynamical Systems. Cambridge: Cambridge
See also: Adiabatic Piston; Hamiltonian Systems: University Press.
Obstructions to Integrability; Hyperbolic Billiards; Kozlov V and Treshchev D (1991) Billiards. A Genetic Introduc-
tion to the Dynamics of Systems with Impacts. Providence:
Integrable Discrete Systems; Integrable Systems and
American Mathematical Society.
Algebraic Geometry; Optical Caustics; Integrable
Lazutkin V (1993) KAM Theory and Semiclassical Approxima-
Systems: Overview; Polygonal Billiards; Semiclassical tions to Eigenfunctions. Berlin: Springer.
Spectra and Closed Orbits; Separatrix Splitting; Stability Moser J (1980) Various Aspects of Integrable Hamiltonian Systems.
Theory and KAM. Progress in Mathematics, vol. 8, pp. 233–289. Basel: Birhauser.
Siburg KF (2004) The Principle of Last Action in Geometry and
Dynamics, Lecture Notes in Mathematics, vol. 1844. Berlin:
Further Reading Springer.
Sinai Ya (1976) Introduction to Ergodic Theory. Princeton:
Chernov N and Markarian R, Theory of Chaotic Billiards Princeton University Press.
(to appear). Tabachnikov S (1995) Billiards, Société Math. de France,
Farber M and Tabachnikov S (2002) Topology of cyclic Panoramas et Syntheses, No 1.
configuration spaces and periodic orbits of multi-dimensional Tabachnikov S (2005) Geometry and Billiards. American Mathe-
billiards. Topology 41: 553–589. matic Society (to appear).

Black Hole Mechanics


A Ashtekar, Pennsylvania State University, of globally stationary black holes. In the second, we
University Park, PA, USA will consider black holes which are themselves in
ª 2006 Elsevier Ltd. All rights reserved. equilibrium but in surroundings which may be time
dependent. Finally, in the third part, we summarize
what is known in the fully dynamical situations. For
simplicity, all manifolds and fields are assumed to be
Introduction
smooth and, unless otherwise stated, spacetime is
Over the last 30 years, black holes have been assumed to be four dimensional, with a metric of
shown to have a number of surprising properties. signature , þ , þ , þ , and the cosmological con-
These discoveries have revealed unforeseen relations stant is assumed to be zero. An arrow under a
between the otherwise distinct areas of general spacetime index denotes the pullback of that index to
relativity, quantum physics, and statistical the horizon.
mechanics. This interplay, in turn, led to a number
of deep puzzles at the very foundations of physics.
Global Equilibrium
Some have been resolved while others continue to
baffle physicists. The starting point of these To capture the intuitive notion that black hole is a
fascinating developments was the discovery of region from which signals cannot escape to the
laws of black hole mechanics by Bardeen, asymptotic part of spacetime, one needs a precise
Bekenstein, Carter, and Hawking. They dictate the definition of future infinity. The standard strategy is to
behavior of black holes in equilibrium, under small use Penrose’s conformal boundary J þ . A black hole
perturbations away from equilibrium, and in fully region B of a spacetime (M, gab ) is defined as B = Mn
dynamical situations. While they are consequences I (J þ ), where I denotes ‘‘chronological past.’’ The
of classical general relativity alone, they have a boundary @B of the black hole region is called the
close similarity with the laws of thermodynamics. ‘‘event horizon’’ and denoted by E. Thus, E is the
The origin of this seemingly strange coincidence lies boundary of the past of J þ . It therefore follows that E is
in quantum physics. For further discussion, a null 3-surface, ruled by future inextendible null
see Asymptotic Structure and Conformal Infinity; geodesics without caustics. If the spacetime is globally
Loop Quantum Gravity; Quantum Geometry and hyperbolic, an ‘‘instant of time’’ is represented by a
Its Applications; Quantum Field Theory in Curved Cauchy surface M. The intersection of B with M may
Spacetime; Stationary Black Holes. have several disjoint components, each representing a
The focus of this article is just on black hole black hole at that instant of time. If M0 is a Cauchy
mechanics. The discussion is divided into three parts. surface to the future of M, the number of disjoint
In the first, we will introduce the notions of event components of M0 [ B in the causal future of M [ B
horizons and black hole regions and discuss properties must be less than or equal to those of M [ B
Black Hole Mechanics 301

(see Hawking and Ellis (1973)). Thus, black holes can satisfies the t– orthogonality property, its event
merge but can not bifurcate. (By a time reversal, i.e., by horizon E is a Killing horizon. (Although one can
replacing J þ with J  and I with Iþ , one can define a envisage stationary black holes in which these
white hole region W. However, here we will focus only additional symmetry conditions are not met, this
on black holes.) possibility has been ignored in black hole mechanics
A spacetime (M, gab ) is said to be stationary (i.e., time on stationary spacetimes. Quasilocal horizons, dis-
independent) if gab admits a Killing field t a that cussed below, do not require any spacetime symme-
represents an asymptotic time translation. By conven- tries.) In these cases, the normalization freedom in
tion, t a is assumed to be unit at infinity. (M, gab ) is said Ka is fixed by requiring that Ka have the form
to be axisymmetric if gab admits a Killing field a
Ka ¼ ta þ a ½2
generating an SO(2) isometry. By convention a is
normalized such that the affine length of its integral on the horizon, where  is a constant, called the
curves is 2. Stationary spacetimes with nontrivial Mn ‘‘angular velocity of the horizon.’’ The resulting  is
I (J þ ) represent black holes which are in global called the surface gravity of the black hole. It is
equilibrium. In the Einstein–Maxwell theory in four remarkable that  is constant for all such black
dimensions, there exists a unique three-parameter holes, even when their horizon is highly distorted
family of stationary black hole solutions, generally (i.e., far from being spherically symmetric) either
parametrized by mass m, angular momentum J, and due to rotation or due to external matter fields. This
electric charge Q. This is the celebrated Kerr–Newman is analogous to the fact that the temperature of a
family. Therefore, in general relativity a great deal of thermodynamical system in equilibrium is constant,
work on black holes has focused on these solutions and independently of the details of the system. In
perturbations thereof. The Kerr–Newman family is analogy with thermodynamics, constancy of  is
axisymmetric and furthermore, its metric has the referred to as the ‘‘zeroth law of black hole
property that the 2-flats spanned by the Killing fields mechanics.’’
t a and a are orthogonal to a family of 2-surfaces. This Next, let us consider an infinitesimal perturbation
property is called ‘‘t– orthogonality.’’ These features of  within the three-parameter Kerr–Newman family.
Kerr–Newman space-times are widely used in black A simple calculation shows that the changes in the
hole physics. Note however that uniqueness fails in Arnowitt–Deser–Misner (ADM) mass m, angular
higher dimensions, and also in the presence of momentum J, and the total charge Q of the
nonabelian gauge fields or rings of perfect fluids around spacetime and in the area a of the horizon are
black holes in four dimensions. In mathematical constrained via
physics, there is significant literature on the new 
stationary black hole solutions in Einstein–Yang– m ¼ a þ   J þ Q ½3
8G
Mills–Higgs theories. These are called ‘‘hairy black
holes.’’ Research on stationary black hole solutions with where the coefficients , ,  are black hole para-
rings received a boost by a recent discovery that these meters,  = Aa Ka being the electrostatic potential at
black holes can violate the Kerr inequality J  Gm2 the horizon. The last two terms, J and Q, have
between angular momentum J and mass m. the interpretation of ‘‘work’’ required to spin the
A null 3-manifold K in M is said to be a ‘‘Killing black hole up by an amount J or to increase its
horizon’’ if gab admits a Killing field Ka which is charge by Q. Therefore, [3] has a striking resem-
everywhere normal to K. On a Killing horizon, one blance to the first law, E = TS þ W, of thermo-
can show that the acceleration of Ka is proportional dynamics if (as the zeroth law suggests)  is made
to Ka itself: proportional to the temperature T, and the horizon
area a to the entropy S. Therefore, [3] and its
Ka ra Kb ¼ Kb ½1 generalizations discussed below are referred to as
the ‘‘first law of black hole mechanics.’’
The proportionality function  is called ‘‘surface In Kerr–Newman spacetimes, the only contribu-
gravity.’’ We will show in the next section that if a tion to the stress–energy tensor comes from the
mild energy condition holds on K, then  must be Maxwell field. Bardeen et al. (1973) consider
constant. Note that if we rescale Ka via Ka ! cKa , stationary black holes with matter such as perfect
where c is a constant, surface gravity also rescales as fluids in the exterior region and stationary perturba-
 ! c. tions  thereof. Using Einstein’s equations, they
In the Kerr–Newman family, the event horizon is show that the form [3] of the first law does not
a Killing horizon. More generally, if an axisym- change; the only modification is addition of certain
metric, stationary black hole spacetime (M, gab ) matter terms on the right-hand side which can be
302 Black Hole Mechanics

interpreted as the work W done on the total physically. These considerations call for a replace-
system. A generalization in another direction was ment of E by a quasilocal horizon which leads to a
made by Iyer and Wald (1994) using Noether first law involving only horizon attributes, and
currents. They allow nonstationary perturbations which can grow only in response to the influx of
and, more importantly, drop the restriction to energy. Such horizons are discussed in the next two
general relativity. Instead, they consider a wide sections.
class of diffeomorphism-invariant Lagrangian
densities L(gab , Rabcd , ra Rbcde , . . . , .. .. , ra .. .. , . . . )
Local Equilibrium
which depend on the metric gab , matter fields .. .. ,
and a finite number of derivatives of the Riemann The key idea here is drop the requirement that
tensor and matter fields. Finally, they restrict spacetime should admit a stationary Killing field and
themselves to  6¼ 0. In this case, on the maximal ask only that the intrinsic horizon geometry be time
analytic extension of the spacetime, the Killing field independent. Consider a null 3-surface  in a
Ka vanishes on a 2-sphere So called the bifurcate spacetime (M, gab ) with a future-pointing normal
horizon. Then, [3] is generalized to field ‘a . The pullback qab := gab of the spacetime
 metric to  is the intrinsic, degenerate ‘‘metric’’ of 
m ¼ Shor þ W ½4 with signature 0, þ , þ . The first condition is that it
2
be ‘‘time independent,’’ that is, L‘ qab = 0 on .
Here W again represents ‘‘work terms’’ and Shor is Then by restriction, the spacetime derivative opera-
given by tor r induces a natural derivative operator D on .
I While D is compatible with qab , that is, Da qbc = 0, it
L
Shor ¼ 2 nab ncd ½5 is not uniquely determined by this property because
So Rabcd
qab is degenerate. Thus, D has extra information,
where nab is the binormal to So (with nab nab = 2), not contained in qab . The pair (qab , D) is said to
and the functional derivative inside the integral is determine the intrinsic geometry of the null surface
evaluated by formally viewing the Riemann tensor . This notion leads to a natural definition of a
as a field independent of the metric. For the horizon in local equilibrium. Let  be a null, three-
Einstein–Hilbert action, this yields Shor = a=4G and dimensional submanifold of (M, gab ) with topology
one recovers [3]. S  R, where S is compact and without boundary.
These results are striking. However, the under-
Definition 1  is said to be ‘‘isolated horizon’’ if it
lying assumptions have certain unsatisfactory
admits a null normal ‘a such that:
aspects. First, although the laws are meant to refer
just to black holes, one assumes that the entire (i) L‘ qab = 0 and [L‘ , D] = 0 on  and
spacetime is stationary. In thermodynamics, by (ii) T a b ‘b is a future pointing causal vector on .
contrast, one only assumes that the system under
On can show that, generically, this null normal field
consideration is in equilibrium, not the whole
‘a is unique up to rescalings by positive constants.
universe. Second, in the first law, quantities a, , 
are evaluated at the horizon while M, J are Both conditions are local to . In particular, (M, gab )
evaluated at infinity and include contributions from is not required to be asymptotically flat and there is no
possible matter fields outside the black hole. A more longer any teleological feature. Since  is null and
satisfactory law of black hole mechanics would L‘ qab = 0, the area of any of its cross sections is the
involve attributes of the black hole alone. Finally, same, denoted by a . As one would expect, one can
the notion of the event horizon is extremely global show that there is no flux of gravitational radiation or
and teleological since it explicitly refers to J þ . An matter across . This captures the idea that the black
event horizon may well be developing in the very hole itself is in equilibrium. Condition (ii) is a rather
room you are sitting today in anticipation of a weak ‘‘energy condition’’ which is satisfied by all
gravitational collapse in the center of our galaxy matter fields normally considered in classical general
which may occur a billion years hence. This feature relativity. The nontrivial condition is (i). It extracts
makes it impossible to generalize the first law to from the notion of a Killing horizon just a ‘‘tiny part’’
fully dynamical situations and relate the change in that refers only to the intrinsic geometry of . As a
the event horizon area to the flux of energy and result, every Killing horizon K is, in particular, an
angular momentum falling across it. Indeed, one can isolated horizon. However, a spacetime with an
construct explicit examples of dynamical black holes isolated horizon  can admit gravitational radiation
in which an event horizon E forms and grows in the and dynamical matter fields away from . In fact, as a
flat part of a spacetime where nothing happens family of Robinson–Trautman spacetimes illustrates,
Black Hole Mechanics 303

gravitational radiation could even be present arbitra- matter fields along a define a vector field X() on
rily close to . Because of these possibilities, there are G. One shows that it is an infinitesimal canonical
many nontrivial examples and the transition from transformation, that is, satisfies LX() W = 0, where W
event horizons of stationary spacetimes to isolated is the symplectic structure on G. The Hamiltonian
horizons represents a significant generalization of H() generating this canonical transformation is
black hole mechanics. (In fact, the derivation of the given by
zeroth and the first law requires slightly weaker ðÞ
ðÞ
assumptions, encoded in the notion of a ‘‘weakly HðÞ ¼ J  J1
I I ½7
isolated horizon’’ (Ashtekar et al. 2000, 2001).) ðÞ 1 1
An immediate consequence of the requirement J ¼  ð!a a Þ  ðAa a Þ? F
8G S 4 S
L‘ qab = 0 is that there exists a 1-form !a on  such
()
that Da ‘b = !a ‘b . Following the definition of  on a where J1 is the ADM angular momentum at
Killing horizon, the surface gravity (‘) of (, ‘) is infinity, S is any cross section of , and  the area
()
defined as (‘) = !a ‘a . Again, under ‘a ! c‘a , we have element thereon. The term J is independent of the
(c‘) = c‘ . Together with Einstein’s equations, the choice of S made in its evaluation and interpreted as
two conditions of Definition 1 imply L‘ !a = 0 and the ‘‘horizon angular momentum.’’ It has numerous
‘a D[a !b] = 0. The Cartan identity relating the Lie properties that support this interpretation. In parti-
and exterior derivative now yields cular, it yields the standard angular momentum
expression in Kerr–Newman spacetimes.
Da ð!b ‘b Þ  Da ð‘Þ ¼ 0 ½6 To define horizon energy, one has to introduce a
Thus, surface gravity is constant on every isolated ‘‘time-translation’’ vector field ta . At infinity, ta must
horizon. This is the zeroth law, extended to horizons tend to a unit time translation. On , it must be a
representing local equilibrium. In the presence of an symmetry of qab . Since ‘a and ’a are both horizon
electromagnetic field, Definition 1 and the field symmetries, ta = c‘a þ ’a on , for some constants
equations imply L‘ Fab = 0 and ‘a Fab = 0. The first of c and . However, unlike a , the restriction of ta to
these equations implies that one can always choose a  cannot be fixed once and for all but must be
gauge in which L‘ Aa = 0. By Cartan identity it then allowed to vary from one phase-space point to
follows that the electrostatic potential (‘) := Aa ‘a is another. In particular, on physical grounds, one
constant on the horizon. This is the Maxwell analog expects  to be zero at a phase-space point
of the zeroth law. representing a nonrotating black hole but nonzero
In this setting, the first law is derived using a at a point representing a rotating black hole. This
Hamiltonian framework (Ashtekar et al. 2000, freedom in the boundary value of ta introduces a
2001). For concreteness, let us assume that we are qualitatively new element. The vector field X(t) on G
in the asymptotically flat situation and the only defined by the Lie derivatives of gravitational and
gauge field present is electromagnetic. One begins by matter fields does not, in general, satisfy LX(t) W = 0;
restricting oneself to horizon geometries such that  it need not be an infinitesimal canonical transforma-
admits a rotational vector field ’a satisfying tion. The necessary and sufficient condition is that
L’ qab = 0. (In fact for black hole mechanics, it ((c‘) =8G)a þ J þ (c‘) Q be an exact var-
suffices to assume only that L’ ab = 0, where ab is iation. That is, X(t) generates a Hamiltonian flow if
the intrinsic area 2-form on . The same is true on and only if there exists a function E(t) on G such that
dynamical horizons discussed in the next section.) ðtÞ ðc‘Þ
One then constructs a phase space G of gravitational E ¼ a þ J þ ðc‘Þ Q ½8
8G
and matter fields such that (1) M admits an internal
boundary  which is an isolated horizon; and (2) all This is precisely the first law. Thus, the framework
fields satisfy asymptotically flat boundary conditions provides a deeper insight into the origin of the first
at infinity. Note that the horizon geometry is law: it is the necessary and sufficient condition for
allowed to vary from one phase-space point to the evolution generated by ta to be Hamiltonian.
another; the pair (qab , D) induced on  by the Equation [8] is a genuine restriction on the choice of
spacetime metric only has to satisfy Definition 1 and phase-space functions c and , that is, of restrictions
the condition L’ qab = 0. to  of evolution fields ta . It is easy to verify that M
Let us begin with angular momentum. Fix a admits many such vector fields. Given one, the
vector field a on M which coincides with the fixed Hamiltonian H(t) generating the time evolution
’a on  and is an asymptotic rotational symmetry along ta takes the form
at infinity. (Note that a is not restricted in any way ðtÞ
in the bulk.) Lie derivatives of gravitational and HðtÞ ¼ EðtÞ
1  E ½9
304 Black Hole Mechanics

re-enforcing the interpretation of E(t)  as the horizon It is tempting to ask if there is a local physical
energy. process directly responsible for the growth of area.
In general, there is a multitude of first laws, one for For event horizons, the answer is in the negative
each vector field ta , the evolution along which preserves since they can grow in a flat portion of spacetime.
the symplectic structure. In the Einstein–Maxwell However, one can introduce quasilocal horizons
theory, given any phase-space point, one can choose a also in the dynamical situations and obtain the
canonical boundary value toa exploiting the uniqueness desired result (Ashtekar and Krishnan 2003). These
theorem. E(to ) is then called the horizon mass and constructions are strongly motivated by earlier ideas
denoted simply by m . In the Kerr–Newman family, introduced by Hayward (1994).
H(to ) vanishes and m coincides with the ADM mass
Definition 2 A three-dimensional spacelike sub-
m1 . Similarly, if a is chosen to be a global rotational
() () manifold H of (M, gab ) is said to be a ‘‘dynamical
Killing field, J equals J1 . However, in more general
horizon’’ if it admits a foliation by compact
spacetimes where there is matter field or gravitational
2-manifolds S (without boundary) such that:
radiation outside , these equalities do not hold; m
and J represent quantities associated with the (i) the expansion (‘) of one (future directed) null
horizon alone while the ADM quantities represent normal field ‘a to S vanishes and the expansion
the total mass and angular momentum in the space- of the other (future directed) null normal field,
time, including contributions from matter fields and na is negative; and
gravitational radiation in the exterior region. In the (ii) T a b ‘b is a future pointing causal vector on H.
first law [8], only the contributions associated with
One can show that this foliation of H is unique and
the horizon appear.
that S is either a 2-sphere or, under degenerate and
When the uniqueness theorem fails, as, for
physically over-restrictive conditions, a 2-torus. Each
example, in the Einstein–Yang–Mills–Higgs theory,
leaf S is a marginally trapped surface and referred to as a
first laws continue to hold but the horizon mass m
‘‘cut’’ of H. Unlike event horizons E, dynamical horizons
becomes ambiguous. Interestingly, these ambiguities
H are locally defined and do not display any teleological
can be exploited to relate properties of hairy black
feature. In particular, they cannot lie in a flat portion of
holes with those of the corresponding solitons. (For
spacetime. Dynamical horizons commonly arise in
a summary, see Ashtekar and Krishnan (2004).)
numerical simulations of evolving black holes as world
tubes of apparent horizons. As the black hole settles
down, H asymptotes to an isolated horizon , which
Dynamical Situations tightly hugs the asymptotic future portion of the event
horizon. However, during the dynamical phase, H
A natural question now is whether there is an analog of
typically lies well inside E.
the second law of thermodynamics. Using event
The two conditions in Definition 2 immediately
horizons, Hawking showed that the answer is in the
imply that the area of cuts of H increases mono-
affirmative (see Hawking and Ellis (1973)). Let (M, gab )
tonically along the ‘‘outward direction’’ defined by
admit an event horizon E. Denote by ‘a a geodesic null
the projection of ‘a on H. Furthermore, this change
normal to E. Its expansion is defined as (‘) := qab ra ‘b ,
turns out to be directly related to the flux of energy
where qab is any inverse of the degenerate intrinsic
falling across H. Let R denote the ‘‘radius function’’
metric qab on E, and determines the rate of change of the
on H so that the area of any cut S is given by
area element of E along ‘a . Assuming that the null energy
aS = 4R2 . Let N denote the norm of @a R and H,
condition and Einstein’s equations hold, the Raychaud-
the portion of H bounded by two cross sections S1
huri equation immediately implies that if (‘) were to
and S2 . The appropriate energy turns out to be
become negative somewhere it would become infinite
associated with the vector field N‘a , where ‘a is
within a finite affine parameter. Hawking showed that,
normalized such that its projection on H is the unit
if there is a globally hyperbolic region containing
normal ^r a to the cuts S. In the generic and
I (J þ ) [ E – that is, if there are no naked singularities
physically interesting case when S is a 2-sphere, the
– this can not happen, whence (‘)  0 on E. Hence, if a
Gauss and the Codazzi (i.e., constraint) equations
cross section S2 of E is to the future of a cross section S1 ,
imply
we must have aS2  aS1 . Thus, in any (i.e., not
Z
necessarily infinitesimal) dynamical process, the change 1 1
a in the horizon area is always non-negative. This ðR2  R1 Þ ¼ Tab N‘a ^b d3 V þ
2G H 16G
result is known as the ‘‘second law of black hole Z  
mechanics.’’ As in the first law, the analog of entropy is  N ab ab þ 2
a
a d3 V ½10
the horizon area. H
Black Hole Mechanics 305

Here ^ a is the unit normal to H, ab the shear of ‘a a cosmological constant . (The only significant
(i.e., the tracefree part of qam qbm rm ‘n ), and
a = change is that the topology of cuts S of dynamical
qab^rc rc ‘b , where qab is the projector onto the horizons is restricted to be S2 if  > 0 and is
tangent space of the cuts S. The first integral on completely unrestricted if  < 0.) In the first two
the right-hand side can be directly interpreted as the frameworks, results have also been extended to higher
flux across H of matter–energy (relative to the dimensions. Since the notions of isolated and dynami-
vector field N‘a ). The second term is purely cal horizons make no reference to infinity, these
geometric and is interpreted as the flux of energy frameworks can be used also in spatially compact
carried by gravitational waves across H. It has spacetimes. The notion of an event horizon, by
several properties which support this interpretation. contrast, does not naturally extend to these space-
Thus, not only does the second law of black hole times. On the other hand, the generalization [4] of the
mechanics hold for a dynamical horizon H, but the first law [3] is applicable to event horizons of
‘‘cause’’ of the increase in the area can be directly stationary spacetimes in a wide class of theories while
traced to physical processes happening near H. so far the isolated and dynamical horizon frameworks
Another natural question is whether the first law are tied to general relativity (coupled to matter
[8] can be generalized to fully dynamical situations, satisfying rather weak energy conditions). From a
where  is replaced by a finite transition. Again, the mathematical physics perspective, extension to more
answer is in the affirmative. We will outline the idea general theories is an important open problem.
for the case when there are no gauge fields on H. As
with isolated horizons, to have a well-defined notion See also: Asymptotic Structure and Conformal Infinity;
of angular momentum, let us suppose that the Branes and Black Hole Statistical Mechanics; Dirac
intrinsic 3-metric on H admits a rotational Killing Fields in Gravitation and Nonabelian Gauge Theory;
Geometric Flows and the Penrose Inequality; Loop
field ’. Then, the angular momentum associated
Quantum Gravity; Minimal Submanifolds; Quantum Field
with any cut S is given by
Theory in Curved Spacetime; Quantum Geometry and its
I I
ð’Þ 1 1 Applications; Random Algebraic Geometry, Attractors
JS ¼  Kab ’a^rb d2 V  jð’Þ d2 V ½11 and Flux Vacua; Shock Wave Refinement of the
8G S 8G S
Friedman–Robertson–Walker Metric; Stationary Black
where Kab is the extrinsic curvature of H in (M, gab ) and Holes.
j(’) is interpreted as ‘‘the angular momentum density.’’
Now, in the Kerr family, the mass, surface gravity, and
the angular velocity can be unambiguously expressed as Further Reading
well-defined functions m(a,  J) of the
 J), (a, J), and (a,
Ashtekar A, Beetle C, and Lewandowski J (2001) Mechanics
horizon area a and angular momentum J. The idea is to of rotating black holes. Physical Review 64: 044016 (gr-qc/
use these expressions to associate mass, surface gravity, 0103026).
and angular velocity with each cut of H. Then, a Ashtekar A, Fairhurst S, and Krishnan B (2000) Isolated horizons:
surprising result is that the difference between the Hamiltonian evolution and the first law. Physical Review D
62: 104025 (gr-qc/0005083).
horizon masses associated with cuts S1 and S2 can be
Ashtekar A and Krishnan B (2003) Dynamical horizons and their
expressed as the integral of a locally defined flux across properties. Physical Review D 68: 104030 (gr-qc/0308033).
the portion H of H bounded by H1 and H2 : Ashtekar A and Krishnan B (2004) Isolated and dynamical
Z I horizons and their applications. Living Reviews in Relativity
1 1  ’ d2 V 10: 1–78 (gr-qc/0407042).
m2 m 1 ¼  da þ j
8G H 8G S2 Bardeen JW, Carter B, and Hawking SW (1973) The four laws of
I Z  2 I  black hole mechanics. Communications in Mathematical
 ’ d2 V  Physics 31: 161.
 j 
d j’ d2 V ½12 DeWitt BS and DeWitt CM (eds.) (1972) Black Holes.
S1 1
 S
Amsterdam: North-Holland.
If the cuts S2 and S1 are only infinitesimally separated, Frolov VP and Novikov ID (1998) Black Hole Physics.
this expression reduces precisely to the standard first Dordrecht: Kluwer.
Hawking SW and Ellis GFR (1973) Large Scale Structure of
law involving infinitesimal variations. Therefore, [12] is Space-Time. Cambridge: Cambridge University Press.
an integral generalization of the first law. Hayward S (1994) General laws of black hole dynamics. Physical
Let us conclude with a general perspective. On the Review D 49: 6467–6474.
whole, in the passage from event horizons in Iyer V and Wald RM (1994) Some properties of noether charge
stationary spacetimes to isolated horizons and then and a proposal for dynamical black hole entropy. Physical
Review D 50: 846–864.
to dynamical horizons, one considers increasingly Wald RM (1994) Quantum Field Theory in Curved Spacetime and
more realistic situations. In all the three cases, the Black Hole Thermodynamics. Chicago: University of Chicago
analysis has been extended to allow the presence of Press.
306 Boltzmann Equation (Classical and Quantum)

Boltzmann Equation (Classical and Quantum)


M Pulvirenti, Università di Roma ‘‘La Sapienza,’’ As fundamental features of eqn [1], we have the
Rome, Italy conservation in time of the following five quantities
ª 2006 Elsevier Ltd. All rights reserved.
Z Z
dx dv f ðx; v; tÞv ½4

with  = 0, 1, 2, expressing conservation of the


Introduction probability, momentum, and energy.
R R
From now on we shall set = R3 for notational
Ludwig Boltzmann (1872) established an evolution simplicity.
equation to describe the behavior of a rarefied gas, Moreover, Boltzmann introduced the (kinetic)
starting from the mathematical model of elastic balls entropy defined as
and using mechanical and statistical considerations. Z Z
The importance of this equation is twofold. First, it Hðf Þ ¼ dx dv f log f ðx; vÞ ½5
provides a reduced description (as well as the
hydrodynamical equations) of the microscopic and proved the famous H-theorem asserting the
world. Second, it is also an important tool for the decreasing of H(f (t)) along the solutions to eqn [1].
applications, especially for dilute fluids when the Finally, in the case of bounded domains or
hydrodynamical equations fail to hold. homogeneous solutions (f = f (v; t) is independent of
The starting point of the Boltzmann analysis is to x), the distribution defined for some  > 0,  > 0,
abandon the study of the gas in terms of the detailed and u 2 R 3 by
motion of molecules which constitute it because of
 2
their large number. Instead, it is better to investigate Mðx; vÞ ¼ 3=2
eð=2Þjvuj ½6
a function f (x, v), which is the probability density of ð2=Þ
a given particle, where x and v denote its position called Maxwellian distribution, is stationary for the
and velocity. Actually, f (x, v)dx dv is often confused evolution given by eqn [1]. In addition, M minimizes
with the fraction of molecules falling in the cell of H among all distributions with given total mass ,
the phase space of size dx dv around x, v. The two given mean velocity u, and mean energy. The
concepts are not exactly the same, but they are parameter  is interpreted as the inverse
asymptotically equivalent (when the number of temperature.
particles is diverging) if a law of large numbers holds. In conclusion, Boltzmann was able to introduce
The Boltzmann equation is the following: not only an evolutionary equation with the remark-
ð@t þ v  rx Þf ¼ Qðf ; f Þ ½1 able properties expressing mass, momentum, and
energy conservation, but also the trend to the
where Q, the collision operator, is defined by eqn [2]: thermal equilibrium. In other words, he tried to
Z Z
conciliate the Newton’s laws with the second
Qðf ; f Þ ¼ dv1 dnðv  v1 Þ  n principle of thermodynamics.
R3 S2þ

 ½ f ðx; v Þf ðx; v01 Þ  f ðx; vÞf ðx; v1 Þ


0
½2
The Boltzmann Heuristic Argument
and
Thus, we want to find an evolution equation for the
v0 ¼ v  n½n  ðv  v1 Þ quantity f (x, v; t). The molecular system we are
½3
v01 ¼ v1 þ n½n  ðv  v1 Þ considering consists of N identical particles of
diameter r in the whole space R3 . We denote by
Moreover, n (the impact parameter) is a unitary x1 , v1 , . . . , xN , vN a state of the system, where xi and
vector and S2þ = {njn  (v  v1 )  0}. vi indicate the position and the velocity of the
Note that v0 , v01 are the outgoing velocities after a particle i. The particles cannot overlap (i.e., the
collision of two elastic balls with incoming velocities centers of two particles cannot be at a distance
v and v1 and centers x and x þ rn, r being the smaller than the particle diameter r).
diameter of the spheres. Obviously, the collision The particles are moving freely up to the first
takes place if n  (v  v1 )  0. Equations [3] are a instance of contact, that is, the first time when two
consequence of the conservation of total energy, particles (say particles i and j) arrive at a distance r.
momentum, and angular momentum. Note also that Then the pair interacts when an elastic collision
r does not enter in eqn [1] as a parameter. occurs. This means that they change instantaneously
Boltzmann Equation (Classical and Quantum) 307

their velocities, according to the conservation of that we have to integrate over the hemisphere
the energy and linear and angular momentum. S2þ = {(v2  v)  n > 0}:
More precisely, the velocities after a collision Z
with incoming velocities v and v1 are those given 2
G ¼ ðN  1Þr dv2
by formula [3]. After the first collision, the Z
system evolves by iterating the procedure. Here  dn f2 ðx; v; x þ nr; v2 Þjðv2  vÞ  nj ½11
we neglect triple collisions because they are Sþ
unlikely. The evolution equation for a tagged
Summing G and L, we get
particle is then of the form
Z
ð@t þ v  rx Þf ¼ Coll ½7 Coll ¼ ðN  1Þr2 dv2
Z
where Coll denotes the variation of f due to the  dn f2 ðx; v; x þ nr; v2 Þðv2  vÞ  n ½12
collisions.
We have which, however, is not a very useful expression
Coll ¼ G  L ½8 because the time derivative of f is expressed in terms
of another object, namely f2 . An evolution equation
where L and G (the loss and gain terms, respectively) for f2 will imply f3 , the joint distribution of three
are the negative and positive contributions to the particles, and so on, up to we include the total
variation of f due to the collisions. More precisely, particle number N. Here the basic main assumption
L dx dv dt is the probability of the test particle to of Boltzmann enters, namely that two given particles
disappear from the cell dx dv of the phase space are uncorrelated if the gas is rarefied, namely
because of a collision in the time interval (t, t þ dt)
and Gdx dv dt is the probability to appear in the f ðx; v; x2 ; v2 Þ ¼ f ðx; vÞf ðx2 ; v2 Þ ½13
same time interval for the same reason. Let us
Condition [13], referred to as the propagation of
consider the sphere of center x with radius r and a
chaos, seems contradictory at first sight: if two
point x þ rn over the surface, where n denotes the
particles collide, correlations are created. Even though
generic unit vector. Consider also the cylinder with
we could assume eqn [13] at some time, if the test
base area dS = r2 dn and height jVjdt along the
particle collides with particle 2, such an equation
direction of V = v2  v.
cannot be satisfied anymore after the collision.
Then a given particle (say particle 2) with velocity
Before discussing the propagation of chaos
v2 can contribute to L because it can collide with the
hypothesis, we first analyze the size of the collision
test particle in the time dt, provided it is localized in
operator. We remark that, in practical situations
the cylinder and if V  n  0. Therefore, the contri-
for a rarefied gas, the combination Nr3  104 cm3
bution to L due to the particle 2 is the probability of
(i.e., the volume occupied by the particles) is very
finding such a particle in the cylinder (conditioned to
small, while Nr2 = O(1). This implies that G = O(1).
the presence of the first particle in x). This quantity is
Therefore, since we are dealing with a very large
f2 (x, v, x þ nr, v2 ) j (v2  v)  njr2 dn dv2 dt, where f2
number of particles, we are tempted to perform the
is the joint distribution of two particles. Integrating in
limit N ! 1 and r ! 0 in such a way that
dn and dv2 , we obtain that the total contribution to
r2 = O(N1 ). As a consequence, the probability that
L due to any predetermined particle is
Z Z two tagged particles collide (which is of the order of
the surface of a ball, i.e., O(r2 )) is negligible.
r2 dv2 dn f2 ðx; v; x þ nr; v2 Þjðv2  vÞ  nj ½9
S2 However, the probability that a given particle
performs a collision with any one of the remaining
where S2 is the unit hemisphere (v2  v)  n < 0. N  1 particles (which is O(Nr2 ) = O(1)) is not
Finally, we obtain the total contribution multiplying negligible. Therefore, condition [13] is referring to
by the total number of particles: two preselected particles (say particles 1 and 2), so
Z
that it is not unreasonable to conceive that it holds
L ¼ ðN  1Þr2 dv2 in the limiting situation in which we are working.
Z However, we cannot insert [13] in [12] because
 dn f2 ðx; v; x þ nr; v2 Þjðv2  vÞ  nj ½10 this latter equation refers to instants before and after
S
the collision and, if we know that a collision took
The gain term can be derived analogously by place, we certainly cannot invoke eqn [13]. Hence, it
considering that we are looking at particles which is more convenient to assume eqn [13] in the loss
have velocities v and v2 after the collisions so term and work over the gain term to keep advantage
308 Boltzmann Equation (Classical and Quantum)

of the factorization property which will be assumed a two-body interaction V = V(r), the resulting
only before the collision. Boltzmann equation is eqn [1], with
Coming back to eqn [11] for the outgoing pair Z Z
 
velocities v, v2 (satisfying the condition (v2  v)  n > 0), Qðf ; f Þ ¼ dv1 dn Bðv  v1 ; nÞ f 0 f10  ff1 ½17
we make use of the continuity property S2þ
  where we are using the usual shorthand notation:
f2 ðx; v; x þ nr; v2 Þ ¼ f2 x; v0 ; x þ nr; v02 ½14
 
where the pair v0 , v02 is pre-collisional. On f2 f 0 ¼ f ðx; v0 Þ; f10 ¼ f x; v01 ; f ¼ f ðx; vÞ;
½18
expressed before the collision, we can reasonably f1 ¼ f ðx; v1 Þ
apply condition [13] and obtain and B = B(v  v1 ; n) is a suitable function of the
Z Z relative velocity v  v1 and the impact parameter n,
2
G  L ¼ ðN  1Þr dv2 dnðv  v2 Þ  n which is proportional to the cross section relative to
S2þ
  the potential V. Another equivalent, sometimes
 ½f ðx; v Þf x  nr; v02
0
more convenient, way, to express eqn [17] is
 f ðx; vÞf ðx þ nr; v2 Þ ½15 Z Z Z
0
 
Qðf ; f Þ ¼ dv1 dv dv01 W v; v1 jv0 ; v01
after a change n ! n in the gain term, using the
 0 0 
notation S2þ for the hemisphere {nj = (v2  v)  n  0}. f f1  ff1 ½19
This transforms the pair v0 , v02 from a pre-collisional
to a post-collisional pair. with
Finally, in the limit N ! 1, r ! 0, Nr2 = 1 , we  
W v; v1 jv0 ; v01
find    
¼ w v; v1 jv0 ; v01   v þ v1  v0  v01
 
ð@t þ v  rx Þf 2  2
Z Z   12 v2 þ v21  ðv0 Þ  v01 ½20
¼ 1 dv2 dnðv  v2 Þ  n
Sþ where w is a suitable kernel. All the qualitative
  properties, such as the conservation laws and the
 ½f ðx; v0 Þf x; v02  f ðx; vÞf ðx; v2 Þ ½16
H-theorem, are obviously still valid.
The parameter , called mean free path, represents,
roughly speaking, the typical length a particle can
cover without undergoing any collision. In eqns [1] Consequences
and [2], we just chose  = 1. The Boltzmann equation provoked a debate involving
Equation [16] (or, equivalently, eqns [1] and [2]) is Loschmidt, Zermelo, and Poincaré, who outlined
the Boltzmann equation for hard spheres. Such an inconsistencies between the irreversibility of the equa-
equation has a statistical nature, and it is not tion and the reversible character of the Hamiltonian
equivalent to the Hamiltonian dynamics from which dynamics. Boltzmann argued the statistical nature of
it has been derived. Indeed, the H-theorem shows that his equation and his answer to the irreversibility
such an equation is not reversible in time as expected paradox was that ‘‘most’’ of the configurations behave
of any law of mechanics. as expected by the thermodynamical laws. However,
This concludes the heuristic preliminary analysis of he did not have the probabilistic tools for formulating
the Boltzmann equation. We certainly know that the in a precise way the statements of which he had a
above arguments are delicate and require a more precise intuition.
rigorous and deeper analysis. If we want the Boltzmann Grad (1949) stated clearly the limit N ! 1,
equation not to be a phenomenological model, derived r ! 0, Nr2 ! const:, where N is the number of
by ad hoc assumptions and justified only by its particles and r is the diameter of the molecules, in
practical relevance, but rather that it is a consequence which the Boltzmann equation is expected to hold.
of a mechanical model, we must derive it rigorously. In This limit is usually called the Boltzmann–Grad limit
particular, the propagation of chaos should be not a (B–G limit in the sequel).
hypothesis but the statement of a theorem. The problem of a rigorous derivation of the
Boltzmann equation was an open and challenging
problem for a long time. Lanford (1975) showed that,
Beyond the Hard Spheres
although for a very short time, the Boltzmann equation
The heuristic arguments we have developed so far can be derived starting from the mechanical model of the
can be extended to different potentials than that of hard-sphere system. The proof has a deep content but is
the hard-sphere systems. If the particles interact via relatively simple from a technical viewpoint.
Boltzmann Equation (Classical and Quantum) 309

Z
Existence 1 3 1
v2 M dv ¼ T þ u2 ½25
The mathematical study of the Boltzmann equation 2 2 2
starts with the problem of proving the existence of Moreover, the only solution to the equation
the solutions. One would like to be able to show that, Z
for all (or at least for a physically significant family hðvÞQðf ; f Þ dv ¼ 0 ½26
of) initial distributions (which are positive and
summable functions) with finite momentum, energy, is any linear combination of the quantities (1, v, v2 ),
and entropy, there exists a unique solution to eqn [1] called collision invariants. The last property
with the same mass, momentum, and energy as of the obviously corresponds to the mass, momentum,
initial distribution. Moreover, the entropy should and energy conservation.
decrease and the solution should approach the right With this in mind, consider a change of
Maxwellian as t ! 1. The problem, in such a variables in the Boltzmann equation [1], passing
generality, is still unsolved, but several results in this from microscopic to macroscopic variables,
direction have been achieved since the pioneering x ! "x, t ! "t. Here " is a small scale parameter
works due to Carleman (1933) for the homogeneous expressing the ratio between the typical inter-
equation. Actually, there are satisfactory results for particle distances and the typical distances over
some special situations, such as the homogeneous which the macroscopic equations are varying.
solutions (independent of x) close to the equilibrium, Such a change yields
to the vacuum, or to homogeneous data. The most
1
general result we have up to now is, unfortunately, ð@t þ v  rx Þf" ¼ Qðf" ; f" Þ ½27
not constructive. This is due to Di Perna and Lions "
(1989), who showed the existence of suitable weak We need to allow the small parameter " (mean free
solutions to eqn [1]. However, we still do not know path or the Knudsen number) to tend to zero. In
whether such solutions, which preserve mass and order to eliminate the singularity on the right-hand
momentum, and satisfy the H-theorem, are unique side of [27], we multiply both sides by the collision
and also preserve the energy. invariants v with  = 0, 1, 2; and obtain the five
equations:
Z
dv v ð@t þ v  rx Þf" ¼ 0 ½28
Hydrodynamics
The derivation of hydrodynamical equations from On the other hand, if f" converges to f, as " ! 0,
the Boltzmann equation is a problem as old as the necessarily Q(f , f ) = 0 and hence f = M. Therefore,
equation itself and, in fact, it goes back to Maxwell we expect that in the limit " ! 0,
and Hilbert. Preliminary to the discussion of the Z
hydrodynamic limit, we establish a few properties of dv v ð@t þ v  rx ÞM ¼ 0 ½29
the collision kernel.
It is a well-known fact that the only solution to Equation [29] fixes a relation among the fields , u, T
the equation as functions of x and t. A standard computation gives
us the Euler equations for compressible gas
Qðf ; f Þ ¼ 0 ½21
@t  þ divðuÞ ¼ 0 ½30
is a local Maxwellian, namely

f ðx; vÞ :¼ Mðx; vÞ 1
@t u þ ðu  rÞu þ rp ¼ 0 ½31

ðxÞ 2
¼ 3=2
ejvuðxÞj =2TðxÞ ½22
ð2TðxÞÞ
@t T þ ðu  rÞT þ 23Tru ¼ 0 ½32
where the local parameters , u, and T satisfy the
where the pressure p is related to the density  and
relations
the temperature T by the perfect gas law
Z
M dv ¼  ½23 p ¼ T ½33

Z In order to make the above arguments rigorous,


Hilbert (1916) developed a useful tool, called the
vM ¼ u ½24
Hilbert expansion, to control the limiting procedure.
310 Boltzmann Equation (Classical and Quantum)

Namely, he expressed a formal solution to eqn [27] the upstream and the downstream values of the
in the form of a power series expansion: densities, mean velocities, and temperatures. Such
X relations are known in gas dynamics as the
f" ¼ fj "j ½34 Rankine–Hugoniot conditions. A solution of this
j0
problem has been found by Caflisch and Nikolaenko
where f0 is the local Maxwellian, with the para- (1983) in case of a weak shock (namely, when Mþ
meters , u, T satisfying the Euler equations. All the and M are close) by using Hilbert expansion
other coefficients fj of the developments can be techniques. More recently, Liu and Yu (2004)
determined by recurrence, inverting suitable opera- established also stability and positivity of this
tors. However, the series is not expected to be solution.
convergent, so that the way to show the validity of
the hydrodynamical limit rigorously is to truncate
the expansion and to control the remainder. The Quantum Kinetic Theory
first result in this direction was obtained by Caflisch
(1980). However, this approach is based on the Uehling and Uhlembeck (1933) introduced the
regularity of the solutions to the Euler equations, following kinetic equation for describing a large
which is known to hold only for short times since system of weakly interacting bosons or fermions:
Z Z Z
shocks can be formed. How to approximate the 0
 
shocks in terms of a kinetic description is still a ð@t þ v  rx Þf ¼ dv1 dv dv01 W v; v1 jv0 ; v01
difficult and open problem.  fð1 f Þð1 f1 Þf 0 f10
Note that the hydrodynamical picture of the  
Boltzmann equation just means that we are looking  ð1 f 0 Þ 1 f10 ff1 g ½36
at the solutions of this equation at a suitable Here the þ/ sign, stand for bosons/fermions,
macroscopic scale. The rarefaction hypothesis respectively, and
underlying the Boltzmann description is reflected in  
the law of perfect gas, which states that the W v; v1 jv0 ; v01
 
particles, in the local thermal equilibrium, are free. ^ 0  v1 ÞÞ2  v þ v1  v0  v0
^ 0  vÞ  Vðv
¼ ðVðv 1
  2 
  12 v2 þ v21  ðv0 Þ2  v01 ½37
Stationary Problems Moreover,
Stationary non-Maxwellian solutions to the Z
^
VðpÞ ¼ 4 dx eipx ½38
Boltzmann equation should describe stationary
nonequilibrium states exhibiting nontrivial flows.
In spite of the physical relevance of these problems, where V is the interaction potential. Note that eqn
not many complete mathematical results are, at the [37] is the expression of the cross section of a
moment, available. Among them, there is the quantum scattering in the Born approximation.
traveling-wave problem, which can be formulated The unknown f = f (x, v; t) in eqn [37] is the expected
in the following way. We look for a solution number of molecules falling in the unit (quantum) cell
f = f (x  ct, v), f : R  R 3 ! Rþ , constant in form of the phase space. This function is proportional to the
but traveling with a constant velocity c > 0, to one-particle Wigner function, introduced by Wigner
(1932) to handle kinetic problems in quantum
ðv1  cÞf 0 ¼ Qðf ; f Þ ½35 mechanics, and defined as (setting h = 1):
0
where v1 is the first component of v and f denotes Z
1  
the spatial derivative of f. Equation [35] must be 3
dy eiyv  x þ 12 y; x  12 y
ð2Þ
complemented by the boundary conditions which
are f ! M , as x ! 1, where M are the right where (x; z) is the kernel of a one-particle density
and left Maxwellians, namely two prescribed equili- matrix. Basically, the Wigner function is an equiva-
brium situations at infinity. The parameters (density, lent way to describe a state of a quantum system.
mean velocity, and temperature) of the Maxwel- For instance, eqn [40] below expresses the equili-
lians, however, cannot be chosen arbitrarily. Indeed, brium distributions for bosons and fermions in
the conservations of the mass, momentum, and terms of Wigner functions. In general, the Wigner
energy (which are properties of Q) imply the functions, due to the uncertainty principle, are real
conservations (in x) of the fluxes of these quantities. but not necessarily positive; however, the integral
Hence, we have to impose five equations that relate with respect to x and v gives the probability
Boltzmann Equation (Classical and Quantum) 311

distributions of the velocity and the position, limit, which consists in scaling space and time and the
respectively. In the kinetic regime, in which we are interaction potential as
interested, the scales are mesoscopic, namely the pffiffiffi
typical quantum oscillations are on a scale much x ! "x; t ! "t; ! " ½43
smaller than the characteristic scales of the problem,
where "1 = N 1=3 is a parameter diverging when the
so that we expect that f should be a genuine
number of particles N tends to infinity.
probability distribution, since the Heisenberg
We mention, incidentally, that under such a
principle does not play an essential role. However,
scaling, a classical system is described by a transport
the interaction occurs on a microscopic scale, so that
equation, called Fokker–Planck–Landau equation,
we expect that the statistics play a role in addition with a diffusion operator in the velocity space.
to the quantum rules for the scattering. The B–G limit considered for classical particle
In this framework, the entropy functional is
systems is different from that considered here
Z Z
for weakly interacting quantum systems. It is actually
Hðf Þ ¼ dx dv ½ f ðx; vÞ log f ðx; vÞ equivalent to rescaling space and time according to

ð1 f ðx; vÞÞ logð1 f ðx; vÞÞ ½39 x ! "x; t ! "t ½44
It is decreasing along the solutions to eqn [35] and it is leaving the interaction unscaled but, in order to
also minimized (among the distributions with given control the total interaction, we make the density
mass, momentum, and energy) by the equilibria diverging gently as "1 = N 1=2 .
z A quantum system under such a scaling is expected to
MðvÞ ¼ 2 ½40 be described by a Boltzmann equation [1] with the
eð=2Þjvuj
z collision operator Q computed with the full quantum
namely the Bose–Einstein and the Fermi–Dirac cross section. Now we do not have any effect of the
distributions, respectively. Here  > 1 and z > 0 statistics because in this rarefaction limit these correc-
are the inverse temperature and the activity, respec- tions disappear. On the other hand, the cross section is
tively. Note that, for the Bose–Einstein distribution, that arising from the analysis of the quantum scattering.
z < 1. This creates, in a sense, an inconsistency with Since we do not rescale the interaction, all the other
eqn [36]. Indeed, assuming u = 0 and an initial terms in the Born expansion of the cross section play a
distribution f = f0 (v) with the density larger than the role. This kind of Boltzmann equation is a good
maximal density allowed by eqn [40], namely description of a rarefied gas in which quantum effects
Z are not negligible.
1
c :¼ dv ð=2Þv2 ½41
e 1 See also: Adiabatic Piston; Evolution Equations: Linear
and Nonlinear; Gravitational N-Body Problem (Classical);
it cannot converge to any equilibrium. In order to Interacting Particle Systems and Hydrodynamic
overcome this difficulty related to the Bose con- Equations; Kinetic Equations; Multiscale Approaches;
densation, one can enlarge the definition of the Nonequilibrium Statistical Mechanics: Dynamical
equilibria family by setting Systems Approach; Quantum Dynamical Semigroups.

1
MðvÞ ¼ þ ðvÞ ½42
eð=2Þv2 1
Further Reading
to take care of excess of mass by means of a condensate
Balesku R (1978) Equilibrium and Nonequilibrium Statistical
component. However, it is not clear whether eqn Mechanics. Moscow: Mir (distributed by Imported Publica-
[36] can actually describe the Bose condensation tions, Chicago, Ill).
since its derivation from the Schrödinger equation Caflisch RE (1980) The fluid dynamical limit of the nonlinear
requires, just from the very beginning, the existence of Boltzmann equation. Communications of Pure and Applied
bosonic quasifree states which can be constructed only Mathematics 33: 651–666.
Caflisch RE and Nicolaenko B (1983) Shock waves and the
if the density is moderate. Further analyses are certainly Boltzmann equation. Nonlinear partial differential equations.
needed to clarify the situation. A rigorous derivation of Contemporary Mathematics 17: 35–44.
the Uehling and Uhlembeck equation is, up to now, far Carleman T (1933) Sur la théorie de l’équation intégro-differentielle
from being obtained even for short times; nevertheless, de Boltzmann. Acta Mathematica 60: 91–146.
such an equation is extensively used in the applications. Cercignani C (1998) Ludwig Boltzmann. The Man Who Trusted
Atoms. Oxford: Oxford University Press.
Equation [36] concerns a weakly interacting gas of Cercignani C, Illner R, and Pulvirenti M (1994) The Mathema-
quantum particles. From a mathematical viewpoint, it tical Theory of Dilute Gases. Springer Series in Applied
is expected to be valid in the so-called weak-coupling Mathematics, vol. 106. New York: Springer.
312 Bose–Einstein Condensates

Di Perna RJ and Lions P-L (1989) On the Cauchy problem for the Liu T-P and Yu S-H (2004) Boltzmann equation: micro–macro
Boltzmann equations: Global existence and weak stability. decompositions and positivity of shock profiles. Communica-
Annals of Mathematics 130: 321–366. tions in Mathematical Physics 246(1): 133–179.
Grad H (1949) On the kinetic theory of rarified gases. Spohn H (1994) Quantum kinetic equations. In: Fannes M, Maes C,
Communications in Pure and Applied Mathematics and Verbeure A (eds.) On Three Levels: Micro, Meso and Macro
2: 331–407. Approaches in Physics. New York: Plenum.
Hilbert D (1916) Begründung der Kinetischen Gastheorie. Uehling EA and Uhlembeck GE (1933) Transport phenomena in
Mathematische Annalen 72: 331–407. Einstein–Bose and Fermi–Dirac gases. I. Physical Reviews
Lanford OE III (1975) Time evolution of large classical systems. 43: 552–561.
In: Ehlers J, Hepp K, and Weidenmüller HA (eds.) Lecture Wigner EP (1932) On the quantum correction for thermodynamic
Notes in Physics, vol. 38, pp. 1–111. Berlin: Springer. equilibrium. Physical Reviews 40: 749–759.

Bose–Einstein Condensates
F Dalfovo, L P Pitaevskii, and S Stringari, general ground, one can start with the definition
Università di Trento, Povo, Italy of the one-body density matrix
ª 2006 Elsevier Ltd. All rights reserved. y

^ ðrÞðr
nð1Þ ðr; r 0 Þ ¼  ^ 0Þ ½1
The quantities  ˆ
^ y (r) and (r) are the field operators
Introduction which create and annihilate a particle at point r,
In 1924 the Indian physicist S N Bose introduced a new respectively; they satisfy the bosonic commutation
statistical method to derive the blackbody radiation law relations
in terms of a gas of light quanta (photons). His work, ^
½ðrÞ; ^ y ðr 0 Þ ¼ ðr  r 0 Þ;
 ^
½ðrÞ; ^ 0 Þ ¼ 0
ðr ½2
together with the contemporary de Broglie’s idea of
matter–wave duality, led A Einstein to apply the same If the system is in a pure state described by the
statistical approach to a gas of N indistinguishable N-body wave function (r 1 , . . . , r N ), then the
particles of mass m. An amazing result of his theory was average [1] is taken following the standard rules of
the prediction that below some critical temperature a quantum mechanics and the one-body density
finite fraction of all the particles condense into the matrix can be written as
lowest-energy single-particle state. This phenomenon,
named Bose–Einstein condensation (BEC), is a conse- nð1Þ ðr; r 0 Þ
quence of purely statistical effects. For several years, Z
such a prediction received little attention, until 1938, ¼ N dr 2 dr N  ðr;r 2 ; ...; r N Þðr 0 ; r 2 ;.. .;r N Þ ½3
when F London argued that BEC could be at the basis of
the superfluid properties observed in liquid 4 He below involving the integration over the N  1 variables
2.17 K. A strong boost to the investigation of Bose– r 2 , . .., r N . In the more general case of a statistical
Einstein condensates was given in 1995 by the observa- mixture of pure states, expression [3] must be
tion of BEC in dilute gases confined in magnetic traps averaged according to the probability for a system
and cooled down to temperatures of the order of a few to occupy the different states.
nK. Differently from superfluid helium, these gases Since n(1) (r, r 0 ) = (n(1) (r 0 , r)) the quantity n(1) ,
allow one to tune the relevant parameters (confining when regarded as a matrix function of its indices
potential, particle density, interactions, etc.), so to make r and r 0 , is Hermitian. It is therefore always possible
them an ideal test-ground for concepts and theories on to find a complete orthonormal basis of single-
BEC. particle eigenfunctions, ’i (r), in terms of which the
density matrix takes the diagonal form
X
What Is BEC? nð1Þ ðr; r 0 Þ ¼ ni ’ i ðrÞ’i ðr 0 Þ ½4
i
In nature, particles have either integer or half-
integer spin. Those having half-integer spin, like P ni are subject to the normal-
The real eigenvalues
electrons, are called fermions and obey the Fermi– ization condition i ni = N and have the meaning of
Dirac statistics; those having integer spin are occupation numbers of the single-particle states ’i .
called bosons and obey the Bose–Einstein statis- BEC occurs when one of these numbers (say, n0 )
tics. Let us consider a system of N bosons. In becomes macroscopic, that is, when n0 N0 is a
order to introduce the concept of BEC on a number of order N, all the others remaining of order 1.
Bose–Einstein Condensates 313

In this case eqn [4] can be conveniently rewritten in The sum on the right is the number of noncondensed
the form particles (N  N0 ), and the quantity N0 =N is called
X condensate fraction.
nð1Þ ðr; r 0 Þ ¼ N0 ’0 ðrÞ’0 ðr 0 Þ þ ni ’i ðrÞ’i ðr 0 Þ ½5 If the system is not uniform, the eigenfunctions of
i6¼0
the density matrix are no longer plane waves but,
and the state represented by ’0 (r) is called provided N is sufficiently large, the concept of BEC
Bose–Einstein condensate. This definition is rather is still well defined, being associated with the
general, since it applies to any macroscopic (N  1) occurrence of a macroscopic occupation of a
system of indistinguishable bosons independently of single-particle eigenfunction ’0 (r) of the density
mutual interactions and external fields. matrix. Thus, the condensed bosons can be
The one-body density matrix [1] contains informa- described by means of the function (r) =
tion on important physical observables. By setting pffiffiffiffiffiffiffi
N0 ’0 (r), which is a classical complex field playing
r = r 0 one finds the diagonal density of the system the role of an order parameter. This is the analog of
^ y ðrÞðrÞi
^ the classical limit of quantum electrodynamics,
nðrÞ  nð1Þ ðr; rÞ ¼ h ½6
R where the electromagnetic field replaces the micro-
with N = dr n(r). The off-diagonal components scopic description of photons. The function  may
can instead be used to calculate the momentum also depend on time and can be written as
distribution
ðr; tÞ ¼ jðr; tÞj eiSðr;tÞ ½11
nðpÞ ¼ h ^ y ðpÞðpÞi
^ ½7 Its modulus determines the contribution of the
R
^
where (p) = (2 h) 3=2 ^ exp [ip  r=
dr (r) h] is the condensate to the diagonal density [6], while the
field operator in momentum representation. By phase S is crucial in characterizing the coherence
inserting this expression for (p) ^ into eqn [7] one and superfluid properties of the system. The order
finds parameter [11], also named macroscopic wave
Z  function or condensate wave function, is defined
1 s s only up to a constant phase factor. One can always
nðpÞ ¼ 3
dR ds nð1Þ R þ ; R  eips=h
ð2hÞ 2 2 multiply this function by the numerical factor ei
½8 without changing any physical property. This
reflects the gauge symmetry exhibited by all the
where s = r  r 0 and R = (r þ r 0 )=2. physical equations of the problem. Making an
Let us consider a uniform system of N particles in explicit choice for the value of the order parameter,
a volume V and take the thermodynamic limit and hence for the phase, corresponds to a formal
N, V ! 1 with density N/V kept fixed. The eigen- breaking of gauge symmetry.
functions of the density matrix are plane waves and
the lowest-energy state has zero momentum, p = 0,
and constant wave function ’0 (r) = V 1=2 . BEC in
this state implies a macroscopic number of particles BEC in Ideal Gases
having zero momentum and constant density N0 =V. Once we have defined what is a Bose–Einstein
The density matrix only depends on s = r  r 0 and condensate, the next question is when such a
can be written as condensation occurs in a given system. The ideal
N0 1 X Bose gas provides the simplest example. So, let us
nð1Þ ðsÞ ¼ þ np eips=h ½9 consider a gas of noninteracting bosons described
V V p6¼0
by the Hamiltonian H ^ =P H ^ (1)
i i , where the Schrö-
In the s ! 1 limit, the sum on the right vanishes due dinger equation H ^ (1) ’i (r) = i ’i (r) gives the spec-
to destructive interference between different plane trum of single-particle wave functions and
waves, but the first term survives. One thus finds that, energies. One can define an occupation number
in the presence of BEC, the one-body density matrix ni as the number of particles in the state with
tends to a constant finite value at large distances. This energy i . Thus, any given state of the many-body
behavior is named off-diagonal long-range order, system is specified by a set {ni }. The mean
since it involves the off-diagonal components of the occupation numbers, n i , can be calculated by
density matrix. Its counterpart in momentum space is using the standard rules of statistical mechanics.
the appearance of a singular term at p = 0: For instance, by considering a grand canonical
X ensemble at temperature T, one finds
nðpÞ ¼ N0 ðpÞ þ np0 ðp  p0 Þ ½10
p0 6¼0 i ¼ fexp½ði  Þ  1g1
n ½12
314 Bose–Einstein Condensates

with  = 1=(kB T). The chemical P potential  is fixed equivalent to saying that BEC occurs when the
by the normalization condition i n i = N, where N mean distance between bosons is of the order of
is the average number of particles in the gas. For their de Broglie wavelength.
T ! 1 the chemical potential is negative and large. Another interesting case, which is relevant for the
It increases monotonically when T is lowered. Let us recent experiments with BEC in dilute gases con-
call 0 the lowest single-particle level in the fined in magnetic and/or optical traps, is that of an
spectrum. If at some critical temperature Tc the ideal gas subject to harmonic potentials. Let us
normalization condition can be satisfied with consider, for simplicity, an isotropic external poten-
 !  0 , then the occupation of the lowest state, tial Vext (r) = (1=2)m!2ho r2 . The single-particle Hamil-
0 = N0 , becomes of order N and BEC is realized.
n tonian is H ^ (1) = (h2 =2m)r2 þ Vext (r) and its
Below Tc the normalization condition P must be eigenvalues are nx , ny , nz = (nx þ ny þ nz þ 3=2)h!ho .
replaced with N = N0 þ NT , where NT = i6¼0 n i is The corresponding density of states is () =
the number of particles out of the condensate, that (1=2)(h!ho )3 2 . A natural thermodynamic limit for
is, the thermal component of the gas. Whether BEC this system is obtained by letting N ! 1 and
occurs or not, and what is the value of Tc depends !ho ! 0, while keeping the product N!3ho constant.
on the dimensionality of the system and the type of The condition for BEC to occur is that  approaches
single-particle spectrum. the value 000 = (3=2)h!ho from below by cooling the
The simplest case is that of a gas confined in a gas down to Tc . Following the same procedure as
cubic box of volume V = L3 with periodic boundary for the uniform gas, one finds
conditions, where H h2 =2m)r2 . The eigen-
^ (1) = (
functions are plane waves ’p (r) = V 1=2 exp [ip  kB Tc ¼ h!ho ½N= ð3Þ1=3 ¼ 0:94h!ho N 1=3 ½15
r=h], with energy p = p2 =2m and momentum
and
p = 2 hn=L. Here n is a vector whose components
nx , ny , nz are 0 or  integers. The lowest eigenvalue N0 ðTÞ ¼ N½1  ðT=Tc Þ3  ½16
has zero energy (0 = 0) and zero momentum. The
mean occupation numbers are given by Notice that the condensate is not uniform in this case,
p = {exp [(p2 =2m  )] 1}1 . In the thermo-
n since it corresponds to the lowest eigenfunction of the
dynamic limit (N, V ! 1 with harmonic oscillator, which is a Gaussian of width
P N/V kept constant), aho = [h=(m!ho )]1=2 . Correspondingly, the condensate
one
R can replace the sum p with the pffiffiintegral
d(), where () = (2)2 V(2m= h2 )3=2  is the in the momentum space is also a Gaussian, of width
density of states. In this way, one can calculate the a1
ho . This implies that, differently from the gas in a box,
thermal component of the gas as a function of T, here the condensate can be seen both in coordinate and
finding the critical temperature momentum space in the form of a narrow distribution
 2=3 emerging from a wider thermal component. Finally,
2 h2 N results [15] and [16] remain valid even for anisotropic
kB Tc ¼ ½13
m V ð3=2Þ harmonic potentials, with trapping frequencies !x , !y ,
and !z , provided the frequency !ho is replaced by the
where is the Riemann zeta function and (3=2) ’ geometric average (!x !y !z )1=3 .
2.612. For T > Tc , one has  < 0 and NT = N. For
T < Tc one instead has  = 0, NT = N  N0 and
BEC in Interacting Gases
N0 ðTÞ ¼ N½1  ðT=Tc Þ3=2  ½14
Actual condensates are made of interacting particles.
The critical temperature turns out to be fully The full many-body Hamiltonian is
determined by the density N/V and by the mass of Z
the constituents. These results were first obtained ^ ¼ dr 
H ^ 0 ðrÞ
^ y ðrÞH ^
by A Einstein in his seminal paper and used by Z
1
F London in the context of superfluid helium. We þ dr 0 dr 
^ y ðrÞ
^ y ðr 0 ÞVðr  r 0 Þðr
^ 0 ÞðrÞ
^ ½17
notice that the replacement of the sum with an 2
integral in the above derivation is justified only if where V(r  r 0 ) is the particle–particle interaction and
the thermal energy kB T is much larger than the H^ 0 = (h2 =2m)r2 þ Vext (r). Differently from the
energy spacing between single-particle levels, that is, case of ideal gases, H ^ is no longer a sum of single-
if kB T  h2 =2mV 2=3 . Is is also worth noticing that particle Hamiltonians. However, the general defini-
the above expression for Tc can be written as tions given in the section ‘‘What is BEC?’’ are still

3T N=V ’ 2.612, where


T = [2 h2=(mkB T)]1=2 is valid. In particular, the one-body density matrix, in the
the thermal de Broglie wavelength. This is presence of BEC, can be separated as in eqn [5]. One
Bose–Einstein Condensates 315

can write n(1) (r, r 0 ) =  (r)(r 0 ) þ n ~(1) (r, r 0 ), where  jj2 . It has been derived assuming that N is large
is the order parameter of the condensate ( (r)(r 0 ) while the fraction of noncondensed atoms is negli-
being of order N), while n ~(1) (r, r 0 ) vanishes for large gible. On the one hand, this means that quantum
jr  r 0 j. This is equivalent to say that the bosonic field fluctuations of the field operator have to be small,
operator splits in two parts, which is true when njaj3
1, where n is the particle
density. In fact, one can show that, at T = 0 the
^
ðrÞ ^
¼ ðrÞ þ ðrÞ ½18 quantum depletion of the condensate is proportional
to (njaj3 )1=2 . On the other hand, thermal fluctuations
where the first term is a complex function and the
have also to be negligible and this means that the
second one is the field operator associated with
theory is limited to temperatures much lower than
the noncondensed particles. This decomposition is
Tc . Within these limits, one can identify the total
particularly useful when the depletion of the
density with the condensate density.
condensate, that is, the fraction of noncondensed
The stationary solution of eqn [20] corresponds to
particles, is small. This happens when the interac-
the condensate wave function in the ground state. One
tion is weak, but also for particles with arbitrary
can write (r, t) = 0 (r) exp (it=h), where  is the
interaction, provided the gas is dilute. In this case,
chemical potential. Then the GP equation [20] becomes
one can expand the many-body Hamiltonian by
ˆ as a small quantity. !
treating the operator 
h2 r2 2
A suitable strategy consists in writing the Heisen-  þ Vext ðrÞ þ gj0 ðrÞj 0 ðrÞ ¼ 0 ðrÞ ½21
2m
berg equation for the evolution of the field opera-
tors, i ^ = [,
h@t  ^
^ H], using the many-body where n(r)= j0 (r)j2 is the particle density. The same
Hamiltonian [17]: equation can be obtained by minimizing the energy of
i ^ tÞ
h@t ðr; the system written as a functional of the density:
 Z  Z " 2 #
¼ H ^ 0 þ dr 0 
^ y ðr 0 ; tÞVðr  r 0 Þðr
^ 0 ; tÞ h pffiffiffi 2 gn2
E½n¼ dr j= nj þ nVext ðrÞ þ ½22
2m 2
^ tÞ
ðr; ½19
The first term on the right corresponds to the
The zeroth-order is thus obtained by replacing the quantum kinetic energy coming from the uncertainty
operator  ^ with the classical field . In the integral principle; it is usually named ‘‘quantum pressure’’
containing the interaction V(r  r 0 ), this replacement is, and vanishes for uniform systems.
in general, a poor approximation when short distances The next order in  ˆ gives the excited states of the
(r  r 0 ) are involved. In a dilute and cold gas, one can condensate. In a uniform gas the ground-state order
nevertheless obtain a proper expression for the inter- parameter, 0 , is a constant and the first-order
action term by observing that, in this case, only binary expansion of H ^ was introduced by N Bogoliubov in
collisions at low energy are relevant and these collisions 1947. In particular, he found an elegant way to
are characterized by a single parameter, the s-wave diagonalize the Hamiltonian by using simple linear
scattering length, a, independently of the details of the combinations of particle creation and annihilation
two-body potential. This allows one to replace V(r  r 0 ) operators. These are known as Bogoliubov’s trans-
^ with an effective interaction V(r  r 0 ) = g(r  r 0 ),
in H formations and stay at the basis of the concept of
where the coupling constant g is given by g = 4h2 a=m. quasiparticle, one of the most important concepts in
The scattering length can be measured with several quantum many-body theory.
experimental techniques or calculated from the exact A generalization of Bogoliubov’s approach to the
two-body potential. Using this pseudopotential and case of nonuniform condensates is obtained by
replacing the operator  ^ with the complex function  in considering small deviations around the ground
the Heisenberg equation of motion, one gets state in the form

i
h@t ðr; tÞ ðr; tÞ ¼ eit=h 0 ðrÞ þ uðrÞei!t þ v ðrÞei!t ½23
!
 2 r2
h 2 Inserting this expression into eqn [20] and keeping
¼  þ Vext ðrÞ þ gjðr; tÞj ðr; tÞ ½20
2m terms linear in the complex functions u and v, one gets
This is known as Gross–Pitaevskii (GP) equation and ^ 0   þ 2g2 ðrÞuðrÞ þ g2 ðrÞvðrÞ
h!uðrÞ ¼ ½H ½24
0 0
it was first introduced in 1961. It has the form of a
nonlinear Schrödinger equation, the nonlinearity
coming from the mean-field term, proportional to  ^ 0   þ 2g2 ðrÞvðrÞ þ g2 ðrÞuðrÞ
h!vðrÞ ¼ ½H ½25
0 0
316 Bose–Einstein Condensates

These coupled equations allow one to calculate the vessels, viscousless motion, quantized vorticity, and
energies " = 
h! of the excitations. They also give the others. These features can also be observed in BEC.
so-called quasiparticle amplitudes u and v, which obey The link between BEC and superfluidity is given by
the normalization condition the phase of the order parameter [11]. To under-
Z stand this point, let us consider a uniform system. If
dr½ui ðrÞuj ðrÞ  vi ðrÞvj ðrÞ = ij ^ t) is a solution of the Heisenberg equation [19]
(r,
with Vext = 0, then
In a uniform gas, u and v are plane waves and one  
recovers the famous Bogoliubov’s spectrum ^  vt; tÞ exp i mv  r  1 mv2 t
^ 0 ðr; tÞ ¼ ðr
 ½28
" !#1=2 h 2
h2 q2 
 h2 q2
h! ¼ þ 2gn ½26 where v is a constant vector, is also a solution. This
2m 2m
equation gives the Galilean transformation of
where q is the wave vector of the excitations. the field operator and also applies to its condensate
For large momenta the spectrum coincides with the component . At equilibrium, the pffiffiffi ground-state
free-particle energy h2 q2 =2m. At low momenta, it order parameter is given by 0 = n exp (it=h),
instead gives the phonon dispersion ! = cq, where where n is a constant independent of r. In a frame
c = [gn=m]1=2 is the Bogoliubov sound velocity. The where the condensate moves with velocity v, the
transition between the two regimes occurs when the order
pffiffiffi parameter instead takes the form 0 =
excitation wavelength is of the order of the healing n exp (iS), with S(r, t) = h1 [mv  r  (mv2 =2 þ )t].
length, The velocity of the condensate can thus be identified
pffiffiffi with the gradient of the phase S:
¼ ½8na1=2 ¼ 
h=ðmc 2Þ ½27
h

which is an important length scale for superfluidity. vðr; tÞ ¼ =Sðr; tÞ ½29
m
When the order parameter is forced to vanish at some
point (by an impurity, a wall, etc.), the healing length This definition is also valid for v varying slowly in
provides the typical distance over which it recovers its space and time. The modulus of the order para-
bulk value. In a nonuniform condensate the excitations meter plays a minor role in this definition and it is
are no longer plane waves but, at low energy, they have not necessary to assume the gas to be dilute and
still a phonon-like character, in the sense that they close to T = 0. Indeed, the relation [29] between the
involve a collective motion of the condensate. velocity field and the phase of the order parameter
The GP equation [20] is the starting point for an also applies in the presence of large quantum
accurate mean-field description of BEC in dilute depletion, as in superfluid 4 He, and at T 6¼ 0. In
cold gases, which is rigorous at T = 0 and for this case, n should not be identified with the
njaj3
1. Static and dynamics properties of con- condensate density. Conversely, in dilute gases at
densates in different geometries can be calculated by T = 0, n is the condensate density and the velocity
solving the GP equation numerically or using [29] can be simply obtained by applying the usual
suitable approximated methods. The inclusion of definition of current density operator, ^j, to the order
effects beyond mean field is a highly nontrivial and parameter [11].
interesting problem. A rather extreme case is The velocity [29] describes a potential flow and
represented by liquid 4 He, which is a dense system corresponds to a collective motion of many particles
where the interaction between atoms causes a large occupying a single quantum state. Being equal to the
depletion of the condensate even at T = 0 (N0 =N gradient of a scalar function, it is irrotational
being less than 10%) and thus a full many-body (= vs = 0) and satisfiesH the Onsager–Feynman
treatment is required for its rigorous description. quantization condition vs  dl = h=m, with
Nevertheless, even in this case, the general defini- non-negative integer. These conditions are not
tions of the section ‘‘What is BEC?’’ are still useful. satisfied by a classical fluid, where the hydro-
dynamic velocity field, v(r, t) = j(r, t)=n(r, t), is the
average over many different states and does not
correspond to a potential flow.
Superfluidity and Coherence
By using the definition of the phase S and velocity
With the word superfluidity, one summarizes a v, together with particle conservation, one can show
complex of macroscopic phenomena occurring in that the dynamics of a condensate, as far as
quantum fluids under particular conditions: persis- macroscopic motions are concerned, is governed by
tent currents, equilibrium states at rest in rotating the hydrodynamic equations of an irrotational
Bose–Einstein Condensates 317

nonviscous fluid. Within the mean-field theory, this been observed in condensates of ultracold atoms. In
can be easily seen by rewriting the GP equation [20] these systems it was also possible to measure the
in terms of the density n = jj2 and the velocitypffiffiffi coherence length, that is, the distance jr  r 0 j at which
[29]. Neglecting the quantum pressure term r2 n the one-body density vanishes and the phase of the
(hence limiting the description to length scales order parameter is no more well defined. In most
larger than the healing length ), one gets situations, the coherence length turns out to be of the
order of, or larger than the size of the condensates.
@ However, interesting situations exist when the coher-
n þ=  ðvnÞ ¼ 0 ½30
@t ence length is shorter but the system still preserves some
and features of BEC (quasicondensates).
 
@ mv2
m v þ = Vext þ ðnÞ þ ¼0 ½31 Final Remarks
@t 2
Bose–Einstein condensates of ultracold atoms are
with the local chemical potential (n) = gn. These easily manipulated by changing and tuning the
equations have the typical structure of the dynamic external potentials. This means, for instance, that one
equations of superfluids at zero temperature and can can prepare condensates in different geometries,
be viewed as the T = 0 case of the more general including very elongated (quasi-1D) or disk-shaped
Landau’s two-fluid theory. (quasi-2D) condensates. This is conceptually impor-
One of the most striking evidences of superfluidity tant, since BEC in lower dimensions is not as simple as
is the observation of quantized vortices, that is, in three dimensions: thermal and quantum fluctua-
vortices obeying the Onsager–Feynman quantization tions play a crucial role, superfluidity must be properly
condition. A vast literature is devoted to vortices in re-defined, and very interesting limiting cases can be
superfluid helium and, more recently, vortices have explored (Tonks–Girardeau regime, Luttinger liquid,
also been produced and studied in condensates of etc.). Another possibility is to use laser beams to
ultracold gases, including nice configurations of produce standing waves acting as an external periodic
many vortices in regular triangular lattices, similar potential (optical lattice). Condensates in optical
to the Abrikosov lattices in superconductors. Other lattices behave as a sort of perfect crystal, whose
phenomena, such as the reduction of the moment of properties are the analog of the dynamic and transport
inertia, the occurrence of Josephson tunneling properties in solid-state physics, but with controllable
through barriers, the existence of thresholds for spacing between sites, no defects and tunable lattice
dissipative processes (Landau criterion), and others, geometry. One can investigate the role of phase
are typical subjects of intense investigation. coherence in the lattice, looking, for instance, at
Another important consequence of the fact that Josephson effects as in a chain of junctions. By tuning
BEC is described by an order parameter with a well- the lattice depth one can explore the transition from a
defined phase is the occurrence of coherence effects superfluid phase and a Mott-insulator phase, which is
which, in different words, mean that condensates a nice example of quantum phase transition. Control-
behave like matter waves. For instance, one can ling cold atoms in optical lattice can be a good starting
measure the phase difference between two conden- point for application in quantum engineering, inter-
sates by means of interference. This can be done in ferometry, and quantum information.
coordinate space by confining two condensates in Another interesting aspect of BECs is that the key
two potential minima, a and b, at a distance d. Let equation for their description in mean-field theory,
us take d along z and assume that, at t = 0, the order namely the GP equation [20], is a nonlinear Schrö-
parameter is given by the linear combination dinger equation very similar to the ones commonly
(r) = a (r) þ exp (i )b (r) with a and b real used, for instance, in nonlinear quantum optics. This
and without overlap. Then let us switch off the opens interesting perspectives in exploiting the analo-
confining potentials so that the condensates expand gies between the two fields, such as the occurrence of
and overlap. If the overlap occurs when the density dynamical and parametric instabilities, the possibility
is small enough to neglect interactions, the motion to create different types of solitons, the occurrence of
is ballistic and the phase of each condensate evolves nonlinear processes like, for example, higher harmonic
as S(r, t) ’ mr2 =(2ht), so that v = r=t. This implies generation and mode mixing.
a relative phase þ S(x, y, z þ d=2)  S(x, y, z  A relevant part of the current research also involves
d=2) = þ mdz= ht. The total density n = jj2 thus systems made of mixtures of different gases, Bose–Bose
exhibits periodic modulations along z with wave- or Fermi–Bose, and many activities with ultracold
length  ht=md. This interference pattern has indeed atoms now involve fermionic gases, where BEC can
318 Bosons and Fermions in External Fields

also be realized by condensing molecules of fermionic Dalfovo F, Giorgini S, Pitaevskii LP, and Stringari S (1999)
pairs. An extremely active research now concerns the Theory of Bose–Einstein condensation in trapped gases.
Reviews of Modern Physics 71: 463.
BCS–BEC crossover, which can be obtained in Fermi Griffin A, Snoke DW, and Stringari S (1995) Bose–Einstein
gases by tuning the scattering length (and hence the Condensation. Cambridge: Cambridge University Press.
interaction) by means of Feshbach resonances. Huang K (1987) Statistical Mechanics, 2nd edn. New York:
Ten years after the first observation of BEC in Wiley.
ultracold gases, it is almost impossible to summarize Inguscio M, Stringari S, and Wieman CE (1999) Bose–Einstein
Condensation in Atomic Gases, Proceedings of the Inter-
all the researches done in this field. A large amount national School of Physics ‘‘Enrico Fermi,’’ Course CXL.
of work has already been devoted to characterize the Amsterdam: IOS Press.
condensates and several new lines have been opened. Ketterle W (2002) Nobel lecture: when atoms behave as waves:
Rather detailed review articles and books are Bose–Einstein condensation and the atom laser. Reviews of
already available for the interested readers. Modern Physics 74: 1131.
Landau LD and Lifshitz EM (1980) Statistical Physics, Part 1.
Oxford: Pergamon Press.
See also: Interacting Particle Systems and Hydrodynamic
Leggett AJ (2001) Bose–Einstein condensation in the alkali gases:
Equations; Quantum Phase Transitions; Quantum some fundamental concepts. Reviews of Modern Physics
Statistical Mechanics: Overview; Renormalization: 73: 307.
Statistical Mechanics and Condensed Matter; Superfluids; Lifshitz EM and Pitaevskii LP (1980) Statistical Physics, Part 2.
Variational Techniques for Ginzburg–Landau Energies. Oxford: Pergamon Press.
Pethick CJ and Smith H (2002) Bose–Einstein Condensation in
Dilute Gases. Cambridge: Cambridge University Press.
Further Reading Pitaevskii LP and Stringari S (2003) Bose–Einstein Condensation.
Oxford: Clarendon Press.
Cornell EA and Wieman CE (2002) Nobel lecture: Bose–Einstein
condensation in a dilute gas, the first 70 years and some recent
experiments. Reviews of Modern Physics 74: 875.

Bosons and Fermions in External Fields


E Langmann, KTH Physics, Stockholm, Sweden describe some prototype examples and a general
ª 2006 Elsevier Ltd. All rights reserved. Hamiltonian framework which has been used in
mathematically precise work on such models. The
general framework for this latter work is the
mathematical theory of Hilbert space operators
Introduction
(see, e.g., Reed and Simon (1975)), but in our
In this article we discuss quantum theories which discussion we try to avoid presupposing knowledge
describe systems of nondistinguishable particles of that theory. As mentioned briefly in the end, this
interacting with external fields. Such models are work has had close relations to various topics of
of interest also in the nonrelativistic case (in recent interest in mathematical physics, including
quantum statistical mechanics, nuclear physics, anomalies, infinite-dimensional geometry and group
etc.), but the relativistic case has additional, theory, conformal field theory, and noncommutative
interesting complications: relativistic models are geometry.
genuine quantum field theories, that is, quantum We restrict our discussion to spin-0 bosons and
theories with an infinite number of degrees of spin-1/2 fermions, and we will not discuss models
freedom, with nontrivial features like divergences of particles in external gravitational fields but
and anomalies. Since interparticle interactions are only refer the interested reader to DeWitt (2003).
ignored, such models can be regarded as a first We also only mention in passing that external
approximation to more complicated theories, and field problems have also been studied using
they can be studied by mathematically precise functional integral approaches, and mathemati-
methods. cally precise work on this can be found in the
Models of relativistic particles in external electro- extensive literature on determinants of differential
magnetic fields have received considerable attention operators.
in the physics literature, and interesting phenomena
like the Klein paradox or particle–antiparticle pair
Examples
creation in overcritical fields have been studied; see
Rafelski et al. (1978) for an extensive review. We Consider the Schrödinger equation describing a
will not discuss these physics questions but only nonrelativistic particle of mass m and charge e
Bosons and Fermions in External Fields 319

moving in three-dimensional space and interacting certain (anti-) commutator relations, and this is a
with an external vector and scalar potentials A and convenient way to construct the appropriate many-
, respectively, particle Hilbert space, Hamiltonian, etc. In the
nonrelativistic case, this formalism can be regarded
1
i@t ¼ H ; H¼ ðir þ eAÞ2  e ½1 as an elegant reformulation of a pedestrian con-
2m struction of a many-body quantum-mechanical
(we set  h = c = 1, @t = @=@t, and ,, and A can model, which is useful since it provides convenient
depend on the space and time variables x 2 R 3 and computational tools. However, this formalism nat-
t 2 R). This is a standard quantum-mechanical urally generalizes to the relativistic case where the
model, with the one-particle wave function one-particle model no longer has an acceptable
allowing for the usual probabilistic interpretation. physical interpretation, and one finds that one can
One interesting generalization to the relativistic nevertheless give a consistent physical interpretation
regime is the Klein–Gordon equation to [2] and [3] provided that are interpreted as
h i quantum field operators describing bosons and
ði@t þ eÞ2 ðir þ eAÞ2  m2 ¼ 0 ½2 fermions. This particular exchange statistics of the
relativistic particles is a special case of the spin-
with a C-valued function . There is another
statistics theorem: integer-spin particles are bosons
important relativistic generalization, the Dirac
and half-integer spin particles are fermions. While
equation
many structural features of this formalism are
½ði@t þ eÞ  ðir þ eAÞ  a þ m ¼ 0 ½3 present already in the simpler nonrelativistic models,
the relativistic models add some nontrivial features
with a = (1 , 2 , 3 ) and  Hermitian 4  4
typical for quantum field theories.
matrices satisfying the relations
In the following, we discuss a precise mathema-
i j þ j i ¼ ij ; i  ¼ i ; 2 ¼ 1 ½4 tical formulation of the quantum field theory models
described above. We emphasize the functorial nature
and a C4 -valued function (we also write 1 for the of this construction, which makes manifest that it
identity). These two relativistic equations differ by also applies to other situations, for example, where
the transformation properties of under Lorentz the bosons and fermions are also coupled to a
transformations: in [2] it transforms like a scalar gravitational background, are considered in other
and thus describes spin-0 particles, and it transforms spacetime dimensions than 3 þ 1, etc.
like a spinor describing spin-1/2 particles in [3]. While
these equations are natural relativistic generaliza-
tions of the Schrödinger equation, they no longer
Second Quantization:
allow to consistently interpret as one-particle
Nonrelativistic Case
wave functions. The physical reason is that, in a
relativistic theory, high-energy processes can create Consider a quantum system of nondistinguishable
particle–antiparticle pairs, and this makes the particles where the quantum-mechanical descrip-
restriction to a fixed particle number inconsistent. tion of one such particle is known. In general, this
This problem can be remedied by constructing a one-particle description is given by a Hilbert space
many-body model allowing for an arbitrary number h and one-particle observables and transforma-
of particles and antiparticles. The requirement that tions which are self-adjoint and unitary operators
this many-body model should have a ground state is on h, respectively. The most important observable
an important ingredient in this construction. is the Hamiltonian H. We will describe a general
It is obviously of interest to formulate and study construction of the corresponding many-body
many-body models of nondistinguishable particles system.
already in the nonrelativistic case. An important
Example As a motivating example we take the
empirical fact is that such particles come in two
Hilbert space h = L2 (R3 ) of square-integrable func-
kinds, bosons and fermions, distinguished by their
tions f (x), x 2 R3 , and the Hamiltonian H in [1]. A
exchange statistics (we ignore the interesting possi-
specific example for a unitary operator on h is the
bility of exotic statistics). For example, the fermion
gauge transformation (Uf )(x) = exp(i(x))f (x) with
many-particle version of [1] for suitable  and A is a
 a smooth, real-valued function on R 3 .
useful model for electrons in a metal. An elegant
method to go from the one- to the many-particle In this example, the corresponding wave functions
description is the formalism of second quantization: for N identical such particles are the L2 -functions
one promotes to a quantum field operator with fN (x1 , . . . , xN ), xj 2 R3 . It is obvious how to extend
320 Bosons and Fermions in External Fields

one-particle observables and transformations to such for all f 2 h. Then the relations characterizing the
N-particle states: for example, the N-particle Hamil- field operators can be written as
tonian corresponding to H in [1] is y
½ ðf Þ; ðgÞ ¼ ðf ; gÞ
XN
1 ½ ðf Þ; ðgÞ ¼ 0 ½10
HN ¼ ðirxj þ eAðt; xj ÞÞ2  eðt; xj Þ ½5
j¼1
2m 8f ; g 2 h
and the N-particle gauge transformation
Q UN is defined where
through multiplication with N j=1 exp(i(x j )). Z
For systems of indistinguishable particles it is ðf ; gÞ ¼ d3 xf ðxÞgðxÞ
3
enough to restrict to wave functions which are even R

or odd under particle exchanges, is the inner product in h. The Fock space F  (h) can
then be defined by postulating that it contains a
fN ðx1 ; . . . ; xj ; . . . ; xk ; . . . ; xN Þ
normalized vector  called ‘‘vacuum’’ such that
¼ fN ðx1 ; . . . ; xk ; . . . ; xj ; . . . ; xN Þ ½6
ðf Þ ¼ 0 8f 2 h ½11
for all 1  j < k  N, with the upper and lower (y)
and that all (f ) are operators on F  (h) such that
signs corresponding to bosons and fermions, respec- y
(f ) = (f )
, where
is the Hilbert space adjoint.
tively (this empirical fact is usually taken as a
Indeed, from this we conclude that F  (h), as vector
postulate in nonrelativistic many-body quantum
space, is generated by
physics). It is convenient to define the zero-particle
Hilbert space as C (complex numbers) and to f1 ^ f2 ^    ^ fN y
ðf1 Þ y
ðf2 Þ    y
ðfN Þ ½12
introduce a Hilbert space containing states with all
possible particle numbers: this so-called Fock space with fj 2 h and N = 0, 1, 2, . . . , and that the Hilbert
contains all states space inner product of such vectors is
0 1 hf1 ^ f2 ^    ^ fN ; g1 ^ g2 ^    ^ gM i
f0
B f1 ðx1 Þ C X Y
N
B C ¼ N;M ð1ÞjPj ðfj ; gPj Þ ½13
B f2 ðx1 ; x2 Þ C
B C ½7 j¼1
B f3 ðx1 ; x2 ; x3 Þ C P2SN
@ A
.. with SN the permutation group, with (þ1)jPj = 1
.
always, and (1)jPj = þ1 and 1 for even and odd
with f0 2 C. The definition of HN and UN then permutations, respectively. The many-body Hamil-
naturally extends to this Fock space; see below. tonian q(H) corresponding to the one-particle Hamil-
tonian H can now be defined by the following relations:
y y
General Construction qðHÞ ¼ 0; ½qðHÞ; ðf Þ ¼ ðHf Þ ½14
The construction of Fock spaces and many-particle for all f 2 h such that Hf is defined. Indeed, this
observables and transformations just outlined in a implies that
specific example is conceptually simple. An alter-
qðHÞf1 ^ f2 ^    ^ fN
native, more efficient construction method is to use
‘‘quantum fields,’’ which we denote as (x) and X
N
y
(x), x 2 R 3 . They can be fully characterized by the ¼ f1 ^ f2 ^    ^ ðHfj Þ ^    ^ fN ½15
j¼1
following (anti-) commutator relations:
which defines a self-adjoint operator on F  (h), and
½ ðxÞ; y
ðyÞ ¼ 3 ðx  yÞ; ½ ðxÞ; ðyÞ ¼ 0 ½8 it is easy to check that this coincides with our down-
where [a, b] ab  ba, with the commutator and to-earth definition of HN above. Similarly, the
anticommutators (upper and lower signs, respec- many-body transformation Q(U) corresponding to
tively) corresponding to the boson and fermion case, a one-particle transformation U can be defined as
respectively. It is convenient to ‘‘smear’’ these fields QðUÞ ¼ ; QðUÞ y
ðf Þ ¼ y
ðUf ÞQðUÞ ½16
with one-particle wave functions and define
Z for all f 2 h, which implies that
ðf Þ ¼ d3 xf ðxÞ ðxÞ
3 QðUÞf1 ^ f2 ^    ^ fN
ZR ½9 ½17
¼ ðUf1 Þ ^ ðUf2 Þ ^    ^ ðUfN Þ
y
ðf Þ ¼ d3 x y ðxÞf ðxÞ
R3
Bosons and Fermions in External Fields 321

and thus coincides with our previous definition of for all m, n. We also note that, in our definition of
UN . q(A), we made a convenient choice of normal-
While we presented the construction above for a ization, but there is no physical reason to not choose
particular example, it is important to note that it a different normalization and define
actually does not make reference to what the one-
q 0 ðAÞ ¼ qðAÞ  bðAÞ ½24
particle formalism actually is. For example, if we
had a model of particles on a space M given by where b is some linear function mapping self-adjoint
some ‘‘nice’’ manifold of any dimension and with M operators A to real numbers. For example, one may wish
internal degrees of freedom, we would take to use another reference vector  ~ instead of  in the
h = L2 (M) CM and replace [9] by Fock space, and then would choose b(A) = h, ~ q(A)i.
~
Z Then the relations in [19] are changed to
XM
ðf Þ ¼ dðxÞ fj ðxÞ j ðxÞ ½18 ½q0 ðAÞ; q 0 ðBÞ ¼ q 0 ð½A; BÞ þ S0 ðA; BÞ ½25
M j¼1
where S0 (A, B) = b([A, B]). However, the C-number
and its Hermitian conjugate, with the measure  on term S0 (A, B) in the relations [25] is trivial, since it
M defining the inner product in h, can be removed by going back to q(A).
Z X
ðf ; gÞ ¼ dðxÞ fj ðxÞgj ðxÞ
j Physical Interpretation
With that, all formulas after [9] hold true as they stand. The Fock space F  (h) is the direct sum of subspaces
Given any one-particle Hilbert space h with inner of states with different particle numbers N,
product ( , ), observable H, and transformation U, the
M
1
formulas above define the corresponding Fock spaces F  ðhÞ ¼ hðNÞ ½26

F  (h) and many-body observable q(H) and transfor- N¼0
mation Q(U). It is also interesting to note that this
where the zero-particle subspace h(0)
 = C is gener-
construction has various beautiful general (functorial)
ated by the vacuum , and h(N) is the N-particle
properties: the set of one-particle observables has a
subspace generated by the states f1 ^ f2 ^    ^
natural Lie algebra structure with the Lie bracket given
fN , fj 2 h. We note that
by the commutator (strictly speaking: i times the
commutator, but we drop the common factor i for N qð1Þ ½27
simplicity). The definitions above imply that
is the ‘‘particle-number operator,’’ N FN = NFN for
(N)
½qðAÞ; qðBÞ ¼ qð½A; BÞ ½19 all FN 2 h . The field operators obviously change
the particle number: y (f ) increases the particle
for one-particle observables A, B, that is, the above- number by one (maps h(N) to h(Nþ1) ), and (f )
 
mentioned Lie algebra structure is preserved under decreases it by one. Since every f 2 h can be interpreted
this map q. In a similar manner, the set of one- as one-particle state, it is natural to interpret y (f ) and
particle transformations has a natural group struc- (f ) as ‘‘creation’’ and ‘‘annihilation’’ operators,
ture preserved by the map Q, respectively: they create and annihilate one particle in
QðUÞQðVÞ ¼ QðUVÞ; QðUÞ1 ¼ QðU1 Þ ½20 the state f 2 h. It is important to note that, in the
fermion case, [10] implies that y (f )2 = 0, which is a
Moreover, if A is self-adjoint, then exp(iA) is mathematical formulation of the Pauli exclusion
unitary, and one can show that principle: it is not possible to have two fermions in the
same one-particle state. In the boson case, there is no
QðexpðiAÞÞ ¼ expðiqðAÞÞ ½21
such restriction. Thus, even though the formalisms
For later use, we note that, if {fn }n2Z is some used to describe boson and fermion systems look very
complete, orthonormal basis in h, then operators A similar, they describe dramatically different physics.
on h can be represented by infinite matrices
(Amn )m, n2Z with Amn = (fm , Afn ), and Applications
X
qðAÞ ¼ Amn ym n ½22 In our example, the many-body Hamiltonian
m;n H0 q(H) can also be written in the following
where (y)
= (y)
(fn ) obey suggestive form:
n
Z
   
m;
y
n ¼ m;n ; m;
y
n ¼ 0 ½23 H0 ¼ d3 x y ðxÞðH ÞðxÞ ½28
322 Bosons and Fermions in External Fields

and similar formulas hold true for other observables Field Algebras and Quasifree Representations
and other Hilbert spaces h = L2 (M) Cn . It is
In the previous section, we identified the field
rather easy to solve the model defined by such
operators (y) (f ) with particular Fock space opera-
Hamiltonian: all necessary computations can be
tors. This is analogous to identifying the operators
reduced to one-particle computations. For example,
pj = i@xj and qj = xj on L2 (RM ) with the generators
in the static case, where A and  are time
of the Heisenberg algebra, as usually done. (We
independent, a main quantity of interest in statistical
recall: the Heisenberg algebra is the star algebra
physics is the free energy
generated by Pj and Qj , j = 1, 2, . . . , M < 1, with
E 1 logðtrðexp ð½H0  N ÞÞÞ ½29 the well-known relations

where  > 0 is the inverse temperature,  the ½Pj ; Pk  ¼ ijk ; ½Pj ; Pk  ¼ ½Pj ; Qk  ¼ 0
chemical potential, and the trace over the Fock ½32
Pyj ¼ Pj ; Qyj ¼ Qj
space F  (h). One can show that
  for all j, k.) Identifying the Heisenberg algebra with
E ¼ tr 1 logð1  expð½H  ÞÞ ½30
a particular representation is legitimate since, as is
where the trace is over the one-particle Hilbert space well known, all its irreducible representations are
h. Thus, to compute E, one only needs to find the (essentially) the same (this statement is made precise
eigenvalues of H. by a celebrated theorem due to von Neumann).
It is important to mention that the framework However, in case of the algebra generated by the
discussed here is not only for external field field operators (y) (f ), there exist representations
problems but can be equally well used to for- which are truly different from the ones discussed in
mulate and study more complicated models with the last section, and such representations are needed
interparticle interactions. For example, while the to construct relativistic external field problems. It is
model with the Hamiltonian H0 above is often too therefore important to distinguish the fields as
simple to describe systems in nature, it is easy to generators of an algebra from the operators repre-
write down more realistic models, for example, the senting them. We thus define the (boson or fermion)
Hamiltonian field algebra A (h) over a Hilbert space h as the star
Z Z algebra generated by y (f ), f 2 h, such that the map
H ¼ H0 þ ðe2 =2Þ d3 x d3 y y ðxÞ y ðyÞ f ! (f ) is linear and the relations

 jx  yj1 ðyÞ ðxÞ ½31 ½ðf Þ; y ðgÞ ¼ ðf ; gÞ


describes electrons in an external electromagnetic ½ðf Þ; ðgÞ ¼ 0 ½33
field interacting through Coulomb interactions. This y y
 ðf Þ ¼ ðf Þ
illustrates an important point which we would like
to stress: the task in quantum theory is twofold, are fulfilled for all f , g 2 h, with y the star
namely to formulate and to solve (exact of other- operation in A (h). The particular representation
wise) models. Obviously, in the nonrelativistic case, of this algebra discussed in the last section will be
it is equally simple to formulate many-body models denoted by 0 , 0 ((y) (f )) = (y) (f ). Other represen-
with and without interparticle interactions, and only tations P can be constructed from any projection
the latter are simpler because they are easier to operators P on h, that is, any operator P on h
solve: the two tasks of formulating and solving satisfying P
 = P2 = P . Writing ˆ (y) (f ) short for
models can be clearly separated. As we will see, in P ((y) (f )), this so-called quasifree representation
the relativistic case, even the formulation of an is defined by
external field problem is nontrivial, and one finds
^y ðf Þ ¼ y
ðPþ f Þ þ ðP f Þ
that one cannot formulate the model without at ½34
least partially solving it. This is a common feature of ^ðf Þ ¼ ðPþ f Þ  y
ðP f Þ
quantum field theories making them challenging and
interesting. where the bar means complex conjugation. It is
important to note that, while the star operation is
identical with the Hilbert space adjoint
in the
fermion case, we have
Relativistic Fermion and Boson Systems
^ðf Þy ¼ ðFf Þ
with
We now generalize the formalism developed in the ½35
previous section to the relativistic case. F ¼ Pþ  P  for bosons
Bosons and Fermions in External Fields 323

where F is a grading operator, that is, F


= F and F2 = 1. Many-body formalism We now explain how to
We stress that the ‘‘physical’’ star operation always is
, construct a physical many-body description from these
that is, physical observables A obey A = A
. data. To simplify notation, we first assume that D has a
The present framework suggests to regard quantiza- purely discrete spectrum (which can be achieved by
tion as the procedure which amounts to going from a using a compact space). We can then label the eigen-
one-particle Hilbert space h to the corresponding field functions fn by integers n such that the corresponding
algebra Aþ (h). Indeed, the Heisenberg algebra is eigenvalues En 0 for n 0 and En < 0 for n < 0.
identical with the boson field algebra A (CM ) (since Using the naive representation of the fermion field
the latter is obviously identical with the algebra of M algebra discussed in the last section, we get (we use the
harmonic oscillators), and thus conventional quantum notation introduced in [22])
mechanics can be regarded as boson quantization in the X X
special case where the one-particle Hilbert space is qðDÞ ¼ jEn j yn n  jEn j yn n ½38
n 0 n<0
finite dimensional. It is interesting to note that
‘‘fermion quantum mechanics’’ A (CM ) is the natural which is obviously not bounded from below and thus
framework for formulating and studying lattice fer- not physically meaningful. However, yn n = 1  n yn ,
mion and spin systems which play an important role in which suggests that we can remedy this problem by
condensed matter physics. interchanging the creation and annihilation operators
In the following, we elaborate the naive inter- for n < 0. This is possible: it is easy to see that
pretations of the relativistic equations in [2] and [3]
^n 8n 0 and ^n y
8n < 0 ½39
as a quantum theory of one particle, and we discuss n n
why they are unphysical. For simplicity, we assume provides a representation of the algebra in [23]. We
that the electromagnetic fields , A are time inde- thus define
pendent. We then show that quasifree representa- X
tions as discussed above can provide physically q
^ðDÞ En : ^yn ^n : ½40
acceptable many-particle theories. We first consider n2Z

the Dirac case, which is somewhat simpler. with the so-called normal ordering prescription
 
Fermions : ym n : ym n  ; ym n  ½41
One-particle formalism Recalling that i@t is the where we made use of the freedom of normalization
energy operator, we define the Dirac Hamiltonian D explained after [23] to eliminate unwanted additive
P
by rewriting [3] in the following form: constants. We get q(D) = n2Z jEn j yn n , which is
i@t ¼ D ; D ¼ ðir þ eAÞ  a þ m  e ½36 manifestly a non-negative self-adjoint operator with
 as ground state. We thus found a physical many-
This Dirac Hamiltonian is obviously a self-adjoint body description for our model. We can now define
operator on the one-particle Hilbert space h = L2 (R 4 ) for other one-particle observables,
C4 , but, different from the Schrödinger Hamiltonian in X
[1], it is not bounded from below: for any E0 > 1, q
^ðAÞ Amn : ^ym ^n : ½42
n2Z
one can find a state f such that the energy expectation
value (f, Df ) is less than E0 . This can be easily seen for and, by straightforward computations, we obtain
the simplest case where the external potential vanishes,
A =  = 0. Then the eigenvalues of D can be computed qðAÞ; q
½^ ^ðBÞ ¼ q
^ð½A; BÞ þ SðA; BÞ ½43
P P
by Fourier transformation, and one finds where S(A, B) = m<0 n 0 (Amn Bnm  Bmn Anm ),
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi that is,
E ¼  p 2 þ m2 ; p 2 R 3 ½37
SðA; BÞ ¼ trðP APþ BP  P BPþ AP Þ ½44
Due to the negative energy eigenvalues we conclude P
with P = n<0 fn (fn , ) the projection onto the
that there is no ground state, and the Dirac
subspace spanned by the negative energy eigenvec-
Hamiltonian thus describes an unstable system,
tors of D and Pþ = 1  P . One can show that q ^(A)
which is physically meaningless.
is no longer defined for all operators but only if
To summarize: a (unphysical) one-particle
description of relativistic fermions is given by a P APþ and Pþ AP are
Hilbert space h together with a self-adjoint Hamil- Hilbert–Schmidt operators ½45
tonian D unbounded from below. Other observables
and transformations are given by self-adjoint and (we recall that a is a Hilbert–Schmidt operator if
unitary operators on h, respectively. tr(a
a) < 1). The C-number term S(A,B) in [43] is
324 Bosons and Fermions in External Fields

often called Schwinger term and, different from the i@t  ¼ K


similar term in [25], it is now nontrivial, that is, it is

C i ½48
no longer possible to remove it by a redefinition ¼ ; K¼ 2
^0 (A) = q
q ^(A)  b(A). This Schwinger term is an y iB C
example of an anomaly, and it has various interest- with
ing implications.
In a similar manner, one can construct the many- B2 ðir þ eAÞ2 þ m2 ; C e ½49
body transformations Q(U)^ of unitary operators U Thus, one sees that the natural one-particle Hilbert
on h satisfying the very Hilbert–Schmidt condition space for the Klein–Gordon equation is
in [45], and one obtains h = L2 (R 3 ) C2 ; here, and in the following, we
^
QðUÞ ^
QðVÞ ^
¼ ðU; VÞQðUVÞ ½46 identify h with h0 h0 , h0 = L2 (R3 ), and use a
convenient 2  2 matrix notation naturally asso-
with interesting phase-valued functions . ciated with that splitting. However, the one-particle
More generally, for any one-particle Hilbert Hamiltonian is not self-adjoint but rather obeys
space h and Dirac Hamiltonian D, the physical

representation is given by the quasifree representa-


0 i
K ¼ JKJ; J ½50
tion P in [34] with P the projection onto the i 0
negative energy subspace of D. The results about q ^ with
the Hilbert space adjoint. It is important to
and Q ^ mentioned hold true in any such
note that J is a grading operator. Thus, we can
representation. define a sesquilinear form
Thus the one-particle Hamiltonian D determines
which representation one has to use, and one ðf ; gÞJ ðf ; JgÞ 8f ; g 2 h ½51
therefore cannot construct the ‘‘physical’’ represen-
with ( , ) the standard inner product, and [50] is
tation without specific information about D. How-
equivalent to K being self-adjoint with respect to
ever, not all these representations are truly different:
this sesquilinear form; in this case, we say that K is
if there is a unitary operator U on the Fock space
J-self-adjoint. Thus, in the Klein–Gordon case, this
F þ (h) such that
sesquilinear form takes the role of the Hilbert space
U
Pð1Þ ð ðyÞ
ðf ÞÞU ¼ Pð2Þ ð ðyÞ
ðf ÞÞ ½47 inner product and, in particular, not (,) but (,)J is
 
preserved under time evolution. However, different
for all f 2 h, then the quasifree representations from y , y J is not positive definite, and it is
associated with the different projections P(1)  and
therefore not possible to interpret it as probability
P(2)
 are physically equivalent: one could equally well
density as in conventional quantum mechanics. For
formulate the second model using the representation consistency, one has to require that one-particle
of the first. Two such quasifree representations are transformations U are unitary with respect to (,)J ,
called unitarily equivalent, and a fundamental that is, U1 = JUJ. We call such operators J-unitary.
theorem due to Shale and Stinespring states that To summarize: a (unphysical) one-particle
two quasifree representations P(1, 2) are unitarily description of relativistic bosons is given by a

equivalent if and only if P(1) (2)
  P is a Hilbert–
Hilbert space of the form h = h0 h0 , the grading
Schmidt operator (a similar result holds true in the operator J in [50], and a J-self-adjoint Hamiltonian
boson case). K of the form as in eqn [48], where B 0 and C are
self-adjoint operators on h0 . Other observables and
transformations are given by J-self-adjoint and
Bosons J-unitary operators on h, respectively.
One-particle formalism Similarly as for the Dirac
case, the solutions of the Klein–Gordon equation in Many-body formalism We first consider the quasi-
[2] also do not define a physically acceptable one- free representation P(0) 
of the boson field algebra
particle quantum theory with a ground state: the A (h) so that the grading operator in [35] is
energy eigenvalues in [37] for A =  = 0 are a equal to J, that is, P(0)  = (1  J)=2. Writing
(y) (y)
consequence the relativistic invariance and thus P(0)

( (f )) = (f ), one finds that
equally true for the Klein–Gordon case. However,
qðAÞ
¼ qðJAJÞ; QðUÞ
¼ QðJU
JÞ ½52
in this case there is a further problem. To find the
one-particle Hamiltonian, one can rewrite the and thus J-self-adjoint operators and J-unitary
second-order equation in [2] as a system of first- operators are mapped to proper observables and
order equations, transformations. In particular, q(K) is a self-adjoint
Bosons and Fermions in External Fields 325

operator, which resolves one problem of the one-particle related to conformal field theory (see, e.g., Kac and
theory. However, q(K) is not bounded from below, and Raina (1987) for a textbook presentation and Carey
thus P(0)

is not yet the physical representation. and Ruijsenaars (1987) for a detailed mathematical
The physical representation can be constructed account within the framework described by us).
using the operators It turns out that the mathematical framework


discussed in the previous section is sufficient for
1 B1=2 iB1=2 1 0
T ¼ pffiffiffi 1=2 ; F¼ ½53 constructing fully interacting quantum field theories,
2 B iB1=2 0 1
in particular Yang–Mills gauge theories, in 1 þ 1
(for simplicity, we restrict ourselves to the case C = 0 but not in higher dimensions. The reason is that, in
and B > 0; we use the calculus of self-adjoint operators 3 þ 1 dimensions, the one-particle observables A of
here) with the following remarkable properties: interest do not obey the Hilbert–Schmidt condition
in [45] but only the weaker condition
T 1 ¼ JT
F

trða
aÞn < 1; a ¼ P AP ½56
B 0 ½54
TKT 1 ¼ ^
K
0 B with n = 2, and the natural analog of g2 in 3 þ 1
One can check that dimensions thus seems to be the Lie algebra g2n of
operators satisfying this condition with n = 2. Various
^y ðf Þ y
ðTf Þ; ^ðf Þ ðT 1 f Þ ½55 results on the representation theory of such Lie
algebras g2n>2 have been developed (see Mickelsson
is a quasifree representation P of A (h) with
(1989), where various interesting relations to infinite-
P = (1  F)=2. With that the construction of q ^ and
^ is very similar to the fermion case described dimensional geometry are also discussed).
Q
^ and F now As mentioned, the Schwinger term S(A,B) in [44] is
above (the crucial simplification is that K
an example of an anomaly. Mathematically, it is a
are diagonal). In particular, q^(K) is a non-negative
nontrivial 2-cocycle of the Lie algebra g2 , and analogs
operator with the ground state , and q ^(A) and
^ for the groups g2n>2 have been found. These cocycles
Q(U) are self-adjoint and unitary for every one-
provide a natural generalization of anomalies (in the
particle observable A and transformation U, respec-
meaning of particle physics) to operator algebras. They
tively. One also gets relations as in [43] and [46].
not only shed some interesting light on the latter, but
also provide a link to notions and results from
Related Topics of Recent Interest noncommutative geometry (see, e.g., Gracia-Bondı́a
et al. (2001)). We believe that this link can provide a
The impossibility to construct relativistic quantum- fruitful driving force and inspiration to find ways to
mechanical models played an important role in the deepen our understanding of quantum Yang–Mills
early history of quantum field theory, as beautifully theories in 3 þ 1 dimensions (Langmann 1996).
discussed in chapter 1 of Weinberg (1995).
The abstract formalism of quasifree representations See also: Anomalies; C*-Algebras and Their
of fermion and boson field algebras was developed in Classification; Dirac Fields in Gravitation and Nonabelian
many papers (see, e.g., Ruijsenaars (1977), Grosse and Gauge Theory; Dirac Operator and Dirac Field; Gerbes in
Langmann (1992), and Langmann (1994) for explicit Quantum Field Theory; Quantum Field Theory in Curved
results on Q ^ and ). A nice textbook presentation Spacetime; Quantum n-Body Problem; Superfluids;
with many references can be found in chapter 13 of Two-Dimensional Models.
Gracia-Bondı́a et al. (2001) (this chapter is rather self-
contained but mainly restricted to the fermion case).
Further Reading
Based on the Shale–Stinespring theorem, there has
been considerable amount of work to investigate Carey AL and Ruijsenaars SNM (1987) On fermion gauge
whether the quasifree representations associated groups, current algebras and Kac–Moody algebras. Acta
Applicandae Mathematicae 10: 1–86.
with different external electromagnetic fields
DeWitt B (2003) The Global Approach to Quantum Field
1 , A1 and 2 , A2 are unitarily equivalent, if and Theory, International Series of Monographs on Physics, vols.
which time-dependent many-body Hamiltonians 1 and 2, p. 114. New York: Oxford University Press.
exist, etc. (see chapter 13 of Gracia-Bondı́a et al. Gracia-Bondı́a JM, Várilly JC, and Figueroa H (2001) Elements
(2001), and references therein). of Noncommutative Geometry, Birkhäuser Advanced Texts:
The infinite-dimensional Lie algebra g2 of Hilbert Basel Textbooks. Boston: Birkhäuser.
Grosse H and Langmann E (1992) A superversion of quasifree second
space operators satisfying the condition in [45] is an quantization. Journal of Mathematical Physics 33: 1032–1046.
interesting infinite-dimensional Lie algebra with a Kac VG and Raina AK (1987) Bombay Lectures on Highest
beautiful representation theory. This subject is closely Weight Representations of Infinite-Dimensional Lie Algebras,
326 Boundaries for Spacetimes

Advanced Series in Mathematical Physics, vol. 2. Teaneck: Reed M and Simon B (1975) Methods of Modern Mathematical
World Scientific Publishing. Physics. II. Fourier Analysis, Self-Adjointness. New York:
Langmann E (1994) Cocycles for boson and fermion Bogoliubov Academic Press.
transformations. Journal of Mathematical Physics 96–112. Ruijsenaars SNM (1977) On Bogoliubov transformations for
Langmann E (1996) Quantum gauge theories and noncommuta- systems of relativistic charged particles. Journal of Mathema-
tive geometry. Acta Physica Polonica B 27: 2477–2496. tical Physics 18: 517–526.
Mickelsson J (1989) Current Algebras and Groups, Plenum Weinberg S (1995) The Quantum Theory of Fields, vol. I (English
Monographs in Nonlinear Physics. New York: Plenum Press. summary) Foundations. Cambridge: Cambridge University Press.
Rafelski J, Fulcher LP, and Klein A (1978) Fermions and bosons
interacting with arbitrary strong external fields. Physics
Reports 38: 227–361.

Boundaries for Spacetimes


S G Harris, St. Louis University, St. Louis, MO, USA This article will consider several of the methods
ª 2006 Elsevier Ltd. All rights reserved. that have been used or proposed for constructing
boundaries for spacetimes, ranging from the ad hoc
(but practical) to the universal. Perhaps the
Introduction simplest way to classify these methods is into
those which employ or analyze embeddings of the
There is a common practice in mathematics of placing a spacetime in question and those that do not.
boundary on an object which may not appear to come
naturally equipped with one; this is often thought of as
adding ideal points to the object. Perhaps the most Boundaries from Embeddings
famous example is the addition of a single ‘‘point at General
infinity’’ to the complex plane, resulting in the Riemann
sphere: this is a boundary point in the sense of providing The simplest and most common method of construct-
an ideal endpoint for lines and other endless curves in ing a boundary for a spacetime M is to find a suitable
the plane. Often, there is more than one reasonable way manifold N (of the same dimension) and an appro-
to construct a boundary for a given object, depending priate map  : M ! N which is a topological embed-
on the intent; for instance, the plane is sometimes ding, that is, a homeomorphism onto its image (M).
We can consider M   , the closure of (M) in N, as the
equipped, not with a single point at infinity, but with a
-completion of M, and @ (M) = M    (M) as the
circle at infinity, resulting in a space homeomorphic to a
closed disk. Both these boundaries on the plane have -boundary. Typically, this embedding is chosen in
useful but different things to tell us about the nature of such a way that curves of interest in M – such as
the plane; the common feature is that, by bringing the timelike or null geodesics or causal curves of bounded
infinite reach of the plane within the confines of a more acceleration – which have no endpoints in M, do have
finite object, we are better able to grasp the behavior of endpoints in @ (M); in other words, if c : [0, 1) ! M is
the original object. such a curve of interest, then limt!1 (c(t)) exists in N.
The general usefulness of the construction of The common practice, initiated by Penrose in
boundaries for an object is to allow behavior of 1967, is to choose N to be another spacetime –
structures in the ‘‘completed’’ object to aid in often called the unphysical spacetime, while M is
visualization of behavior in the original object, considered the spacetime of physical interest – and to
such as by providing a degree of measurement or require the embedding  to be a conformal mapping,
other classification of processes at infinity. This that is,  carries the spacetime metric in M to a scalar
utility has not been overlooked for spacetimes. A multiple of the spacetime metric in N. As conformal
variety of purposes may be served by various maps preserve the local causal structure, leaving
boundary construction methods: providing a locale unchanged the notions of timelike curve or null
curve, this means that M   inherits from N a causal
for singularities (as the spacetime itself is modeled
by a smooth manifold with a smooth metric, free of structure which, locally, is an extension of that of M.
singular points); providing a platform from which to This allows us to speak of causal relationships within
M  , closely related to those in M.
measure global properties such as total energy or
angular momentum; displaying in finite form the
Minkowski Space
causal structure at infinity; or providing a compact
(or quasicompact) topological envelope for the The prototypical example is the conformal embedding
spacetime while preserving the causal structure. of Minkowski space into the Einstein static spacetime.
Boundaries for Spacetimes 327

Let Rn denote Euclidean n-space, Sn the unit termed future-null infinity, and I is past-null infinity.
n-sphere, and Ln Minkowski n-space, that is, Rn with All spacelike geodesics come to i0 , spacelike infinity.
metric ds2 = dx21 þ    þ dx2n1  dt2 (so Ln = For n = 2, this picture produces the familiar
Rn1  L1 ). The n-dimensional Einstein static space- diamond representation of L2 (Figure 3): as E2 is
time is the product spacetime En = Sn1  L1 . Con- easily unrolled into another copy of L2 (metric
sider Sn1 as embedded in Rn = Rn1  R1 . Then the
conformal embedding is  : Ln ! En , expressed as
i+
 : Rn1  L1 ! Sn1  L1  Rn1  R 1  L1 given
by (x, t) = ((x=jxj) sin , cos , ), where  = tan1
(t þ jxj)  tan1 (t  jxj) and  = tan1 (t þ jxj) þ
tan1 (t  jxj). The boundary @ (Ln ) consists of the τ=π
following: the points { þ  = ; 0 <   }, composed
of an Sn2 of null lines coming together at the point
iþ = (0, 1, ); a similar cone of null lines {   = ;
   < 0} with vertex at i = (0, 1, ); and a single
limit-point for both cones at i0 = (0, 1, 0). The  > 0
null cone is called Iþ (the letter is read ‘‘scri’’ for
‘‘script-I’’), its counterpart I (Figures 1 and 2). As all
future-directed timelike geodesics in Ln have iþ as an
endpoint in En , iþ is called future-timelike infinity;
similarly, i is past-timelike infinity. Every future-
directed null geodesic ends up on Iþ , which is thus
ᑣ+

E2

τ=0
i+ τ=π

ᑣ+
i0

Image of L2
i0

τ=0
ᑣ–

ᑣ–
i–

τ = –π
i– τ = –π

Figure 1 L2 conformally embedded in E2 = S1  L1 : Figure 2 L3 conformally embedded in E3 = S2  L1 :


328 Boundaries for Spacetimes

(1994) formulated what they called the abstract


Unrolled E2 boundary of a spacetime. This depends on a choice
of class of ‘‘interesting’’ curves, each characterizable
ᑣ+ as having either infinite or finite parameter length;
typical choices for this class would be timelike
geodesics or causal geodesics or timelike curves of
bounded acceleration. For instance, a boundary
point may be said to represent a singularity with
respect to the chosen class of curves if it is the
endpoint of one such curve with finite parameter
length; nonsingular points are points at infinity.
Image of L2
ᑣ– These classifications do not require conformal
embeddings, nor even that the target of the embed-
dings be spacetimes; they accommodate boundaries
i– of a far more general type than Penrose’s notion
Figure 3 L 2
conformally embedded in unrolled E2 , i.e., stemming from conformal embeddings.
R 1  L1 = L2 : A somewhat different study of boundaries from
embeddings has been formulated by Garcı́a-Parrado
d2  d 2 ), this means that (L2 ) is the region jj þ and Senovilla (2003), classifying points at infinity and
jj <  in L2 ; timelike curves and null geodesics in singularities in @ (M) for embeddings  : M ! N in
the original L2 are the same as in (L2 ), and their which N is a spacetime,  preserves the chronology
endpoints in the boundary of the diamond are relation , and there is also a diffeomorphism
evident. For higher dimensions, the picture is not as : (M) ! N which again preserves (the chronol-
visually obvious, since En cannot be unrolled; but the ogy relation in a spacetime is defined thus: x y if
principle of reading the causal structure at infinity of and only if there is a future-directed timelike curve
Ln via its boundary points in En remains the same. from x to y). This scheme applies more generally than
to conformal embeddings, but the requirement for
Conformal Embeddings chronology-preserving maps in both directions guar-
antees a strong sensitivity to causality; it amounts to a
There have been various formulations designed to mild extension of Penrose’s notion that is often much
emulate the conformal mapping of Ln with respect to easier to construct.
spacetimes, which are, in some sense, asymptotically
like Minkowski space being conformally mapped into
larger spacetimes. A spacetime M with metric g is Universal Constructions
called asymptotically simple or (alternatively) asymp-
totically flat if there is a spacetime N with metric h, B-Boundary
an embedding  : M ! N, and a scalar function  Attempts have been made to formulate boundary
defined on N with  h = (  )2 g (i.e.,  is concepts specifically for defining singularities as
conformal with 2 the conformal factor) and  = 0 ideal endpoints for finite-length geodesics. The
on @ (M), d 6¼ 0 on @ (M), and various other most complete venture in this direction is the
restrictions on , depending on the intent. One can b-boundary (‘‘b’’ for ‘‘bundle’’) of Schmidt (Hawking
define asymptotic symmetries of M by means of and Ellis 1973, pp. 276–284). This is a formulation
motions within @ (M), leading to notions of global that takes note only of the connection in the linear
energy and angular momentum (see Hawking and frames bundle L(M) of a spacetime M (or of any
Ellis (1973) and Wald (1984) for details). manifold with a linear connection, metric or other-
wise); in other words, it takes no particular note of
Classifications of Embeddings
the spacetime metric or even of the causal structure of
As a general rule, there is no uniqueness in the the spacetime, but only of the notion of parallel
choice of an embedding  for a spacetime M to translation of tangent vectors along curves. Parallel
construct a boundary, nor in the topology of the translation of a frame (a basis for the tangent space)
resulting boundary @ (M), or even of which curves along a curve is used to obtain an ad hoc length for
of interest end up having endpoints in the boundary. the curve by treating the translated frame as positive-
In an attempt to categorize which embeddings yield definite orthonormal at each point; whether this
equivalent results and what sort of results there are length is finite or infinite is independent of the choice
in terms of endpoints of curves, Scott and Szekeres of the original frame. The Schmidt construction
Boundaries for Spacetimes 329

defines a boundary on M which gives an endpoint for


each curve, endless in M, which is finite in that sense:
Select a positive-definite metric on L(M), give it a
boundary by means of Cauchy completion, and then
take the appropriate quotient by the bundle group.
This has an appealing universality of application, but
the problems of putting it into practice are quite P
formidable. Also, the fact that it takes no special note
of the spacetime character of M suggests that it may
not be of particular utility for physical insights.

Causal Boundary: Basics Figure 4 PIP P = I  (x ).

In 1972 Geroch, Kronheimer, and Penrose (GKP)


a causal curve ending at x.) The future causal
formulated a notion of boundary – the causal ^
boundary of M, @(M), consists of all the TIPs of M;
boundary – that is specifically adapted to the causal ^
^ = @(M)
the future causal completion of M is M [ M.
character of a spacetime M; indeed, it is defined in
But that is just a set; the causal structure of M needs
such a way that one need know only the chronology ^
to be extended to M.
relation on M without any further reference to ^
For any x 2 M and P 2 @(M), set x P if and
the metric (another way of saying this is that the
only if x 2 P; set P x if and only if P  I (y) for
causal boundary is conformally invariant). Like ^
some y x (y 2 M); and for P and Q in @(M), set
Schmidt’s b-boundary, the causal boundary is a 
P Q if and only if P  I (y) for some y 2 Q.
universal construction, not depending on any extra-
If we consider this an extension of the relation on
neous choices; however, although it has an obvious
M, then we end up with a relation which, like that
clarity in its causal structure, there are subtleties in
on M, is transitive and antireflexive. Furthermore, it
the choice of an appropriate topology which are ^   if and
has the property that for all ,  2 M,
perhaps not yet fully resolved. As this boundary
only if for some x 2 M,  x . (One can also
construction appears to embody the best hopes for a
amend the chronology relation within M to be more
practical universal construction, it is detailed here in
like the definition in the extension; that is not of
some depth.
major import.)
The causal boundary construction applies only to
We can also extend the causality relation
on M
strongly causal spacetimes; essentially, this means ^ (in M, x
y if there is a future-directed
to one on M
that the local causal structure at each point is
exactly reflective of the global causal structure.
The basic construction of the causal boundary of
a spacetime M starts with two separate parts: the
future and past (pre-)boundaries of M, intended as
yielding endpoints for, respectively, future- and past-
endless causal curves. Part of the difficulty of the
causal boundary is knowing how best to meld these
two into one; currently, there are several answers to
this conundrum.
The elements of the future causal boundary of M
are defined in terms of the past-set operator I . For
a point x 2 M, the pastS of x is I (x) = {y j y x}; for
a set A  M, I [A] = x2A I (x). A set P  M is


called a past set if I [P] = P; anything of the form


P
P = I [A] is a past set, and all past sets have this
form. A past set P is an indecomposable past set (IP)
if P cannot be written as P1 [ P2 for past sets which
are proper subsets Pi ( P. IPs come in exactly two
varieties: pointlike IPs (PIPs), of the form I (x)
(Figure 4), and terminal IPs (TIPs), of the form I [c]
for c a future-endless causal curve (Figure 5). (Of c
course, any I (x) can also be expressed as I [c] for c Figure 5 TIP P = I  c.
330 Boundaries for Spacetimes

causal curve from x to y): for x 2 M and P, Q 2 of more concern is that the topology prescribed by
^
@(M), x
P for I (x)  P, P
x for P  I (x), and GKP is not what might be expected in even the
P
Q for P  Q. simplest of cases, for example, Minkowski space: Ln
The intent is to have the elements of @(M)^ provide needs no identifications among boundary points (no
future endpoints for future-endless causal curves in matter whose identification procedure is followed).
M; in particular, we want two such curves, c1 and The GKP topology on Ln , restricted to @(L ^ n ), is not
n2 1
c2 , to be assigned the same future endpoint precisely that of a cone (S  R with a point added), as is
when I [c1 ] = I [c2 ]. This is accomplished by the the case for Iþ in the conformal embedding into En ;
simple expedient of defining the future endpoint of a ^ n ) (not including iþ )
but, instead, each null line in @(L
future-endless causal curve c to be P = I [c]. We do is an open set, and iþ has no neighborhood in @(L ^ n)
not have a topology on M ^ as yet, but it is worth save for the entire boundary. This is a topology
noting that if P is the assigned future endpoint of c, bearing no relation at all to that of any embedding.
then I (P) = I [c]; this is at least the correct causal
behavior for a putative future endpoint of c.
Future Causal Boundary
We can perform all the operations above in the
time-dual manner, obtaining the past causal bound- Construction An alternative approach, initiated by

ary @(M), consisting of terminal indecomposable Harris (1998), is to forego the full causal boundary
future sets (TIFs), and the past causal completion and concentrate only on M ^ and M  separately. There
M 
 = @(M) [ M. The full causal boundary of M is an advantage to this in that the process of future
consists of the union of @(M)^ 
with @(M) with some causal completion – that is to say, forming M ^ from
sort of identifications to be made. M – can be made functorial in an appropriate
As an example of the need for identifications, category of ‘‘chronological sets’’: a set X with a
consider M to be L2 with a closed timelike line relation which is transitive and antireflexive such
segment deleted, say M = L2  {(0, t) j 0  t  1}. that it possesses a countable subset S which is
^
For @(M), we have first the boundary elements at ‘‘chronologically dense,’’ that is, for any x, y 2 X,
infinity: the TIP iþ = M (the past of the positive time there is some s 2 S with x s y. Any strongly
axis) and the set of TIPs making up Iþ (the pasts of causal spacetime M is a chronological set, as is M. ^
null lines going out to infinity in L2 ); and then, the The entire construction of the future causal bound-
boundary elements coming from the deleted points: ary works just as well for a chronological set. The
for each t with 0 < t  1, two IPs emanating from role of a timelike curve in a chronological set is
(0, t), that is, Pþ t , the past of the null line going taken by a future chain: a sequence c = {xn } with
pastwards from (0, t) toward x > 0, and P t , the past xn xnþ1 for all n. For any future chain c, I [c] is an
of the null line going pastwards from (0, t) toward IP, and any IP can be so expressed; but unlike in
x < 0; and P0 , emanating from (0, 0), that is, the spacetimes, I (x) may or may not be an IP for x 2 X.
past of the negative time axis. Similarly, @(M)  Then, X ^ is always future complete in the sense that
  þ  ^ there is an element  2 X ^
consists of i , I , TIFs Ft and Ft emanating from for any future chain c in X,
 
(0, t) for 0  t < 1, and the TIF F1 emanating from with I () = I [c]: for instance, if the chain c lies in
(0, 1). We probably want to make at least the X but there is no x 2 X with I (x) = I [c], just let
following identifications for each t with 0 < t < 1, ^
 = I [c], which is an element of @(X). This yields a
þ þ
Pþt Ft
þ
and P  
t Ft ; P1 F1 P1 ; and F0 functor of future completion from the category of

P0 F0 . This results in a two-sided replacement chronological sets to the category of future-complete
for the deleted segment; for some purposes, it might chronological sets, and the embedding X ! X ^ is a
be deemed desirable to identify the two sides as one, universal object in the sense of the category theory;
but a universal boundary is probably a good idea, this implies that it is categorically unique and is the
leaving further identifications as optional quotients minimal future-completion process.
of the universal object. However, it is crucial to have more than the
How best to define the appropriate identifications chronology relation operating in what is to be a
in general is a matter of some controversy. GKP boundary; topology of some sort is needed. This is
defined a somewhat complicated topology on accomplished by defining what might be called the
M ^
 = @(M) 
[ @(M) [ M, then used an identification future-chronological topology for any chronological
intended to result in a Hausdorff space. There are set – including for M ^ when M is a strongly causal
significant problems with this approach in some spacetime. This topology is defined by means of a
outré spacetimes, as pointed out by Budic and Sachs limit-operator L ^ on sequences: if X is the chron-
(1974) and Szabados (1989), both of whom recom- ological set, then for any sequence of points  = {xn }
mended a different set of identifications. But what is ^
in X, L() denotes a subset of X which is the set of
Boundaries for Spacetimes 331

limits of . It is explicitly recognized that there may


be more than one limit of a sequence, as the space
may not be Hausdorff; no attempt is made to
remove any non-Hausdorffness, as this is viewed as x
giving important information on how, possibly,
two points in the future causal boundary represent
P
very similar and yet not identical pieces of z
information about the causal structure at infinity. I –(x)

Once the limit operator is in place, the actual


^ n g): there is some IP P ) I  (x) such that for
Figure 7 x 62 L(fx
topology on X is defined thus: a subset A  X is
all z 2 P, z xn for infinitely many n.
said to be closed if and only if for any sequence
^
  A, L()  A (and open sets are complements of
closed sets). This yields the elements of L() ^ as ^
in M, the point I [c] in @(M) is the topological
topological limits of . endpoint of c in M. ^
The definition of L ^ is simplest when X has the cn , then X is homeomorphic to the conformal
5. If X = L
property that I (x) is an IP for any x 2 X; as this is image of Ln in En together with Iþ and iþ ; in
true for X being either a spacetime M or the future ^ n ) has the topology of a cone.
particular, @(L
causal completion M ^ of a spacetime, the discussion
here is restricted to this situation. Let us also make Examples The future causal boundary with the
the common assumption that X is past-distinguishing, future-chronological topology can be calculated
that is, I (x) = I (y) implies x = y. with a fair degree of success. For instance, if M
Let  = {xn } be a sequence of points in a past- is conformal to a simple product spacetime Q  L1
distinguishing chronological set X in which the past (Q a Riemannian manifold), then @(M) ^ is much
of any point is an IP. Then L() ^ consists of those ^ n
like @(L ) in that it consists of null or timelike
points x for which (see Figures 6 and 7) lines factored over a particular boundary construc-
tion @(Q) on Q, coming together at a single point iþ
1. for all y 2 I (x), for n sufficiently large, y xn , (the IP which is all of M); if Q is complete, then
and these are all null lines, and together they may be
2. for any IP P ) I (x), there is some z 2 P such that called Iþ .
for n sufficiently large, z 6 xn . The elements of @(Q) are defined in terms of the
Then the future-chronological topology on X has Lipschitz-1 functions on Q known as Busemann
these features: functions: if c : [, !) ! Q is any endless unit-speed
curve (typically, ! = 1), then the Busemann function
1. It is a T1 topology, that is, points are closed. bc : Q ! R is defined by bc (q) = lims!! (s  d(c(s), q)),
2. If I (x) = I [c] for a future chain c = {xn }, then x where d is the distance function in Q; this function
is a topological limit of the sequence {xn }. is either finite for all q or infinite for all q. The set
3. If X = M, a strongly causal spacetime, then the B(Q) of finite Busemann functions has an R-action
future-chronological topology is precisely the defined by a  bc = bac , where (a  c)(s) = c(s þ a).
manifold topology. Then @(Q) = B(Q)=R. For any P 2 @(M), ^ the
4. If X = M, ^ the future causal completion of a 1
boundary of P, as a subset of Q  L ffi Q  R, is
strongly causal spacetime M, then the induced the graph of a Busemann function (the function is
topology on M is the manifold topology, @(M) ^ is
^ ^ bc for P generated by a null curve projecting to c);
a closed subset of M, and M is dense in M. As per and a point x = (q, t) in M can be represented by
property (2), for any future-endless causal curve c @(I (x)), which is the graph of the function
t  d(, q). Thus, one could use the function-
space topology on B(Q) to topologize M; ^ in that
^
function-space topology @(M) is a cone on @(Q),
and M,^ apart from iþ , is the topological product of
Xn X
Z R with Q [ @(Q). The future-chronological topol-
ogy is sometimes different from the function-space
P topology, allowing more convergent sequences
y
than the function-space topology does. When this
I –(x) happens, the result is non-Hausdorff, revealing
^ n g): for all y 2 I  (x ), eventually y xn , and for pairs of points in @(M) ^ which are more closely
Figure 6 x 2 L(fx
all IP P ) I  (x ), there is some z 2 P such that eventually z 6 xn : related to one another than the function-space
332 Boundaries for Spacetimes

topology reveals; but it is still the case that @(M), ^ formed of TIPs and TIFs, plus any TIP or TIF that
apart from iþ , is fibered by R over @(Q). cannot be paired; this produces an appropriate set of
If Q is a warped product Q = (a, b)  K for a ^
identifications within @(M) 
[ @(M). The chronology
compact manifold K with metric dr2 þ e(r) h with h relation on M is extended to M 
 = @(M) [ M by treating
a metric on K, then one can calculate more precisely: each point x in M as the Szabados pair (I (x), Iþ (x)) and
if, for instance,  has a minimum in the interior of each unpaired IP P as (P, ;) and unpaired IF F as (;, F),
(a, b) and has suitable growth on either end, then and then defining (P, F) (P0 , F0 ) whenever
@(Q) represents two copies of K (one for each end of F \ P0 6¼ ;.
(a, b)  K), the future-chronological topology is the The resulting chronological set is not necessarily
same as the function-space topology, and M ^ (apart either past- or future-distinguishing, but it is (past and
from iþ ) is a simple product of R with Q [ @(Q): future)-distinguishing. The topology they propose
^
@(M) is precisely a null cone over two copies of K. places endpoints in @(M) for all causal curves which
This applies, for instance, to exterior Schwarzschild, are endless in M, but there may be multiple future
where K = S2 ; the boundary at one end of exterior endpoints for a single future-endless curve. The
Schwarzschild is the usual Iþ , and the boundary at topology need not be T1 : points can fail to be closed.
the other end is the null cone {r = 2m}, where For a product spacetime M = Q  L1 , the Marolf–Ross
exterior attaches to interior Schwarzschild. topology on M  is always the function-space topology.
Calculations for the future-chronological topology As of this writing, there is active research by J L Flores
become much easier when @(M) ^ is purely spacelike, ^
to institute a Marolf–Ross type of identification of @(M)
^
that is, no P 2 @(M) is contained in the past of any 
with @(M) using a topology that partakes more of the
other element of M. ^ For instance, if M is conformal future- and past-chronological topologies.
to a multiwarped product, Q1      Qm  (a, b)
with metric f1 (t)2 h1 þ    þ fm (t)2 hm  dt2 , where hi See also: Asymptotic Structure and Conformal Infinity;
is a Riemannian metric on Qi , then @(M) ^ will be Spacetime Topology, Causal Structure and Singularities.
purely spacelike if all theR Riemannian factors are
b
complete and for each i, b 1=fi (t) dt < 1; in that
^
case, @(M) ffi Q, where Q = Q1      Qm and
^ ffi Q  (a, b). This applies, for instance, to inter- Further Reading
M
ior Schwarzschild, where Q1 = R 1 and Q2 = S2 , Budic R and Sachs RK (1974) Causal boundaries for general relativistic
yielding the topology of R 1  S2 for the Schwarzs- space-times. Journal of Mathematical Physics 15: 1302–1309.
Garcı́a-Parrado A and Senovilla JMM (2003) Causal relationship:
child singularity.
a new tool for the causal characterization of Lorentzian
There is a categorical universality for spacelike manifolds. Classical and Quantum Gravity 20: 625–664.
boundaries and the future-chronological topology. Geroch RP, Kronheimer EH, and Penrose R (1972) Ideal points
This means that any other reasonable way of in space-time. Proceedings of the Royal Society of London,
future-completing interior Schwarzschild must yield Series A 327: 545–567.
Harris SG (1998) Universality of the future chronological
R1  S2 or a topological quotient of that for the
boundary. Journal of Mathematical Physics 39: 5427–5445.
singularity; and if the result is to be past-distinguishing, Harris SG (2000) Topology of the future chronological boundary:
R1  S2 is the only possibility. universality for spacelike boundaries. Classical and Quantum
Of course, all this can be done in the time-dual Gravity 17: 551–603.
fashion, using the past-chronological topology on Harris SG (2001) Causal boundary for standard static spacetimes.
 It would be desirable to combine the future and Nonlinear Analysis 47: 2971–2981 (Special Edition: Proceed-
M.
ings of the Third World Congress in Nonlinear Analysis).
past causal boundaries with a suitable topology as Harris SG (2004a) Boundaries on spacetimes: an outline. Classical
well as appropriate identifications. There has been and Quantum Gravity 359: 65–85.
some work in that direction. Harris SG (2004b) Discrete group actions on spacetimes: causality
conditions and the causal boundary. Classical and Quantum
Gravity 21: 1209–1236.
Causal Boundary: Revisited Harris SG and Dray T (1990) The causal boundary of the trousers
space. Classical and Quantum Gravity 7: 149–161.
Marolf and Ross (2003) have proposed an identification Hawking SW and Ellis GFR (1973) The Large Scale Structure of
of TIPs and TIFs that relies on the equivalence relation Space-Time. Cambridge: Cambridge University Press.
defined by Szabados. For an IP P and IF F, call (P, F) a Marolf D and Ross SF (2003) A new recipe for causal
completions. Classical and Quantum Gravity 20: 4085–4118.
Szabados pair if P  I (x) for all x 2 F, P is maximal Schmidt BG (1972) Local completeness of the b-boundary.
among IPs for that property, and dually for F with Communications in Mathematical Physics 29: 49–54.
respect to P. For instance, for any x 2 M, (I (x), Iþ (x)) Scott SM and Szekeres P (1994) The abstract boundary – a new
is a Szabados pair. The Marolf–Ross version of the approach to singularities of manifolds. Journal of Geometry

causal boundary, @(M), consists of all Szabados pairs and Physics 13: 223–253.
Boundary Conformal Field Theory 333

Szabados LB (1988) Causal boundary for strongly causal space- Wald RM (1984) General Relativity. Chicago: University of
times. Classical and Quantum Gravity 5: 121–134. Chicago Press.
Szabados LB (1989) Causal boundary for strongly causal space-
times: II. Classical and Quantum Gravity 6: 77–91.

Boundary Conformal Field Theory


J Cardy, Rudolf Peierls Centre for Theoretical In two dimensions, it is useful to use the so-called
Physics, Oxford, UK complex coordinates z = x1 þ ix2 , z̄ = x1  ix2 . In
ª 2006 Elsevier Ltd. All rights reserved. CFT, there are local densities j (z, z̄), called primary
fields, whose correlation functions transform covar-
iantly under conformal mappings z ! z0 = f (z):
Boundary conformal field theory (BCFT) is simply h1 ðz1 ; z1 Þ2 ðz2 ; z2 Þ   i
the study of conformal field theory (CFT) in Y j    
h
domains with a boundary. It gains its significance ¼ f 0 ðzj Þhj f 0 ðzj Þ h1 z01 ; z01 2 z02 ; z02   i ½1
[1] because, in some ways, it is mathematically i
simpler: the algebraic and geometric structures of where (hj , h̄j ) (usually real numbers, not complex
CFT appear in a more straightforward manner; and conjugates of each other) are called the conformal
[2] because it has important applications: in string weights of j . These local fields can in general be
theory in the physics of open strings and D-branes, normalized so that their two-point functions have
and in condensed matter physics in boundary critical the form
behavior and quantum impurity models.

This article, however, describes the basic ideas hj ðzj ; zj Þk ðzk ; zk Þi ¼ jk =ðzj  zk Þ2hj ðzj  zk Þ2hj ½2
from the point of view of quantum field theory,
without regard to particular applications or to any They satisfy an algebra known as the operator
deeper mathematical formulations. product expansion (OPE)
i ðz1 ; z1 Þ  j ðz2 ; z2 Þ
X
Review of CFT ¼ cijk ðz1  z2 Þhi hj þhk
k
Stress Tensor and Ward Identities   
 ðz1  z2 Þhi hj þhk k ðz1 ; z1 Þ þ    ½3
Two-dimensional CFTs are massless, local, relati-
vistic renormalized quantum field theories. which is supposed to be valid when inserted into
Usually they are considered in imaginary time, higher-order correlation functions in the limit when
that is, on two-dimensional manifolds with jz1  z2 j is much less than the separations of all the
Euclidean signature. In this article, the metric is other points. The ellipses denote the contributions of
also taken to be Euclidean, although the formula- other nonprimary scaling fields to be described
tion of CFTs on general Riemann surfaces is also below. The structure constants cijk , along with the
of great interest, especially for string theory. For conformal weights, characterize the particular CFT.
the time being, the domain is the entire complex An essential role is played by the energy–
plane. momentum tensor, or, in Euclidean field theory
Heuristically, the correlation functions of such a language, the stress tensor T
. Heuristically, it is
field theory may be thought of as being given by defined as the response of the partition function to
the Euclidean path integral, that is, as expectation a local change in the metric:
values of products of local densities with respect
T
ðxÞ ¼ ð2Þ ln Z= g
ðxÞ ½4
to a Gibbs measure Z1 eSE ({ }) [d ], where the
{ (x)} are some set of fundamental local fields, SE (the factor of 2 is included so that similar factors
is the Euclidean action, and the normalization disappear in later equations).
factor Z is the partition function. Of course, such The symmetry of the theory under translations
an object is not in general well defined, and this and rotations implies that T
is conserved,
picture should be seen only as a guide to @
T
= 0, and symmetric. Scale invariance implies
formulating the basic principles of CFT which that it is also traceless  T

= 0. It should be
can then be developed into a mathematically noted that the vanishing of the trace of the stress
consistent theory. tensor for a scale invariant classical field theory does
334 Boundary Conformal Field Theory

not usually survive when quantum corrections are c 0


TðzÞ ! f 0 ðzÞ2 Tðz0 Þ þ fz ; zg ½9
taken into account: indeed,  / (g), the renorma- 12
lization group (RG) beta-function. A quantum field where {z0 , z} = (f 000 f 0  32 f 00 2)=f 0 2 is the Schwartzian
theory is thus only a CFT when this vanishes, that is, derivative.
at an RG fixed point. In complex coordinates, the
components Tzz̄ = Tz̄z = 4 vanish, while the con- Virasoro Algebra
servation equations read
As with any quantum field theory, the local fields
@ z Tzz ¼ @z Tzz ¼ 0 ½5 can be realized as linear operators acting on a
Hilbert space. In ordinary QFT, it is customary to
Thus, correlators of T(z)  Tzz are locally analytic
quantize on a constant-time hypersurface. The
(in fact, globally meromorphic) functions of z, while
 generator of infinitesimal time translations is the
those of T(z̄)  Tz̄z̄ are antianalytic. It is this
Hamiltonian Ĥ, which itself is independent of
property of analyticity which makes CFTs tractable
which time slice is chosen, because of time
in two dimensions.
translational symmetry. It is also given by the
Since an infinitesimal conformal transformation
integral over the hypersurface of the time–time
z ! z þ (z) induces a change in the metric, its effect
component of the stress tensor. In CFT, because of
on a correlation function of primary fields, given by [1],
scale invariance, one may instead quantize on fixed
may also be expressed through an appropriate integral
circle of a given radius. The analog of the
involving an insertion of the stress tensor. This leads to
Hamiltonian is the dilatation operator D̂, which
the conformal Ward identity:
Z generates scale transformations. Unlike Ĥ, the
Y spectrum of D̂ is usually discrete, even in an
hTðzÞ j ðzj ; zj Þi ðzÞ dz
C j
infinite system. It may also be expressed as an
X D Y E integral over the radial component of the stress
¼ hj 0 ðzj Þ þ ðzj Þð@=@zj Þ j ðzj ; zj Þ ½6 tensor:
j j Z 2
where C is a contour encircling all the points {zj }. ^ ¼ 1
D rT ^ rr rd
 2 0
(A similar equation holds for the insertion of T.) Z Z
1 ^ 1 ^ zÞdz
Using Cauchy’s theorem, this determines the first ¼ zTðzÞdz  zTð
few terms in the OPE of T with any primary density: 2i C 2i C
L ^0 þ L^ ½10
0
hj
TðzÞ  j ðzj ; zj Þ ¼ ðzj ; zj Þ
ðz  zj Þ2 where, because of analyticity, C can be any contour
1 encircling the origin.
þ @z ðzj ; zj Þ þ Oð1Þ ½7 This suggests that one define other operators
z  zj j Z
^ 1 ^
The other, regular, terms in the OPE generate new Ln  znþ1 TðzÞdz ½11
2 C
scaling fields, which are not in general primary,
and similarly the L ^ . From the OPE [8] then follows
called descendants. One way of defining a density to n
be primary is by the condition that the most singular the Virasoro algebra V:
term in its OPE with T is a double pole.
^ n; L
½L ^ m  ¼ ðn  mÞL^ nþm þ c nðn2  1Þnþm;0 ½12
The OPE of T with itself has the form 12
with an isomorphic algebra V  generated by the L ^
 n.
c=2 2
TðzÞ  Tðz1 Þ ¼ 4
þ Tðz1 Þ þ    ½8 In radial quantization, there is a vacuum state j0i.
ðz  z1 Þ ðz  z1 Þ2
Acting on this with the operator corresponding to a
The first term is present because hT(z)T(z1 )i is scaling field gives a state jj i  ^j (0, 0)j0i which is
nonvanishing, and must take the form shown, with c an eigenstate of D̂: in fact,
being some number (which cannot be scaled to ^ 0 jj i ¼ hj jj i;
L ^ j i ¼ h
L j jj i ½13
0 j
unity, since the normalization of T is fixed by its
definition) which is a property of the CFT. It is From the OPE [7], one sees that jLn j i / L̂n jj i,
known as the conformal anomaly number or the and, if j is primary, L̂n jj i = 0 for all n  1.
central charge. This term implies that T is not itself The states corresponding to a given primary field,
primary. In fact, under a finite conformal transfor- and those generated by acting on these with all the
mation z ! z0 = f (z), L̂n with n < 0 an arbitrary number of times, form a
Boundary Conformal Field Theory 335

highest-weight representation of V. However, this is This is related to the (punctured) plane by the
not necessarily irreducible. There may be null conformal mapping z ! (1=2) ln z  t þ ix. The
vectors, which are linear combinations of states at result is a QFT on the circle 0
x < 1, in
a given level which are themselves annihilated by all imaginary time t. The generator of infinitesimal
the L̂n with n > 0. They exist whenever h takes a time translations is related to that for dilatations in
value from the Kac table: the plane:

ðrðm þ 1Þ  smÞ2  1 ^ ¼ 2D


H ^  c
h ¼ hr;s ¼ ½14 6
4mðm þ 1Þ
¼ 2ðL ^ Þ  c
^0 þ L ½18
0
with the central charge parametrized as c = 1  6= 6
(m (m þ 1)), and r, s are non-negative integers. These where the last term comes from the Schwartzian
null states should be projected out, giving an derivative in [9]. Similarly, the generator of transla-
irreducible representation V h . tions in x, the total momentum operator, is
The full Hilbert space of the CFT is then ^ ).
P̂ = 2(L̂0  L 0
M A general torus is, up to a scale transformation, a
H¼ 
nh;h V h  V ½15
h

parallelogram with vertices (0, 1,
, 1 þ
) in the
h;h
complex plane, with the opposite edges identified.
where the non-negative integers nh, h̄ specify how We can make this by taking a cylinder of unit
many distinct primary fields of weights (h, h̄) there circumference and length Im,
, twisting the ends by
are in the CFT. a relative amount Re
, and sewing them together.
The consistency of the OPE [3] with the existence This means that the partition function of the CFT on
of null vectors leads to the fusion algebra of the the torus can be written as
CFT. This applies separately to the holomorphic and ^ ^
antiholomorphic sectors, and determines how many Zð
;
Þ ¼ tr eðIm
ÞHþiðIm
ÞP
copies of V c occur in the fusion of V a and V b : ^ ^
¼ tr qL0 c=24 q
L0 c=24 ½19
X
c
Va Vb ¼ Nab Vc ½16 using the above expressions for Ĥ and P̂ and
c
c
introducing q  e2i
.
where the Nab are non-negative integers. Through the decomposition [15] of H, the trace
A particularly important subset of all CFTs sum can be written as
consists of the minimal models. These have rational X
central charge c = 1  6(p  q)2 =pq, in which case Zð
;
Þ ¼ nh;h h ðqÞ h ðqÞ ½20
the fusion algebra closes with a finite number of 
h;h
possible values 1
r
q, 1
s
p in the Kac
where
formula [14]. For these models, the fusion algebra
^ X
takes the form h ðqÞ  trV h qL0 c=24 ¼ dh ðNÞqhðc=24ÞþN ½21
0 0 N
X
r1 þr X
2 1 s1 þs2 1

V r1 ;s1 V r2 ;s2 ¼ V r;s ½17 is the character of the representation of highest weight
r¼jr1 r2 j s¼js1 s2 j h, which counts the degeneracy dh (N) at level N. It is
where the prime on the sums indicates that they are purely an algebraic property of the Virasoro algebra,
to be restricted to the allowed intervals of r and s. and its explicit form is known in many cases.
There is an important theorem which states that All of this would be less interesting were it not
the only unitary CFTs with c < 1 are the mini- for the observation that the parametrization of the
mal models with p=q = (m þ 1)=m, where m is an torus through
is not unique. In fact, the
integer 3. transformations S :
! 1=
and T :
!
þ 1
give the same torus (see Figure 1). Together, these

Modular Invariance
τ
The fusion algebra limits which values of (h, h̄)
might appear in a consistent CFT, but not which –1/τ
ones actually occur, that is, the values of the nh, h̄ .
This is answered by the requirement of modular
invariance on the torus. First consider the theory on 0 1 0 1
an infinitely long cylinder, of unit circumference. Figure 1 Two equivalent parametrizations of the same torus.
336 Boundary Conformal Field Theory

operations generate the modular group SL(2, Z), half plane. The conformal Ward identity, cf. [7],
and the partition function Z(
,
) should be now reads
invariant under them. T-invariance is simply imple- D Y E
mented by requiring that h  h̄ is an integer, but TðzÞ j ðzj ; zj Þ
the S-invariance of the right-hand side of [20] j

places highly nontrivial constraints on the nh, h̄ . X hj 1


That this can be satisfied at all relies on the ¼ 2
þ @z
j ðz  zj Þ z  zj j
remarkable property of the characters that they !
transform linearly under S: j
h 1 DY E
X 0 þ 2
þ @zj j ðzj ; zj Þ ½24
ðz  zj Þ z  zj
h ðe2i=
Þ ¼ Shh h0 ðe2i
Þ ½22 j
h0
In radial quantization, in order that the Hilbert
This follows from applying the Poisson sum formula spaces defined on different hypersurfaces be equiva-
to the explicit expressions for the characters, which lent, one must choose semicircles centered on some
are related to Jacobi theta-functions. In many cases point on the boundary, conventionally the origin.
(e.g., the minimal models) this representation is The dilatation operator is now
finite dimensional, and the matrix S is symmetric Z Z
^ 1 ^ 1 ^ zÞ dz
and orthogonal. This means that one can immedi- D¼ zTðzÞdz  zTð ½25
ately obtain a modular invariant partition function 2i S 2i S
by forming the diagonal sum where S is a semicircle. Using the conformal
X boundary condition, this can also be written as
Z¼ h ðqÞ h ðq
Þ ½23 Z
h
^ ^ 1 ^
D ¼ L0 ¼ zTðzÞ dz ½26
so that nh, h̄ = hh̄ . However, because of various 2i C
symmetries of the characters, other modular invariants where C is a complete circle around the origin. As
are possible: for the minimal models (and some others) before, one may similarly define the L̂n , and they
these have been classified. Because of an analogy of the satisfy a Virasoro algebra.
results with the classification of semisimple Lie Note that there is now only one Virasoro algebra.
algebras, the diagonal invariants are called the A-series. This is related to the fact that conformal mappings
which preserve the real axis correspond to real
Boundary CFT analytic functions. The eigenstates of L̂0 correspond
to boundary operators ^j (0) acting on the vacuum
In any field theory in a domain with a boundary, state j0i. It is well known that in a renormalizable
one needs to consider how to impose a set of QFT operators at the boundary require a different
consistent boundary conditions. Since CFT is for- renormalization from those in the bulk, and this will
mulated independently of a particular set of funda- in general lead to a different set of conformal
mental fields and a Lagrangian, this must be done in weights. It is one of the tasks of BCFT to determine
a more general manner. A natural requirement is these, for a given allowed boundary condition.
that the off-diagonal component Tk? of the stress However, there is one feature unique to boundary
tensor parallel/perpendicular to the boundary should CFT in two dimensions. Radial quantization also
vanish. This is called the conformal boundary makes sense, leading to the same form [26] for the
condition. If the boundary is parallel to the time dilation operator, if the boundary conditions on the
axis, it implies that there is no momentum flow negative and positive real axes are different. As far as
across the boundary. Moreover, it can be argued the structure of BCFT goes, correlation functions with
that, under the RG, any uniform boundary condi- this mixed boundary condition behave as though a
tion will flow into a conformally invariant one. For local scaling field were inserted at the origin. This has
a given bulk CFT, however, there may be many led to the term ‘‘boundary condition changing (bcc)
possible distinct such boundary conditions, and it is operator,’’ but it must be stressed that these are not
one task of BCFT to classify these. local operators in the conventional sense.
To begin with, take the domain to be the upper-
half plane, so that the boundary is the real axis. The
conformal boundary condition then implies that
 z) when z is on the real axis. This has the The Annulus Partition Function
T(z) = T(
immediate consequence that correlators of T  are Just as consideration of the partition function on the
those of T, analytically continued into the lower- torus illuminates the bulk operator content nh, h̄ , it
Boundary Conformal Field Theory 337

Note that Ĥ is the same Hamiltonian that appears in


[18], and the boundary states lie in H, [15].
How are these boundary states to be character-
a b δ ized? Using the transformation law [9] the
conformal boundary condition applied to the
circle implies that Ln = L  n . This means that
any boundary state jBi lies in the subspace
1 satisfying
Figure 2 The annulus, with boundary conditions a and b on
either boundary. L ^ jBi
^ n jBi ¼ L ½31
n

Moreover, because of the decomposition [15] of


turns out that consistency on the annulus helps H, jBi is also some linear superposition of states from
classify both the allowed boundary conditions, and Vh  V  . This condition can therefore be applied in

the boundary operator content. To this end, con- each subspace. Taking n = 0 in [31] constrains h̄ = h.
sider a CFT in an annulus formed of a rectangle of For simplicity, consider only the diagonal CFTs with
unit width and height , with the top and bottom nh, h̄ = h, h̄ . It can then be shown that the solution
edges identified (see Figure 2). The boundary of [31] is unique and has the following form.
conditions on the left and right edges, labeled by The subspace at level N of V h has dimension
a, b, . . . , may be different. The partition function dh (N). Denote an orthonormal basis by jh, N ; ji,
with boundary conditions a and b on either edge is with 1
j
dh (N), and the same basis for V  h by
denoted by Zab (). jh, N ; ji. The solution to [31] in this subspace is
One way to compute this is by first considering then
the CFT on an infinitely long strip of unit width. X1 dX
h ðNÞ

This is conformally related to the upper-half plane jhii  jh; N; ji  jh; N; ji ½32
N¼0 j¼1
(with an insertion of bcc operators at 0 and 1 if
a 6¼ b) by the mapping z ! (1=) ln z. The gen- These are called Ishibashi states. Matrix elements of
erator of infinitesimal translations along the strip is the translation operator along the cylinder between
them are simple:
^ ab ¼ D
H ^  c=24 ¼ L
^ 0  c=24 ½27
^
hhh0 jeH= jhii
Thus, for the annulus, 0
1 dhX
X 0 ðN Þ 1 dh ðNÞ
X X
Zab ðÞ ¼ tr e
^ ab
 H
¼ tr q
^ 0 c=24
L
½28 ¼ hh0 ; N 0 ; j0 j
N 0 ¼0 j0 ¼1 N¼0 j¼1

with q  e . As before, this can be decomposed ^ ^
 hh0 ; N 0 ; j0 jeð2=ÞðL0 þL0 c=12Þ ½33
into characters:
X
Zab ðÞ ¼ nab
h h ðqÞ ½29 jh; N; ji  jh; N; ji
h
1 dX
X h ðNÞ

but note that now the expression is linear. The non- ¼  h0 h eð4=ÞðhþNðc=24ÞÞ ½34
N¼0 j¼1
negative integers nhab give the operator content with
the boundary conditions (ab): the lowest value of h
with nhab > 0 gives the conformal weight of the bcc ¼ h0 h h ðe4= Þ ½35
operator, and the others give conformal weights of
Note that the characters which appear are
the other allowed primary fields which may also sit
related to those in [29] by the modular transfor-
at this point.
mation S.
On the other hand, the annulus partition function
The physical boundary states satisfying [29],
may be viewed, up to an overall rescaling, as the
sometimes called the Cardy states, are linear
path integral for a CFT on a circle of unit
combinations of the Ishibashi states:
circumference, being propagated for (imaginary)
time 1 . From this point of view, the partition X
jai ¼ hhhjaijhii ½36
function is no longer a trace, but rather the matrix h
element of eĤ= between boundary states:
^
Equating the two different expressions [29] and [30]
Zab ðÞ ¼ hajeH= jbi ½30 for Zab , and using the modular transformation law
338 Boundary Conformal Field Theory

[22] and the linear independence of the characters from which one finds the allowed boundary states
gives the (equivalent) conditions:    
 
X ~ ¼ p1ffiffiffi j0ii þ p1ffiffiffi 1 þ 1  1
j0i ½43
nhab ¼ Shh0 hajh0 iihhh0 jbi ½37 2 2 2 2  16
1=4
h0
X  +    
1
~ 1  1 1  1
0
hajh0 iihhh0 jbi ¼ Shh nhab ½38 1
 ¼ p ffiffiffi j0ii þ pffiffiffi  1=4  ½44
h 2 2 2 2 2 16
These are called the Cardy conditions. The require-  +
ments that the right-hand side of [37] should give a 1  
~ 1
non-negative integer, and that the right-hand side of  ¼ j0ii   ½45
16 2
[38] should factorize in a and b, give highly
nontrivial constraints on the allowed boundary
states and their operator content. The nontrivial part of the fusion algebra of this
For the diagonal CFTs considered here (and for CFT is
the nondiagonal minimal models) a complete solu-
V 161 V 161 ¼ V 0 þ V 12 ½46
tion is possible. It can be shown that the elements Sh0
of S are all non-negative, so one may choose
~ = (Sh )1=2 . This defines a boundary state
hhhj0i V 1 V1 ¼ V 1 ½47
0 16 2 16

X 1=2
~ 
j0i Sh0 jhii ½39 V1 V1 ¼ V0 ½48
2 2
h
from which can be read off the boundary operator
and a corresponding boundary condition such that
content
nh00 = h0 . Then, for each h0 6¼ 0, one may define a
boundary state 1 1 1
nhh~ ¼ 1 n01~ 1~ ¼ n21~ 1~ ¼ n21~ ~
1
¼ n116~ ¼ 1
~ 1 ½49
hhhjh~0 i  Sh0 =ðSh Þ1=2
h 0 ½40 16 16 16 16 16 16 2 16

From [37], this gives nhh0 0


=  . For each allowed h0 The c = 12 CFT is known to describe the continuum limit
h0 h
in the torus partition function, there is therefore a of the critical Ising model, in which spins s = 1 are
boundary state jh~0 i satisfying the Cardy conditions. localized on the sites of a regular lattice. The above
However, there is a further requirement: boundary conditions may be interpreted as the con-
tinuum limit of the lattice boundary conditions s =1,
Shh0 Shh00 free and s = 1, respectively. Note there is a symmetry
nhh0 h00 ¼ ½41
Sh0 of the fusion rules which means that one could
equally well have inverted the ordering of this
should be a non-negative integer. Remarkably, this
correspondence.
combination of elements of S occurs in the Verlinde
formula, which follows from considering consis-
tency of the CFT on the torus. This states that the
right-hand side of [41] is equal to the fusion algebra Other Topics
coefficient Nhh0 h00 . Since these are non-negative
Boundary Entropy
integers, the consistency of the above ansatz for the
boundary states is consistent. The partition function on annulus of length L and
We conclude that, at least for the diagonal models, circumference  can be thought of as the quantum
there is a bijection between the allowed primary fields statistical mechanics partition function for a one-
in the bulk CFT and the allowed conformally invariant dimensional QFT in an interval of length L, at
boundary conditions. For the minimal models, with a temperature  1 . It is interesting to consider this
finite number of such primary fields, this correspon- in the thermodynamic limit when  = L= is large. In
dence has been followed through explicitly. that case, only the ground state of Ĥ contributes in
[30], giving
Example The simplest example is the diagonal c = 12
unitary CFT corresponding to m = 3. The allowed Zab ðL; Þ haj0ih0jbiecL=6 ½50
values of the conformal weights are h = 0, 12 , 16
1
, and
0 1 1 pffiffi
1 1 from which the free energy Fab = 1 ln Zab and
B 1
2 2 2 the entropy S ab = 2 (@Fab =@) can be obtained.
1 1ffiffi C
S¼@ 2 2  p
2A
½42 The result is
p1ffiffi  p1ffiffi 0
2 2 S ab ¼ ðc=3ÞL þ sa þ sb þ oð1Þ ½51
Boundary Conformal Field Theory 339

where the first term is the usual extensive contribu- Kac–Moody algebras via the coset construction. The
tion. The other two pieces sa  ln (haj0i) and sb  classification of boundary conditions from this point
ln (hbj0i) may be identified as the boundary entropy of view is fruitful and also important for applica-
associated with the corresponding boundary states. tions, but is beyond the scope of this article.
A similar definition may be made in massive QFTs.
It is an unproven but well-verified conjecture that Stochastic Loewner Evolution
the boundary entropy is a nonincreasing function
In recent years, there has emerged a deep connection
along boundary RG flows, and is stationary only for
between BCFT and conformally invariant measures
conformal boundary states.
on curves in the plane which start at a boundary of a
Bulk–Boundary OPE domain. These arise naturally in the continuum limit
of certain statistical mechanics models. The measure
The boundary Ward identity [24] has the implica- is constructed dynamically as the curve is extended,
tion that, from the point of view of the dependence using a sequence of random conformal mappings
of its correlators on zj and zj , a primary field called stochastic Loewner evolution (SLE). In CFT,
j (zj , zj ) may be thought of as the product of two the point where the curve begins can be viewed as
local fields which are holomorphic functions of zj the insertion of a boundary operator. The require-
and zj , respectively. These will satisfy OPEs as jzj  ment that certain quantities should be conserved in
zj j ! 0, with the appearance of primary fields on the mean under the stochastic process is then equivalent
right-hand side being governed by the fusion rules. to this operator having a null state at level two.
These fields are localized on the real axis: they are Many of the standard results of CFT correspond to
the boundary operators. There is therefore a kind of an equivalent property of SLE.
bulk–boundary OPE:
X 
j ðzj ; zj Þ ¼ djk ðIm zj Þhj hj þhk bk ðRe zj Þ ½52 Acknowledgments
k
This article was written while the author was a
where the sum on the right-hand side is, in principle,
member of the Institute for Advanced Study. He
over all the boundary fields consistent with the
thanks the School of Mathematics and the School of
boundary condition, and the coefficients djk are
Natural Sciences for their hospitality. The work was
analogous to the OPE coefficients in the bulk. As
supported by the Ellentuck Fund.
before, they are nonvanishing only if allowed by the
fusion algebra: a boundary field of conformal weight See also: Affine Quantum Groups; Eight Vertex and Hard
hk is allowed only if Nhhkh̄ > 0. Hexagon Models; Indefinite Metric; Operator Product
j j
For example, in the c = 12 CFT, the bulk operator Expansion in Quantum Field Theory; Quantum Phase
1
with h = h̄ = 16 goes over into the boundary opera- Transitions; Stochastic Loewner Evolutions; String Field
tor with h = 0, or that with h = 12 , depending on the Theory; Superstring Theories; Symmetries in Quantum
boundary condition. The bulk operator with Field Theory: Algebraic Aspects; Two-Dimensional
h = h̄ = 12 , however, can only go over into the Conformal Field Theory and Vertex Operator Algebras.
identity boundary operator with h = 0 (or a descen-
dent thereof.)
The fusion rules also apply to the boundary Further Reading
operators themselves. The consistency of these with Affleck I (1997) Boundary condition changing operators in
bulk–boundary and bulk–bulk fusion rules, as well conformal field theory and condensed matter physics. Nuclear
as the modular properties of partition functions, was Physics B Proceedings Supplement 58: 35.
examined by Lewellen. Cardy J (1984) Conformal invariance and surface critical
behavior. Nuclear Physics B 240: 514–532.
Cardy J (1989) Boundary conditions, fusion rules and the
Extended Algebras Verlinde formula. Nuclear Physics B 324: 581.
CFTs may contain other conserved currents apart di Francesco P, Mathieu P, and Senechal D (1999) Conformal
Field Theory. New York: Springer.
from the stress tensor, which generate algebras Kager W and Nienhuis B (2004) A guide to stochastic Loewner evolution
(Kac–Moody, superconformal, W-algebras) which and its applications. Journal of Statistical Physics 115: 1149.
extend the Virasoro algebra. In BCFT, in addition to Lawler G (2005) Conformally Invariant Processes in the Plane.
the conformal boundary condition, it is possible (but American Mathematical Society.
not necessary) to impose further boundary condi- Lewellen DC (1992) Sewing constraints for conformal field theories
on surfaces with boundaries. Nuclear Physics B 372: 654.
tions relating the holomorphic and antiholomorphic Petkova V and Zuber JB Conformal Boundary Conditions and What
parts of the other currents on the boundary. It is They Teach Us, Lectures given at the Summer School and
believed that all rational CFTs can be obtained from Conference on Nonperturbative Quantum Field Theoretic
340 Boundary Control Method and Inverse Problems of Wave Propagation

Methods and their Applications, August 2000, Budapest, Werner W Random Planar Curves and Schramm–Loewner Evolu-
Hungary, hep-th/0103007. tions, Springer Lecture Notes (to appear), math.PR/0303354.
Verlinde E (1988) Fusion rules and modular transformations in
2D conformal field theory. Nuclear Physics B 300: 360.

Boundary Control Method and Inverse Problems of Wave


Propagation
M I Belishev, Petersburg Department of Steklov A point x 2  is said to belong to the set c0  if
Institute of Mathematics, St. Petersburg, Russia x is connected with  via more than one shortest
ª 2006 Elsevier Ltd. All rights reserved. geodesic. The set c := c0 is called the separation set
(cut locus) of  with respect to . It is a closed set of
zero volume. Let
 ( ) be the length of the geodesic
Introduction emanating from 2  orthogonally to  and
connecting with c. The function
 () is continuous
Inverse problems are generally positioned as the on .
problems of determination of a system (its structure, For x 2  n c the pair ( ,
), such that
parameters, etc.) from its ‘‘input ! output’’
= d(x, ) = d(x, ), constitutes the semigeodesic
correspondence. coordinates of x. The set of these coordinates
The boundary-value inverse problems deal with
systems which describe processes (wave, heat, electro-  :¼ fð ;
Þ j 2 ; 0

<
 ð Þg   ½0; T 
magnetic ones, etc.) occurring in media occupying a
is called the pattern of . Pictorially, to get the
spatial domain. The process is initiated by a boundary
pattern, one needs to slit  along c and then pull it
source (input) and is described by a solution of a certain
on the cylinder   [0, T ]. The part T :=  \ ( 
partial differential equation in the domain. Certain
[0, T]) of the pattern consists of the semigeodesic
additional information about the solution, which can be
coordinates of the points x 2 hiT nc (Figure 1).
extracted from measurements on the boundary, plays
the role of the output. The objective is to determine the
parameters of the medium – in particular, the coeffi- Dynamical System
cients in the equation – from this information.
The boundary control (BC) method (Belishev Propagation of waves in the manifold is described by
1986) is an approach to the boundary-value inverse a dynamical system T of the form
problems based on their links with the control utt  g u ¼ h in   ð0; TÞ ½1
theory and system theory. The present article is a
version of the BC method which solves the problem
of reconstruction of a Riemannian manifold from its u jt¼0 ¼ ut jt¼0 ¼ 0 in  ½2
boundary spectral or dynamical data.
u¼f on   ½0; T ½3
Forward Problems where g is the Beltrami–Laplace operator, 0<T
1,
f and h are the boundary and volume sources
Manifold
(controls), u = uf ,h (x,t) is the solution (wave).
Let (, d) be a smooth compact Riemannian manifold Set H := L2 (); the spaces of the controls are
with the boundary , dim   2; d is the distance
determined by the metric tensor g. For A  denote F T :¼ L2 ð  ½0; TÞ; GT :¼ L2 ð½0; T; HÞ

hAir :¼ fx 2  j dðx; AÞ
rg; r0
Ω Θ
T
the hypersurfaces  := {x 2  j d(x, ) = T}, T > 0
τ = T*
are equidistant to . In terms of the dynamics of Γ
τ=T
the system, the value ΓT
c
τ (γ) ΘT
T :¼ minfT > 0 j hiT ¼ g ¼ max dð ; Þ τ (γ)
x
*
τ
 * ΓT τ
τ=0
γ γ γ γ
means the time needed for waves, moving from 
with the unit speed, to fill . Figure 1 Manifold and pattern. (Data from Belishev (1997).)
Boundary Control Method and Inverse Problems of Wave Propagation 341

The ‘‘input 7! state’’ map of the system T is The sets of waves


realized by the control operator W T :
T
U T :¼ Wbd
T
F T ; U T! :¼ Wvol
T
GT!
T T f ;h
F  G ! H; W ff ; hg :¼ u ð ; TÞ
are said to be reachable at time t = T from  and !,
and its parts
respectively. Denoting
T
Wbd : F T ! H; T
Wvol : GT ! H
T HA :¼ fy 2 H j supp y  Ag
Wbd f :¼ uf ;0 ð ; TÞ; Wvol
T
h :¼ u0;h ð ; TÞ
by virtue of [6] one has the embeddings U T  HhiT
In the case f = 0 the evolution of the system is and U T!  Hh!iT . The property of the system T
governed by the operator L := g defined on the that plays the key role in inverse problems is that
Sobolev class H 2 () \ H01 () of functions vanishing these embeddings are dense:
on , and the semigroup representation
u0;h ð; rÞ ¼ Wvol
r
h cl U T ¼ HhiT ; cl U T! ¼ Hh!iT ½7
Z r h i
¼ L 1=2 sin ðr tÞL1=2 hð; tÞdt ½4 for any T > 0 (cl denotes the closure in H).
0 In control theory, relations [7] are interpreted as
holds for all r  0. an approximate controllability of the system in
The ‘‘input 7! output’’ map is implemented by the subdomains filled with waves; the name ‘‘BC
response operator RT : F T ! F T , method’’ is derived from the first one (boundary
controllability). This property means that the sets
RT f :¼ @ u f ;0 on   ½0; T of waves are rich enough: any function supported
1 in the subdomain hiT reachable for waves excited
defined on controls f 2 H (  [0, T]) vanishing on
on  can be approximated with any precision in
  {t = 0}; here  = () is the outward normal to .
H-norm by the wave uf ,0 (  , T) due to appropriate
The normal derivative @ uf ,0 describes the forces
choice of the control f acting from . The proof of
appearing on  as a result of interaction of the wave
[7] relies on the fundamental Holmgren–John–Tataru
with the boundary.
T  unique continuation theorem for the wave equation
The map CT : F T ! F T , CT := (Wbd T
) Wbd , which
(Tataru 1993).
is called the connecting operator, can be represented
via the response operator of the system 2T :
Laplacian on Waves
CT ¼ 12ðST Þ R2T J2T ST ½5
If h = 0, so that the system is governed only by
ST : F T ! F 2T being the extension of controls from boundary controls, its trajectory {uf ,0 (  , t)j0  t  T}
  [0, T] onto   [0, 2T] as odd functions of t does not leave the reachable set U T . In this case, the
with respect to t = T, and J2T : F 2T ! F 2T being the system possesses one more intrinsic operator LT
integration which acts in the subspace cl U T and is introduced
Z t
through its graph
ð J2T f Þð; tÞ ¼ f ð; sÞds
0  
T T T 1
gr L :¼ cl fWbd f ; Wbd ftt gj f 2 C0 ð  ð0; TÞÞ ½8

Controllability
(closure in H  H). By virtue of the relation
Open subsets    and !   determine the LT Wbd T T
f = g Wbd f following from the wave
subspaces equation [1] and [6], the operator LT is interpreted
as Laplacian on waves filling the subdomain hiT .
F T :¼ ff 2 F T j supp f    ½0; Tg
In the case T > T , one has hiT = , cl U T = H,
GT! :¼ fh 2 GT j supp h  !  ½0; Tg and LT is a densely defined operator in H, satisfying
LT  L. Using [7], one proves the equality LT = L.
of controls acting from  and !, respectively. In view
This equality and representation [4] imply that
of hyperbolicity of the problem [1]–[3], the relation
Z r h i
supp u f ;h ð; tÞ  hit [ h!it ; t0 ½6 r
Wvol h ¼ ðLT Þ 1=2 sin ðr tÞðLT Þ1=2 hð;tÞdt ½9
0
holds for f 2 F T and h 2 GT! . This means that the
waves propagate in  with the speed = 1. for all r  0 and any fixed T > T  .
342 Boundary Control Method and Inverse Problems of Wave Propagation

Spectral Problem 1. Does the coincidence of the inverse data imply


the equivalence of the manifolds?
The Dirichlet homogeneous boundary-value pro-
2. Given the inverse data of an unknown manifold,
blem is to find nontrivial solutions of the system
how to construct a manifold possessing these
g ’ ¼ ’ in  ½10 data?
The BC method gives an affirmative answer to the
’¼0 on  ½11 first question and provides a procedure producing a
This problem is equivalent to the spectral analysis representative of the class of equivalent manifolds
of the operator L; it has the discrete spectrum from its inverse data. The method is based on the
{k }1 concepts of model and ‘‘coordinatization.’’
k=1 , 0 < 1 < 2  , k ! 1; the eigenfunctions
{’k }1
k=1 , L’k = k ’k , form an orthonormal basis
in H. Model
Expanding the solutions of the problem (1)–(3)
over the eigenfunctions of the problem [10], [11] A pair consisting of an auxiliary Hilbert space H~
T
one derives the spectral representation of waves: and an operator W̃bd : F T ! H~ is said to be a model
T T
of the system  , if W̃bd is determined by inverse
X
1
T T
uf ;0 ð; TÞ ¼ Wbd
T
f ¼ ð f ; sTk ÞF T ’k ð  Þ ½12 data, and the map U : Wbd f 7! W̃bd f is an isometry
T
T
from Ran Wbd  H onto Ran W̃bd  H. ~ The model is
k¼1
an intermediate object in solving inverse problems. It
where plays the role of an auxiliary copy of the original
h i
1=2 1=2 dynamical system which an external observer can build
sTk ð; tÞ :¼ k sin ðT tÞk @ ’k ðÞ
from measurements on the boundary. While the
Thus, for a given control f, the Fourier coefficients genuine wave process inside , initiated by a boundary
of the wave uf ,0 are determined by the spectrum control, remains unaccessible for direct measurements,
~
its H-representation can be visualized by means of the
{k }1 1
k = 1 and the derivatives {@ ’k }k = 1 . T
model control operator W̃bd . This is illustrated by the
diagram on Figure 2, where the upper part is invisible
Inverse problems for an external observer, whereas the lower part can be
extracted from inverse data.
General Setup Each type of data determines a corresponding
The set of pairs  := {k ; @ ’k }1 model. The spectral model is the pair
k = 1 associated with
the problem [10], [11] is said to be the Dirichlet H~ :¼ l2 ; ~ T :¼ fð; sT Þ T g1
W ½13
bd k F k¼1
spectral data of the manifold (, d). The spectral
(frequency domain) inverse problem is to recover the (see [12]); the role of isometry U is played by the
manifold from its spectral data. Fourier transform F : H ! H, ~ Fy := {(y,’)H }1 . By
k=1
Since the speed of wave propagation is unity, the virtue of [4], the data  also determine the operator
r
response operator RT contains the information not ~ ! H,
W̃ vol : L2 ([0, r]; H) ~
about the entire manifold but only about its part Z r h i
r ~ 1=2 ðÞðtÞ dt;
~ 1=2 sin ðr tÞðLÞ
hiT=2 . This fact is taken into account in the W̃ vol ¼ L
0
dynamical (time domain) inverse problem which
aims to recover the manifold from the operator R2T r0 ½14
given for a fixed T > T .
If the manifolds (0 , d0 ) and (00 , d00 ) are isometric
via an isometry i : 0 ! 00 , then, identifying the H
boundaries by i()
, one gets two manifolds with
the common boundary  = @0 = @00 which possess
identical inverse data: 0 = 00 , R0 2T = R00 2T . Such T
W bd
manifolds are called equivalent: they are indistin- U
guishable for the external observer extracting  or
R2T from the boundary measurements. Therefore,
these data do not determine the manifold uniquely ~T
W bd
and both of the inverse problems need to be F T ~
H
clarified. The precise formulation is given in the
form of two questions: Figure 2 Model of a system. (Data from Belishev (1997).)
Boundary Control Method and Inverse Problems of Wave Propagation 343

where L̃ := ULU = diag{k }1


k = 1 . Thus, the spectral
model allows one to see the Fourier images of
invisible waves. c
According to [5], the response operator R2T Γτ
determines the modulus of the control operator x (γ, τ) Γτ − ε
x (γ′, τ)
T T  T 1=2 T 1=2
jWbd j¼ ½ðWbd Þ Wbd  ¼ ðC Þ
which enters in the polar decomposition
T T
Wbd = jWbd j. Along with it, the response operator Γ
determines the dynamical model γ
γ′
σγε
H~ :¼ cl RanðCT Þ1=2 ; ~ T :¼ ðCT Þ1=2
W bd ½15
Figure 3 The subdomains.
The correspondence ‘‘system ! model’’ is realized
T T
by the isometry U =  : Wbd f 7! jWbd jf . The opera-
T T 
tor L̃ := UL U dual to the Laplacian on waves, is !" ð; Þ :¼ ½hi nhi "  \ h" i
determined by its graph
(shaded domain on Figure 3) shrinks to x(, ); if
~T
gr L  >  (), then the family terminates: !" (, ) = ; as
  " < "0 () (the case  =  0 in Figure 3). Such behavior
~ T f ; W
:¼ cl fW ~ T ftt gj f 2 C1 ð  ð0; TÞÞ ½16 of subdomains implies that
bd bd 0
D Er
T
(see [8]) and, therefore, L̃ is also determined lim ½hi nhi "  \ h" i
"!0
by R2T . In the case T > T , the operator 
r
~ ! H~ dual to W r , is represented hxð; Þir ;    ðÞ
W̃vol : L2 ([0, r]; H) vol ¼
;;  >  ðÞ ½18
in the form
Z r h i
~ r
Wvol ¼ ðL ~ T Þ1=2 ðÞðtÞ dt;
~ T Þ 1=2 sin ðr tÞðL Step 2 (wave subspaces) Pass from the subdomains
0 to the corresponding subspaces Hhi , Hh" i ,
r0 ½17 Hh!" (, )ir , and represent them via reachable sets
by [7]:
in accordance with [9]. Thus, the dynamical model
visualizes the  -images of the waves propagating Hhi ¼ cl Wbd

F; Hh" i ¼ cl Wbd

F "
inside .
Hh!" ð; Þir ¼ cl Wvol
r
L2 ð½0; r; H!" ð; ÞÞ

r
¼ cl Wvol L2 ½0; r; ½Hhi
Wave Coordinatization 
In a general sense, a coordinatization is a corre- Hhi "  \ Hh" i
spondence between points x of the studied set A and 
r 
¼ cl Wvol L2 ½0; r; ½cl Wbd F
elements x̃ of another set A~ such that: (i) the

elements of A~ are accessible and distinguishable; (ii)  "  "
cl Wbd 
F  \ cl Wbd F "
the map x 7! x̃ is a bijection; and (iii) relations
between elements of A~ determine those between Define
points of A which are studied (H Weyl). Coordina- 
tization enables one to study A via operations with W rð;Þ :¼ lim cl Wvol
r 
L2 ½0; r; ½cl Wbd F
~ "!0
coordinates x̃ 2 A. 
The external observer investigating the mani- cl Wbd  "  "
F  \ cl Wbd 
F " ½19
fold probes  with waves initiated by sources on
. The relevant coordinatization of  described W r(,0) := W r(,þ0) , r  0 (the limits in the sense of the
below uses such waves and is implemented in strong operator convergence of the projections in H
three steps. on the corresponding subspaces). By the definitions,
Step 1 (subdomains) Let x(, ) be the end point of one has W r(,) = lim"!0 Hh!" (,)ir , whereas [18]
the geodesic of the length  > 0 emanating from  2  leads to the equality
in the direction (), and let "   be a small 
Hhxð; Þir ;    ðÞ
neighborhood shrinking to  as " ! 0. If    (), W rð;Þ ¼ ½20
then the family of subdomains f0g;  >  ðÞ
344 Boundary Control Method and Inverse Problems of Wave Propagation

for all  2 ,   0, r  0. As a result, since any x 2  This sample is isometric to the original (, d) by
can be represented as x = x(, ), one attaches to every construction. Identifying properly the boundaries @ ˆ
point of the manifold a family of expanding subspaces and , one turns (, ˆ d̂) into a canonical representa-
{W r(,) jr  0} built out of waves. As is seen from [20], tive of the class of equivalent manifolds possessing
the family is determined by the point x (not dependent the given inverse data.
on the representation x = x(,)); the subspaces which If the response operator R2T is given for a fixed
it consists of coincide with Hhxir . T < T , the above procedure produces the wave
Expressing the distance as copy of the submanifold (hiT , d). This locality in
time is an intrinsic feature and advantage of the BC
dðx0 ; x00 Þ ¼ 2 inf fr > 0 j Hhx0 ir \ Hhx00 ir 6¼ f0gg method: longer time of observation on  increases
in accordance with [20], one can represent the depth of penetration into .

dðx0 ; x00 Þ Amplitude Formula


¼ 2 inf fr > 0 j W rð0 ; 0 Þ \ W rð00 ; 00 Þ 6¼ f0gg ½21
Another variant of the BC method is based on
where x0 = x( 0 ,  0 ), x00 = x( 00 ,  00 ), and hence find geometrical optics formulas describing the propaga-
the distance via the above families. tion of singularities of the waves.
Step 3 (wave copy) By varying  2 ,   0, Let y 2 H, and let  be the density of the volume
gather all nonzero families {W r(,) jr  0} =: x̃ in the in semigeodesic coordinates: dx =  d d; the
set ˜ = {x̃}. Redenoting W rx̃ := W r(,) 2 x̃, endow the function
set with the distance 
1=2
~yð; Þ :¼  ð; Þ yðxð; ÞÞ; ð; Þ 2 
~ x0 ; x
dð~ ~00 Þ :¼ 2 inf fr > 0 j W rx~0 \ W rx~00 6¼ f0gg ½22 0; otherwise

In view of [21], one has d(x0 , x00 ) = d̃(x̃0 , x̃00 ), so defined on   [0, T ] is called the image of y. The
that the metric space (, ˜ d̃) is an isometric copy amplitude formula represents the images of waves
of (, d) by construction. Thus, the correspondence initiated by boundary controls in the form
x 7! x̃ (‘‘point 7! family’’) is an isometry and
satisfies the general principles (i)–(iii) of u f ;0g
ð; TÞð; Þ ¼ T 
lim ½ðWbd T
Þ ðI P ÞWbd f ð; tÞ
t!T  0
coordinatization. 0< <T
The manifold (, ˜ d̃) is the end product of the
wave coordinatization. It represents the original where I is the identity operator and P is the
manifold as a collection of infinitesimal sources projection in H onto clWbd 
F  . The formula is
interacting with each other via the waves which they derived by the ray method going back to
produce. J Hadamard, the derivation uses the controllability
[7].
Solving Inverse Problems Any model determines the right-hand side of the
T 
last relation by the isometry: (Wbd ) (I P )
The motivation for the above coordinatization is T T   T T
Wbd = (W̃bd ) (Ĩ P̃ )W̃bd , where W̃bd = UWbd T
, Ĩ is
that the wave copy can be reproduced via any   
the identity operator, and P̃ = UP U is the projec-
model. Namely, the external observer with the 
tion in H~ onto cl W̃bd F  . This leads to the
˜ d̃)
knowledge of  or R2T (T > T ) can recover (, representation
up to isometry by the following procedure:
uf ;0g

ð; TÞð; Þ ¼ ~ T Þ ð~I P
lim ½ðW ~ ÞW
~ T f ð; tÞ
1. Construct the model corresponding to the given t!T  0
bd bd

inverse data and determine the operators W̃bd ,
0< <T ½23
0    T by [13], [15]; then determine
T r
L̃, L̃ , and W̃vol by [14] or [16], [17]. and makes the amplitude formula a useful tool for
2. Replace on the right-hand side of [19] all solving the inverse problems. The external observer
operators W without tildes by the ones with can construct a model via inverse data and then
r
tildes, and get the subspaces W~ (,) = UW r(,) , visualize by [23] the wave images on the part T of
 2 ,   0, r  0. the pattern (see Figure 1). The collection of images
r g
3. Gather all nonzero families {W~ (,) jr  0} = : x̂ in the uf ,0 corresponding to all possible controls f is rich

set  ˆ = {x̂} and redenote the subspaces as enough for recovering the tensor g on T (i.e., the
r r
~
W x̂ := W~ (,) 2 x̂; endow the set with the metric metric tensor in semigeodesic coordinates) and
r r
d̂(x̂0 , x̂00 ):= 2 inf{r > 0 j W~ x̂0 \ W~ x̂00 6¼ {0}} (see [22]), turning the pattern into an isometric copy of the
and get a sample (, ˆ d̂) of the wave copy (, ˜ d̃). submanifold (hiT , d). This variant of the method is
Boundary Control Method and Inverse Problems of Wave Propagation 345

more appropriate if one needs to recover unknown framework of linear system theory (Belishev
coefficients of the wave equation in  – it can be 2001). The method is also related to the problem
realized in terms of numerical algorithms. of triangular factorization of operators (Belishev
and Pushnitski 1996).
Numerical algorithms for solving two-dimensional
spectral and dynamical inverse problems for the wave
Extensions of the Method
equation
utt u = 0 which recover the variable
Electromagnetic waves are also well suited for density
have been developed and tested (Filippov,
coordinatization and for constructing the wave copy Gotlib, Ivanov, 1994–1999).
˜ d̃). An appropriate version of the amplitude
(,
formula also exists for the system governed by the See also: Dynamical Systems and Thermodynamics;
Maxwell equations (see Further Reading). At present Geophysical Dynamics; Inverse Problem in Classical
(2004), the applicability of the BC method to three- Mechanics.
dimensional inverse problems of elasticity theory is
still an open question. The following hypothesis Further Reading
concerns the Lamé system: the wave coordinatization
procedure (steps 1–3) using the elastic waves instead Belishev MI (1988) On an approach to multidimensional inverse
of the above uf ,0 , gives rise to the copy of   R3 problems for the wave equation. Soviet Mathematics. Doklady
36(3): 481–484.
endowed with ffi the metric jdxj2 =c2p where
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Belishev MI (1996) Canonical model of a dynamical system with
cp = ( þ 2 )=
is the speed of the pressure waves. boundary control in the inverse problem of heat conductivity.
The concept of model is used for solving inverse St. Petersburg Mathematical Journal 7(6): 869–890.
problems for the heat and Schrödinger equations Belishev MI (1997) Boundary control in reconstruction of
manifolds and metrics. Inverse Problems 13(5): R1–R45.
(Avdonin and Belishev, 1995–2004), as well as for
Belishev MI (2001) Dynamical systems with boundary control:
the problem of boundary data continuation models and characterization of inverse data. Inverse Problems
(Belishev 2001, Kurylev and Lassas 2002). A variant 17: 659–682.
of the BC method allows one to recover not only the Belishev MI (2002) How to see waves under the Earth surface
manifold but also the Schrödinger type operators on (the BC-method for geophysicists). In: Kabanikhin SI and
it and/or the dissipative term in the scalar wave Romanov VG (eds.) Ill-Posed and Inverse Problems, pp. 67–84.
Utrecht/Boston: VSP.
equation (Kurylev and Lassas 1993–2003). Belishev MI (2003) The Calderon problem for two-dimensional
An appropriate version of the amplitude formula manifolds by the BC-method. SIAM Journal of Mathematical
solves the inverse problem for one-dimensional two- Analysis 35(1): 172–182.
velocity dynamical system which describes the waves Belishev MI (2004) Boundary spectral inverse problem on a class
consisting of two modes propagating with different of graphs (trees) by the BC-method. Inverse Problems
20(3): 647–672.
speeds and interacting with each other (Belishev, Belishev MI and Glasman AK (2001) Dynamical inverse problem
Blagoveschenskii, Ivanov, 1997–2000). for the Maxwell system: recovering the velocity in the regular
One more variant of coordinatization going back zone (the BC-method). St. Petersburg Mathematical Journal
to the first paper on the BC method, associates with 12(2): 279–319.
Belishev MI and Gotlib VYu (1999) Dynamical variant of the
points x 2  the Dirac measures x ; then, their
BC-method: theory and numerical testing. Journal of Inverse
images ˜x are identified via suitable models. This and Ill-Posed Problems 7(3): 221–240.
variant solves inverse problems on graphs and the Belishev MI, Isakov VM, Pestov LN, and Sharafutdinov VA
two-dimensional elliptic Calderon problem. The (2000) On reconstruction of metrics from external electro-
reader is referred to articles by the present author magnetic measurements. Russian Academy of Sciences.
listed in Further Reading. Doklady. Mathematics 61(3): 353–356.
Belishev MI and Ivanov SA (2002) Characterization of data of
Within the scope of the method, one derives some dynamical inverse problem for two-velocity system. Journal of
natural analogs of the classical Gelfand–Levitan– Mathematical Sciences 109(5): 1814–1834.
Krein–Marchenko equations (Belishev, 1987–2001). Belishev MI and Lasiecka I (2002) The dynamical Lamé system:
Also, an appropriate analog solves the kinematic regularity of solutions, boundary controllability and boundary
inverse problem for a class of two-dimensional data continuation. ESAIM COCV 8: 143–167.
Katchalov A, Kurylev Y, and Lassas M (2001) Inverse Boundary
manifolds (Pestov 2004). Spectral Problems. Chapman and Hall/CRC Monographs and
There exists an abstract version of the Surveys in Pure and Applied Mathematics, vol. 123. Boca
approach, embedding the BC method into the Raton, FL: Chapman and Hall/CRC.
346 Boundary-Value Problems for Integrable Equations

Boundary-Value Problems for Integrable Equations


B Pelloni, University of Reading, UK Davey–Stewartson I and II, and Kamdotsev–
ª 2006 Elsevier Ltd. All rights reserved.
Petviashvili I and II equations.
There is no universally accepted definition of an
integrable PDE, but on account of the above results,
Introduction the existence of a Lax pair can be taken as the
defining property of such equations. In the course of
Integrable equations are a special class of nonlinear the 1970s, the inverse scattering transform was
equations arising in the modeling of a wide variety applied to solve the initial-value (Cauchy) problem
of physical phenomena. It has been argued that for many integrable equations. In principle, there is
integrable PDEs are in a certain, specific sense no obstruction to solving analytically the initial-value
‘‘universal’’ models for physical phenomena invol- problem by the inverse scattering transform as soon
ving weak nonlinearity. Indeed, integrable equations as a Lax pair is constructed for the equation, and
are obtained by a procedure involving rescaling and appropriate decaying initial conditions are pre-
an asymptotic expansion from very large classes of scribed. The solution is then characterized in
nonlinear evolution equations, which preserves terms of a certain integral equation. This approach
integrability while retaining in the limit weakly is equivalent to associating with the initial-value
nonlinear effects. For this reason, integrable equa- problem a classical problem in complex analysis,
tions are a very important class of PDEs. Important namely a matrix Riemann–Hilbert problem,
examples are the nonlinear Schrödinger (NLS) defined in the complex spectral space. This point
equation of view is currently taken by many authors as it
iqt þ qxx  2jqj2 q ¼ 0;  ¼ 1 ½1 provides a unifying and very flexible framework for
the analysis.
the Korteweg–deVries (KdV) equation After the success of the inverse scattering trans-
qt þ qx  qxxx þ 6qqx ¼ 0 ½2 form in solving the Cauchy problem, it was natural
to attempt to generalize the approach to boundary-
the modified KdV (mKdV) equation value problems. To describe the difficulties involved
in this generalization, consider the case of evolution
qt  qxxx  6q2 qx ¼ 0;  ¼ 1 ½3
equations in one space and one time dimensions.
and the sine-Gordon (SG) equation in light-cone or The independent variables can be denoted by (x, t),
laboratory coordinates with t > 0 representing time. While the initial-value
problem is posed on the full real line, hence for
qxt þ sin q ¼ 0 or qtt  qxx þ sin q ¼ 0 ½4
x 2 (1, 1), the simplest boundary-value problem
A general method for solving the initial-value is posed on a half-line, for x 2 (0, 1). In addition
problem for integrable equations in one space to initial conditions for initial time t = 0, it is
dimension was discovered in 1967, when in a necessary to prescribe conditions at the boundary
pioneering and much celebrated work (Gardner x = 0. The number of conditions that must be
et al. 1967), the initial-value problems for KdV prescribed to obtain a problem which admits a
with decaying initial condition was completely unique solution depends on the particular equation,
solved. Soon afterwards, it was understood that but for evolution equation it is roughly equal to
this method, now known as the ‘‘inverse scattering half the number of x-derivatives involved in the
transform,’’ is of more general applicability. Indeed, equation. For example, for the NLS equation, a
it can be applied to those nonlinear equations that well-posed problem is defined as soon as one
can be written as the compatibility condition of a boundary condition at x = 0 is prescribed; hence a
pair of linear eigenvalue equations. The method of typical boundary-value problem for this equation is
solution for the Cauchy problem essentially relies on obtained, for example, when q(x, 0) = q0 (x) and
the possibility of expressing the equation through q(0, t) = g0 (t) are prescribed and compatible, so that
this pair, now called a Lax pair after the work of q0 (0) = g0 (0). It follows that, while qxx (0, t) can be
Lax (1968), who first clarified the connection. computed from the equation, qx (0, t) is not imme-
Zakharov and Shabat (1972) constructed such a diately known. An even more difficult situation
pair for the NLS equation, and in subsequent years arises for the KdV equation [2] (with the þ sign),
the Lax pairs associated with all important integr- for which a well-posed problem is again defined as
able equations in one and two spatial variables were soon as one boundary condition is prescribed, so
constructed. These include the NLS, sG, mKdV, that there are two unknown boundary values.
Boundary-Value Problems for Integrable Equations 347

Because of this simple fact, a straightforward Recently, Fokas (2000) introduced a general
application of the ideas of the inverse scattering methodology to extend the ideas of the inverse
transform immediately encounters one crucial diffi- scattering transform to boundary-value problems.
culty. This transform method yields an integral This methodology provides the tools to analyze
representation of the solution which involves not boundary-value problems for integrable equations to
only the given boundary conditions f (t), but also the a considerable degree of generality. We note as a
other ‘‘unknown’’ boundary values – in our example side remark that linear PDEs are trivially integrable,
for the NLS equation, the function qx (0, t). The in the sense of admitting a Lax pair (in this case the
problem of characterizing these unknown boundary Lax pair can be found algorithmically, while the
values has impeded progress in this direction for over construction of the Lax pair associated with a
thirty years. nonlinear equation is by no means trivial). As a
On account of their physical significance, various consequence of this remark, the extension of the
boundary-value problems for the KdV equation have inverse scattering transform also provides a method
been considered, and classical PDE techniques (not for solving boundary-value problems for a large
specific to integrable models) have been used to variety of linear PDEs of mathematical physics.
establish existence and uniqueness results (Bona What follows is a general description of the
et al. 2001, Colin and Ghidaglia 2001, Colliander approach of Fokas, considering, for the sake of
and Kenig 2001). These approaches, and in parti- concreteness, the case of an integrable PDE in the
cular the approach of Colliander and Kenig, are two variables (x,t) which vary in the domain D
quite general and possibly of wide applicability, and (typically, for an evolution problem D = (0, 1)
give global existence results in wide functional (0, T)). We assume that q(x, t) denotes the unique
classes. However, they do not rely on integrability solution of a boundary-value problem posed for
properties. Indeed, none of these results use the such an equation.
integrable structure of the equation in any funda-
The method consists of the following steps.
mental or systematic way. However, the fact that
these equations are integrable on the full line implies 1. Write the PDE as the compatibility condition of a
very special properties that should be exploited in Lax pair. This is a pair of linear ODEs for the
the analysis and it is natural to try to generalize the function  = (x, t, k) involving the solution
inverse scattering transform approach. q(x, t) of the PDE, the derivatives of this solution,
Such a generalization is sometimes directly possi- and a complex parameter k, called the spectral
ble. For example, it has been used for studying the parameter. This can be done algorithmically for
problem on the half-line for the hyperbolic version linear PDEs, and in this case (x, t,k) is a scalar
of the sG equation [4a] which does not involve function. For nonlinear integrable PDEs, (x, t, k)
unknown boundary values (Fokas 2000, Pelloni). It is in general a matrix-valued function.
has also been used to study some specific boundary- The equivalence of the PDE with a Lax pair
value problems for the NLS equation, for example, can be reformulated in the language of differ-
for homogeneous Dirichlet or Neumann conditions, ential forms, and in this language it is easier to
when it is possible to use even or odd extensions of describe the methodology in general. Assume
the problem to the full line (Ablowitz and Segur then that (x, t, k) is a differential 1-form
1974), or more recently in Degasperis et al. (2001). expressed in terms of a function q(x, t) and its
In the latter case, however, the unknown boundary derivatives, and of a complex variable k, and one
values are characterized through an integral Fred- which is characterized by the property that
holm equation, which does not admit a unique d = 0 if and only if q(x, t) satisfies the given
solution. Some special cases of boundary-value PDE. The closure of the form  yields the two
problems for the KdV equation (Adler et al. 1997, important consequences 2(a) and 2(b) below.
Habibullin 1999) and elliptic sG (Sklyanin 1987) 2. (a) Since the domain D under consideration is
have also been studied via the inverse scattering simply connected, the closed form  is also exact;
transform. However all the examples considered are hence, it is possible to find the particular, 0-form
nongeneric, and it has recently been shown (Fokas, (x, t,k), solving d = . In particular, (x, t, k)
in press) that the boundary conditions chosen fall in can be chosen to be sectionally bounded with
the special class of the so-called ‘‘linearizable’’ respect to k by solving either a Riemann–Hilbert
boundary conditions, for which the problem can be problem or a d-bar problem in the complex
solved as if it were posed on the full line. One spectral k plane, and the solution (x, t, k) is
cannot hope to use similar methods to solve the then expressed in terms of certain ‘‘spectral
problem with generic boundary conditions. functions’’ depending on all the boundary values
348 Boundary-Value Problems for Integrable Equations

of the solution q(x, t) of the PDE. The function  


0 q
q(x, t) can then be expressed in terms of x þ ik3  ¼ Q; Q¼
q
 0 ½6
(x, t, k). (b) The integral of  along the
2 2
boundary of the domain D vanishes. This yields t þ 2ik 3  ¼ ð2kQ  iQx 3  ijqj 3 Þ
an integral constraint between all boundary
values of the solution of the PDE, which The first step towards a systematic new approach to
becomes an algebraic constraint for the spectral solving boundary-value problem was the work of
functions. The resulting algebraic identity is Fokas and Its, who associated the boundary-value
called the ‘‘global relation.’’ problem for NLS on the half-line to a single
3. The last step is the analysis the k-invariance Riemann–Hilbert problem determined by both
properties of the global relation. This analysis equations in the Lax pair. The jump determining
yields the characterization of the spectral func- this Riemann–Hilbert problem has an explicit
tions in terms only of the given boundary exponential dependence on both x and t. This differs
conditions. from the classical inverse scattering approach, in
which the x-part of the Lax pair is used to determine
The crucial and most difficult step in the solution an x-transform with t-dependent scattering data,
process is the characterization described above. The and the t-part of the Lax pair is then exploited to
analysis required depends on the type of problem find the time evolution of these data. The work of
under consideration. For nonlinear integrable evolu- Fokas and Its led to the understanding that both
tion PDEs posed on the half-line x > 0, in general equations in the Lax pair [6] must be considered in
the characterization mentioned in step (3) involves order to construct a spectral transform appropriate
solving a system of nonlinear Volterra integral to solve boundary-value problems. Fokas (2000)
equations. This is an important difference from the reviews his systematic way to solve these problems
case of the Cauchy problem, where the solution is by performing the simultaneous spectral analysis of
given by a single integral equation where all the both equations in the Lax pair. The transform thus
terms are explicitly known. obtained, which is a nonlinearization of the Fourier
The method outlined above has been applied transform, precisely generalizes the inverse scatter-
successfully to solve a variety of boundary-value ing transform.
problems for linear and integrable nonlinear PDEs. This simultaneous analysis also leads naturally to
For concreteness, here the focus is on the important the identification of the ‘‘global relation’’ which
case of integrable evolution PDEs in one space, which holds between initial and boundary data, and which
illustrates clearly the generalities of this method. plays an essential role in deriving an expression for
the solution of the problem which does not involve
unknown boundary values.
Integrable Evolution Equations in One The Riemann–Hilbert problem with explicit (x, t)
Space Dimension dependence, the global relation, and the invariance
properties of the latter with respect to the spectral
The crucial property of integrable PDEs which is parameter are the fundamental ingredients of this
used in the inverse scattering transform approach to systematic approach to solve boundary-value pro-
solve the initial-value problem is the fact that they blems for integrable equations.
can be written as the compatibility of a Lax pair. The steps involved in this method are summar-
Many integrable evolution equations of physical ized in the introduction. While steps (1) and (2)
significance (such as NLS, KdV, sG, and mKdV) can be described generally, and, once the Lax pair
admit a Lax pair of the form is identified, can be performed algorithmically (at
least under the assumption that the solution of the
x þ if1 ðkÞ3  ¼ Qðx; t; kÞ PDE exists), the last step is the most difficult part
½5
~
t þ if2 ðkÞ3  ¼ Qðx; t; kÞ of the analysis, and it needs to be considered
separately for each given problem. However, it is
where (x, t, k) is a 2  2 matrix, 3 = diag(1,  1), this step that yields the effective characterization
fi (k), i = 1, 2, are analytic functions of the complex of the solution.
parameter k, and Q, Q ~ are analytic functions of k, The results obtained for the particular case of eqn
of the function q(x, t) (and of its complex conjugate [1] are reviewed in detail in the next section, as they
q(x, t) for complex-valued problems) and of its provide an important example, which can be
derivatives. For example, the NLS equation [1] is generalized without any conceptual difficulty to
equivalent to the compatibility condition of the pair eqns [2]–[4].
Boundary-Value Problems for Integrable Equations 349

The NLS Equation where D denotes the first quadrant of the


complex k-plane:
As already mentioned, the initial-value problem for
NLS was solved, for decaying initial condition, by D ¼ fkjRe k > 0; Im k > 0g
Zakharov and Shabat, and studied in depth by many  denotes the closure of D, and c(t, k) is a
D
others. However, by the mid-1990s only a handful
function of k analytic in D and of order O(1=k)
of papers had been written on the solution of the
as k ! 1. The spectral functions are defined by
boundary-value problem posed on the half-line, all
on a specific example or aspect of the problem, or 2

Aðt; kÞ ¼ e2ik t 2 ðt; kÞ;
attempts at solving the problem using general PDE 2ik2 t
½9
techniques. Bðt; kÞ ¼ e 1 ðt; kÞ
For this equation, the approach of Fokas yields where the vector (t, k) with components 1 and
the following results. Let the complex-valued 2 is the following solution of the t-problem of
function q(x, t) satisfy the NLS equation [1], for the associated Lax pair evaluated at x = 0:
x > 0 and t > 0, for prescribed one initial and one
boundary conditions. For the sake of concreteness, ~
t þ 2ik2 3  ¼ Qð0; t; kÞ
we select the specific initial and boundary 0 < t < T; k2C
conditions  
0
ð0; kÞ ¼
qðx; 0Þ ¼ q0 ðxÞ 2 SðRþ Þ 1 ½10
qð0; tÞ ¼ g0 ðtÞ 2 SðRþ Þ ½7 ~
Qð0; t; kÞ ¼
!
q0 ð0Þ ¼ g0 ð0Þ  jg0 ðtÞj2 2kg0 ðtÞ þ ig1 ðtÞ
where S denotes the space of Schwartz functions 2kg0 ðtÞ  ig1 ðtÞ jg0 ðtÞj2
(similar results hold for different choices of bound-
ary conditions, and less restrictive function classes).  Given a(k), b(k) and A(k), B(k), define a 2  2
The solution of this initial boundary-value (IBV) matrix Riemann–Hilbert problem. This problem
problem can be constructed as follows (Fokas 2000, has the distinctive feature that its jump has
2002; in press): explicit (x, t) dependence in the exponential
form of exp {ikx þ 2ik2 t}. Determine q(x, t) in
 Given q0 (x) construct the spectral functions
terms of the solution of this Riemann–Hilbert
{a(k), b(k)}. These functions are defined by
problem by using the fact that these functions
aðkÞ ¼ 2 ð0; kÞ; bðkÞ ¼ 1 ð0; kÞ are related by the Lax pair. Then the function
q(x, t) solves the IBV problem [1]–[7] with
where the vector (x, k) with components 1 (x, k)
q(x, 0) = q0 (x), q(0, t) = g0 (t), and q0x (0, t) = g1 (t).
and 2 (x, k) is the following solution of the
x-problem of the associated Lax pair evaluated The above construction can be summarized in the
at t = 0: following theorem (Fokas 2002):

x þ ik3  ¼ Qðx; 0; kÞ; 0 < x < 1; Im k  0 Theorem 1 Consider the boundary-value problem
   for the NLS equation [1] determined by the conditions
0 [7]. Let a(k), b(k) be given by [8], and suppose that
ðx; kÞ ¼ eikx þ oð1Þ as x ! 1
1 there exists a function g1 (t) such that if A(k), B(k) are
 
0 q0 ðxÞ defined by [9], then the global relation [8] holds.
Qðx; 0; kÞ ¼ Let M(x, t, k) be the solution of the 2  2
q0 ðxÞ 0
Riemann–Hilbert problem with jump on the real
and imaginary axes given by
(3 and Q(x, t, k) are defined after eqns [5] and [6],
respectively).  M (x, t, k) = Mþ (x, t, k)J(x, t, k) with M = M in
 Given q0 (x) and g0 (t) characterize g1 (t) by the the second and fourth quadrants of C, M = Mþ in the
requirement that the spectral functions first and third quadrants of C, and J(x, t, k) is defined
2
{A(t, k), B(t, k)} satisfy the global relation in terms of a, b, A, B and the exponential eikx2ik t :
 M = I þ O(1=k) as k ! 1 and has appropriate
2 cðt; kÞ
Bðt; kÞ  RðkÞAðt; kÞ ¼ e4ik t
residue conditions if there are poles
aðkÞ Then M(x,t,k) exists and is unique, and
½8
bðkÞ 
RðkÞ ¼ ; t 2 ½0; T; k 2 D qðx; tÞ ¼ 2i lim ðkMðx; t; kÞÞ12
aðkÞ k!1
350 Boundary-Value Problems for Integrable Equations

The result above relies on characterizing the representation has now been derived for all equations
unknown boundary value g1 (t) a priori by requiring [1]–[3], see Fokas (in press).
that the global relation hold. Recently, substantial The analysis of the invariance properties of the
progress has been made in this direction in the case of global relation with respect to k also yields the
integrable nonlinear evolution equations, in particu- characterization of all the boundary conditions for
lar of NLS. Namely Fokas (in press) contains an which the transform obtained to represent the solution
effective description of the map assigning to each linearizes. For these boundary conditions, called
given q(x, 0) = q0 (x) and g0 (t) = q(0, t) a unique value linearizable, the solution can be represented as
for qx (0, t) (called the Dirichlet to Neumann map) for effectively as for the Cauchy problem. For example,
the NLS, as well as for a version of the Korteweg– the linearizable boundary conditions for the NLS
deVries and sG equations. We state below the equation are given by any boundary values that satisfy
relevant theorem for the case of the NLS equation.
Theorem 2 Let q(x, t) satisfy the NLS equation on g0 ðtÞg1 ðtÞ  g0 ðtÞg1 ðtÞ ¼ 0
the half-line 0 < x < 1, t > 0 with the initial and
An example of boundary condition satisfying
boundary conditions [7]. Then g1 (t) := qx (0, t) is
this constraint, encompassing also Dirichlet and
given by
Neumann homogeneous conditions, is q(0, t) 
Z qx (0, t) = 0, with  a non-negative constant.
g0 ðtÞ 2
g1 ðtÞ¼ e2ik t ð2 ðt;kÞ2 ðt;kÞÞdk As mentioned at the beginning of the previous
 @D
Z section, the approach described in general can be
4i 2

þ e2ik t kRðkÞ2 ðt; kÞdk used to obtain results similar to those given for the
 @D
Z NLS equation for many other integrable evolution
2i 2
equations, in particular, mKdV (Boutet de Monvel
þ e2ik t ðk½1 ðt;kÞ1 ðt;kÞ þig0 ðtÞÞdk
 @D et al. 2004), sG, and KdV (Fokas 2002). The results
obtained are essentially the same as for NLS,
with =(1 ,2 ) given by the solution of [10]. The
starting from the general form [5] of the Lax pair,
Neumann datum g1 (t) is unique and exists globally
and include the derivation of the solution representa-
in t.
tion, the complete characterization of linearizable
This result yields a rigorous proof of the global boundary conditions, and the analysis of the Dirichlet
existence of the solution of boundary-value pro- to Neumann map.
blems on the half-line for the NLS equation. There- The approach above can also be used for studying
fore, the assumption in Theorem 1 that a suitable boundary-value problems posed on finite domains,
function g1 (t) exists can be dropped. for x 2 [0, 1]. This has been done for a model for
transient simulated Raman scattering (Fokas and
Menyuk 1999), for the sG equation in light-cone
coordinates (Pelloni, in press), and for the NLS
Generalizations and Summary of Results
equation (Fokas and Its 2004). In this case also the
Results analogous to the ones presented in the method yields a representation of the solution which
previous section can be phrased exclusively in terms is suitable for asymptotic analysis. In this respect,
of integral equations rather than in terms of the question of soliton generation from boundary
Riemann–Hilbert problems, as done for example in data is of some importance, and has been recently
Khruslov and Kotlyarov (2003). This is the point of considered by various authors (Fokas and Menyuk
view of the school of Gelfand and Marchenko, and in 1999, Boutet de Monvel and Kotlyarov 2003,
this setting the functions  are given in the so-called Pelloni in press, Boutet de Monvel et al. 2004).
Gelfand–Levitan–Marchenko representation. Results The results are however still considered case by case,
on boundary-value problems for the NLS equation and there is no general framework for this problem
using this representation have been obtained only identified yet. For problem on the half-line, solitons
under additional assumptions on the unknown part may be generated but not necessarily in correspon-
of the boundary values. It was only after the idea that dence to the singularities that generate soliton for
the x- and t-parts of the spectral equations should be the full line problem, even when the same singula-
treated simultaneously that this approach yielded rities are present. For problems posed on finite
complete results. However, the Gelfand–Levitan– domains, in some specific cases at least for the
Marchenko representation yields a crucial simplifica- simulated Raman scattering, and the sG equations,
tion for deriving the explicit form of the Dirichlet to it appears that the dominant asymptotic behavior is
Neumann map and proving Theorem 2. This given by a similarity solution.
Braided and Modular Tensor Categories 351

In conclusion, the extension of the inverse scattering Colliander JE and Kenig CE (2001) The generalized Korteweg–
transform given by Fokas provides the tool for analyzing deVries equation on the half line (http://arxiv.org/abs/
math.AP/0111294).
boundary-value problems specific to nonlinear integr- Degasperis A, Manakov S, and Santini PM (2001) The nonlinear
able equations. This tool relies, in an essential way, on Schrödinger equation on the half line. JETP Letters
the integrability structure of the problem, and yields a 74(10): 481–485.
full characterization of the solution as well as uniqueness Fokas AS (2000) On the integrability of linear and nonlinear
and existence results. The solution representation thus PDEs. Journal of Mathematical Physics 41: 4188.
Fokas AS (2002) Integrable nonlinear evolution equations on the half
obtained is not always fully explicit, but it is always line. Communications in Mathematical Physics 230: 1–39.
suitable for asymptotic analysis using standard techni- Fokas AS (2005) A generalised Dirichlet to Neumann map for
ques such as the recent nonlinearization of the classical certain nonlinear evolution PDEs. Communications on Pure
steepest descent method. and Applied Mathematics 58: 639–670.
Fokas AS and Its AR (2004) The nonlinear Schrödinger equation
See also: @ Approach to Integrable Systems; Integrable on the interval. Journal of Physics A: Mathematical and
General 37: 6091–6114.
Discrete Systems; Integrable Systems and the Inverse
Fokas AS and Menyuk CR (1999) Integrability and self-similarity
Scattering Method; Integrable Systems: Overview;
in transient stimulated Raman scattering. Journal of Nonlinear
Nonlinear Schrödinger Equations; Riemann–Hilbert Science 9: 1–31.
Methods in Integrable Systems; Separation of Variables Gardner GS, Greene JM, Kruskal MD, and Miura RM (1967)
for Differential Equations; Sine-Gordon Equation. Method for solving the Korteweg–de Vries equation. Physical
Review Letters 19: 1095.
Habibullin IT (1999) KdV equation on a half-line with the zero
Further Reading boundary condition. Theoretical and Mathematical Fizika
119: 397.
Ablowitz MJ and Segur HJ (1974) The inverse scattering Khruslov E and Kotlyarov VP (2003) Generation of asymptotic
transform: semi-infinite interval. Journal of Mathematical solitons in an integrable model of stimulated Raman scattering by
Physics 16: 1054. periodic boundary data. Mat. Fiz. Anal. Geom. 10(3): 366–384.
Adler VE, Gurel B, Gurses M, and Habibullin IT (1997) Journal Lax PD (1968) Integrals of nonlinear equations of evolution and
of Physics A 30: 3505. solitary waves. Communications in Pure and Applied Mathe-
Bona J, Sun S, and Zhang BY (2001) A non-homogeneous boundary matics 21: 467–490.
value problem for the Korteweg–deVries equation. Transactions of Pelloni B (2005) The asymptotic behaviour of the solution of boundary
the American Mathematical Society 354: 427–490. value problems for the Sine–Gordon equation on a finite interval.
Boutet de Monvel A, Fokas AS, and Shepelsky D (2004) The Journal of Nonlinear Mathematical Physics 12: 518–529.
modified KdV equation on the half-line. Journal of the Sklyanin EK (1987) Boundary conditions for integrable equations.
Institute of Mathematics of Jussieu 3: 139–164. Functional Analysis and its Applications 21: 86–87.
Boutet de Monvel A and Kotlyarov VP (2003) Generation of Zakharov VE and Shabat AB (1972) An exact theory of two-
asymptotic solitons of the nonlinear Schrödinger equation by dimensional self-focusing and one-dimensional automodula-
boundary data. Journal of Mathematical Physics 44: 3185–3215. tion of waves in a nonlinear medium. Soviet Physics – JEPT
Colin T and Ghidaglia J-M (2001) An initial-boundary value problem 34: 62–78.
for the Korteweg–deVries equation posed on a finite interval.
Advanced Differential Equations 6(12): 1463–1492.

Braided and Modular Tensor Categories


V Lyubashenko, Institute of Mathematics, Kyiv, isomorphic; moreover, the permutation isomorphism
Ukraine (the twist) c : X Y 7! Y X, x y ! y x, is
ª 2006 Elsevier Ltd. All rights reserved. involutive, c2 = idX Y . Next examples of monoidal
categories were given by categories of representa-
tions of supergroups or Lie superalgebras. They are
also symmetric: now the symmetry (Koszul’s rule)
Introduction
c : X Y ! Y X, x y 7! (1)deg x
deg y y x, is the
Tensor or monoidal categories are encountered in twist with a sign, which depends on the degree (or
various branches of modern mathematical physics. parity) deg x of elements x 2 X.
First examples came without mentioning the name of a The development of the theory of exactly solvable
monoidal category as categories of modules over a models in statistical mechanics led Drinfeld (1987)
group or a Lie algebra. The operation of a monoidal to the notion of quantum groups – Hopf algebras H
product in this case is the usual tensor product X C Y with additional structures (quasitriangular Hopf
of modules (representations) X and Y. These categories algebras). H-Modules also form a monoidal cate-
are symmetric: the modules X Y and Y X are gory; however, it is not symmetric, but only braided.
352 Braided and Modular Tensor Categories

It means that a canonical braiding isomorphism realized as categories of modules over weak Hopf
c : X  Y ! Y  X still exists, but it is not involutive algebras, but we stress again that the monoidal product
any more, c2 6¼ id. The braiding c satisfies the Yang– for such modules does not coincide with the tensor
Baxter equation product of vector spaces. So, general features are better
seen at the level of category theory, and we now start
ðc  1Þð1  cÞðc  1Þ with precise definitions.
¼ ð1  cÞðc  1Þð1  cÞ : X  Y  Z ! Z  Y  X

for any three H-modules X, Y, Z. Rigid Monoidal Categories


In the above examples, we also have an obvious
isomorphism of associativity a : X  (Y  Z)! We recall here the basic definitions of monoidal
(X  Y)  Z of the iterated tensor product. categories, monoidal functors, and dual objects.
There are, however, monoidal categories of Definition 1 A monoidal category (C, , a, 1, l, r) is
modules, where such an isomorphism is nontri- a category C, a functor  : C  C ! C (called the
vial, namely, modules over quasi-Hopf algebras. tensor product), a functorial isomorphism a : X 
These were introduced by Drinfeld (1989a, b) in (Y  Z) ! (X  Y)  Z, the associativity isomorph-
connection with the Knizhnik–Zamolodchikov equa- ism, a unit object 1, and two functorial isomorph-
tions. These nontrivial associativity isomorphisms isms l : 1  X ! X, r : X  1 ! X such that
a : X  (Y  Z) ! (X  Y)  Z are required to a a
satisfy the pentagon equation of Mac Lane and X  ðY  ðZ  WÞÞ ! ðX  YÞ  ðZ  WÞ ! ððX  YÞ  ZÞ  W
X  a# "a  W
Stasheff.
a
Braided monoidal categories also arise in rational X  ððY  ZÞ  WÞ ðX  ðY  ZÞÞ  W

^
conformal field theories (RCFTs), integrable models commutes (the pentagon equation) and
of statistical mechanics and topological quantum  
field theories (TQFTs). The common feature of Xl Y r 1
X
Y
aX;1;Y ¼ X  ð1  YÞ ! X  Y ! ðX  1Þ  Y
these categories is that they are semisimple abelian
with finite number of simple modules. In other
words, such a category C is equivalent to the category Definition 2 A monoidal functor (F, , f ) : (C, ) !
of finite-dimensional Cn = C      C-modules for (D,  ) is a functor F : C ! D, a functorial isomorph-
some n. However, not monoidally equivalent, the ism  = X, Y : F(X)  F(Y) ! F(X  Y) 2 D, and an
monoidal structure can be rather involved. For isomorphism f : 1 ! F1 2 D such that
instance, from the Ising model one can obtain the 1 
monoidal category with two simple objects I and X, FX  ðFY  FZÞ ! FX  FðY  ZÞ ! FðX  ðY  ZÞÞ
which obey the monoidal law 1  1 = 1, 1  X = X  a# #Fa
1 = X, X  X = 1  X. Clearly, such relations cannot 1 
be satisfied by finite-dimensional C-vector spaces 1 ðFX  FYÞ  FZ ! FðX  YÞ  Z ! FððX  YÞ  ZÞ
and X, if  would mean the usual tensor product C  
F1  FX ! Fð1  XÞ FX  F1 ! FðX  1Þ
of C-vector spaces. However, here  means simply a
functor  : C  C ! C with certain properties. Cate- f 1 " # F l; 1f " #Fr
gories which come from RCFT, integrable models or 1  FX l FX FX  1 r FX
^

TQFT often enjoy additional properties. They are


rigid – for each object X, there exists a dual object commute. A morphism of monoidal functors
X_ . They are ribbon (balanced) – there is a canonical  : (F, , f ) ! (G, , g) is a functorial morphism
endomorphism vX : X ! X for each object X, which  : F ! G such that
is related to the braiding. They are modular, which is 
defined as nondegeneracy of a certain matrix. The FX  FY ! FðX  YÞ
meaning of modularity is that the ribbon category is
suitable for producing a TQFT out of it.
 # #
For categories equivalent to the category of
GX  GY ! GðX  YÞ
C      C-modules, the ribbon (braided) monoidal
structure can be specified by a finite number of complex f 
g ¼ ð1 ! F1 ! G1Þ
matrices. For instance, 6j-symbols or q-6j-symbols
encode the associativity isomorphism. In this form, The f datum of a monoidal functor (F, , f ) is
modular categories appeared in the work of Moore and uniquely determined by the (F, ) data, so we can
Seiberg (1989) on RCFTs. Such categories can be denote a monoidal functor as (F, ) or even F.
Braided and Modular Tensor Categories 353

The coherence theorem of Mac Lane (1963) states X


that any monoidal category C is equivalent to a
A morphism f: X Y by f
strictly monoidal category, in which X  (Y  Z) =
(X  Y)  Z, 1  X = X = X  1, and the isomorph- Y
isms a, l, r are identity isomorphisms. Thus, in X Y
The braiding cX,Y : X Y Y X by
theoretical constructions, one may ignore the associa-
tivity isomorphism. It is not always so in practice. For X Y
instance, working with quasi-Hopf algebras related The inverse braiding c –1 : X Y Y X by
with the Knizhnik–Zamolodchikov equation one
prefers to keep the original category, which is (a X X∨
The evaluation evX : X X∨ 1 by
deformation of) the category of modules over a Lie
algebra, rather than to replace it with a strict monoidal
The coevaluation coevX : 1 X∨ X by
category, that is not a category of modules any more. X∨ X
Definition 3 A rigid category C is a monoidal Figure 1 Conventions for notation of morphisms from
category in which, to every object X 2 C, dual tangles.
objects X_ and _ X 2 C are assigned together with
morphisms of evaluation and coevaluation conventions are listed in Figure 1. The suggested
[ assignment of morphisms in C to elementary pictures
evX : X  X_ ! 1 ¼ X X_ extends to a unique functor  from the category of
[ C-colored tangles to the category C itself. With the
ev0X : _ X  X ! 1 ¼ _ X X
\ above interpretation, these tangles need not be
coevX : 1 ! X_  X ¼ X_ X oriented. We shall use the same notation for framed
\ tangles, and the framing will be within the plane.
coev0X : 1 ! X  _ X ¼ X _ X The maps Ob C ! Ob C, X 7! X_ , and X 7! _ X
extend to contravariant self-equivalences C ! C,
The evaluations and coevaluations are chosen such
f 7! f t , and f 7! t f . For given f, the morphisms f t
that the compositions
and t f can be defined, respectively, by the following
r 1 1coev a ev1 1
X! X1 ! XðX_ XÞ! ðXX_ ÞX ! 1X! X pictures using the assignment from Figure 1:
11 coev0 1 a1 1ev0 r
X! 1X ! ðX _ XÞX! Xð_ XXÞ ! X1! X Y∨ Y∨
1 1
_ 1 _ coev1 _ _ a _ _ 1ev _ r _
X ! 1X ! ðX XÞX ! X ðXX Þ! X 1! X X
_ r 1 _ 1coev0 _ _ a _ _ ev0 1 _ 1 _
X! X1 ! XðX XÞ! ð XXÞ X ! 1 X! X
ft = f
are all identity morphisms. Y

In a rigid monoidal category C, there is a pairing X∨ X∨



ðX  YÞ  ðY _  X_ Þ ! ðX  ðY  Y _ ÞÞ ∨Y ∨Y
ev
 X_ XevX_ ðX  1Þ  X_ rX_ X  X_ ! 1 X
^

_ _
which induces an isomorphism jþX, Y : Y  X ! (X  t
f = f
Y)_ , such that the above pairing coincides with
Y
_ _ 1jþ _ ev
∨X ∨
ðX  YÞ  ðY  X Þ ! ðX  YÞ  ðX  YÞ ! 1 X
The equation
We have a monoidal self-equivalence of C,
 coevY _ _
coevXY ¼ 1 ! Y  Y ’ Y  1  Y ð__ ; j2 Þ : ðC; ; 1Þ ! ðC; ; 1Þ; X 7! X__ ; f 7! f tt
 
Y _  X_  X  Y
1coevX 1 jþ j1t ½1
j2 X;Y ¼ X__  Y __ ! ðY _  X_ Þ_ ! ðX  YÞ__
þ
^

jþ 1

! ðX  YÞ_  ðX  YÞ
It is not always true that the two duals X_ and _ X
also holds. Similarly, there is an isomorphism are isomorphic. However, there are canonical
jX, Y : _ Y  _ X ! _ (X  Y). isomorphisms
Morphisms constructed from braidings and (co)-
evaluations are often described by tangles. The X ! _ ðX_ Þ; X ! ð_ XÞ_
354 Braided and Modular Tensor Categories

We may replace the category C with an equivalent one, These are isomorphisms of monoidal functors
such that the above isomorphisms become identity (see [1])
morphisms, and the functors _ and _  are inverse to
each other. We shall assume this to simplify notations. u21 : ðId; c2 Þ ! ð__ ; j2 Þ
Finally, we denote the iterated duals by X(n_) = X__ u21 : ðId; c2 Þ ! ð__ ; j2 Þ
(n times) and X(n_) = __ X (n times) for n 0.
In particular, this implies the commutativity of the
diagram
Braided Categories XY c2
XY

^
Here we review the definitions of the braiding u21  u21 # #u21
isomorphism and further derived isomorphisms. Sev- j2
eral basic relations between them are listed. Two X__  Y __ ! ðX  YÞ__
important classes of examples of braided categories The square of the monoidal functor (__ , j2 ) is
are given by the categories of modules over quasitrian-
gular Hopf algebras and the categories of tangles. ð____ ; j4 Þ : ðC; ; 1Þ ! ðC; ; 1Þ;
Definition 4 A braided category (C, c) is a monoidal X 7! X____ ; f 7! f tttt
category C equipped with a functorial isomorphism
where
c = cX, Y : X  Y ! Y  X – the braiding, or the
 
commutativity isomorphism – such that the two ____ ____ j2 __
jtt
__ __ 2 ____
j4X;Y ¼ X Y ! ðX  Y Þ ! ðX  YÞ
hexagons commute,

X  ðY  ZÞ 1c
1 X  ðZ  YÞ ! ðX  ZÞ  Y
a The natural isomorphism u40 = u21 u21 is, in fact, an
^

a # # c
1  1 isomorphism of monoidal functors u40 : (Id, id) !
c
1 a (____ , j4 ).
ðX  YÞ  Z ! Z  ðX  YÞ ! ðZ  XÞ  Y

(one for c and one for c1 ).


Ribbon Categories
The graphical notation for the braiding and its
Now we define balancing and recall some properties
inverse is
of balanced (ribbon) categories.
X Y Definition 5 Let C be a rigid braided category.
c ¼ ðcX;Y : X  Y ! Y  X Þ ¼ A balancing X : X ! X__ is an isomorphism of
Y X monoidal functors  : (Id, id, id) ! (__ , j2 , d2 ) such
that  2 = u40 and X
t 1
= X _ :X
___
! X_ . The cate-
X Y gory C equipped with a balancing is called
c¼ balanced.
Y X We also use the notation u20 = . In any balanced
In a rigid braided category, we can define category, there exists a canonical ribbon twist v.
functorial isomorphisms using again the conventions A ribbon twist v = vX : X ! X, v : Id ! Id is a self-
from Figure 1: adjoint (vX_ = vtX ) automorphism of the identity
functor such that c2 = (v1 1
X  vY ) vXY . It can be
determined from the equations

u20 ¼ u21 v1 ¼ u21 v : X ! X__


u12 = , 2 =
u–1
 1 ¼ u2 2
0 ¼ u1 v
1
¼ u2
1 v : X !
__
X
X ∨∨ X ∨∨ In particular, its square is given by the canonical
isomorphism v2 = u2 2
1 u1 . Conversely, in any
X X rigid braided category with a ribbon twist (called
ribbon category) there exists a canonical balan-
u–2
–1
= , u–2
–1
= cing u20 given by the above formulas. Thus, ribbon
categories and balanced categories are synonyms.
In the case of X = 1, we have v1 = id1 .
Braided and Modular Tensor Categories 355

The following result can be used to simplify are very similar to those of usual Hopf algebras, for
notations: example, the antipode is antimultiplicative with
respect to the braiding (see, e.g., Majid (1993)).
Proposition 1 For any ribbon category C there exists
For Hopf algebras in rigid braided categories, there
a ribbon category D equivalent to C such that in it
exist integrals in a sense very much similar to the
(i) 1_ = 1; case of ordinary finite-dimensional Hopf algebras,
(ii) for any object X we have _ X = X_ , X__ = X, as shown by Bespalov et al. (2000).
and X = idX : X ! X__ ¼¼¼¼ X.
(iii) for any object X we have evX = ev0X_ : X 
X_ ! 1, and coevX = coev0X_ : 1 ! X_  X.
Modular Categories
In the category C = H-mod, where H is a ribbon
Assume that a braided rigid monoidal category C is
Hopf algebra, the equation X_ = _ X is not neces-
equivalent as a category (with monoidal structure
sarily satisfied. Nevertheless, X_ is canonically
ignored) to the category of finite-dimensional mod-
isomorphic to _ X. The same holds in any ribbon
ules over a finite-dimensional algebra. In particular,
category. We identify these objects via  = u20 :
_ C is abelian. Then there exists an object F in C,
X ! X_ . This allows us to use the right dual
equipped with a morphism iX : X  X_ ! F for each
objects in place of the left ones. In that role, the
X 2 Ob C, such that the diagram
right duals are equipped with the left evaluation
and coevaluation, called flipped evaluation and f Y _
X  Y_ Y  Y_

^
coevaluation, respectively:
Xf t # #iY
_ X_  _ __ ev
e :X  X
ev X  X ! 1 _ iX
^

XX F

^
g :1
coev coev X__  X_  1 X_ X  X_
^

is commutative for all morphisms f : X ! Y of C, and,


They are often denoted simply ev and coev and moreover, F is universal between objects with such
e and coev
should be replaced by ev g in applications. In properties. Here f t : Y _ ! X_ is the transpose of a
the context of Hopf algebra,  is given by the action morphism f : X ! Y. In other words, FRis a direct limit,
Z2C
of a group-like element introduced by Drinfeld. called the coend and denoted as F = Z  Z_ . It
can also be defined via an exact sequence
M M iZ
X  Y _ f Y Xf Z  Z_ ! F ! 0
_ t
^

Hopf Algebras in Braided Categories f :X!Y2C Z2C

Let C be a braided monoidal category. A Hopf It turns out that the coend F is a Hopf algebra in
algebra H in C is an object H 2 Ob C together with the braided category C, when it is equipped with the
an associative multiplication m : H  H ! H and an following operations. The comultiplication in F is
associative comultiplication  : H ! H  H, obeying uniquely determined by the equation
the bialgebra axiom  
iX 
 m 
 X  X_ !F ! F  F
H  H!H!H  H 
 ¼ X  X_ ¼ X  1  X_
¼ H  H  H  H  H  H
^

XcoevX_ X  X_  X  X_
^

HcH
HHHH 
^

 iX iX FF
^

mm
HH
^

The counit in F is determined by the equation


Moreover, H has a unit  : 1 ! H, a counit " : H ! 1,    
iX " ev
an antipode  : H ! H, and the inverse antipode X  X_ ! F ! 1 ¼ X  X_ ! 1
 1 : H ! H. The defining relations for these are the
same as in the classical case. Notice, in particular, The multiplication m : F  F ! F is defined by the
that the unit is also a morphism. Associativity of following diagram:
multiplication, as well as coassociativity of comulti-
X X∨ Y Y∨ X  X_  ðY  Y _ Þ iX iY
plication, is formulated with the use of associativity FF
^

isomorphism (in the nonstrict case). m = and Xc # #


9 m
Hopf algebras in braided categories have also
X  Y  ðX  YÞ_ iXY F
^

been called braided groups. Their basic properties X Y Y∨ X∨


356 Braided and Modular Tensor Categories

The unit is given by the morphism and  is universal between morphisms with such
property. By duality, the integral functional  : F ! 1
i1
 : 1 ¼ 1  1_ ! F is also two sided. It satisfies
  1

The diagram corresponding to the antipode F ! F  F ! F  1 ¼ F
F : F ! F is given by   

¼ F ! 1 ! F
F   
1
¼ F ! F  F ! 1  F ¼ F

γF = and is universal between morphisms with such property.


The integral element and the integral functional are
unique up to a multiplication by an element of AutC 1.
F

The structure of the coend F as a Hopf algebra can


also be found directly from its universal property, as
Semisimple Abelian Modular Categories
in Majid (1993). Reshetikhin and Turaev proposed to construct invari-
There is a pairing of Hopf algebras ! : F  F ! 1 in C: ants of 3-manifolds via quantum groups. More
F F precisely, they use certain abelian semisimple ribbon
categories obtained from quantum groups at roots of
unity as trace quotients. One can forget about the origin
ω= of these categories and work simply with semisimple
modular categories. We shall describe them as input
data for the modular functor construction.
It induces a homomorphism of Hopf algebras F ! F_ . Let C be a C-linear abelian semisimple modular
ribbon category. Assume that the number of
Definition 6 A ribbon category C, equivalent as isomorphism classes of simple objects is finite.
a category to the category of finite-dimensional Assume also that 1 is simple and for each simple
modules over a finite-dimensional algebra, is called object X the endomorphism algebra End X = C. We
modular if the pairing ! is nondegenerate, that is, denote by S = {Xi }i the list of (representatives of
the induced morphism F ! F_ is invertible. isomorphism classes of) all simple objects.
Examples of nonsemisimple modular categories Under these assumptions, many formulas simplify.
include C = H-mod, where H = uq (g ) is a finite- The coend F 2 C takes the form
M
dimensional algebra, quotient of the quantum F¼ X  X_ 2 C
universal enveloping algebra Uq (g ), and q is a root X2S
of unity of odd degree. In these examples, the
coalgebra F identifies with the dual Hopf algebra Any morphism 1 ! F is a C-linear combination of the
H , but the multiplication in F differs from that of standard morphisms for X 2 S,
H . Explicit formula for the multiplication in F uses ∨X
X
the R-matrix for H (see, e.g., Majid (1993)).
A definition of modularity for another type of u20
coev 1u20 i
categories (not necessarily abelian) was given by X ¼ : 1 ! X _ X! X  X_ ! F
iX
Turaev (1994).
When the category C is modular, the integrals for
the Hopf algebra F have especially simple properties. F
The integral element in F is two sided. It is a The morphisms X form a basis of the commu-
morphism  : 1 ! F such that tative algebra Inv F = HomC (1, F). The Grothen-
  dieck ring of the category C determines the
1 m
F ¼ F  1 ! F  F ! F multiplication law in Inv F via the algebra
 " 
 isomorphism C Z K0 (C) ! Inv F, [X] 7! X .
¼ F ! 1 ! F Any morphism F ! 1 can be represented as a
  linear combination of the morphisms
1 m
¼ F ¼ 1  F ! F  F ! F prX evX
X : F ! X  X_ ! 1
Braided and Modular Tensor Categories 357

where X 2 S. The functional 1 : F ! 1 satisfies the sX1 sX;YZ


properties of a two-sided integral  of the braided
Hopf algebra F. X Y
u02
The Verlinde Formula
Y∨
The number X X∨
u02
= u20
X∨ X
dimq ðXÞ ¼ u20 Z
u02
coev 1u20 ev X∨
: 1 ! X_  X! X_  X__ !1 Z∨
is called the dimension of an object X 2 Ob C. (The
index q reminds us that this number coincides with
the q-dimension in the case C = Uq (g )-mod.) We
have dimq (X_ ) = dimq (X). X Y X Z
u02 u02
Definition 7 Introduce a biadditive function of two
variables s : Ob C  Ob C ! C on the class of objects of C: = Y∨ Z∨
∨Y
X Y u02 u02
u–2
0 u02 X∨ X∨
XY =


X ¼ sXY sXZ

In particular, its restriction to S is a matrix sjS : S  This proves the second formula. &
S ! C, denoted again by s = (sXY )X, Y2S by abuse of Proposition 2 (Criterion of modularity) In the
notation; here X and Y run over simple objects. above assumption of semisimplicity, the following
Notice that sXY = sYX , so the matrix s is symmetric. conditions are equivalent:
Let us consider the C-algebra Inv F = HomC (1, F). It has (i) C is modular (! is nondegenerate);
the basis X , X 2 S; hence, it is n-dimensional, where (ii) the matrix (sXY )X, Y2S is nondegenerate;
n = Card S. The form ! on F induces a bilinear form (iii) for any X 2 S its dimension dimq X does not
 vanish, and there exist numbers 0Y , Y 2 S, such
! 0 : Inv F  Inv F ! Homð1; F  FÞ Homð1;!Þ 1 P
^

that for all X 2 S we have Y2S sXY 0Y = X1 ; and


The matrix (sXY ) is the matrix of the form !0 in the (iv) for
P each simple X 6’ 1 we have
basis (X ). s
Y2S XY dim q Y = 0 and dim q X ¼
6 0.
Lemma 1 (The Verlinde formula) For any simple The easy implication (ii) ¼) (iii) can be deduced
X 2 S and any objects Y and Z of C, we have from the Verlinde formula. If the dimension
dimq (X) = sX1 of a simple object X vanishes, then
sX1 ¼ dimq ðXÞ; sX1 sX;YZ ¼ sXY sXZ ½2
s2XY = 0 for all Y 2 Ob C. This contradicts to the
Proof The first formula is straightforward. Since assumption of nondegeneracy of (sXY ).
Let us determine the coefficients Y of the integral
∨X ∨Y
element
Y X
u02 ¼ Y Y : 1 ! F
 End ∨X C Y2S

of the Hopf algebra F. It also has a two-sided


integral-functional  : F ! 1. The corresponding
endomorphism is
is a number, we can move it from the second factor   Z

~Z ¼ Z ! F  Z ! 1  Z ¼¼¼¼ Z
Z

to the first in the following computation:


358 Braided and Modular Tensor Categories

for an arbitrary object Z of C, where Z is the Multiplying both sides of [7] with 1 , we find
natural coaction. The equation
Y ¼ 1  dimq ðYÞ
X X∨ X X∨ The normalization is fixed by eqn [6], which we can
μY φY write as

λ = δXY ½3 Y∨
μ X Y
1 ¼ 1  ¼ 1 Y u20
Y2S
Y Y∨ Y Y∨
follows from the properties of the two-sided integral X
¼ 21 ðdimq ðYÞÞ2
 of the Hopf algebra F. Due to uniqueness of Y2S
integrals,  is proportional to 1 . In eqn [3], X and
Y vary over S. The right-hand side is the identity Hence,
morphism if X = Y, and vanishes otherwise. Sub- !1
X 2
stituting the definition of Y , we rewrite the 2
ð1 Þ ¼ dimq ðYÞ ½8
equation as follows: Y2S

X∨ Y X∨ Y So, we find 1 , unique up to a sign.

u02
~ Conjugation Properties
μy λ = δXY ½4
From the Verlinde formula [2], we conclude that
the commutative C-algebra Inv F possesses
X∨ Y X∨ Y homomorphisms
For X = 1, we get X : Inv F ! C
Y  ~Y ¼ 1Y  idY : Y ! Y ½5 Y 7! ðdimq ðXÞÞ1 sXY ¼ sXY =sX1
If Y 6’ 1, then ~Y = 0. So [5] tells essentially that The matrix s is invertible, so that its columns cannot
be proportional. Hence, all X are different char-
1  ~1 ¼ id1 : 1 ! 1 ½6 acters. Their number is n = Card S = dimC F; hence,
Now return to [4] with X = Y. If we compose that there is an isomorphism of C-algebras
equation with coev : 1 ! Y _  Y, we obtain
: Inv F ! C      C ¼ Cn
X∨ X  7! ð 1 ðÞ; . . . ; n ðÞÞ
Now we show that the dimensions dimq (Y) are
μ y . λ~n = μy
~
λ real numbers, so that 1 is also a real number. One
can introduce in Inv F an antilinear involution,
Y∨ Y Y∨ Y  : Inv F ! Inv F; ðX Þ ¼ X_
and a scalar (Hermitian) product
Y∨ u02 Y ðX jY Þ ¼ XY ; X; Y 2 S
½7 Then Inv F becomes a finite-dimensional commu-
=
tative Hilbert algebra. Indeed,
Y∨ Y ðX Y jZ Þ ¼ dim HomðX  Y; ZÞ
¼ dim HomðX; Y _  ZÞ ¼ ðX j Y Z Þ
From the theory of finite-dimensional commutative
= dimqY Hilbert algebras, we know that idempotents in the
algebra Inv F are self-adjoint (only in that case the
scalar product can be positive definite). Hence, is
Y∨ Y a -morphism, that is, X ( ) = X (). Therefore,
Braided and Modular Tensor Categories 359

sXY _ =sX1 = sXY =sX1 . In the particular case of X = 1, the constructions due to Kerler and Lyubashenko
we obtain (2001) takes a nonsemisimple modular category as an
input and assigns to it a double TQFT functor, that is,
dimq ðYÞ ¼ dimq ðY _ Þ ¼ s1Y _ ¼ s1Y ¼ dimq ðYÞ a functor between double categories. The target is the
since s11 = 1. This proves that for any Y 2 C its 2-category of abelian categories.
dimension dimq (Y) is a real number.
See also: Axiomatic Approach to Topological Quantum
It is natural to take for 1 the positive root of the
Field Theory; Hopf Algebras and q-Deformation Quantum
right-hand side of [8]. Positiveness fixes 1 uniquely.
Groups; The Jones Polynomial; Knot Invariants and
Quantum Gravity; Quantum 3-Manifold Invariants;
Examples of Semisimple Modular Categories
Symmetries in Quantum Field Theory of Lower
In their original paper, Reshetikhin and Turaev Spacetime Dimensions; Topological Quantum Field
(1991) use as algebraic input data the representation Theory: Overview; von Neumann Algebras: Introduction,
theory of the quantum deformation U = Uq (sl2 ) of Modular Theory, and Classification Theory; von
the Lie algebra sl(2, C), where q is a root of unity. Neumann Algebras: Subfactor Theory.
They construct the invariant as a trace over
U-equivariant morphisms, and prove the necessary
modularity condition concerning the nondegeneracy Further Reading
of the braided pairing. Bakalov B and Kirillov A Jr. (2001) Lectures on Tensor
The general picture is drawn by Turaev (1994), Categories and Modular Functors, University Lecture Series,
where 3-manifold invariants and TQFTs are con- vol. 21. Providence, RI: American Mathematical Society.
Bespalov Y, Kerler T, Lyubashenko VV, and Turaev VG (2000)
structed from semisimple modular categories. He
Integrals for braided Hopf algebras. Journal of Pure and
shows how to obtain the latter as quotients of Applied Algebra 148(2): 113–164 (arXiv:math.QA/9709020).
certain subcategories of representations of a modu- Drinfeld VG (1987) Quantum groups. In: Gleason A (ed.)
lar Hopf algebra by the ideal of trace-negligible Proceedings of the International Congress of Mathematicians
morphisms. (Berkeley, 1986), vol. 1, pp. 798–820. Providence, RI:
American Mathematical Society.
Finkelberg (1996), based on results of Gelfand
Drinfeld VG (1989a) Quasi-Hopf algebras. Algebra i Analiz
and Kazhdan, establishes (via the theory of Kazhdan 1(6): 114–148.
and Lusztig) an equivalence between two modular Drinfeld VG (1989b) Quasi-Hopf algebras and Knizhnik–
categories. The first is the semisimple category C of Zamolodchikov equations. In: Problems of Modern Quantum
integrable modules over an affine Lie algebra ^g of Field Theory, pp. 1–13. Berlin–New York: Springer.
Finkelberg M (1996) An equivalence of fusion categories.
positive integer level k. The second is a certain
Geometric and Functional Analysis 6(2): 249–267.
subquotient of the category of Uq (g )-modules for Huang Y-Z and Lepowsky J (1999) Intertwining operator
q = exp(
im1 =(k þ h_ )), where m 2 {1, 2, 3} and h_ algebras and vertex tensor categories for affine Lie algebras.
is the dual Coxeter number of g . Huang and Duke Mathematical Journal 99(1): 113–134 (arXiv:q-alg/
Lepowsky (1999) describe the rigid braided struc- 9706028) (arXiv:q-alg/9706028).
Joyal A and Street RH (1991) Tortile Yang–Baxter operators in
ture of C using vertex operators. Bakalov and
tensor categories. Journal of Pure and Applied Algebra
Kirillov (2001) use geometrical constructions to 71: 43–51.
make C into a modular category, associated with Kerler T and Lyubashenko VV (2001) Non-Semisimple Topologi-
the Wess–Zumino–Witten (WZW) model. They cal Quantum Field Theories for 3-Manifolds with Corners,
construct the corresponding WZW modular functor. Lecture Notes in Mathematics, vol. 1765, vi + 379 pp.
Heidelberg: Springer.
Mac Lane S (1971) Categories for the Working Mathematician,
Modular Functor and TQFT GTM, vol. 5. New York: Springer.
Majid S (1993) Braided groups. Journal of Pure and Applied
Modular categories give rise to a modular functor Algebra 86(2): 187–221.
and a TQFT. The meanings of those differ from Majid S (1995) Foundations of Quantum Group Theory.
Cambridge: Cambridge University Press.
author to author, but the common features are the Moore G and Seiberg N (1989) Classical and quantum conformal
following. Such a TQFT is a functor from the field theory. Communications in Mathematical Physics
category whose objects are smooth surfaces with 123: 177–254.
additional structures and morphisms are three- Reshetikhin NY and Turaev VG (1991) Invariants of 3-manifolds
dimensional manifolds with additional structures to via link polynomials and quantum groups. Inventiones
Mathematicae 103(3): 547–597.
the category of vector spaces. A modular functor is Turaev VG (1994) Quantum Invariants of Knots and 3-Manifolds,
the restriction of such TQFT to the subcategory whose de Gruyter Stud. Math, vol. 18. Berlin–New York: Walter de
morphisms are homeomorphisms of surfaces. One of Gruyter.
360 Brane Construction of Gauge Theories

Brane Construction of Gauge Theories


S L Cacciatori, Università di Milano, Milan, Italy called D-particles and D-strings respectively, whereas
ª 2006 Elsevier Ltd. All rights reserved. D(1) branes are instantons, that is, points in
spacetime. Concretely, D-branes are extended regions
in spacetime where the endpoints of open strings are
constrained to live. Mathematically, they are defined
Introduction
imposing Dirichlet conditions (whence the ‘‘D’’ of
Branes appear in string theories and M-theory as D-brane) on the ends of the string, along certain
extended objects which contain some nonperturba- spatial directions. Excitation of these string states
tive information about the theory, and, apart from gives rise to the dynamic of the brane. They
gravity, they can couple with gauge fields. correspond to a ten-dimensional U(1) gauge field,
At low energies, M-theory can be approximated whose components, which are tangent to the brane
with an 11-dimensional N = 1 supergravity, which in world volume, give rise to a gauge field in p þ 1
fact is unique and contains a graviton field (the metric dimensions, whereas the orthogonal components
g ), a spin 3/2 field (the gravitino) and a gauge field generate deformations of the brane shape. Moreover,
consisting of a 3-form potential field c. The gauge if n parallel p-branes overlap, the gauge theory on the
field, whose field strength is a 4-form G = dc, can then world volume is enhanced to a U(n) gauge theory.
couple electrically with two-dimensional extended Closed strings can generate gravitational interactions
objects, called M2 membranes. Moving in spacetime, responsible for wrappings of the brane. However, in
an M2 membrane describes a three-dimensional world the cases when gravitational interaction is negligible,
volume W3 so that its coupling to the gauge field is we can use this mechanism to construct (p þ 1)-
Z dimensional gauge theories, as we will see.
S2 ¼ k c ½1 Before explaining how the construction works let
W3
us remember that there are two other interesting
k representing the charge. objects which often appear. In fact, we have not yet
With c we can associate a dual field ~c such that considered the Neveu–Schwarz B-field: this field can
d~c =  G. It is a 6-form and can then electrically couple electrically with a one-dimensional object
couple with a five-dimensional object, the M5 and magnetically with a five-dimensional object.
membrane. However, as c is the true field, we say These are the usual string (also called a fundamental
that M5 couples magnetically with c. or F-string) and a five-dimensional membrane called
In superstring theories, which however are related NS5 brane.
to M-theory by a dualities web, there are many We will see how supersymmetric gauge theory
more objects to be considered. In particular, we will configurations can be realized geometrically, con-
consider type II strings, which at low energies are sidering more or less simple configurations of
described by ten-dimensional N = 2 supergravity branes. We will also show that quantum corrections,
theories. They contain a Neveu–Schwarz sector be they exact or perturbative, can be described in
consisting of a graviton g , a 2-form potential this geometrical fashion. To be explicit, we will
B , and a scalar field , the dilaton. The content of work with four-dimensional gauge theories, but it is
the Ramond–Ramond fields depends on the chirality clear that similar constructions can be done in
of the supercharges. different dimensions.
Type IIA strings are nonchiral (their left and right
supercharges having opposite chiralities) and con-
Gauge Groups on the Branes
tain only odd-dimensional p-form potentials A(p) ,
with p = 1, 3, 5, 7, 9. A deeper understanding of how D-branes and
Type IIB strings are chiral and contain only related world-volume gauge theories work requires
even-dimensional p-form potentials A(p) , with the introduction of dualities, but a quite simple
p = 0, 2, 4, 6, 8. heuristic argument can be given, giving up some
Proceeding as before, we see that a (p þ 1)-form rigor in favor of intuition.
potential can couple electrically with a p-dimensional To set our ideas, let us think of an open string
object and magnetically with a (6  p)-dimensional moving in a nearly flat (but ten-dimensional) space-
object. Such objects in fact exist in type II strings: the time. Its trajectory will describe a two-dimensional
Dp branes are p-dimensional extended objects, with surface having a boundary traced by the ends of the
p = 0, 2, 4, 6, 8 for IIA strings and p = 1, 1, 3, 5, 7, 9 string (Figure 1). The string can then be described by
for IIB strings. In particular, D0 and D1 branes are a map from a two-dimensional surface , having a
Brane Construction of Gauge Theories 361

Here we conventionally rescaled the A field to


τ
τ normalize the action. To define the equation of
motion, however, we must also specify boundary
conditions for X (, ) on . Let us choose Neu-
σ σ mann conditions for  = 0, 1, . . . , p and Dirichlet
Closed string Open string conditions for the remaining directions
Figure 1 Strings moving in spacetime.
@ Xa ðÞ ¼ 0; a ¼ 0; . . . ; p ½6

boundary  = @, to spacetime, say X (, ) with @ Xi ðÞ ¼ 0; i ¼ p þ 1; . . . ; 9 ½7
 = 0, 1, . . . , 9. Here we chose on  local coordi-
nates  = (, ), where  2 [0, ] is a spacelike This means that the extrema of the string are bound
coordinate and  is a timelike one. Then  = 0, on a (p þ 1)-dimensional region (including time): the
individuate the ends of the string and are identi- Dp brane. If for  we consider the full strip
fied for the closed string. Now, on a given back- (, ) = [0, ]  R then the U(1) action reduces to
ground, the string evolution is usually described as a Z 1
two-dimensional (supersymmetric) conformal field SA ½X ¼ Aa @ Xa ð ; Þ
1
theory for the fields X (, ). The action for the Z 1
bosonic part is the same for both type IIA and IIB  Aa @ Xa ð0; Þ ½8
strings, and reads 1
Z pffiffiffi Thus, only the components of Aa tangent to the
1  
@X @X
S½X ¼ d h h g  ðXÞ brane interact with the ends of the strings. What
4 0  @ @

Z about the normal components Ai ?


1 @X @X 

þ B ðXÞ d ^ d
½2 To understand its meaning, let us proceed to
4 0  @ @
compute the mean momentum transferred by the
where g and B are the metric and a 2-form string, as it would be rigid. Imitating the Hamilton–
potential field for the given spacetime background, Jacobi procedures for particles, let us consider the
and h
is a metric for . In general, we must also action up to a fixed time, say  = 0, so that
add a scalar field (X), but it will not play any role  = [0, ]  [1, 0]. It is then a function of the
here. Using conformal invariance, we can reduce h
position X (, 0) of the string at the instant  = 0.
to the flat metric. Also consider a flat background To compute the momentum, we must vary the
g (X) =  and concentrate for a moment on the action by changing the position by a constant shift
B-field. X () = 0 . The variation will then contain some
Conceived as a 2-form field over the spacetime, boundary terms which, for reasons of consistency,
the potential field B is a gauge field: its field strength we must make vanish.
3-form H = dB is unchanged under a shift Before doing such a computation, let us make
some further comments. It is plausible to assume
B ! B þ dA ½3 that the two ends of the string could be charged for
generated by the 1-form field A(X). Here A should be different U(1) fields. To the states of the open string
a totally unphysical field. However, note that if one we can in fact add two discrete labels I, J = 1, . . . , n,
considers open strings, the action for the B-field, and for some integer n, called Chan–Paton factors, and
then the full action is shifted by a boundary term referring, respectively, to the two ends of the string.
Z We will indicate the ends of the string as X (0, ; I)
1 @X  and X ( , ; J) when we need to specify the states. If
S½X ! S½X þ A  ðXÞ d ½4
4 0  @ the string is in the excited state (I, J), then X(0, ; I)
can couple with the field AI and X( , ; J) with A( J) .
The boundary  just describes the timelike world
For simplicity, we will now assume that these fields
lines of the ends of the string. Thus, the ends of
are constant. Note however that A(I) must be
the string carry a U(1) charge and, even though
intended as a function of X(0, ) only, and similarly
the B-field vanishes, we can have the open-string
for A( J) . Also to realize the variation we can vary
action
X (, ) by a function X (, ) =  () strictly
Z
1 picked to 0 at  = 0 so that essentially
S½X ¼ @ X @  X d2 
4 0 
Z @  ðÞ ¼ 0 ðÞ ½9
þ A ðXÞ@ X d ½5
 where () is the Dirac delta function.
362 Brane Construction of Gauge Theories

Using the chosen boundary conditions, the varia- boundary terms, the total variation of the action
tion of the full action contains the boundary terms due to the shift X (, 0) =  becomes
 Z 0 Z
1
ð JÞ
Sbound ¼ Ai  Ai
ðIÞ
@ i ðÞd S ¼ @  @ X d2
2 0 
Z 1 Z
1 
þ i @ Xi ð; 0Þd ¼ @ X ð; 0Þd ½12
2 0 0 2 0 0

i The resulting momentum is
¼ Xi ð ; 0Þ  Xi ð0; 0Þ
2 0 Z
  P ¼
1
@ X ð; 0Þd
0 ð JÞ ðIÞ
þ 2  Ai  Ai ½10 2 0 0

Imposing the condition of its vanishing gives the On the bulk, the fields X satisfy the standard wave
physical interpretation for the normal components equation in two dimensions, so that the general
of the U(1) fields solution is the sum of a left-moving and a right-
  moving part, X (, ) = XL ( þ ) þ XR (  ).
ð JÞ ðIÞ
Xi ð ; 0Þ  Xi ð0; 0Þ ¼ 2 0 Ai  Ai ½11 Imposing the boundary conditions, one finds

This means that, up to a constant shift, the fields Xa ð; Þ ¼ XaL ð þ Þ þ XaL ð  Þ
A(K)
i measure the positions of the ends of the strings þ 2 0 pa  þ Xa0 ½13
in the transverse directions! (Figure 2). Equivalently,
we can say that the string ends on two different Dp Xi ð; Þ ¼ XiL ð þ Þ  XiL ð  Þ
 
branes, parallel but displaced in the transverse þ 20 AðJÞi  AðIÞi  þ Xi0 ½14
directions by a quantity 2 0 A(i J)  A(I)
i . We are
thus also able to interpret the Chan–Paton factors. Here X0 and pa are integration constants and
They mean that the string is living in a background XiL ( þ )  XiL (  ) = 0. A direct computation
of n parallel branes, stretched between the Ith and then shows that Pa = pa and Pi = 0, which is also
the Jth brane. On every brane, a U(1) gauge group what intuition suggests: the string can freely move
lives so that the full gauge group is U(1)n . However, along the branes but is fixed between them in the
when k of the branes overlap, the corresponding set orthogonal directions. However, if it is stretched
of states become indistinguishable, so that the gauge between two separated branes (i.e., if I 6¼ J), there is
group can be enhanced to a U(k) group. In another contribution to the energy. In fact the factor
conclusion, n overlapping parallel Dp branes carry T := 1=(2 0 ) represents the string tension, so that if
a (p þ 1)-dimensional U(n) gauge theory which  is its minimal length, its minimal contribution to
breaks in U(ki ) block factors if the branes separate the energy will be E = T. This energy must
in stacks of ki overlapping branes. equally contribute to the spectrum of the excited
We can say a little bit more about this. If the modes, the gauge field bosons. Here in fact, is where
string excited states represent gauge degree of T-duality comes into play, but we will not discuss it.
freedom, they must become massive to break gauge The conclusion is that the spectrum corresponding to
symmetry when the branes separate. To see this, let the stretched string must satisfy the condition E  T,
us conclude by computing the mean momentum which is as if the string states acquired a mass T,
carried by the string. After elimination of the that is,
9 
X 2
m2 ¼ Að JÞi  AðIÞi ½15
i¼pþ1
Aa
This gives us a geometric tool to construct (p þ 1)-
dimensional gauge theories: on n coincident Dp
Aa Aa branes there exists a U(n) gauge theory which can be
Ai broken separating the branes and thus giving a mass
to the gauge bosons. Such a mass is proportional to
the distance between the branes (Figure 3).
Before continuing with some examples, let us
Figure 2 Tangential components of Aa appear as gauge make two comments. First, the theory obtained in
modes. Normal components Ai appear as shift modes. this way is a supersymmetric one, because the
Brane Construction of Gauge Theories 363

Massless NS5
v
x6
x
D4

Massive

Δ Figure 4 D4 branes ending on an NS5 brane. Gauge degrees


of freedom are frozen in four dimensions.
Figure 3 Stretched strings acquire a mass.

Dirichlet conditions allow the action of supersym- can try to consider the coexistence of more kinds of
metric transformations of the form L QL þ R QR , branes.
where QL and QR are the fermionic left and right One way to do this is to consider n parallel 4-branes
supercharge operators and L , R are spinors satisfy- ending on an NS5 brane in type IIA string theory
ing the brane projection condition L = 0 1  . . .  (Figure 4), and then analyze the gauge theory restricted
p R . Here  are the ten-dimensional Dirac to the four-dimensional intersection (here the theory is
matrices and one refers to ‘‘antibranes’’ for the nonchiral as 0  . . .  9 L=R =  L=R ). What kind of
negative sign. branes can end on other kind of branes can be
Second, the gauge group can be converted into an established, starting from the fact that strings can end
SO(n) or an Sp(n=2) (for even n), adding an on a brane, and using the dualities tool (Giveon and
orientifold plane parallel to the branes. The orienti- Kutasov 1999).
fold plane acts on the orthogonal spacetime direc- Let us fix some conventions. We will indicate with
tions with a Z2 -action x = (x0 , x1 , x2 , x3 ) 2 R4 the coordinates on the inter-
section, so that (x; v) = (x; x4 , x5 ) 2 R6 define the NS5
Xi Xi ½16 brane, and (x, x6 ), with x6 2 [0, 1), the 4-branes. Also
if Xi = 0 is the position of the orientifold. It further vI will indicate the position of the Ith 4-brane on the 5-
acts on the string world sheet as    making it brane, and y = (x7 , x8 , x9 ) will collect the remaining
an unoriented string. The effect is to project out coordinates. Finally, we will indicate the product of -
some states from the spectra, thus reducing the matrices, corresponding to given directions, indicizing
gauge group. a simple  with the respective coordinates. For
example v = 4 5 . With these conventions, the
brane projection conditions for D4 and NS5 branes,
Geometric Engineering of Gauge respectively, read
Theories from Branes L ¼ x 6 R ½17
To illustrate how brane construction of gauge
L ¼ x v L ; R ¼ x v R ½18
theories works, we will consider a particular con-
figuration of branes (Witten 1997). These projections reduce supersymmetry to N = 2.
We would like to obtain a four-dimensional U(n) After a short manipulation and using for example
gauge theory. A possibility could be to take n D3 antichirality of R , it is easy to see that the first
branes in a type IIB string background. However, condition can be substituted by
such a model would contain too many supersymme-
L ¼ x y R ½19
tries: in ten dimensions, supersymmetries are gener-
ated by two 16-dimensional chiral spinors L , R In other words, we could add a number of 6-branes
(0  . . .  9 L,R = L,R ). From the four-dimensional in the (x, y) directions, without further reducing
point of view, each of them represents four four- supersymmetry. We will consider this possibility
dimensional spinors giving an N = 8 supersymmetric later.
theory. The projection condition, due to the branes, On the D4 branes there is an eventually broken
reduces the number of supersymmetries to four. U(n) gauge theory. Here the vector fields
Supersymmetry not being manifest in nature, it is A ,  = 0, 1, 2, 3, 6, and the scalar fields vI and y
desirable to have fewer supersymmetric gauge theo- live. The last ones are set to zero by the Dirichlet
ries at hand. Because different brane projection conditions, whereas vI measure the fluctuations of
conditions can further reduce supersymmetry, we the D3 brane positions over NS5. The O(2) group
364 Brane Construction of Gauge Theories

of rotations of the (x4 , x5 ) coordinates acts on NS5 NS5


them, which can be broken by an expectation
value hvI i 6¼ 0. The SO(3) rotations of (x6 , x7 , x8 ) v
(under which vI are singulets) do not influence the x
projection conditions and can then be identified with D4
the R-symmetry group SU(2)R . It could be broken by a
nonvanishing expectation value h yi 6¼ 0, but as we x6
said it cannot happen in the actual configuration. This
highlights an unbroken supersymmetric Coulomb L
branch.
What is the physics as seen by an observer living Figure 5 N = 2 four-dimensional super Yang–Mills theory, with
U(n) gauge group.
on the four-dimensional spacetime x? The compo-
nents A ,  = 0, 1, 2, 3, of the vector fields transform
as vectors with respect to the four-dimensional
What we just obtained is an N = 2 supersym-
Lorentz group SO(1, 3). They satisfy Neumann
metric classical U(n) gauge theory in four dimen-
boundary conditions on x6 = 0 and then survive as
sions, without matter, and in the Coulomb branch.
U(n) gauge vector fields. The A6 component behaves
Before considering quantization, let us briefly
as a scalar with respect to SO(1, 3) but is eliminated
discuss some possible generalizations. For example,
by a Dirichlet condition in x6 = 0. The v scalar field
matter can be realized attaching to the left-hand side
will be responsible for the eventual breaking of the
NS5 brane, new D4 branes parallel to the previous
gauge group.
ones, but extended in the x6 direction from 1 to 0
This seems to be quite a good scenario but
(Figure 6). Considering strings stretched between
actually the situation is unsatisfactory. If a 4-brane
long and short branes, we obtain states whose half-
extends to the interval [0, L] in the x6 direction, the
gauge action, associated with the end connected to
effective action for the gauge fields goes like this:
the long brane, is frozen. The corresponding states
Z L Z
1 thus appear in the fundamental representation and
dx 6
d4 xtrF F can be interpreted as matter states.
g2D4 0 R4
Z To consider the Higgs branch, one should be able
L

2 d4 xtrF
F
½20 to break supersymmetry giving an expectation value
gD 4 R 4 to y. As mentioned above, in the actual configura-
where ,
= 0, 1, 2, 3. Thus, the gauge coupling in tion this cannot happen because y is set to 0 by
pffiffiffiffi Dirichlet conditions. Fortunately, as we said, one
four dimensions appears to be g4 = (gD4 )= L. In our
case, where L goes to infinity, the gauge coupling can add 6-branes in the (x, y) directions. If we insert
vanishes and the gauge degrees of freedom are such branes to stop the long D4 branes in a large but
frozen. Moreover, an argument similar to the one finite value of x6 , say x6 = M with M L, then
made for the stretched strings shows that the energy long branes have Neumann conditions in the y
of the D4 brane is very high and makes the directions. Thus, fluctuations of the long branes can
mechanism of gauge group breaking difficult. The give an expectation value to y, breaking super-
same is true for the NS5 brane, which also turns out symmetry and subsequently the Higgs branch can be
to be extremely massive and does not participate in tuned, shifting 4-branes stretched between 6-branes
the dynamics. But this is what we want. (Figure 7).
To solve the problem and restore gauge dynamics
in four dimensions, one must consider a stack of
4-branes of finite length in the x6 direction. This can
be achieved placing in x6 = L a second NS5 brane x NS5
parallel to the first one and in the same point in y Matter
(Figure 5). In this way, the D4 branes can stretch NS5
between the NS5 branes. If L is little enough, the D4
gauge dynamics is restored also requiring a small
value for gD4 , to ensure the gravitational coupling v x6
(and the couplings with the Kaluza–Klein and NS5
modes) to be negligible. However, L must be bigger
L
then the X6 fluctuations in order to avoid quantum
corrections. Figure 6 Adding matter.
Brane Construction of Gauge Theories 365

x The full web of dualities suggests the existence of


NS5
y a unique unifying theory called M-theory. At low
Matter
D6 NS5 energies, M-theory appears as the strong-coupling
limit of type IIA strings. In such a limit, D0 branes
become the dominant objects and the corresponding
states can be interpreted as Kaluza–Klein modes
v x6 coming from an eleventh dimension x10 compacti-
Higgs
branch fied on a circle S1 (Figure 10).
L Thus, M-theory manifests itself as an 11-dimensional
Figure 7 Permitting Higgs phases. supergravity. In particular, it can be shown that there
can be only a unique 11-dimensional supergravity. As
The details require some careful inspection, but said, here the nonperturbative objects are two- or five-
we shall stop our analysis here (Giveon and Kutasov dimensional membranes.
1999). From the M-theory point of view, the D4 branes
More general gauge configurations can be realized considered in our model appear as M5 membranes
by adding more parallel NS5 branes, and thus wrapped on the eleventh direction S1 (Figure 11).
obtaining product groups. Adding orientifold planes, Because quantum corrections are no longer negligi-
one can change gauge groups as explained in the ble, we can no longer think of these branes as
previous section (Figure 8). stretched in the x6 direction, but v must also be
Finally, we can take a further step towards more considered. Thus, the M5 membranes will describe,
physical models, constructing N = 1 gauge theories. in R10  S1 , a region R4  S, where R4 are the x
For example, this can be achieved from the previous coordinates, and S is a Riemann surface immersed in
N = 2 model, rotating the second NS5 brane from Q  S1 , Q being spanned by the (v, x6 ) coordinates.
the (x, v) position, to the (x, w) position, where In fact, supersymmetry constrains the surface to be a
w = (x8 , x9 ) (Figure 9). Then a new brane projection holomorphic curve, so that to describe it, it is
condition appears ( L = x w R ), breaking super- convenient to collect v = (x4 , x5 ) and (x6 , x10 ) into
symmetry down to N = 1. complex coordinates v = x4 þ ix5 and s = x6 þ ix10 .
In this case, one could also obtain chiral matter, To compute quantum fluctuations, let us note that
adding, for example, orientifold planes. the end of a D4 brane over an NS5 brane is free to
move along the v directions. A fully free end of a
brane would satisfy a free wave equation. However,
as x6 is constrained in all directions but the v ones, it
Quantum Corrections from M-Theory will simply satisfy a Laplace equation in two
Up to this point we have considered classical gauge dimensions: v X6 = 0. Let us solve it, for a fixed
configurations. Quantum corrections could be com- NS5 brane. It will be (at least for large values of v)
puted switching on brane fluctuations. However, it X
nL
ðÞ
X
nR
ðÞ
is an amusing fact that working with M-theory one x6 ðvÞ ¼ k log jv  vLi j  k log jv  vRi j ½21
can obtain exact quantum results. As an example, i¼1 i¼1

let us sketch how the exact Seiberg–Witten solution where nL is the number of D4 branes ending on
can be obtained for the N = 2 model described in the the left-hand side of the NS5 brane, in the positions
previous section, in the simplest case without v()
Li , and similar for the R index, which refers to
matter.

x NS5
(n1, n2)

NS5

D4 n1 n2 D4

x6 v

Figure 8 N = 2 four-dimensional super Yang–Mills theory with U(n1 )  U(n2 ) gauge group and matter. Strings crossing the central
NS5 brane give matter in the (n1 , n2 ) representation.
366 Brane Construction of Gauge Theories

  Y
n
x
y NS5
w
s  sð1Þ s  sð2Þ ðv  vi Þ ¼ 0 ½24
Matter
D6 i¼1

D4 Here s() are the positions of the NS5 branes, and


the positions vi of the D4 branes coincide for both
v x6 NS5′ the NS5 branes. Also, for large values of v, one has
t(1)
vn and t(2)
vn .
Quantum mechanically, the configuration is
Figure 9 Going down to N = 1 supersymmetry. determined in terms of v and t by the holomorphic
curve S, which can be described as an algebraic
curve F(v, t) = 0, generalizing the classical configura-
(v, y) tion. As there are two NS5 branes and n D4 branes,
(x, x 6) F must be a polynomial of degree 2 in t,

Fðv; tÞ ¼ A2 ðvÞt2 þ A1 ðvÞt þ A0 ðvÞ ½25

where Aa , a = 1, 2, 3, are all polynomials of degree n.


x 10
Note that values of v such that A1 vanishes give the
Figure 10 In M-theory one can think as if at any ten-dimensional
solution t = 0, which corresponds to sending the right-
spacetime point, there is attached an S 1 circle of ray R10 .
hand side NS5 brane to 1. Similarly, A2 = 0 sends the
other NS5 brane to 1. To avoid these undesirable
configurations, we can set A0 = A2 = 1. For A1 , we
D4 brane M5 membrane (v, y) can take the most general choice, up to an eventual
(x, x 6) shift in v, giving the quantum configuration

x 10
t2 þ vn þ an2 vn2 þ    þ a1 v þ a0 t þ 1 ¼ 0 ½26
Figure 11 D4 branes become M5 membranes in M-theory. This realizes a quantum-mechanical correspondence
between the M5 membrane configurations described
the right-hand side. Here () refers to the th NS5 by the given polynomials, and the N = 2 super
brane, and k is an integration constant. Yang–Mills vacua. But this is also the claimed
Because x6 is the real part of a holomorphic field, Seiberg–Witten curve. In particular, M-theory gives
whose imaginary part is compactified on a circle of a concrete physical meaning for the support Rie-
ray R10 , we then find mann surfaces of the Seiberg–Witten solutions.
  To conclude, let us make some further comments.
X
nL
ðÞ
sðvÞ ¼ R10 log v  vLi It is clear how the construction can be extended for
i¼1 involving more configurations, for example, with
X
nR   more NS5 branes, or adding matter.
ðÞ
 R10 log v  vRi ½22 Also, we have seen that the geometrical picture
i¼1 which branes give of gauge theories extends at the
This describes the quantum fluctuations of the NS5 quantum level.
brane as seen in M-theory. In particular, because of A similar construction can be made for the N = 1
the imaginary part of s, the ends of the D4 branes model, which also permits a full geometrical proof
appear as vortices on the NS5 brane. In place of s, it of the Seiberg duality at both classical and quantum
is now convenient to introduce a new field levels.
t := exp (s=R10 ) so that Finally, we should note that there are also
Q nR   other methods, which work in spacetimes where extra
ðÞ
i¼1 v  v Ri dimensions are compactified. There, the branes wrap
tðvÞ ¼ Q   ½23 around certain singular loci which contain information
nL ðÞ
i¼1 v  vLi about gauge symmetries (Lerche 1997).
Before continuing, let us look a bit again at the See also: AdS/CFT Correspondence; Compactification of
classical limit. In this case, a fixed value of v will Superstring Theory; Gauge Theories from Strings;
correspond to the position of a D4 brane, whereas a Noncommutative Geometry from Strings; Seiberg–Witten
fixed value of s will correspond to the fixed position Theory; Supergravity; Superstring Theories;
of an NS5 brane. The classical configuration is then Supersymmetric Particle Models.
Brane Worlds 367

Further Reading Polchinski J (1998) String Theory. Vol. 1: An Introduction to the


Bosonic String. Cambridge: Cambridge University Press.
Giveon A and Kutasov D (1999) Brane dynamics and gauge Polchinski J (2004) String Theory. Vol. 2: Superstring Theory and
theory. Reviews of Modern Physics 71: 983. Beyond. Cambridge: Cambridge University Press.
Johnson CV (2003) D-Branes.Cambridge: Cambridge University Witten E (1997) Solutions of four-dimentional field theories via
Press. M-theory. Nuclear Physics B 500: 3.
Lerche W (1997) Introduction to Seiberg–Witten Theory and Its Zwiebach B (2004) A First Course in String Theory. Cambridge:
Stringy Origin. Nucl. Phys. Proc. Suppl. B 55: 83. Cambridge University Press.

Brane Worlds
R Maartens, Portsmouth University, Portsmouth, UK extended objects of higher dimension than strings
ª 2006 Elsevier Ltd. All rights reserved. play a fundamental role in the theory. These objects
are known as ‘‘branes’’ (from membranes), and the
relation between them and strings leads to a new
Introduction picture of how gravity and matter may be connected
in the universe. Roughly speaking, open strings
At high enough energies, Einstein’s classical theory describe the particles of the nongravitational sector,
of general relativity breaks down, and will be and their ends are attached to branes, while closed
superseded by a quantum gravity theory. The strings, which describe the graviton and associated
singularities predicted by general relativity in grav- particles of the gravitational sector, can move freely
itational collapse and in the hot big bang origin of in all dimensions.
the universe are thought to be artifacts of the Thus, the observable universe could be a
classical nature of Einstein’s theory, which will be ð1 þ 3Þ-surface – a ‘‘brane,’’ embedded in a
removed by a quantum theory of gravity. Develop- ð1 þ 3 þ dÞ-dimensional spacetime – the ‘‘bulk,’’
ing a quantum theory of gravity and a unified theory with standard-model particles and fields trapped on
of all the forces and particles of nature are the two the brane, while gravity is free to access the bulk.
main goals of current work in fundamental physics. Brane-world models offer a phenomenological way to
The problem is that general relativity and quantum test some of the novel predictions and corrections to
field theory cannot simply be molded together. general relativity that are implied by M theory.
There is as yet no generally accepted (pre-)quantum
gravity theory.
The quest for a quantum gravity theory has a long
and thus far not very successful history. Many Higher-Dimensional Gravity
different lines of attack have been developed, each Brane worlds can be seen as reviving the original
having a different way of dealing with the classical higher-dimensional ideas of Kaluza and Klein in the
singularities that arise from point particles and 1920s, but in a new context of quantum gravity. An
smooth spacetime geometry. String theory does important consequence of extra dimensions is that
away with zero-dimensional point particles, and the four-dimensional Planck scale Mp M(4) =
particles are modeled as different states of new 1.2  1019 GeV is no longer the fundamental energy
fundamental objects, the one-dimensional strings. It scale of gravity. The fundamental scale is instead
turns out, however, that there is a price to pay – the M(4þd) . This can be seen from the modification of
number of spacetime dimensions must be greater the gravitational potential. For an Einstein–Hilbert
than four for a consistent theory. When fermions are gravitational action,
included, which leads to superstring theory, the Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
required number of dimensions is ten – one time and 1
Sgravity ¼ 2 d4 x dd y ð4þdÞ g
nine space dimensions. 2ð4þdÞ
There are in fact five distinct ð1þ9Þ-dimensional h i
superstring theories. In the mid-1990s, duality  ð4þdÞ R  2ð4þdÞ ½1
transformations were discovered that relate these
superstring theories to each other and to the ð1þ10Þ- we have the higher-dimensional Einstein field
dimensional supergravity theory. This led to the equations,
conjecture that all of these theories arise as different ð4þdÞ
GAB ð4þdÞ RAB  12ð4þdÞ Rð4þdÞ gAB
limits of a single theory, which has come to be
known as M theory. It was also discovered that ¼ ð4þdÞ ð4þdÞ gAB þ 2ð4þdÞ ð4þdÞ TAB ½2
368 Brane Worlds

where xA = (xa , y1 , . . . , yd ) and 2(4þd) is the gravita- fundamental scale is much less than the Planck
tional coupling constant given by scale felt in four dimensions. This emerges by virtue
of the large size of the extra dimensions. It is not
8
2ð4þdÞ ¼ 8Gð4þdÞ ¼ ½3 necessary for all extra dimensions to be of equal size
M2þd
ð4þdÞ for this mechanism to operate. There are string
theory solutions (Horava–Witten solutions) with
The static weak field limit of the field equations
two ð1þ9Þ-branes located at the boundaries of the
leads to the ð4þdÞ-dimensional Poisson equation,
bulk, at the endpoints of an S1 =Z2 orbifold, that is,
whose solution is the gravitational potential
a circle folded on itself across a diameter. The
2ð4þdÞ orbifold extra dimension is the large one, whereas
VðrÞ / ½4 the other six extra dimensions on the branes are
r1þd
compactified on a very small scale, close to the
In the simplest scenario, we can assume a fundamental scale, and their effect on the
toroidal configuration for the d extra dimensions, dynamics is felt through ‘‘moduli’’ fields, that is,
with each compactified on the same length scale L. five-dimensional scalar fields.
Then on scales r . L, the potential is ð4þdÞ- These solutions can be thought of as effectively
dimensional, V  r(1þd) . By contrast, on scales five dimensional, with an extra dimension that can
large relative to L, where the extra dimensions do be large relative to the fundamental scale. They
not contribute to variations in the potential, V behaves provide the basis for the Randall–Sundrum 1 (RS1)
like a four-dimensional potential, V  Ld r1 . This phenomenological models of five-dimensional grav-
means that the usual Planck scale becomes an effective ity. The single-brane Randall–Sundrum 2 (RS2)
coupling constant, describing gravity on scales much models with infinite extra dimension arise when
larger than the extra dimensions, and related to the the orbifold radius tends to infinity. The RS models
fundamental scale via the volume of the extra are not the only phenomenological realizations of M
dimensions: theory ideas. They were preceded by the brane-
world models of Arkani-Hamed, Dimopoulos, and
M2p  M2þd
ð4þdÞ L
d
½5 Dvali (ADD), which put forward the idea that a
large volume for the compact extra dimensions
would lower the effective Planck scale M(4þd) . If
Large Extra Dimensions
M(4þd) is close to the electroweak scale, Mew , then
If the extra-dimensional volume is significantly this would address the long-standing ‘‘hierarchy’’
above the Planck scale, then the true fundamental problem, that is, why there is such a large gap
scale M(4þd) can be much less than the effective scale between Mew  1 TeV and Mp  1016 TeV.
Mp , In the ADD models, more than one extra
dimension is required for agreement with experi-
Ld  Md
p ) Mð4þdÞ  Mp ½6 ments, and there is ‘‘democracy’’ among the equiva-
In this case, we understand the weakness of gravity lent extra dimensions, which, in addition, are flat.
as due to the fact that it ‘‘spreads’’ into extra By contrast, the RS models have a ‘‘preferred’’ extra
dimensions, and only a part of it is felt in four dimension, with other extra dimensions treated as
dimensions. ignorable (i.e., stabilized except at energies near the
A lower limit on M(4þd) is given by null results in fundamental scale). Furthermore, this extra dimen-
table-top experiments to test for deviations from sion is curved or ‘‘warped’’ rather than flat: the bulk
Newton’s law in four dimensions, V / r1 . These is a portion of anti-de Sitter (AdS5 ) spacetime. The
experiments currently probe submillimeter scales, RS branes are Z2 -symmetric (mirror symmetry), and
and find no detectable deviation, so that have a tension, which serves to counter the influence
on the brane of the negative bulk cosmological
L . 101 mm  ð1015 TeVÞ1 constant. This also means that the self-gravity of the
) Mð4þdÞ & 10ð3215dÞ=ðdþ2Þ TeV ½7 branes is incorporated in the RS models. The novel
feature of the RS models compared to previous
Stronger bounds can be derived from null results in higher-dimensional models is that the observable
particle accelerators in some brane-world models, or three dimensions are protected from the large extra
from constraints imposed by observations of super- dimension (at low energies) by curvature (warping),
novae or of light-element abundance. rather than straightforward compactification.
Brane worlds, arising in the framework of string The RS brane worlds provide phenomenological
theory, thus incorporate the possibility that the models that reflect at least some of the features of
Brane Worlds 369

M theory, and that bring exciting new geometric The massless mode, h0 , is the usual four-
and particle physics ideas into play. The RS2 dimensional graviton mode. But there is a tower
models also provide a framework for exploring of massive modes, L1 , 2L1 , . . . , which
holographic ideas that have emerged in M theory. imprint the effect of the five-dimensional gravita-
Roughly speaking, holography suggests that tional field on the four-dimensional brane. Com-
higher-dimensional dynamics may be determined pactness of the extra dimension leads to
from a knowledge of the fields on a lower- discreteness of the spectrum. For an infinite
dimensional boundary. The AdS/CFT correspon- extra dimension, L ! 1, the separation between
dence is an example in which the classical the modes disappears and the tower forms a
dynamics of the higher-dimensional AdS gravita- continuous spectrum.
tional field are equivalent to the quantum
dynamics of a conformal field theory (CFT) on
the boundary.
Randall–Sundrum Brane Worlds
RS brane worlds do not rely on compactification to
localize gravity at the brane, but on the curvature of
Kaluza–Klein Modes the bulk. What prevents gravity from ‘‘leaking’’ into
The dilution of gravity via extra dimensions not the extra dimension at low energies is a negative
only weakens gravity, it also broadens the range of bulk cosmological constant,
graviton modes felt on the brane. The graviton is 6
more than just the four-dimensional massless mode ð5Þ ¼  ¼ 62 ½12
‘2
of four-dimensional gravity – other modes, with an
effective mass on the brane, arise from the fact where ‘ is the curvature radius of AdS5 and  is the
that the graviton is a (4þd)-dimensional massless corresponding energy scale. The bulk cosmological
particle. These extra modes on the brane are constant with its repulsive gravity effect acts to
known as Kaluza–Klein (KK) modes of the ‘‘squeeze’’ the gravitational potential closer to the
graviton. brane. We can see this clearly in Gaussian normal
For simplicity, consider a flat brane with one flat coordinates xA = (x , y) based on the brane at y = 0,
extra dimension, compactified through the identi- for which the metric takes the form
fication y $ y þ 2nL, where n = 0, 1, 2, . . . . The ð5Þ
ds2 ¼ dy2 þ e2jyj=‘  dx dx ½13
perturbative five-dimensional graviton is defined
via with  the Minkowski metric. The exponential
ð5Þ
warp factor reflects the confining role of the bulk
AB ! ð5Þ AB þ hAB ½8 cosmological constant. The Z2 -symmetry about the
where (5) AB is the five-dimensional Minkowski metric brane at y = 0 is incorporated via the jyj term. In the
and hAB is a small transverse traceless perturbation. Its bulk, this metric is a solution of the five-dimensional
amplitude can be Fourier expanded as Einstein equations,
X ð5Þ
hðxa ; yÞ ¼ einy=L hn ðxa Þ ½9 GAB ¼ ð5Þ ð5Þ gAB ½14
n
that is, (5) TAB = 0 in eqn [2]. The brane is a flat
where hn are the amplitudes of the KK modes, that Minkowski spacetime, gAB (x , 0) =   A  B , with
is, the effective four-dimensional modes of the five- self-gravity in the form of brane tension.
dimensional graviton. To see that these KK modes The two RS models are distinguished as follows:
are massive from the brane viewpoint, we start from
the five-dimensional wave equation that the massless RS1 There are two branes in RS1, at y = 0 and
five-dimensional field h satisfies (in a suitable y = L, with Z2 -symmetry identifications
gauge): y $ y; yþL$Ly ½15
ð5Þ & 2
h ¼ 0 ) &h þ @ yh ¼0 ½10 The branes have equal and opposite tensions, ,
where
It follows that the KK modes satisfy a four-
2
dimensional Klein–Gordon equation with an effec- 3 Mp
tive four-dimensional mass, mn : ¼ ½16
4 ‘2
n The positive-tension ‘‘TeV’’ brane has fundamental
&hn ¼ m2n hn ; mn ¼ ½11
L scale M(5)  1 TeV. Because of the exponential
370 Brane Worlds

h  
warping factor, the effective scale on the negative
hm ðyÞ ¼ e2y=‘ Bm J2 mley=‘
tension ‘‘Planck’’ brane at y = L is Mp . On the  i
positive tension brane, þ Cm Y2 mley=‘ ½27
h i
M2p ¼ M3ð5Þ ‘ 1  e2L=‘ ½17 where J2 , Y2 are Bessel functions.
The boundary condition for the perturbations is
So RS1 gives a new approach to the hierarchy h0 (t, 0) = 0, which implies
problem. Because of the finite separation between
J1 ðm‘Þ
the branes, the KK spectrum is discrete. C0 ¼ 0; Cm ¼  Bm ½28
Y1 ðm‘Þ
RS2 In RS2, there is only one, positive-
tension, brane. This may be thought of as arising In the RS1 model, we have a further boundary
from sending the negative tension brane off to condition, h0 (t, L) = 0, which leads to a discrete
infinity, L ! 1. Then the energy scales are eigenspectrum, namely the masses m that satisfy
related via    
J1 m‘eL=‘ Y1 ðm‘Þ  Y1 m‘eL=‘ J1 ðm‘Þ ¼ 0 ½29
M2p
M3ð5Þ ¼ ½18 The zero mode is normalizable, since

Z 1 
 
On the RS2 brane, the negative (5) is offset by  B e2y=‘
dy <1 ½30
 0 
the positive brane tension . The fine-tuning in eqn 0
[16] ensures that there is zero effective cosmological
Its contribution to the gravitational potential
constant on the brane, so that the brane has the
V = ð1/2Þh00 gives the four-dimensional result, V /
induced geometry of Minkowski spacetime. To see
r1 . The contribution of the massive KK modes sums
how gravity is localized at low energies, we consider
to a correction of the four-dimensional potential.
the five-dimensional graviton perturbations of the
For r  ‘, one obtains
metric:  
ð5Þ
GM ‘ GM‘
gAB ! ð5Þ gAB þ hAB VðrÞ

2 ½31
½19 r r r
hAy ¼ 0 ¼ h  ¼ @ h
which simply reflects the fact that the potential
We split the amplitude h into three-dimensional becomes truly five dimensional on small scales. For
Fourier modes, and the linearized five-dimensional r  ‘,
Einstein equations lead to the wave equation (y > 0)  
h i GM 2‘2
VðrÞ
1þ 2 ½32
e2y=‘ h€ þ k2 h ¼ h00  4 h0 ½20 r 3r

which gives the small correction to four-dimensional
Separability means we can write gravity at low energies from extra-dimensional effects.
X
hðt; yÞ ¼ ’m ðtÞ hm ðyÞ ½21
m

and the wave equation reduces to Cosmological Brane Worlds


€m þ ðm2 þ k2 Þ’m ¼ 0
’ ½22 The RS models contain vacuum (Minkowski)
branes. In order to pursue brane-world ideas in
4 0 cosmology, we need to generalize the RS models to
h00m  h þ e2y=‘ hm ¼ 0 ½23
‘ m incorporate cosmological branes with matter and
The zero-mode solution is radiation on them. The effective field equations on
the brane are the vehicle for brane-bound observers
’0 ðtÞ ¼ A0þ eþikt þ A0 eikt ½24 to interpret cosmological dynamics. They arise from
projecting the five-dimensional field equations onto
h0 ðyÞ ¼ B0 þ C0 e4y=‘ ½25 the brane, via the Gauss–Codazzi equations. These
and the massive KK mode (m > 0) solutions are equations involve also the extrinsic curvature K of
 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  the brane, which determines how the brane is
’m ðtÞ ¼ Amþ exp þi m2 þ k2 t imbedded in the bulk.
 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  The stress-energy on the brane (tension, matter,
þ Am exp i m2 þ k2 t ½26 radiation) means that there is a jump in K across
Brane Worlds 371

the brane. More precisely, the junction conditions E  , the projection of the bulk Weyl tensor on the
across the brane are brane, encodes corrections from KK or five-
dimensional graviton effects. From the brane-
gþ 
  g ¼ 0 ½33
observer viewpoint, the energy–momentum
h i corrections in S  are local, whereas the KK
Kþ  K
¼  2
T brane
 1 brane
T g  ½34 corrections in E  are nonlocal, since they
  ð5Þ  3
incorporate five-dimensional gravity wave
where modes. These nonlocal corrections cannot be
brane
determined purely from data on the brane. In
T ¼ T  g ½35 the perturbative analysis of RS2 which leads to
is the total energy–momentum tensor on the brane the corrections in the gravitational potential, eqn
and T brane = g T
brane
. The Z2 -symmetry means that [32], the KK modes that generate this correction
when approaching the brane from one side and are responsible for a nonzero E  ; this term is
going through it, one emerges into a bulk that looks what carries the modification to the weak-field
the same, but with the normal reversed. This implies field equations.
that The effective field equations are not a closed system.
One needs to supplement them by five-dimensional
K
 ¼ Kþ
 ½36
equations governing E  , which are obtained from the
so that we can use the junction condition (eqn [34]) five-dimensional Einstein equations.
to determine the extrinsic curvature:

K ¼ 122ð5Þ T þ 13ð  TÞg ½37 Cosmological Dynamics

where T = T  , we have dropped the (þ) and we A (1þ4)-dimensional spacetime with spatial
evaluate quantities on the brane by taking the limit 4-isotropy (four-dimensional spherical/ plane/
y ! þ0. hyperbolic symmetry) has a natural splitting into
Together with the Gauss–Codazzi equations, eqn [37] hypersurfaces of symmetry, which are (1þ3)-
leads to the induced field equations on the brane: dimensional surfaces with 3-isotropy and
3-homogeneity, that is, Friedmann–Robertson–
2
G ¼ g þ 2 T þ 6 S   E  ½38 Walker (FRW) surfaces. In particular, the AdS5
z bulk of the RS2 brane world, which admits a
where foliation into Minkowski surfaces, also admits an
FRW foliation since it is 4-isotropic. The general-
2  2ð4Þ ¼ 164ð5Þ ½39
ization of AdS5 that preserves 4-isotropy and
 solves the five-dimensional Einstein equation is
  ð4Þ ¼ 12 ð5Þ þ 2  ½40 Schwarzschild AdS5 , and this bulk therefore
admits an FRW foliation. It follows that an
1
S  ¼ 12 TT  14T T  FRW cosmological brane world can be embedded
 in Schwarzschild AdS5 spacetime.
1
þ 24 g 3T
T
 T 2 ½41
The black hole in the bulk is felt on the brane
and via the E  term. The bulk black hole gives rise to
‘‘dark radiation’’ on the brane via its Coulomb
E  ¼ ð5Þ CACBD nC nD g A g B ½42
effect. The FRW brane can be thought of as
A moving radially along the fifth dimension, with the
where n is the unit normal to the brane and
(5) junction conditions determining the velocity via
CACBD is the Weyl tensor in the bulk.
The induced field equations [38] show two key the Friedmann equation. Thus, one can interpret
modifications to the standard four-dimensional Einstein the expansion of the universe as motion of the
field equations arising from extra-dimensional effects. brane through the static bulk. In the special case
of no black hole and no brane motion, the brane is
S   (T )2 is the high-energy correction term,
empty and has Minkowski geometry, that is, the
which is negligible for  , but dominant for original RS2 brane world is recovered, in different
  (where is the energy density): coordinates.
An intriguing aspect of the cosmological metric is
j2 S  =j jT j
  ½43 that five-dimensional gravitational wave signals can
j2 T j   take ‘‘shortcuts’’ through the bulk in traveling
372 Brane Worlds

between points A and B on the brane. The travel This is much weaker than the limit imposed by
time for such a graviton signal is less than the time table-top experiments, which limit the curvature
taken for a photon signal (which is stuck to the radius to ‘ . 0.2 mm, leading to
brane) from A to B.
Cosmological dynamics on the brane are governed  > ð100 GeVÞ4 ) Mð5Þ > 108 GeV ½47
by the modified Friedmann equation:
The high-energy regime during radiation domina-
  2
 m 1 K tion is short-lived. Since 2 = decays as a8 during the
H2 ¼ 1 þ þ 4þ  2 ½44 radiation era, it will rapidly drop below one, and the
3 2 a 3 a
universe will enter the low-energy four-dimensional
regime. However, traces of the high-energy era may be
where H = ȧ=a is the Hubble expansion rate, a(t) is
left in the perturbation spectra that leave an imprint in
the scale factor, K is the curvature index, and m is
the cosmic microwave background radiation.
the mass of the bulk black hole.
In conclusion, simple brane-world models of RS2
The 2 = term is the high-energy term. When 
type provide a rich phenomenology for exploring
, in the early universe, then H 2 / 2 . This means
some of the ideas that are emerging from M theory.
that a given energy density produces a greater rate of
The higher-dimensional degrees of freedom for the
expansion that it would in standard four-dimen-
gravitational field, and the confinement of standard
sional gravity. As a consequence, inflation in the
model fields to the visible brane, lead to a complex
early universe is modified in interesting ways, some
but fascinating interplay between gravity, particle
of which may leave a signature in cosmological
physics, and geometry, which enlarges and enriches
observations.
general relativity in the direction of a quantum
The m=a4 term in eqn [44] is the ‘‘dark
gravity theory. High-precision astronomical data
radiation,’’ so called because it redshifts with
mean that cosmology is a potential laboratory for
expansion like ordinary radiation. But, unlike
testing and constraining these brane worlds. The
ordinary radiation, it is not a form of detectable
models predict extra-dimensional signatures in the
matter, but the imprint on the brane of the
cosmic microwave background and other observa-
gravitational field in the bulk (the Coulomb effect
tions, and these predictions can in principle be tested
of the bulk black hole). This additional effective
against data.
relativistic degree of freedom is constrained by
nucleosynthesis in the early universe. Any extra See also: String Theory: Phenomenology; Supergravity;
radiative energy not thermally coupled to radiation Superstring Theories.
affects the rate of production of light elements, and
observed abundances place tight constraints on
such extra energy. The dark radiation can be no
more than 3% of the radiation energy density at Further Reading
nucleosynthesis: Brax P and van de Bruck C (2003) Cosmology and brane worlds:
a review. Classical and Quantum Gravity 20: R201 (arXiv:
3m hep-th/0303095) (arXiv: hep-th/0303095).
. 0:03 ½45
2 nuc Cavaglia M (2003) Black hole and brane production in TeV
gravity: a review. International Journal of Modern Physics
The other modification to the Hubble rate is via A18: 1843 (arXiv:hep-ph/0210296).
Langlois D (2003) Cosmology in a brane-universe. Astrophysics
the high-energy correction =. In order to recover and Space Science 283: 469 (arXiv:astro-ph/0301022).
the observational successes of general relativity, the Maartens R (2004) Brane-world gravity. Living Reviews in
high-energy regime where significant deviations Relativity 7: 7 (arXiv:gr-qc/0312059).
occur must take place before nucleosynthesis, that Quevedo F (2002) Lectures on string/brane cosmology. Classical
is, cosmological observations impose the lower and Quantum Gravity 19: 5721 (arXiv:hep-th/0210292).
Rubakov V (2001) Large and infinite extra dimensions. Physics-
limit Uspekhi 44: 871 (arXiv:hep-ph/0104152).
Wands D (2002) String-inspired cosmology. Classical and
 > ð1 MeVÞ4 ) Mð5Þ > 104 GeV ½46 Quantum Gravity 19: 3403 (arXiv:hep-th/0203107).
Branes and Black Hole Statistical Mechanics 373

Branes and Black Hole Statistical Mechanics


S R Das, University of Kentucky, Lexington, KY, USA where  is the surface gravity at the horizon. The
ª 2006 Elsevier Ltd. All rights reserved. principle of detailed balance further ensures that the
radiation rate of some species of particle i, i (k),
in some given momentum range (k, k þ dk) is related
to the corresponding absorption cross section i (k) by
Introduction
i ðkÞ dd k
ðkÞ ¼ ½4
In classical general relativity, a black hole is a e!=TH  1 ð2Þd
solution of Einstein’s equations with a region of
spacetime which is causally disconnected from the where ! is the energy and d denotes the number of
asymptotic region at infinity. The boundary of such spatial dimensions. The  sign refers to fermions
a region is called the ‘‘event horizon.’’ The spacetime (bosons), respectively. A nontrivial k dependence of
around the simplest black hole in three space i signifies a departure from black-body behavior.
dimensions is described by the Schwarzschild metric Consequently, i (k) is often called a grey-body
  factor. Equations [2] and [3] may be derived by
2GM
ds2 ¼  1  dt2 combining Hawking’s calculation of the radiation
rc2 with standard thermodynamic relations. Alterna-
 
2GM 1 2 tively, they follow from the leading semiclassical
þ 1 dr þ r 2 d2 ½1 approximations of path-integral formulations of
rc2
Euclidean gravity based on the standard Einstein–
where G is Newton’s gravitational constant, c is the Hilbert action. For an account of black-hole
velocity of light, and we have used spherical thermodynamics, see Wald (1994).
coordinates with d the line element on an S2 . A Unlike usual thermodynamic systems, black holes
nonrotating, uncharged star which is too massive to appear to pose a deep puzzle. In usual systems,
form a neutron star will eventually collapse, and at thermodynamics is a coarse-grained description of a
late times the metric will be given by [1]. The system which is in a highly degenerate state.
horizon is a null surface S2  t and the radius of the Typically, such systems are described in terms of a
S2 is rhorizon = 2GM=c2 . The Schwarzschild solution few macroscopic parameters such as the total
has generalizations to black holes with charge and energy, the total volume, the total charge. For each
angular momentum and no-hair theorems guarantee set of values of these macroscopic parameters, there
that a black hole has no other characteristic property. are a large number of microscopic states which can
All these solutions can be generalized to other be described in terms of the constituents such as
theories like supergravity in various dimensions. atoms or molecules. This degeneracy manifests itself
In 1974, Hawking showed that due to pair as an entropy S which is related to the number of
production of particles near the horizon, black microscopic states for a given set of values of the
holes radiate thermally. Hawking’s calculation is macroscopic parameters,  by Boltzmann’s relation
valid for black holes whose masses are much larger
than the Planck mass: for such black holes, the S ¼ logðÞ ½5
curvature at the horizon is weak and normal
where units have been chosen such that the
semiclassical quantization is valid. Remarkably, the
Boltzmann constant is unity. For a black hole, the
properties of Hawking radiation are quite universal.
macrostates are specified by its mass, charge, and
A black hole can be characterized by an entropy
angular momentum. No-hair theorems, however,
called the Bekenstein–Hawking entropy. The leading
seem to suggest that there are no other properties
result for the entropy SBH for all black holes in any
and hence no obvious candidate for microstates. In
theory with the standard Einstein–Hilbert action is
the absence of such a statistical basis, one would be
given by
inevitably led to the conclusion that there is loss of
AH information in processes involving black holes.
SBH ¼ ½2 In a consistent quantum theory of gravity, there
4G
would be such a statistical basis since quantum
where AH denotes the area of the horizon. The mechanics is unitary. String theory is a strong
temperature TH is given by candidate for a unified theory which contains
 gravity. Indeed, string theory provides a microscopic
TH ¼ ½3
2 description for a class of black holes.
374 Branes and Black Hole Statistical Mechanics

Black Hole Solutions in String Theory (d  p  4)-dimensional extended objects as well.


These extended objects are called ‘‘branes.’’
Perturbatively, the basic excitations of string theory
In the type IIB example, there should be two
are fundamental closed and open strings character-
kinds of one-dimensional extended objects
ized by a string tensionp Tffiffiffiffiffiffiffiffiffiffi
s and ffi hence a length scale, which carry electric charge under BMN , B0MN ,
the string length ls = 1= 2Ts . Consistency requires
called the F-string and the D-string, respectively.
that the string should be able to propagate in ten
There are also two kinds of five-dimensional
spacetime dimensions and should be supersym-
branes which carry magnetic charges under
metric at the fundamental level. Formulated in
BMN , B0MN , called the NS 5-brane and D5 brane,
this fashion, there are several consistent string
respectively. Finally, there should be a 3-brane,
theories: type IIA, type IIB, and heterotic string
since the corresponding 5-form field strength is
theory (which contain only closed strings perturba-
self-dual as well as a D7 brane. A similar catalog
tively) and type I theory (which contains both open
can be prepared for other string theories, as well
and closed strings).
as for 11-dimensional supergravity, which is the
At energies much smaller than 1=ls , only the
low-energy limit of M-theory.
massless modes of the string can be excited. For all
The classical solutions for a set of p-branes of the
these string theories, the massless spectrum of closed
same kind generally have inner and outer horizons
strings contains the graviton and the low-energy
which have the topology t  S8p  Rp . The outer
dynamics is given by the appropriate supersymmetric
horizon is then associated with a Hawking tempera-
generalization of general relativity, supergravity. In
ture and a Bekenstein–Hawking entropy. Of parti-
addition, the closed-string spectrum contains a
cular interest are extremal limits. In this limit, the
neutral scalar field, the dilaton , whose expectation
inner and outer horizons coincide and the mass
value gives rise to a dimensionless parameter govern-
density is simply proportional to the charge. Given
ing interactions, called the string coupling gs :
some charge, the extremal solution has the lowest
energy. Extremal limits are interesting because in
gs ¼ e<> ½6
supergravity these correspond to solutions in which
The ten-dimensional gravitational constant is given some of the supersymmetries (in this case, half of the
by supersymmetries) are retained – such solutions are
called Bogomolny–Prasad–Sommerfeld (BPS) satu-
G10 ¼ 86 g2s ls8 ½7 rated solutions. The charge in question appears as a
central charge in the extended supersymmetry
Ten-dimensional supergravity has a wide variety of algebra. This fact may be used to show that such
black hole solutions, the simplest of which is the BPS solutions are absolutely stable. Indeed, for the
straightforward generalization of the Schwarzschild particular solution considered here, the Hawking
solution. temperature TH ! 0, so that there is no Hawking
radiation, as required by stability. Furthermore, the
entropy SBH ! 0. The horizon shrinks to a point
Black p-Brane Solutions
which appears as a naked null singularity.
More significantly, there are solutions which are All the ten dimensions of string theory need not be
charged with respect to the various gauge fields that noncompact. In fact, to describe the real world, one
appear in the supergravity spectrum. Generically, must have a solution of string theory in which six of
these charged solutions represent extended objects. the dimensions are wrapped up and form a compact
For accounts of such solutions, see Maldacena space. In principle, however, one can compactify
(1996). any number of dimensions. In the above example
Consider, for example, the supergravity which of a p-brane, it is trivial to compactify the
follows from type IIB string theory. This theory has directions along which the brane is extended to a
a pair of 2-form gauge fields BMN and B0MN and a p-dimensional torus, T p , which can be chosen to be
4-form gauge field AMNPQ with a self-dual field a product of p circles each of radius R. At length
strength. Just as an ordinary point electric charge scales much smaller than R, the theory then becomes
produces a 1-form gauge field, a (p þ 1)-form gauge a (10  p)-dimensional theory. The p-brane appears
field may be sourced by an electrically charged as a black hole with a spherical horizon and,
p-dimensional extended object. The corresponding since the original p-form gauge field now behaves
field strength is a (p þ 2)-form, whose Hodge dual in as an ordinary 1-form gauge field with a nonzero
d spacetime dimensions is a (d  p  2) form. This time component, this is an electrically charged
shows that there should be magnetically charged black hole.
Branes and Black Hole Statistical Mechanics 375

D1–D5–N System and Five-Dimensional Black The Bekenstein–Hawking entropy is given by


Holes
RVr03
For reasons which will become clear in the next SBH ¼ cosh 1 cosh 5 cosh  ½12
83 ls8 g2s
section, it is useful to get extremal black holes with
large horizon areas, so that Hawking’s semiclassical while the Hawking temperature is
formulas are valid. It turns out that such solutions 1
involve branes of various types which intersect each TH ¼ ½13
2r0 cosh 1 cosh 5 cosh 
other and are suitably wrapped on compact internal
spaces. Such black holes then have necessarily The extremal limit of this solution is given by
different kinds of charges. It turns out that the
r0 ! 0; 1 ; 5 ;  ! 1
simplest case is a five-dimensional black hole with ½14
three kinds of charges, which is obtained by brane Q1 ; Q5 ; N ¼ fixed
systems wrapped on a compact five-dimensional The extremal solution is a BPS saturated state and
space. An example is a type IIB solution which has retains four of the original supersymmetries. In this
D5 branes which are wrapped on either T 4  S1 or limit, the inner and outer horizons coincide. How-
K3  S1 , together with D1 branes wrapped on the S1 ever, the horizon is now a smooth S3 with a finite
as well as some momentum along the S1 . From the area in the Einstein frame metric. Consequently, the
noncompact five-dimensional point of view, this is a extremal Bekenstein–Hawking entropy is also finite
black hole with three kinds of gauge charges: the D5 and may be seen to be
charge Q5 , the D1 charge Q1 , and a Kaluza–Klein
charge N coming from the momentum P = N=R -charge extremal ¼ 2pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
S3BH Q1 Q5 N ½15
along the circle of radius R.
When the internal space is T 4  S1 the five- The temperature, however, is zero in this limit,
dimensional Einstein frame metric is given by which is consistent with the stability of a BPS
saturated state.
  The above five-dimensional black hole is in fact a
2 2=3 r02
ds ¼ ½f ðrÞ 1  2 dt2 generalization of the Reissner–Nordtsrom black
r
hole. Similar solutions with large horizon areas in
  the extremal limit can be constructed in four
1=3 dr 2
þ ½f ðrÞ þ r 2 d23 ½8 dimensions. One such construction is in the IIB
ð1  r02 =r2 Þ
theory wrapped on T 6 in which there are four sets of
D3 branes which wrap four different T 3 ’s contained
where
in the T 6 . Black holes with lower supersymmetry
! ! may be obtained by replacing the T 6 by a Calabi–
r 2 sinh2 1 r 2 sinh2 5
f ðrÞ ¼ 1 þ 0 2 1þ 0 2 Yau space.
r r
! Duality and Branes
r 2 sinh2 
 1þ 0 2 ½9 String theory has a rich set of symmetries called
r
duality symmetries which relate different kinds of
string theories that are suitably compactified.
and the three charges are
These symmetries relate different classical solutions.
Vr02 sinh 21 r02 sinh 25 For example, application of these symmetries relate
Q1 ¼ ; Q5 ¼ the five-dimensional black holes above with other
324 gs ls6 2gs ls2
½10 five-dimensional black holes with different kinds of
VR2 charges. Furthermore, at the level of supergravity,
N¼ r 2 sinh 2 these various theories may be derived from
324 ls8 g2s 0
a yet unknown 11-dimensional theory called the
where V is the volume of the T 4 and R is the radius M-theory whose low-energy limit is 11-dimensional
of the circle S1 . supergravity.
The ADM mass of the black hole is
RVr02 Branes in String Theory
MADM ¼
324 g2s ls8
For a given string theory, the perturbative spectrum
½cosh 21 þ cosh 25 þ cosh 2 ½11 consists of strings. However, at the nonperturbative
376 Branes and Black Hole Statistical Mechanics

level, there are, in addition, extended objects of gauge, the off-diagonal gauge fields and their super-
other dimensionalities. Duality symmetries imply symmetric partners (which include scalar fields in
that these extended objects are as ‘‘fundamental’’ the adjoint representation) are the low-energy
as the strings themselves. Such extended objects are degrees of freedom of open strings which connect
also called branes. For an exhaustive account of different branes.
branes in string theory, see Johnson (2003). The mass density or tension Tp of a single Dp
Like their counterparts in supergravity, branes in brane is given by
string theory are typically charged with respect to
1
some gauge fields. While supergravity solutions are Tp ¼ pþ1
½16
possible with any value of the charge, in string gs ð2Þp ls
theory the brane charges have to be quantized. This couples to the (p þ 1)-form gauge field with a
Multiple units of the minimum quantum of charge charge
can appear as collections of branes each with unit
charge or, alternatively, branes which wrap around p ¼ gs Tp ½17
compact cycles in space a multiple number of times. and the Yang–Mills coupling constant for the collec-
tive theory on the brane world volume is given by
D-Branes 2
gYMDp ¼ ð2Þp2 gs lsp3 ½18
The extended objects in string theory are described
in terms of their collective excitations. These The ground state of a single Dp brane is a BPS state
are best understood for the class of branes called which preserves 16 of the 32 supersymmetries of the
D-branes in the type II theory, discovered by original theory. One consequence of this is that two or
Polchinski. These are D1, D3, D5, and D7 branes more parallel Dp branes of the same type form a
in type IIB and D0, D2, D4, and D6 branes in threshold bound state preserving the same supersym-
type IIA theory. Dp branes are characterized by the metries, with no net force between them. As a result, the
fact that they couple to, and act as sources for, tension of N parallel Dp branes is simply NTp .
(p þ 1)-form gauge fields which belong to the Branes of different dimensionalities can also form
Ramond–Ramond sector of the theory. Collective bound states. Of particular interest are configura-
excitations of a p-dimensional extended object in tions which can form threshold bound states which
field theory are expected to be described by waves preserve some supersymmetries. For example, a set
on its (p þ 1)-dimensional world volume. The of N1 parallel Dp branes can form a threshold
collective coordinate action would be a quantum bound state with a set of N2 parallel D(4 þ p)
field theory which has vectors, corresponding branes with all the p branes lying entirely along the
to longitudinal oscillations of the brane, and (4 þ p)-branes. This configuration is also a BPS
scalars which correspond to transverse oscillations. saturated state preserving eight of the original
For D-branes in string theory, the theory of supersymmetries and would have charges under
collective excitations is a string field theory of open both (p þ 1)-form and (p þ 5)-form gauge poten-
strings whose endpoints lie on the brane. (This is the tials. The BPS nature ensures that the total mass
origin of the nomenclature D-brane: an open string density is the sum of the individual mass densities.
whose ends are constrained to lie on the brane has a
NS Branes
world-sheet description in which the bosonic
fields corresponding to transverse target space The other extended objects in string theory are
coordinates have Dirichlet boundary conditions.) called NS branes since they couple to p-form
The lowest-energy states of open superstrings are gauge fields which arise from the Neveu–Schwarz/
ordinary massless gauge fields and their supersym- Neveu–Schwarz sector of the world-sheet theory.
metric partners so that the low-energy limit of These are present in all the five string theories and
the string field theory is a supersymmetric gauge appear in two types. The first is a macroscopic
theory. fundamental string which may be wound around a
The fact that the underlying theory is a string compact direction. The second is called a solitonic
theory has an important consequence. For a system 5-brane. While the collective dynamics of a funda-
of N parallel D-branes of the same type, one mental string is the standard world-sheet description
would have open strings which join different branes of string theory, the description for the NS 5-brane
as well as the same brane. The low-energy is rather complicated and not known in full
theory then becomes a supersymmetric nonabelian detail. The rest of this article deals exclusively with
gauge theory with gauge group U(N). In a suitable D-branes.
Branes and Black Hole Statistical Mechanics 377

D-Branes and Black Branes It is well known that in ap ffiffiffiffiffiffiffip ) gauge


U(Q theory
pffiffiffiffiffiffiffiffiffiffi
ffi the real
coupling constant is gYM QP  gs Qp . This means
The idea that black holes correspond to highly
that the semiclassical limit corresponds to a strongly
degenerate states in string theory is quite old and
coupled string-field theory which reduces to strongly
dates back to ’t Hooft (1990) and Susskind (1993).
coupled gauge theory in the low-energy limit and the
In the following two sections we discuss such black
picture of D-branes as a collection of open strings is
holes which are described by D-branes. For reviews
not very useful. In fact, known calculational methods
see Maldacena (1996), Das and Mathur (2001), and
in gauge theory or open-string theory are not valid in
David et al. (2002).
this regime.
We have so far discussed the string-theoretic
branes in two different ways. In the first description,
branes are solutions of the low-energy equations of Microscopic Entropy for Two-Charge Systems
motion – this is the setting in which branes provide
conventional descriptions of black holes. In the The prospects are much better for extremal black
second description, branes are certain states in the holes, which appear as BPS states in string theory.
quantum theory of superstrings. More specifically, This is because the spectra of BPS states do not
D-branes are described in terms of states of the depend on the coupling. The degeneracy of such
open-string field theory which lives on the branes. states may therefore be calculated at weak coupling,
The first description is necessarily approximate. On where techniques are well known and the result can
the other hand, the second description is exact in be extrapolated to strong coupling without change.
principle, although in practice one might not know The simplest BPS state is the ground state of a set of
how to write down and analyze the string-field parallel D-branes of the same type. This state is indeed
theory in an exact fashion. 128-fold degenerate, which would imply a micro-
The description in terms of open-string field scopic entropy. This entropy, however, is small and
theory should reduce to the description in terms of therefore invisible in the corresponding classical
a classical solution when the charges and masses solution. Indeed, the classical solution shows that in
become large. If black-hole thermodynamics has a the extremal limit the horizon area is zero, leading to a
microscopic origin, D-branes should be highly vanishing Bekenstein–Hawking entropy.
degenerate states in this limit and the entropy The next interesting class of states consists of
should be given by the Boltzmann formula. Further- threshold bound states with two kinds of
more, Hawking radiation should be understood as charges. Consider, for example, the D1–D5 system
an ordinary decay process. on T 4  S1 considered above with no momentum
For a system of Qp parallel Dp branes, the mass along the D1’s. By known duality transformations,
is Qp =gs , while Newton’s gravitational constant this is equivalent to a fundamental IIB string which
G  gs2 . Gravitational effects are controlled by is wound Q5 times around the S1 and with a net
GM  gs Qp . A semiclassical limit in closed-string momentum P = Q1 =2Q5 R (where R is the radius of
theory requires gs ! 0, while a nontrivial gravita- the S1 ), with four of the transverse directions
tional effect in this limit requires gs Qp finite, which compactified on a T 4 . For this system, it is easy to
implies one must have Qp 1. Furthermore, when count the number of states for given values of Q1
gs Qp  1 the typical curvatures are small compared and Q5 at weak string coupling by simply enumer-
to the string scale and the semiclassical string theory ating the perturbative oscillator states of the string.
reduces to classical supergravity. This is the limit in For large values of Q1 and Q5 , we can alternatively
which branes are well described as classical calculate this entropy by using a canonical ensemble
solutions. of eight massless bosons corresponding to the eight
Similar considerations apply for brane systems with transverse polarizations and their supersymmetric
multiple charges. For example, in the D1–D5–N partners – eight massless fermions – moving on the
system the classical solution becomes a good string with some temperature T and a chemical
description when all the quantities gs Q1 , gs Q5 , and potential  for the total momentum.
gs2 N become large. (The relevant quantity which Consider a noninteracting gas of f massless bosons
comes with the momentum has gs2 rather than gs and f fermions living on a circle with circumference
because the mass contribution from the momentum is L. The average number of left- and right-moving
simply N/R without any inverse power of gs .) particles with some energy e, denoted by L , R ,
However, gs is the square of the coupling constant respectively, are
of the open-string theory living on the brane – in fact, 1
eqn [18] shows this relation in the low-energy limit. i ðeÞ ¼ ; i ¼ L; R ½19
ee=Ti  1
378 Branes and Black Hole Statistical Mechanics

where the  sign refers to fermions and bosons, The key point, however, is that the two-charge
respectively, and we have introduced left- and right- solution has a singular horizon where the string
moving temperatures TL , TR . The physical tempera- frame curvature is large. Consequently, low-energy
ture is tree-level supergravity breaks down near the horizon
  and higher-derivative terms (e.g., higher powers of
1 1 1 1 curvature) become important. This issue has been
¼ þ ½20
T 2 TL TR best studied for the fundamental heterotic string
compactified on T 6 . This is dual to the D1–D5
The extensive quantities, such as the energy E, system in type IIB theory compactified on K3  T 2 .
momentum P, and entropy S, then become the sum The classical supergravity solution is then a singular
of left- and right-moving pieces: black hole in four spacetime dimensions. In one of
E ¼ EL þ ER ; P ¼ P L þ PR ; S ¼ SL þ SR ½21 the first papers on the string-theoretic understanding
of black hole thermodynamics, Sen (1995) showed
and the distribution function [19] leads to the that, for large np , nw , string-loop effects are small
following thermodynamic relations: near the horizon so that the only relevant correc-
sffiffiffiffiffiffiffiffiffi tions are higher-derivative terms coming from
3Ei 4Si integrating out the massive modes of the string at
Ti ¼ ¼ ; i ¼ L; R ½22 tree level. Furthermore, a robust scaling argument
Lf fL
shows that regardless of the detailed nature of the
Since the total momentum P = PR þ PL = ER  EL is derivative corrections, the macroscopic entropy
nonzero, the lowest-energy state is clearly the one in defined through the horizon area must be of the
pffiffiffiffiffiffiffiffiffiffiffi
which all the particles move in the same direction, form a np nw , where a is a pure number. Finally,
for example, right moving. This is a BPS state and one can define a ‘‘stretched horizon’’ as the surface
corresponds to the extremal solution in supergravity. where the curvature becomes of the order of the
Then E = ER = P = PR . This approach to the black string scale and the area of the stretched horizon
pffiffiffiffiffiffiffiffiffiffiffi
hole entropy was initiated by Das and Mathur is indeed proportional to np nw . This result gives
(1996) and Callan and Maldacena (1996). a strong indication that string theory provides a
For our two-charge system, f = 8, P = 2Q1 =L, microscopic basis for black hole thermodynamics,
and L = 2RQ1 Q5 . Using [22] we get although the coefficient a cannot be determined
without more detailed knowledge of higher-
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2-charge-II
Smicro ¼ 2 2Q1 Q5 ½23 derivative terms.

This is the microscopic entropy for the fundamental


Microscopic Entropy of Extremal Three-Charge
string with momentum in the type II theory. By System
duality, this is also the microscopic entropy of the
D1–D5 system. This is a large number which should Brane bound states with three kinds of charge
agree with the macroscopic entropy calculated from provide examples of black holes whose extremal
the corresponding classical solution. limits have large horizons with curvatures much
The discussion is almost identical for the funda- smaller than the string scale. In this case, a
mental heterotic string, except that now we have microscopic count of states in string theory should
24 right-moving bosons, eight left-moving bosons, exactly account for the Bekenstein–Hawking
and eight left-moving fermions, and the BPS state formula, without corrections coming from
consists only of right movers. If nw denotes the higher derivatives. This is indeed true, as first found
winding number and np the quantized momentum by Strominger and Vafa (1996). In the following, we
the extremal heterotic string entropy is will outline how this calculation can be done in the
D1–D5–N system on K3  S1 or T 4  S1 following
pffiffiffiffiffiffiffiffiffiffiffi
S2-charge
micro
heterotic
¼ 4 np nw ½24 the treatment of Dijkgraaf et al. (1996).
D1 branes can be considered as ‘‘instanton
The supergravity solution for the D1–D5 strings’’ in the six-dimensional supersymmetric
system may be obtained by substituting  = 0 in U(Q5 ) gauge theory of D5 branes (actually, these
eqns [8]–[13]. In the extremal limit, the classical should be called solitonic strings rather than
Bekenstein–Hawking entropy vanishes as is clear instantons, since the configurations are time
from the expression [15], in which N = 0. This independent). The total instanton number is the
appears to be in contradiction with the fact that the D1-brane charge Q1 . The moduli space of
state has a large microscopic entropy. these instantons is then a blown-up version of the
Branes and Black Hole Statistical Mechanics 379

orbifold (T 4 )Q1 Q5 =S(Q1 Q5 ) or (K3)Q1 Q5 =S(Q1 Q5 ) times. Thus, the thermodynamics may be analyzed
and is 4Q1 Q5 dimensional. Since any instanton exactly along the lines of the fundamental string in
configuration is independent of time x0 and the S1 the previous section. The thermodynamic relations
direction x5 , the collective coordinate dynamics is a are given by [22] with f = 4 and L = 2RQ1 Q5 . The
(1 þ 1)-dimensional field theory which lives in the extremal state consists entirely of right movers and
(x0 , x5 ) space. At low energies, this flows to a E = ER = N=R. Substituting these values in [22]
conformal field theory with a central charge yields the correct formula for the microscopic
c = 6Q1 Q5 since there are 4Q1 Q5 bosons each entropy
contributing 1 to the central charge and an equal pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
number of fermions each contributing 1=2. The BPS S3-charge
micro ¼ 2 Q1 Q5 N ½27
state with momentum N=R is a purely right- or left-
moving state in this conformal field theory which The same expression follows if f = 4Q1 Q5 and
has a conformal weight N. From general principles L = 2R corresponding to Q1 Q5 singly wound
of conformal invariance, the degeneracy of such strings. However, for statistical methods to hold,
states for large N is given by Cardy’s formula the entropy must be much larger than the number of
pffiffiffiffiffiffiffiffi flavors. The ratio p of ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the entropy ffi to the number of
dðNÞ  e2 cN=6 ½25 flavors is S=f  N=Q1 Q5 for multiple singly
wound strings and is not guaranteed to be large
so that the microscopic entropy is when all of Q1 , Q5 , p N ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
are large. On the other hand,
pffiffiffiffiffiffiffiffiffiffiffiffi this ratio is S=f  Q1 Q5 N for the long string.
Smicro
3-charge ¼ log dðnÞ ¼ 2 cN=6 ½26
This shows that the long string is always entropi-
Substituting the value of c = 6Q1 Q5 , this is in exact cally favored.
agreement with the Bekenstein–Hawking entropy of A departure from the extremal state is achieved by
the classical solution given in [15]. adding a left-moving momentum 2n=L as well as a
right-moving momentum 2n=L to the extremal
state, thus adding energy to the system but main-
Nonextremal Black Holes and Hawking taining the total momentum. For the long string, this
Radiation yields
The BPS property of ground states of D-brane pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi
systems enables us to compute the degeneracy of SR ¼ 2 Q1 Q5 N þ n; SL ¼ 2 n ½28
microstates exactly in the regime of parameters For small departures from extremality, n  N, the
where the state can be reliably described as a black expressions for the total entropy and temperature as
hole solution in the low-energy theory. However, a function of the excess energy E = 2n=Q1 Q5
extremal black holes have vanishing temperature agree exactly with the near-extremal Bekenstein–
and do not radiate. To understand the microscopic Hawking entropy and the Hawking temperature of
origins of Hawking radiation, one has to go away the classical solution, as shown by Callan and
from extremality. Such states are not supersym- Maldacena (1996) and by Horowitz and
metric and an extrapolation of weak-coupling Strominger.
calculations to strong coupling is not a priori The necessity of the long string appears in another
justified. Nevertheless, it turns out that for small important physical consideration. For statistical
departures from extremality, weak-coupling results mechanics to be valid, the specific heat of the system
still reproduce semiclassical answers for entropy, has to be larger than unity. This implies that for
temperature, and luminosity. the case considered here the energy gap E must be
larger than 1=RQ1 Q5 , which is precisely what the
Near-Extremal Entropy
long string yields.
Nonextremal properties are best understood for the
D1–D5–N system on T 4  S1 . In the orbifold limit,
Hawking Radiation
the conformal field theory which describes the low-
energy dynamics is equivalent to a gas of strings A nonextremal state described above is unstable,
which are wound around the S1 and which can since a left mover can annihilate a right mover into a
oscillate along the T 4 . The total winding number is closed-string mode which may leave the brane
k = Q1 Q5 and may be achieved by sets of strings system and propagate to the asymptotic region.
which are multiply wound in various ways. As The resulting closed-string state will be in a thermal
argued below, entropically the most favored config- state whose temperature is the physical temperature
uration is a single long string wound around Q1 Q5 of the initial state. This process is the microscopic
380 Branes and Black Hole Statistical Mechanics

description of Hawking radiation. The decay rate is supersymmetric and therefore a naive extrapolation
related to the absorption cross section of the to strong coupling is not a priori justified. There
corresponding mode by the principle of detailed are strong indications, however, that low-energy
balance, encoded in eqn [4]. nonrenormalization theorems are at work. This
From the point of view of the classical solution, agreement has been established not only for black
the absorption cross section can be calculated by holes with finite-horizon areas, but also for other
solving the linearized wave equation in the systems with no horizons – most significantly, a set
background geometry and calculating the ratio of of parallel 3-branes – and forms the basis for
the incident and reflected waves. It follows from Maldacena’s conjecture about AdS/CFT Correspon-
these calculations that at low energies, absorption dence (see AdS/CFT Correspondence).
(and hence emission) are dominated by massless
minimally coupled scalars. In fact, for any spheri-
cally symmetric black hole in any number of Effects of Higher-Derivative Terms
dimensions, there is a general theorem which The classical low-energy limit of string theory is
ensures that the low-energy limit of this absorption supergravity. The effects of the massive modes of the
cross section is exactly equal to the horizon area. string as well as effect of string loops is to add terms to
In the microscopic model for the three-charge the supergravity action which involve higher number
black hole, this absorption cross section may be of spacetime derivatives, for example, terms containing
calculated by the usual rules of quantum mechanics. higher powers of the curvature. In the presence of such
In the long-string limit and in the approximation terms, the Bekenstein–Hawking formula for black hole
that the modes on the long string form a dilute gas, entropy [2] receives corrections which can be calcu-
the result has been derived by Das and Mathur lated in a systematic fashion. It turns out that for a
(1996): class of extremal black holes, this corrected entropy as
2G10 Q1 Q5 e!=T  1 computed in the modified supergravity is also in exact
ð!Þ ¼ ! !=2T ½29 agreement with a microscopic calculation.
V ðe R  1Þðe!=2TL  1Þ
One example of this agreement is provided by four-
where V is the volume of the T 4 and T is the dimensional extremal black holes in type IIA string
physical temperature given by [20]. For a near- theory compactified on a Calabi–Yau manifold. These
extremal hole TR  TL , so that T  2TL . Then are obtained by wrapping D4 branes on three different
in the extreme low-energy limit !  TR , so that 4-cycles on the Calabi–Yau and having in addition a
the corresponding Bose factor may be approxi- number of D0 branes. Let pA , A = 1, . . . ,3 denote the
mated as 1=(e!=2TR  1)  2TR =!. The cross three D4 charges and q0 denote the D0 charge. The
section [29] becomes microscopic entropy of the BPS state can be computed
by embedding this in M-theory:
4Q1 Q5 G10 TR 4G10 SR
¼ ¼ SCYBlack
micro
hole
V ð2RÞV rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
¼ 4G5 Sextremal ¼ AH ½30 ¼ 2 jq0 jðCABC pA pB pC þ c2A pA Þ ½31
6
where G5 is the five-dimensional Newton’s gravita- where CABC is the intersection number of the
tional constant. We have used the relation [22] with 4-cycles and c2 denotes the second Chern class of
L = 2RQ1 Q5 and f = 4. The fact that in the near- the Calabi–Yau space. When all the charges pA are
extremal limit SR is simply the extremal entropy and large, the term involving c2 is subdominant. In this
the fact that the extremal entropy reproduces the case, the result agrees with the Bekenstein–Hawking
Bekenstein–Hawking formula has been used as well. entropy of the corresponding classical solution.
Thus, the microscopic cross section exactly reproduces When the charges are not all large (so that the
the semiclassical result at low energies. Even more second term is appreciable), the curvatures of the
remarkably, the full cross section [29] agrees with the supergravity solution become large at the horizon
semiclassical answer for the gray-body factor for and higher-derivative corrections to the action
parameters which correspond to the dilute-gas regime, cannot be ignored. In this particular case, it turns
as shown by Maldacena and Strominger. out that these higher-derivative corrections are
It is rather surprising that the results for micro- string-loop corrections and can be computed using
scopic absorption cross section calculated at weak general properties of N = 2 supersymmetry, so that
coupling agree with the semiclassical answers, since one can compute corrections to near-horizon
the relevant process involves states which are not geometry. Furthermore, one has to now modify the
Branes and Black Hole Statistical Mechanics 381

expression for macroscopic entropy using the open strings. This is a consequence of the basic
formalism of Wald. Putting these together, it is duality between open strings and closed strings.
found that the macroscopic entropy following from Furthermore, the open-string theory lives in a lower-
the modified supergravity is in exact agreement with dimensional spacetime. This is a manifestation of
[31]. This subject is reviewed in Mohaupt (2000). the holographic principle. As argued by Maldacena,
These methods have also been applied to the the presence of a horizon implies that the low-
problem of two-charge black holes in heterotic energy limit retains all the modes of the closed
string theory on T6 or, equivalently, type IIA on strings near the horizon, while it truncates the open-
K3  T 2 (Dabholkar 2004). Recall that in this case string theory to a gauge theory. Open–closed duality
the horizon of the usual supergravity solution is then reduces to gauge–string duality. This provides a
singular. It has been found that leading-order strong evidence that black holes obey the normal
higher-derivative corrections smoothen out the laws of quantum mechanics and hence their time
horizon into a AdS2  S2 spacetime and the evolution is unitary.
modified expression for the macroscopic entropy is One of the most outstanding problems in the
again in exact agreement with the microscopic subject is a proper understanding of neutral black
answer [23]. holes. Most of the quantitative results described
above depend on supersymmetry, which allows
extrapolation of weak-coupling answers to the
Geometry of Microstates
strong-coupling domain. Some of these results can
A satisfactory solution of the information-loss be extended to situations which have small depar-
paradox requires a much more detailed understand- tures from supersymmetry, for example, near-
ing of black holes in string theory. The discussion extremal black holes. States corrresponding to
above shows that black holes have microstates neutral black holes are, however, far from super-
which may be described well in the weak-coupling symmetry and known calculational techniques fail.
regime. It is interesting to ask whether there is a There are good reasons to expect, however, that the
description of these microstates in the strong- general philosophy – in particular the holographic
coupling regime in terms of the effective geometry principle – is still valid. Finally, so far string theory
perceived by suitable probes. This question has been has been able to attack problems of eternal black
answered for the two-charge system in great detail holes. A satisfactory understanding of the informa-
(see Mathur (2004)). It turns out that the D1–D5 tion-loss problem requires an understanding of the
microstates can be described by perfectly smooth dynamics of black hole formation and subsequent
metrics with no horizons, and they asymptote to evaporation. Unfortunately, very little is known
the standard two-charge metric discussed above. about this at the moment.
The location of the erstwhile stretched horizon
marks the point where the different microstates See also: AdS/CFT Correspondence; Black Hole
start differing from each other significantly. Since Mechanics; Supergravity; Superstring Theories.
each such geometry does not have a horizon, neither
does it have any entropy – this is consistent with
their identification with nondegenerate microstates. Glossary
Indeed, the number of such microstates correctly ADM (Arnowitt–Deser–Misner) mass – Mass of a gravita-
accounts for the microscopic entropy. Whether a tional background which is asymptotically flat.
similar picture holds for the three-charge system AdSn (anti-de Sitter space) – A space (or spacetime) with
remains to be seen in detail, although there are some constant negative curvature in n dimensions.
indications that this may be true. In this approach, it BPS state (Bogomolny–Prasad–Sommerfeld state) – In a
is not yet fully understood how a horizon emerges theory of extended supersymmetry, a state that is
and why the entropy scales as the horizon area. invariant under a nontrivial subalgebra of the full
supersymmetry algebra. These states always carry
conserved charges, and supersymmetry determines the
Outlook mass exactly in terms of the charges.
Calabi–Yau space – Complex Kahler manifold with
One key feature of the understanding of black hole
vanishing first Chern class.
statistical mechanics from the dynamics of branes is Compactify (n. compactification) – To consider a field or
the fact that a problem in gravity is mapped to a string theory in a spacetime some of whose spatial
problem in a theory without gravity, for example, dimensions are compact.
open-string field theory. In fact, the closed strings in Dirichlet boundary condition – The boundary condition
the bulk are already contained in the spectrum of the which fixes the value of a field on the boundary.
382 Branes and Black Hole Statistical Mechanics

Duality Equivalence of systems which appear to be Threshold bound state – A bound state which is margin-
distinct. For string theories, such equivalences relate ally bound, that is, the binding energy is zero.
string theories on different spacetimes as well as Tree level – In a Feynman diagram expansion of a field
theories with different coupling constants. theory, terms which contribute to lowest order of the
Einstein–Hilbert action – The standard action for gravity Planck constant h.
which leads
R to Einstein’s equation, U(N) – The group of N  N unitary matrices. If the
pffiffiffi
S = (1=16G) dd x gR, where R is the Ricci scalar, determinant is unity, the subgroup is called SU(N).
g denotes the determinant of the metric, and G is
Newton’s gravitational constant.
Instanton – A classical solution of Euclidean field theory Further Reading
with finite action.
Kaluza–Klein gauge field – In a compactified theory, the Callan CG and Maldacena M (1996) D-brane approach to black
gauge field which arises from the metric of the higher- hole quantum mechanics. Nuclear Physics B 472: 591
dimensional theory. (arXiv:hep-th/9602043).
Dabholkar A (2004) Exact counting of black hole microstates,
K3 – The unique Calabi–Yau manifold in four dimensions
arXiv:hep-th/0409148.
having an SU(2) holonomy. Das SR and Mathur SD (1996) Comparing decay rates for black
Loop levels – In a Feynman diagram expansion of a field holes and D-branes. Nuclear Physics B 478: 561 (arXiv:hep-
theory, terms which contribute in higher orders of the th/9606185).
Planck constant h. Das SR and Mathur SD (2001) The quantum physics of black
Macroscopic entropy – Entropy associated with gravita- holes: results from string theory. Annual Review of Nuclear
tional backgrounds via the Bekenstein–Hawking for- and Particle Science 50: 153 (arXiv:gr-qc/0105063).
mula or its generalization. David JR, Mandal G, and Wadia SR (2002) Microscopic
Microscopic entropy – Entropy which follows from the formulation of black holes in string theory. Physics Reports
degeneracy of states of a system via Boltzmann’s 369: 549 (arXiv:hep-th/0203048).
Dijkgraaf R, Moore GW, and Verlinde E (1996) Elliptic genera of
relation.
symmetric products and second quantized strings. Commu-
Minimally coupled scalar – A scalar field whose equation nications in Mathematical Physics 185: 197 (arXiv:hep-th/
of motion is the standard Klein–Gordon equation 9608096).
where the derivatives are covariant derivatives. ’t Hooft G (1990) The black hole interpretation of string theory.
Neveu–Schwarz/Neveu–Schwarz states – In type I and II Nuclear Physics B 335: 138.
string theories, bosonic closed-string states whose left- Johnson C (2003) D-Branes. Cambridge: Cambridge University
and right-moving parts are bosonic. Press.
No-hair theorem – A theorem in general relativity which Maldacena JM (1996) Black holes in string theory, arXiv:hep-th/
states that black holes with nonsingular horizons are 9607235.
uniquely characterized by their mass, angular Maldacena J, Strominger A, and Witten E (1997) Black hole
entropy in M-theory. Journal of High Energy Physics
momenta, and charges which can couple to long-
9712: 002 (arXiv:hep-th/9711053).
range gauge fields. Mathur SD (2004) Where are the states of a black hole?,
Orbifold – A coset space M=G where G is a group of arXiv:hep-th/0401115.
discrete symmetries of a manifold M. If G has a fixed Mohaupt T (2000) Black hole entropy, special geometry
point, the space is singular. and strings. Fortschritte der Physik 49: 3 (arXiv:hep-th/
p-Form – A fully antisymmetric p-index tensor. 0007195).
Ramond–Ramond states – In type I and II string theories, Sen A (1995) Extremal black holes and elementary string states.
bosonic closed-string states whose left- and right- Modern Physics Letters A 10: 2081.
moving parts are fermionic. Strominger A and Vafa C (1996) Microscopic origin of the
Reissner–Nordstrom black hole – Black hole solution of Bekenstein–Hawking entropy. Physics Letters B 379: 99
(arXiv:hep-th/9601029).
general relativity with electric Maxwell charge.
Susskind L (1993) Some speculations about black hole entropy in
Sn – n-Dimensional sphere. string theory, arXiv:hep-th/9309145.
Supergravity – Supersymmetric extension of general Wald R (1994) Quantum Field Theory In Curved Space-Time and
relativity. Black Hole Thermodynamics. Chicago, IL: University of
Supersymmetry – A symmetry between bosons and Chicago Press.
fermions.
Breaking Water Waves 383

Breaking Water Waves


A Constantin, Trinity College, Dublin, features of wave breaking (blow-up rate and blow-
Republic of Ireland up set for certain types of breaking waves). We
ª 2006 Elsevier Ltd. All rights reserved. conclude the presentation with a discussion of the
way in which solutions to the Camassa–Holm
equation can be continued after wave breaking.

Introduction
Watching the sea or a lake it is often possible to The Governing Equations
trace a wave as it propagates on the water’s surface.
One can roughly distinguish two types of breaking The water waves that one typically sees propagating
waves. All waves break while reaching the shore but on the surface of the sea or on a lake are, as a matter
certain waves break far from the shore. In the first of common experience, approximately two dimen-
case, the change in water depth or the presence of an sional. That is, the motion is identical in any direction
obstacle (e.g., a rock) seems to cause wave breaking, parallel to the crest line. To describe these waves, it
while for certain waves within the second category, suffices to consider a cross section of the flow that is
these factors appear not to be essential. It is a matter perpendicular to the crest line. Choose Cartesian
of observation that for many waves that break in the coordinates (x, y) with the y-axis pointing vertically
open water a drastic increase in their slope near upwards and the x-axis being the direction of wave
breaking is noticeable. This leads us to the following propagation, while the origin lies at the mean water
mathematical definition: the wave profile gradually level. Let (u(t, x, y), v(t, x, y)) be the velocity field of
steepens as it propagates until it develops a point the flow, let y =  d be the flat bed (for some fixed
where the slope is vertical and the wave is said to d > 0), and let y = (t, x) be the water’s free surface.
have broken (Whitham 1980). Throughout this Homogeneity (constant density) is a physically reason-
article, we are concerned with wave breaking that able assumption for gravity waves (Johnson 1997),
is not caused by a drastic change of the topography and it implies the equation of mass conservation
of the bottom; for a discussion of wave breaking at ux þ vy ¼ 0 ½1
the beach we refer to Johnson (1997). The governing
equations for water waves (see the next section) are The inviscid setting is realistic since experimental
too difficult to be dealt with in their full generality. evidence confirms that the length scales associated
Therefore, to gain some insight, one has to find with an adjustment of the velocity distribution due to
simpler models that are more tractable mathemati- laminar viscosity or turbulent mixing are long com-
cally. Investigating the properties of the model, pared to typical wavelengths. Under the assumption of
certain predictions can be made. The conclusions inviscid flow the equation of motion is Euler’s equation
reached will reflect reality only to some limited ut þ uux þ vuy ¼ Px
extent. The value of a model depends on the number ½2
and the degree of accuracy of physically useful vt þ uvx þ vvy ¼ Py  g
deductions that can be made from it – the ‘‘truth’’ of where P(t, x, y) denotes the pressure and g is the
the model is meaningless as all experiments contain gravitational constant of acceleration. The free
inaccuracies and effects other than those accounted surface decouples the motion of the water from
for (while deriving the model) cannot be totally that of the air so that (Johnson 1997) the dynamic
excluded. We intend to discuss the way in which a boundary condition
recent model due to Camassa and Holm (1993) can
lead to a better understanding of breaking water P ¼ P0 on y ¼ ðt; xÞ ½3
waves. Firstly we survey a few classical nonlinear must hold if we neglect surface tension, where P0 is
partial differential equations that model the propa- the (constant) atmospheric pressure. Moreover,
gation of water waves over a flat bed (within the since the same particles always form the free surface,
confines of the linear theory one cannot cope with we have the kinematic boundary condition
the wave breaking phenomenon) and discuss their
relevance to the study of breaking waves. We then v ¼ t þ ux on y ¼ ðt; xÞ ½4
analyze the breaking of waves within the context of On the flat bed we have the kinematic boundary
the Camassa–Holm equation: existence of breaking condition
waves, criteria that guarantee that a certain initial
shape develops into a breaking wave, specific v¼0 on y ¼ d ½5
384 Breaking Water Waves

expressing the fact that the flow is tangent to the yields an equation that is usually of significance in
horizontal bed (or, equivalently, that water cannot some region of space/time. The aim of this process is to
penetrate the rigid bed). The governing equations obtain a simpler model that can be used to gain some
for water waves are [1]–[5]. Other than the fact that understanding and to make some predictions for
they are highly nonlinear, a main difficulty in specific physical processes. This scaling method yields
analyzing the governing equations lies in the fact the Korteweg–de Vries (KdV) equation
that we deal with a free boundary problem: the free
surface y = (t, x) is not specified a priori. In our t þ x þ xxx ¼ 0; t > 0; x 2 R ½7
discussion, we suppose that initially (at time t = 0), a as a model for the unidirectional propagation of
disturbance of the flat surface of still water was shallow water waves over a flat bed (Johnson 1997).
created and we analyze the subsequent motion of In [7] the function (t, x) represents the height of the
the water. The balance between the restoring gravity water’s free surface above the flat bed. We would
force and the inertia of the system governs the like to emphasize that the ‘‘shallow water’’ regime
evolution of the mass of water and our primary does not refer to water of insignificant depth – it
objective is the behavior of the free surface. indicates that the typical wavelength is much larger
An important category of flows are those of zero than the typical depth (e.g., tidal waves are
vorticity, characterized by the additional assumption considered to be shallow water waves although
uy ¼ v x ½6 they affect the motion of the deep sea). The KdV
model admits the solitary wave solutions
The vorticity of a flow, ! = uy  vx , measures the local pffiffiffi 
spin or rotation of a fluid element. In flows for which c
c ðt; xÞ ¼ 3c sech2 ðx  ctÞ ; c 2 R ½8
[6] holds the local whirl is completely absent and for 2
this reason such flows are called irrotational. Relation
[6] ensures the existence of a velocity potential, namely For any fixed c > 0, the profile c propagates without
a function (t, x, y) defined up to a constant via change of form at constant speed c on the surface on
the water, that is, it represents a traveling wave. Since
x ¼ u; y ¼ v the profiles [8] of the traveling waves drop rapidly to
Notice that [1] ensures that  is a harmonic the undisturbed water level  = 0 ahead and behind the
function, that is, (@x2 þ @y2 ) = 0. In this way, the crest of the wave, c are called solitary waves. Notice
powerful methods of complex analysis become that [8] shows that taller solitary waves travel faster.
available for the study of irrotational flows. Thus, They have other special properties: an initial profile
while most water flows are with vorticity, the study consisting of two solitary waves, with the taller
of irrotational flows can be defended mathemati- preceding the smaller one, evolves in such a way that
cally on grounds of beauty. Concerning the physical the taller wave catches up the other, there is a period of
relevance of irrotational water flows, experimental complicated nonlinear interaction but eventually both
evidence indicates that for waves entering a region solitary waves emerge completely unscathed! This
of still water the assumption of irrotational flow is special type of nonlinear interaction (the superposition
realistic (Johnson 1997). Moreover, as a conse- principle is not valid since KdV is a nonlinear
quence of Kelvin’s circulation theorem (Acheson equation) in which solitary waves regain their form
1990), a water flow that is irrotational initially has upon collision occurs only for special equations, in
to be irrotational at all later times. It is thus which case the solitary waves are called solitons. A
reasonable to consider that water motions starting further interesting property of the KdV model, relevant
from rest will remain irrotational at later times. for the understanding of the interaction of solitons, is
the fact that it is completely integrable (McKean
1998): there is a transformation which converts the
equation into an infinite sequence of linear ordinary
Nonlinear Model Equations differential equations which can be trivially integrated.
Starting from the governing equations [1]–[6] one can Moreover, the KdV-solitons c are stable: an initial
derive a variety of model equations using the non- profile that is close to the form of a soliton will evolve
dimensionalization and scaling approach: a suitable into a wave that at any later times has a form close to
set of nondimensional variables is introduced, which, that of a soliton (Benjamin 1972). Despite all these
after scaling, leads to the appearance of parameters. intriguing features of the KdV-model, for all initial
The sizes and relative sizes of these parameters then profiles x 7! (0, x) within the Sobolev space H 1 (R) of
govern the type of phenomenon that is of interest. An square-integrable functions with a square-integrable
asymptotic expansion in one or several parameters distributional derivative, eqn [7] has a unique solution
Breaking Water Waves 385

defined for all times t  0 (cf. Kenig et al. (1996)) so H 3 (R) there is a unique solution of [10] defined on
that the KdV model cannot be used to shed light on the some maximal time interval [0, T) and the solution
wave breaking phenomenon. stays uniformly bounded on [0, T) with
Whitham (1980) suggested the equation  
Z lim inf fx ðt; xÞgðT  tÞ ¼ 2 if T < 1
t"T x2R
t þ x þ kðx  yÞy ðt; yÞdy ¼ 0 ½9
R In addition to this, for a large class of initial data, there
for the free surface profile x 7! (t, x), with the is precisely one point where the slope of the wave
singular kernel becomes infinite at breaking time (Constantin 2000): if
Z   0 6 0 is odd and such that 0 (x)  000 (x)  0 for all
1 tanh ðÞ 1=2 ix x  0, then the corresponding wave t 7! [x 7! (t, x)]
kðxÞ ¼ e d
2 R  will break in finite time T < 1 and
to model wave breaking. It can be shown lim x ðt;0Þ ¼ 1
t"T
(see Constantin and Escher (1998) and references
therein) that [9] describes wave breaking: there are whereas
smooth initial profiles x 7! (0, x) such that the
coshðxÞ
resulting unique solution of [9] exists on a maximal jx ðt; xÞj  K þ K
time interval [0, T) with jsinhðxÞj
t 2 ½0; TÞ; x 6¼ 0
sup fðt; xÞg < 1
ðt;xÞ2½0;TÞR for some constant K > 0. Thus, the Camassa–Holm
inf fx ðt; xÞg ! 1 as t"T model is an integrable infinite-dimensional Hamil-
x2R tonian system with stable solitons and eqn [10]
admits also breaking waves as local solutions (see
(the solution remains bounded but its slope becomes
Constantin and Escher (1998) and McKean (1998)
infinite in finite time). However, in contrast to the KdV
and references therein for further results on wave
model, eqn [9] is not integrable and does not possess
breaking for the Camassa–Holm equation).
soliton solutions. As emphasized by Whitham (1980),
We conclude our discussion by pointing out that it
it is intriguing to find models for water waves which
is possible to continue solutions of the Camassa–
exhibit both soliton interaction and wave breaking.
Holm equation past the breaking time. For this
The Camassa–Holm equation
purpose it is convenient to rewrite [10] as the
t  txx þ 3x ¼ 2x xx þ xxx ½10 nonlinear nonlocal conservation law
Z  
1 2
was first obtained by Fokas and Fuchssteiner (1981/ t þ x þ @x ejxyj 2 þ x dy ¼ 0 ½11
82) as a nonlinear partial differential equation with 2 R 2
infinitely many conservation laws. Camassa and Holm reminiscent to some extent to the form of [7] and [9]
(1993) derived [10] as a model for shallow water and obtained by formally applying the operator
waves, established that the equation possesses soliton (1  @x2 )1 to [10] in view of the fact that
solutions and found that it is formally integrable (for
a discussion of the integrability issues we refer ð1  @x2 Þ1 f ¼ P  f for f 2 L2 ðRÞ
to Constantin (2001), and Lenells (2002)). Moreover,
the solitons of [10] are stable (Constantin and Strauss the kernel of the convolution being
2003). An astonishing plentitude of structures is
PðxÞ ¼ 12ejxj ; x2R
tied into the Camassa–Holm equation: [10] is a re-
expression of geodesic flow on the diffeomorphism By introducing a new set of independent and depen-
group (Constantin 2000, Kouranbaeva 1999), a dent variables it is possible to resolve all singularities
property that can be used to show that the least action due to wave breaking in the sense that [11] is
principle holds in the sense that there is a unique flow transformed into a semilinear system, the unique
transforming a wave profile into a nearby profile solution of which can be obtained as a fixed point of
within the class of flows that minimize the kinetic a contractive operator (Bressan and Constantin 2005).
energy (see the discussion in Constantin (2000) and In terms of [11], a semigroup of global conservative
Constantin and Kolev (2003)). Interestingly, the solutions (in the sense that the total energy
Camassa–Holm equation also models wave breaking. Z
1
More precisely (see the discussion in Constantin ð2 þ x2 Þdx
(2000)), for any initial data x 7! 0 (x) = (0, x) in 2 R
386 BRST Quantization

equals a constant, for almost every time), depending Constantin A and Kolev B (2003) Geodesic flow on the
continuously on the initial data (0, ) 2 H 1 (R), is diffeomorphism group of the circle. Commentarii Mathematici
Helvetica 78: 787–804.
thus constructed. Constantin A and Strauss WA (2000) Stability of peakons. Commu-
nications on Pure and Applied Mathematics 53: 603–610.
See also: Compressible Flows: Mathematical Theory; Fokas AS and Fuchssteiner B (1981/82) Symplectic structures,
Dynamical Systems in Mathematical Physics: their Bäcklund transformations and hereditary symmetries.
An Illustration from Water Waves; Integrable Systems: Physica D 4: 47–66.
Overview; Interfaces and Multicomponent Fluids. Gesztesy F and Holden H (2003) Soliton Equations and their
Algebro-Geometric Solutions. Cambridge: Cambridge Univer-
sity Press.
Further Reading Johnson RS (1997) A Modern Introduction to the Mathematical
Theory of Water Waves. Cambridge: Cambridge University Press.
Acheson DJ (1990) Elementary Fluid Dynamics. New York: Johnson RS (2002) Camassa–Holm, Korteweg–de Vries and
Oxford University Press. related models for water waves. Journal of Fluid Mechanics
Benjamin TB (1992) The stability of solitary waves. Proceedings 455(2002): 63–82.
of the Royal Society of London Series A 328: 153–183. Kenig CE, Ponce G, and Vega LA (1996) A bilinear estimate with
Bressan A and Constantin A (2005) Global conservative applications to the KdV equation. Journal of the American
solutions of the Camassa–Holm equation, Preprints on Mathematical Society 9: 573–603.
Conservation Laws 2005-016 (www.math.ntnu.no/conserva- Kouranbaeva S (1999) The Camassa–Holm equation as a geodesic
tion/2005/016) . flow on the diffeomorphism group. Journal of Mathematical
Camassa R and Holm DD (1993) A new integrable shallow water Physics 40: 857–868.
equation with peaked solitons. Physical Review Letters Lenells J (2002) The scattering approach for the Camassa–Holm
71: 1661–1664. equation. Journal of Nonlinear Mathematical Physics
Constantin A (2000) Existence of permanent and breaking waves 9: 389–393.
for a shallow water equation: a geometric approach. Annales McKean HP (1979) Integrable systems and algebraic curves. In:
de l’Institut Fourier (Grenoble) 50: 321–362. Global Analysis, Lecture Notes in Mathematics, vol. 755,
Constantin A (2001) On the scattering problem for the Camassa– pp. 83–200. Berlin: Springer.
Holm equation. Proceedings of the Royal Society of London McKean HP (1998) Breakdown of a shallow water equation.
Series A 457: 953–970. Asian Journal of Mathematics 2: 867–874.
Constantin A and Escher J (1998) Wave breaking for nonlinear Whitham GB (1980) Linear and Nonlinear Waves. New York:
nonlocal shallow water equations. Acta Mathematica Wiley.
181: 229–243.

BRST Quantization
M Henneaux, Université Libre de Bruxelles, Bruxelles, the necessary algebraic material underlying the con-
Belgium struction and then illustrates it in the cases of the
ª 2006 Elsevier Ltd. All rights reserved. Hamiltonian BRST formalism and the Lagrangian
BRST formalism.

Introduction
A Result from Homological Algebra
The BRST symmetry was originally introduced in the
seminal papers by Becchi et al. (1976) and Tyutin (1975) The main result of homological algebra needed in
for Yang–Mills gauge theories as a tool for controlling the BRST construction deals with a differential
the renormalization of the models in a consistent (gauge- complex C with two gradings. The first grading is
independent) way. This symmetry was discovered as a an N-degree and is called the ‘‘resolution degree,’’ or
residual symmetry of the gauge-fixed action. It was ‘‘r-degree.’’ The second grading is a Z-degree and is
realized later that, in fact, the BRST construction is quite called the total ghost number. It is denoted by gh.
general, in the sense that it covers arbitrary gauge We assume that there are two odd derivations  and
theories and not just Yang–Mills gauge models. s0 that have the following properties:
Furthermore, it is intrinsic, in that no gauge choice is
actually necessary to define it. rðÞ ¼ 1; ghðÞ ¼ 1
½1
The purpose of this review is to explain the general, rðs0 Þ ¼ 0; ghðs0 Þ ¼ 1
intrinsic features of the BRST formalism applicable to
‘‘any’’ gauge theory. The proper setting for discussing and
these issues is that of homological algebra (Stasheff
(1998), and references therein). This article first explains 2 ¼ 0; s0  þ s0 ¼ 0; s20 ¼ ½; s1  ½2
BRST Quantization 387

for some derivation s1 of r-degree 1 and ghost In physical applications, the total ghost number is
number 1. The bracket [ ,] is the graded commu- a derived quantity. The primary gradings are the
tator – in this specific case, the anticommutator. We resolution degree and the ‘‘filtration degree’’ called
also assume that the homology of  vanishes at the pure ghost number and denoted pgh. It is an
nonzero value of the r-degree, both in the original N-degree and one has
complex C,
gh ¼ pgh  r ½11
Hk ð; CÞ ¼ 0; k>0 ½3 The r-degree is known as the antighost or antifield
number, depending on the context (see below).
(which is equivalent to a ¼ 0, rðaÞ > 0 ) a ¼ b) When r(x) = 0, one has gh(x) = pgh(x). Since the
and in the space of derivations, pure ghost number is non-negative, this implies that
½;  ¼ 0; rðÞ 6¼ 0 )  ¼ ½;  ½4 H k ðs; CÞ ¼ 0; k<0 ½12
where  and  are both derivations in C. The
r-degree of a homogeneous linear operator 
is defined through r((x)) = r() þ r(x) for any
element x 2 C and is negative when  decreases the A Geometric Application
r-degree. Geometric Setting
In H0 (, C), the (odd) derivation s0 defines a
differential. The cohomology of s0 modulo , Theorem 1 is relevant to the following situation.
denoted H k (s0 , H0 (, C)), is the cohomology of s0 in Consider a surface  in a manifold M, defined by
H0 (, C). It is explicitly defined through the cocycle equations
condition fa ¼ 0 ½13
s0 a ¼ m ½5 which may or may not be independent. (We assume
with coboundaries of the form for definiteness that the variables in M are bosonic,
that is, that M is an ordinary manifold – as opposed
s0 b þ n ½6 to a supermanifold. The graded case can be covered
without difficulty by including appropriate sign
The central result underlying the BRST construc-
factors at the relevant places.) Assume that  is
tion is:
partitioned by orbits generated by vector fields X 
Theorem 1 Given the above setting, there exists defined everywhere in M, tangent to  and closing
an odd derivation s in C with the following on  in the Lie bracket,
properties:
½X  ; X   ¼ C  X  þ ‘‘more’’ ½14
s ¼  þ s0 þ s1 þ    ½7
where ‘‘more’’ denotes terms that vanish on . We
assume, for simplicity, that the vector fields X  are
rðsk Þ ¼ k; ghðsk Þ ¼ 1 ½8 linearly independent of , although this is not
necessary. The formalism can be developed in the
s2 ¼ 0 ½9 nonindependent case, but it then requires more vari-
ables. We are interested in the quotient space =O of
Furthermore, one has the surface  by the orbits. To guide the geometrical
intuition, we shall assume that this quotient space is a
Hk ðs; CÞ ¼ H k ðs0 ; H0 ð; CÞÞ ½10 smooth manifold (the fiber of the orbits, etc.), and we
shall suggestively adopt notations adapted to this best
The proof is straightforward (see, e.g., possible case. The approach, being purely algebraic, is
Henneaux and Teitelboim (1992)). In particular, in fact more general. (Accordingly, the notations
the proof of [10] is a standard spectral sequence should be understood with a liberal mind.)
argument with a sequence that collapses after the The aim here is to describe the algebra of
second step. It is interesting to note that, contrary ‘‘observables,’’ that is, the algebra C1 (=O) of
to s0 , which is only a differential modulo ,s is a functions on the quotient space =O. The terminology
true differential. The construction of s provides a ‘‘observables’’ anticipates the physical situation dis-
model for H k (s0 , H0 (, C)). The differential s is not cussed below, where the orbits are the ‘‘gauge orbits.’’
unique, but this does not affect the subsequent In order to describe algebraically the algebra of
discussion. observables, one observes that this algebra is obtained
388 BRST Quantization

through a two-step procedure. First, one restricts the functions on M are annihilated by , they are
functions from M to . Second, one imposes the clearly cycles at r-degree zero. Because the left-
invariance condition along the orbits. To each of these hand side fa of the equations fa = 0 are exact
steps corresponds a separate differential. (equal to ta ), the ideal N coincides with the set
of boundaries in degree zero.
Longitudinal Complex Thus,
The longitudinal complex is associated with the H0 ð; KÞ ¼ C1 ðÞ ½21
second step. One can consider on  an ‘‘exterior
derivative operator D along the gauge orbits.’’ This We see accordingly that  successfully enforces the
operator is defined on functions on  as restriction to the surface  through its homology in
degree zero.
Df ¼ X  ðf ÞC ½15 However, if the equations fa = 0 are not indepen-

where the 1-forms C dual to the X ’s are called dent, this is not the end of the story. Indeed, any
ghosts. In the physical context, the form-degree is identity ZaA fa = 0 on the functions fa leads to a
the pgh described earlier, and so pgh(C ) = 1. The nontrivial cycle ZaA ta in r-degree 1, (ZaA ta ) = 0. This
action of D on the ghosts is given by is undesirable. To cure this drawback, one intro-
duces further generators tA in r-degree 2, one for
DC ¼ 12C   C C ½16 each identity ZaA fa = 0, and defines
The longitudinal complex L is the complex of
tA ¼ ZaA ta ; rðtA Þ ¼ 2 ½22
exterior forms along the gauge orbits. In our
representation used here, it is given by the space
in order to ‘‘kill’’ the unwanted cycles ZaA ta . The
of polynomials in the ghosts C with coefficients
Koszul complex K is thus enlarged to contain these
that are functions on . The exterior derivative D
new (even) variables and redefined as
is defined on this space by extending the formulas
[15] and [16] so that it is an odd derivation. One K ¼ C1 ðMÞ  ^ðta Þ  SðtA Þ ½23
clearly has (on )
where S(tA )
is the symmetric algebra in The tA .
D2 ¼ 0 ½17 operator  is extended to K as an odd derivation.
One has 2 = 0 and the property [21] is unaffected
The functions on the quotient space =O are just the by the inclusion of the new generators. Furthermore,
elements of the zeroth cohomological group by construction,
H 0 (D, L ),
H1 ð; KÞ ¼ 0 ½24
H 0 ðD; L Þ ¼ C1 ð=OÞ ½18
If there is no ‘‘identity on the identities,’’ we shall
In general, H k (D, L ) 6¼ 0. assume that the process stops. Otherwise, one needs
to introduce further generators in r-degree 3 and
Koszul–Tate Differential 
possibly higher. When all the appropriate variables
The Koszul–Tate differential  implements the first are included, there is no homology at higher
step in the reduction procedure. More precisely, it r-degree. Thus,
provides an algebraic resolution of the algebra
Hk ð; KÞ ¼ 0; k>0 ½25
C1 () of the smooth functions on the surface .
That algebra can be identified with the quotient
algebra
Combining  with D
C1 ðÞ ¼ C1 ðMÞ=N ½19
We now turn to the problem of combining the
where N is the ideal of functions that vanish on .
Koszul–Tate complex with the longitudinal com-
The Koszul–Tate complex K is defined by adding
plex, so as to implement the full reduction. To that
one new generator for each equation fa = 0 defining
end, we define C by adding the ghosts to K,
, denoted ta and assigned r-degree 1. In the algebra
C1 (M)  ^(ta ) (where ^(ta ) is the exterior algebra C ¼ K  ^ðC Þ ¼ 0 ½26
on t ), one defines  through
We then extend the action of the Koszul–Tate
f ¼ 0 8f 2 C1 ðMÞ; ta ¼ fa ½20 differential in the simplest way which preserves all
gradings, namely
and extends it as an odd derivation. It is clear
that r() = 1 and that 2 = 0. Because the C ¼ 0 ½27
BRST Quantization 389

It is clear that the homology of  in C is given by canonical transformations that are generated by the
first-class constraints. Assuming that all the second-
H0 ð; CÞ ¼ L ; Hk ð; CÞ ¼ 0 ðk > 0Þ ½28
class constraints have been eliminated and that the
One can also extend the longitudinal derivative bracket being used is the Dirac bracket, one sees
D to the whole complex C because the vector fields that there is a vector field X  for each constraint
X  are defined throughout M and so, the defini- function fa ,   a. (The functions fa are thus
tions [15] and [16] make sense in C. One defines assumed to be independent since the vector fields
the action of D on the generators t by requiring X  are assumed to be so. If not, further variables are
that needed, but the analysis proceeds along the same
ideas.)
D þ D ¼ 0 ½29 This implies, in turn, that there is a pairing between
This is easily verified to be possible. However, the the ghosts Ca associated with the longitudinal exterior
(odd) derivation so obtained fails to be a differential derivative and the generators ta of the Koszul–Tate
in C when the vector fields X do not close off the complex. This pairing enables one to extend the
surface . In that case, the gauge transformations bracket structure defined on the phase space to the
are not integrable off ; one says that they form an pairs (Ca , ta ) by declaring that these are canonically
‘‘open algebra.’’ One has then D2 = 0 only on , or, conjugate. The variables ta are the momenta conjugate
more precisely, to the ghosts, [ta ,Cb ] = ab . Accordingly, the complex C
relevant to the Hamiltonian situation,
D2 ¼ s1  s1  ½30
C ¼ C1 ðPÞ  ^ðCa Þ ^ ðta Þ ½33
for some (odd) derivation s1 (that vanishes in the
‘‘closed algebra’’ case). But this situation is precisely has a phase-space structure (here, P  M is the
the one discussed earlier, with the Koszul–Tate manifold obtained after eliminating the second-class
differential being indeed , as anticipated by the constraints, equipped with the Dirac bracket). The
notation, and the longitudinal differential D playing space C is known as the ‘‘extended phase space.’’
the role of s0 (the degrees also match). Applying the The r-degree is called ‘‘antighost number’’ in the
theorem discussed there, we can conclude: Hamiltonian context.
Theorem 2 There exists a differential s in C, By the general theorem described in the previous
section, one knows that the cohomology at gh = 0 of
s ¼  þ D þ s1 þ    ; s2 ¼ 0 ½31 the BRST differential is isomorphic to the algebra of
such that the observables. Thus, there are two alternative
ways to describe this physical algebra, either
H 0 ðs; CÞ ¼ C1 ð=OÞ ½32 through reduction, by eliminating the redundant
(gauge) variables, or cohomologically in an extended
This is an immediate consequence of Theorem 1 space containing additional variables, the ghosts,
and eqns [18] and [28]. The differential s is known and their momenta.
in the physical applications described below as the There is an additional interesting feature of the
BRST differential. BRST construction in the Hamiltonian case: the
BRST transformation is a canonical transformation
in the extended phase space, in the sense that

Hamiltonian BRST Construction sF ¼ ½; F ½34

As a first application of the above setting, we for some ‘‘BRST generator’’  of ghost number 1
consider the Hamiltonian description of gauge (F,  2 C). The nilpotency s2 of the BRST differen-
systems. As already known, gauge systems are tial is equivalent to
characterized in the Hamiltonian description by
½;  ¼ 0 ½35
constraints and, for this reason, are called ‘‘con-
strained Hamiltonian systems.’’ Furthermore, the That s is canonically generated implies that the
gauge transformations generate gauge orbits on the cohomological BRST groups come with a natural
constraint surface and the physical observables are bracket structure: the Poisson bracket of the extended
the functions on the quotient space of the constraint phase space passes on to the BRST cohomological
surface by the gauge orbits. groups. In particular, H 0 (s, C), equipped with this
A further important feature arises in the Hamilto- bracket structure, is isomorphic (as Poisson algebra)
nian formalism: the gauge transformations are to the algebra of physical observables.
390 BRST Quantization

Lagrangian BRST Construction ghost number is carried by the odd antibracket).


The nilpotency s2 = 0 of the BRST differential is
The analysis of the Lagrangian BRST construc-
equivalent to the crucial ‘‘master equation,’’
tion, due to Batalin and Vilkovisky (1981) (‘‘anti-
field formalism’’), proceeds in the same way because
ðS; SÞ ¼ 0 ½39
the covariant description of the space of observables
involves also the same geometric ingredients. The Because the BRST differential is canonically
surface  is now the ‘‘stationary surface,’’ that is, generated, there is a natural bracket in cohomology.
the space of solutions to the equations of motion. This bracket is not the Poisson bracket of observa-
The space M in which it is embedded is the space of bles (at gh = 0) because it changes the ghost number
all field histories. The gauge symmetry acts on this by one unit. One can, however, relate it to the
space. Furthermore, the gauge vector fields are Poisson bracket of observables (Barnich and Hen-
tangent to  since a solution is mapped on a neaux 1996); furthermore, it plays an important role
solution by a gauge transformation. The integral in the study of the consistent deformations of the
submanifolds are the gauge orbits. The observables action.
are the functions on the quotient space.
Since the equations of motion follow from an
action principle, there are as many equations as Spacetime Locality
there are fields ’i . The corresponding generators ta
in the Koszul–Tate complex (at degree 1) are called In the context of local field theory, one is often
‘‘antifields conjugate to the fields’’ and are denoted interested in a particular class of functions of the
’i . The r-degree is known as ‘‘antifield’’ (or also field histories, namely the so-called space of local
‘‘antighost’’) number. The gauge symmetry of the functionals. A local functional is, by definition, the
action implies Noether identities on the equations of integral of a local n-form (where n is the spacetime
motion. These are, therefore, not independent. dimension). A local n-form reads, in local
According to the above general discussion, there coordinates,
are further generators in the Koszul–Tate complex,
! ¼ f ðxÞ dn x ½40
at degree 2. More precisely, there are as many new
generators in degree 2 as there are Noether identities where f (x) depends on the fields at x as well as on a
or independent gauge symmetries. These are called finite number of their derivatives. When the ghosts
antifields conjugate to the ghosts and denoted C . and the antifields are included, the local functions
In the longitudinal complex, one has the ghosts C , depend on them in the same way.
with as many ghosts as there are gauge symmetries. The previous general cohomological result was
Thus, the BRST complex is the space derived in the space of all function(al)s, without locality
restriction. When changing the space of cochains, one
C ¼ C1 ðMÞ  ^ðC Þ  ^ð’i Þ  SðC Þ ½36
may change the cohomology. For instance, a local
where M is the space of all field histories. There is functional which is BRST-trivial in the space of all
now a natural pairing between the original field functionals may become nontrivial in the space of local
variables ’i and the antifields ’i , as well as between functionals. This indeed happens here because the
the ghosts C and the antifields C . One thus defines homology of the Koszul–Tate differentials usually no
a bracket in which the fields ’i and the ghosts C on longer vanishes at strictly positive r-degree in the space
the one hand, and the antifields ’i and C on the of local functionals, where it is related to local
other, are declared to be conjugate. This bracket is conservation laws. As a result, the analysis of the
denoted by parentheses, BRST cohomology in the space of local functionals is
an interesting and nontrivial problem. In particular, the
ð’i ; ’j Þ ¼ ji ; ðC ; C Þ ¼  ½37 cohomological groups H k (s) in the space of local
functionals may not vanish at negative ghost numbers.
However, since the bracket pairs variables with
degrees that add up to 1, it is in fact an ‘‘odd
bracket,’’ called the ‘‘antibracket.’’
The BRST differential is again canonically gener-
BRST Quantization
ated, but this time in the antibracket, The quantization of a dynamical system can proceed
along different lines. For gauge models, the path-
sF ¼ ðS; FÞ; F2C ½38
integral approach is most efficiently pursued in the
where the generator S is an even function of the context of the antifield formalism. We shall briefly
fields, the ghosts and the antifields, with gh = 0 (the outline here the general principles underlying the
BRST Quantization 391

operator approach, which is based on the Hamiltonian exhaustive here. Some of its main successes are
formalism. outlined here, with suggestions for ‘‘Further reading.’’
In the operator approach, all the variables,
including the ghosts and the conjugate momenta, Renormalization of Gauge Theories
are realized as operators in a space endowed with a
nonpositive-definite inner product (because of the First, there is the original context of perturbative
ghosts and the gauge modes). Real dynamical renormalization and anomalies for gauge theories of
variables become formally Hermitian operators. the Yang–Mills type. The relevant cohomology here
Ignoring anomalies, the BRST generator  becomes is the BRST cohomology in the space of local
an operator that fulfills the conditions functionals involving the fields, the ghosts, and the
antifields. The antifields are also known in this
 ¼ ; 2 ¼ 0 ½41 context as Zinn-Justin sources for the BRST varia-
(which allows for nontrivial solutions  6¼ 0 because tions of the fields and ghosts, since Zinn-Justin was
the inner product is not positive definite). The the first to introduce them (with that meaning).
second relation is a consequence of the classical Many authors have contributed to the full computa-
Poisson bracket relation [,] = 0 and the fact that tion of the local BRST cohomology. A review is
the graded Poisson bracket of two odd objects given in Barnich et al. (2000), where extensions to
becomes the anticommutator. other theories are also indicated.
To remove the ghost and gauge redundancy, which
has no physical content, one must impose a condition String Theory
that selects physical states. The appropriate condition
Modern string theory would be inconceivable with-
is motivated by the general cohomological result
out the BRST formalism. This started with the
connecting the BRST cohomology with the algebra of
pioneering paper by Kato and Ogawa (1983), where
physical observables. One imposes the condition the critical dimension of the bosonic string was
j i ¼ 0 ½42 derived from the condition that 2 should vanish
(quantum mechanically), and where it was shown
Because of [41], states of the form ji are solutions that the string physical states could be identified
of [42], but they have a vanishing inner product with with the state BRST cohomology. The reader is
any other physical states, including themselves. They referred to excellent monographs on modern string
are called null states. The physical states are given by theory (see ‘‘Further reading’’).
the BRST state cohomology. The physical operators
are given by the BRST operator cohomology at
Deformations of Gauge Models
gh = 0 and induce a well-defined action in the state
cohomology. In particular, the Hamiltonian, being The study of consistent deformations of a given
gauge invariant in the original theory, is represented gauge theory (i.e., the problem of introducing
by a BRST cohomological class, so that the time consistent couplings) is also efficiently dealt with in
evolution maps physical states on physical states. the BRST context. References to applications may
The whole scheme is (formally) consistent because be found in Henneaux (1998).
exact BRST operators have vanishing matrix elements
between states annihilated by the BRST operator , See also: Anomalies; Batalin–Vilkovisky Quantization;
while null states ji are such that h jAji= 0 whenever BF Theories; Constrained Systems; Functional
A is a BRST-closed operator, [A, ] = 0, and j i a Integration in Quantum Physics; Graded Poisson
Algebras; Indefinite Metric; Perturbative Renormalization
physical state. Problems may arise, however, if the
Theory and BRST; Quantum Chromodynamics; Quantum
classical relations [, ] = 0 and [H,] = 0 are not
Field Theory: A Brief Introduction; Renormalization:
satisfied in presence of extra terms of order 
h,that is, General Theory; String Field Theory; Supermanifolds;
Topological Sigma Models.
2 6¼ 0 or H þ H 6¼ 0 ½43

In such cases, one says that they are anomalies. These


are usually fatal to the consistency of the theory. Further Reading
Barnich G, Brandt F, and Henneaux M (2000) Local BRST
Some Applications cohomology in gauge theories. Physics Reports 338: 439.
Barnich G and Henneaux M (1996) Isomorphisms between the
The number of applications of the BRST formalism Batalin–Vilkovisky antibracket and the Poisson bracket.
is so large that it would be out of place to try being Journal of Mathematical Physics 37: 5273.
392 BRST Quantization

Batalin IA and Vilkovisky GA (1977) Relativistic S-matrix of Henneaux M and Teitelboim C (1992) Quantization of Gauge
dynamical systems with boson and fermion constraints. Systems. Princeton: Princeton University Press.
Physics Letters B69: 309. Kato M and Ogawa K (1983) Covariant quantization of strings
Batalin IA and Vilkovisky GA (1981) Gauge algebra and based on BRS invariance. Nuclear Physics B212: 443.
quantization. Physics Letters B102: 27. Kugo T and Ojima I (1979) Local covariant operator formalism
Becchi C, Rouet A, and Stora R (1976) Renormalization of gauge of nonabelian gauge theories and quark confinement problem.
theories. Annals of Physics, NY 98: 287. Progress of Theoretical Physics (Suppl.) 66: 1.
Fradkin ES and Vilkovisky GA (1975) Quantization of relativistic Polchinski J (1998) String Theory,vols. 1 and 2. Cambridge:
systems with constraints. Physics Letters B55: 224. Cambridge University Press.
Green MB, Schwarz JH, and Witten E (1987) Superstring Theory, Stasheff JD (1998) The (secret?) homological algebra of the
vols. 1 and 2. Cambridge: Cambridge University Press. Batalin–Vilkovisky approach. Contemporary Mathematics
Henneaux M (1998) Consistent interactions between gauge fields: 219: 195.
the cohomological approach. Contemporary Mathematics Tyutin IV (1975) Gauge invariance in field theory and statistical
219: 93. physics in the operator formalism. Preprint Lebedev-75-39.
C
C-Algebras and their Classification
G A Elliott, University of Toronto, Toronto, Canada unital commutative C -algebra under the Gelfand–
ª 2006 Elsevier Ltd. All rights reserved.
Naimark correspondence may be viewed as the
space of maximal proper ideals, with a natural
topology (the hull-kernel, or Jacobson, topology),
The study of algebras of Hilbert space operators, closed and is called the spectrum. This space may also be
under the adjoint operation and in the weak operator viewed as the set of (unital, linear, multiplicative)
topology, was begun by John von Neumann shortly maps from the algebra into the complex numbers,
after the discovery of quantum mechanics, and partly in which case the topology is that of pointwise
with the aim of understanding the monolithic ideas convergence.
proposed by Heisenberg and Schrödinger. Second, using this result, Gelfand and Naimark
Seventy-five years later, the theory of these proved that arbitrary C -algebras could be axioma-
algebras has become a monolith in its own right tized in a simple way abstractly, as  -algebras – that
(see von Neumann Algebras: Introduction, Modular is, as algebras over the complex numbers with a
Theory and Classification Theory; von Neumann conjugate linear anti-automorphism of order 2 – with
Algebras: Subfactor Theory), with more internal certain special properties. It is now known that the
structure and with more external reference to physics only property that needs to be assumed is the
and, as it turns out, to other areas of mathematics existence of a (necessarily unique) Banach space
than could possibly have been imagined at the outset. norm related to the  -algebra structure by means of
(The most striking example of an application to the so-called C -algebra identity:
mathematics is perhaps the discovery of the Jones kx xk ¼ kx k kxk ½1
knot polynomial (see The Jones Polynomial); note
that this has also had repercussions for physics.) This is clearly related to – and in fact implies – the
Twenty-five years after the beginning of the normed algebra inequality
theory of von Neumann algebras, as these algebras
kx yk  kxk kyk ½2
are now called, Gelfand and Naimark noticed that a
second class of algebras of operators on a Hilbert One reason that the Gelfand–Naimark axiomati-
space, closed under the adjoint operation, was zation of C -algebras is important is that it under-
worthy of study, namely those closed in the norm lines how natural it is to consider a C -algebra
topology. Gelfand and Naimark made two impor- abstractly, i.e., independently of any particular
tant discoveries concerning this class of operator representation. Indeed, while one of the fundamen-
algebras, now called C -algebras. tal phenomena of von Neumann algebra theory
First, Gelfand and Naimark showed that, in the (discovered by Murray and von Neumann) is that,
commutative case, at least when the C -algebra is essentially – in rather a strong sense – there is only
considered only up to isomorphism – with its one way to represent a given von Neumann algebra
identity as a concrete algebra of operators sup- on a Hilbert space (and there is even a canonical
pressed – the information contained in a C -algebra way, called the standard representation!), it is an
is purely topological. More precisely, Gelfand and equally fundamental phenomenon of C -algebra
Naimark showed that the category of unital theory that, except in extremely special cases, this
commutative C -algebras, with unit-preserving is no longer true.
algebra homomorphisms (these necessarily preserve For instance, although the C -algebra of compact
the adjoint operation), is equivalent in a contra- operators on a given Hilbert space has, up to unitary
variant way (i.e., with reversal of arrows) to the equivalence, only a single irreducible representation –
category of compact Hausdorff spaces, with con- this is what underlies the fact, proved by von
tinuous maps. The compact space associated with a Neumann, referred to as the uniqueness of the
394 C-Algebras and their Classification

Heisenberg commutation relations for a quantum- C -algebra should contain the compact operators.
mechanical system with finitely many degrees of Third, any two irreducible representations with the
freedom – as soon as one considers a physical system same kernel should be unitarily equivalent. Fourth,
with infinitely many degrees of freedom, one finds that it should be possible to parametrize the unitary
the naturally associated C -algebra has infinitely equivalence classes of irreducible representations by
many – indeed, uncountably many – unitary equiva- a real number in a natural way (respecting the
lence classes of irreducible representations, and it is natural Borel structure introduced by Mackey).
impossible to parametrize these in any reasonable way. The first of the equivalent properties listed above,
This striking dichotomy presents itself also in that all representations of a C -algebra should be of
other contexts, more elementary perhaps than the type I, suggested a name for the property – that the
physics of infinitely many degrees of freedom. C -algebra itself should be of type I. This property
Consider the dynamical system consisting of a circle of a C -algebra, identified by Glimm – or, rather, its
and a fixed rotation acting on it. If the rotation is of opposite, which as mentioned above is much more
finite order – i.e., if the angle is a rational multiple common (just as irrational numbers are more
of 2 – then the naturally associated C -algebra is common than rationals, or systems with infinitely
relatively easy to study. In the case of angle zero, it many degrees of freedom are, at least in theory,
is the unital commutative C -algebra with Gelfand– much more common than those with finitely many
Naimark spectrum the torus. In the general case of a degrees of freedom) – is a fundamental unifying
rational angle, the space of unitary equivalence principle of nature.
classes of irreducible representations is still naturally Besides commutative C -algebras – as mentioned
parametrized by the torus. (And this is the same as above, just another way of looking at topological
the space of primitive ideals – the kernels of the spaces (compact Hausdorff spaces, that is) – and
irreducible representations – with the Jacobson besides the C -algebra associated to a rotation or to
topology.) a physical system with infinitely many degrees of
In the irrational case – the case of a rotation by an freedom, what are some of the naturally occurring
irrational multiple of 2 (still elementary from a examples of C -algebras – of type I or not!
geometrical point of view; note that the calendar is First, let us take a closer look at what arises from
based on such a system!) – the irreducible represen- a system with infinitely many degrees of freedom –
tations are no longer parametrized up to unitary in the fermion case. As shown by Jordan and
equivalence by the torus – and the space of primitive Wigner, one obtains what, as a C -algebra, is very
ideals consists of a single point – the C -algebra is easy to describe, namely, just the infinite tensor
simple. (But it is decidedly not simple to study!) product in the category of unital C -algebras of
This fundamental dichotomy in the classification copies of the algebra of 2  2 matrices over the
of C -algebras – conjectured by Gaarding and complex numbers. As it happens, in work earlier
Wightman in the quantum-mechanical setting and than that referred to above, Glimm had considered
by Mackey in the geometrical one – was established such infinite tensor product C -algebras, also allow-
by Glimm. Glimm proved (in the setting of separ- ing the components to be matrix algebras of order
ability; most of his results were generalized later different from two. This raised a problem of
to the nonseparable case) that a large number of classification – for those C -algebras, all of which
a priori different ways that a C -algebra could were simple and not of type I. (The only simple
behave well were in fact one and the same behavior: unital C -algebra of type I is a single matrix algebra,
either all present for a given C -algebra, or all or a finite tensor product of matrix algebras!)
catastrophically absent! In a pioneering classification paper (the first paper
Some of the properties considered by Glimm, and on the classification of C -algebras being perhaps
shown to be equivalent (for a separable C -algebra) that of Gelfand and Naimark, in which the commu-
were as follows. First of all, every representation of tative case was described), Glimm obtained the
the C -algebra on a Hilbert space should be of type classification of infinite tensor products of matrix
I, i.e., should generate a von Neumann algebra of algebras, showing that it was a direct extension of
type I. (A von Neumann algebra was said by Murray the classification of finite tensor products, i.e., just
and von Neumann to be of type I if it contained a of the matrix algebras themselves. As described later
minimal projection of central support one, i.e., a by Dixmier, Glimm’s classification was as follows.
projection not contained in a proper direct sum- Given a sequence n1 , n2 , . . . of natural numbers
mand and minimal with this property.) Second, in (equal to one or more), form the infinite product in
every irreducible representation (not necessarily a natural way – just by keeping track of the total
injective) on a Hilbert space, the image of the number of times each prime number appears in the
C-Algebras and their Classification 395

finite products n1 . . . nk (a multiplicity which may be to be added have orthogonal representatives) – one
either finite or infinite). Call such a formal infinite might refer to this as a local abelian semigroup –
product a generalized integer – or, perhaps, a which was used by Murray and von Neumann to
supernatural number! Two (countably) infinite divide von Neumann algebras into what they called
tensor products of matrix algebras are isomorphic types I, II, and III – was shown by the author to
(just as in the finite tensor product case) if and only determine Bratteli’s algebras up to isomorphism.
if the corresponding supernatural numbers are Bratteli called his algebras approximately finite-
equal. dimensional C -algebras, or AF algebras. The author
In formulating Glimm’s classification of infinite referred to his invariant simply as the range of the
tensor products of matrix algebras in this way, (abstract) dimension, and pointed out that this
Dixmier pointed out that each supernatural number structure determined an enveloping ordered abelian
determines a subgroup of the rational numbers group, which he called the dimension group. It was
(those with denominator dividing the supernatural soon noticed that the dimension group was related
number) and that every subgroup of the rational to the K-group introduced by Grothendieck in
numbers containing the integers arises in this way. algebraic geometry (see K-Theory), and by Atiyah
He then gave an alternative derivation of Glimm’s and Hirzebruch (see K-Theory) in topology.
theorem by recovering this subgroup of the rational Grothendieck’s K-group was defined for an arbi-
numbers as a natural invariant of the algebra, trary ring with unit, and Atiyah and Hirzebruch in
namely, as the subgroup generated by the values effect considered the special case of the ring of
on projections of the unique normalized trace. (By a continuous functions on a compact Hausdorff space –
trace is meant here a unitarily invariant positive in other words, a commutative C -algebra – in the
linear functional.) This could even be interpreted as process showing that the deep phenomenon of Bott
an alternative statement of Glimm’s theorem. periodicity could be expressed in terms of this
Soon afterwards, Bratteli considered an extension invariant. The invariant itself (see below) is essen-
of Glimm’s class of C -algebras, namely, the tially the same as that of Murray and von Neumann.
inductive limits of arbitrary sequences of finite- In the special case that the ring is an AF algebra, the
dimensional C -algebras, and gave a classification of K-group coincides with the dimension group. (The
these algebras in terms of the embedding multiplicity K-group has a natural ordered, or pre-ordered,
data in the sequences. This was exactly analogous to structure, although this was often suppressed.)
the original classification of Glimm, but now vastly Let us consider the definition of the K-group of a
more complex, with the multiplicity data of the not necessarily unital C -algebra; it is in this setting
sequence encoded in what is now called a Bratteli that the statement of Bott periodicity attains its
diagram. (Note that a finite-dimensional C -algebra simplest form.
is just a direct sum of matrix algebras over the First, in the unital case, one constructs the abelian
complex numbers.) Bratteli diagrams have proved to local semigroup (addition just partially defined) of
be very important, and in particular have been shown Murray–von Neumann equivalence classes of pro-
by Putnam and others to be useful for the study of jections, as described above in the case of an AF
minimal homeomorphisms of the Cantor set. algebra. Let us call this the dimension range. As
Bratteli’s extension of Glimm’s tensor product stated above, for AF algebras this is all that needs to
classification was followed by a corresponding be done – the enveloping group of the dimension
extension by the present author of Dixmier’s range is already the K-group. In the general case,
approach to Glimm’s result. It was no longer one must repeat the construction for the algebra of
possible to express the appropriate data in terms of 2  2 matrices over the given algebra, with the given
traces (even in the case of a unique normalized algebra considered as embedded as the upper left-
trace). Instead, the present author recalled the hand corner of the matrix algebra. The dimension
concept of equivalence of projections introduced range of the given algebra then maps naturally into
by Murray and von Neumann forty years earlier, (but not necessarily onto) the dimension range of the
together with the fact, proved by Murray and von matrix algebra. One should then repeat this con-
Neumann, that equivalence is compatible with struction, doubling the order of the matrix algebra
addition of orthogonal projections. (Two projec- at every stage (or, alternatively, increasing it just by
tions in a  -algebra are equivalent if they are equal one). The enveloping group of the (algebraic)
to x x and xx for some element x.) The resulting inductive limit of this sequence of local semigroups
elementary invariant – the set of equivalence classes is then the K-group of the given algebra. (Alterna-
of projections with the operation of addition tively, one may just consider immediately the

whenever defined (whenever the equivalence classes -algebra of all infinite matrices over the given
396 C-Algebras and their Classification

C -algebra with only finitely many nonzero entries, first referred to as the index map, and the second
and form the dimension range of this  -algebra – and (sometimes referred to as the odd-order index map)
the enveloping group of this abelian local semi- obtained from this immediately from Bott periodicity
group, now in fact a semigroup.) (as stated above) – such that the periodic six-term
In the case of a nonunital C -algebra, one adjoins sequence
a unit (as may be done, for instance, by representing
K0 ðJÞ ! K0 ðAÞ ! K0 ðA=JÞ
the C -algebra faithfully on a Hilbert space, and
" #
showing that the C -algebra obtained by adjoining
K1 ðA=JÞ K1 ðAÞ K1 ðJÞ
the identity operator is independent of the representa-
tion – actually, one need only check that the  -algebra is exact. (The periodicity stated above can also be
structure is unique, as the C -algebra norm on a recovered from this.)
C -algebra is always determined by the  -algebra Given that the functor K0 classifies AF algebras,
structure). The K-group of the resulting unital one might expect the functor K1 to be useful for
C -algebra then maps naturally into the K-group of classification purposes also. In fact, this is the case.
the natural one-dimensional quotient, and the kernel (Indeed, as shown by Brown, the K1 -functor is
of this map is, for reasons that will become clearer already important for the theory of AF algebras – in
later, defined to be the K-group of the nonunital spite of, or even because of (!), the fact that the
algebra. K1 -group of an AF algebra is zero.) Using the six-
Atiyah and Hirzebruch in fact referred to the term exact sequence of Bott periodicity described
K-group of the C -algebra as K0 – the reason being above, corresponding to an extension of C -algebras,
that there is another very natural group to consider, together with results of the present author, Brown
namely, the K-group of the suspension of the showed that any extension of one AF algebra by
C -algebra. (The suspension, SA, of a C -algebra A another is again an AF algebra.
is defined as the C -algebra of all continuous A rather large class of simple unital C -algebras
functions from the real line R into A which converge has by now been classified by means of the
to zero at 1, with the pointwise  -algebra invariants K0 and K1 – together with the class of
operations and the supremum norm. It may also be the unit in K0 , and the order (or pre-order) structure
defined as the (unique) C -algebra tensor product on K0 – and also taking into account the compact
A  C0 (R), where C0 (R) denotes the suspension of convex set of tracial states on the C -algebra
the C -algebra C of complex numbers.) Denoting (a positive linear functional on a C -algebra is called
the K0 -group of the suspension of a given C -algebra a trace if it has the same value on x x and x x for
by K1 , one might expect this process to continue, every element x, and a tracial state if it is a state,
but in fact it is periodic (K0 , K1 , K0 , K1 , . . .). Bott that is, has norm 1, or has value 1 on the unit in the
periodicity states that there is a natural isomorphism case the algebra has a unit). In addition to the set of
of K2 with K0 . (C -algebras can also be defined with tracial states, together with its natural topology and
the field of real numbers as scalars, and in this case convex structure, one should also keep track of the
the period of Bott periodicity is eight.) natural pairing between traces and K0 (any trace on
Another way of stating Bott periodicity, or, more a unital C -algebra has the same value on two
precisely, of embedding it into the K-theory of equivalent projections – equal to x x and x x for
C -algebras, is as follows. Given a short exact some element x – and hence gives rise to an additive
sequence of C -algebras, real-valued functional on K0 ).
In terms of these invariants (which might, broadly
0 ! J ! A ! A=J ! 0 ½3
speaking, be called K-theoretical), it has been

i.e., given a C -algebra A and a closed two-sided possible to classify the simple unital C -algebras
ideal J (the quotient  -algebra is then a C -algebra (not of type I) arising as inductive limits (i.e., as the
with the quotient norm) – A is sometimes referred to completions of increasing unions) of sequences of
as an extension of J by A=J – consider the natural finite direct sums of matrix algebras over separable
short (not necessarily exact) sequences commutative C -algebras, these assumed to have
spectra of dimension at most three, on the one hand
K0 ðJÞ ! K0 ðAÞ ! K0 ðA=JÞ ½4
(work of the present author together with Guihua
and Gong and Liangqing Li, a culmination of earlier
work of these authors together with a number of
K1 ðJÞ ! K1 ðAÞ ! K1 ðA=JÞ ½5
others), and, on the other hand, it has been possible
(K0 and K1 are functors!). There exist natural connect- (work of Kirchberg and Phillips, also based on
ing maps K1 (A=J) ! K0 (J) and K0 (A=J) ! K1 (J) – the earlier work by a number of authors) to classify the
C-Algebras and their Classification 397

C -algebra tensor products (in a natural sense) of who settled a particularly stubborn case), it is
these C -algebras with what is called the Cuntz natural to ask whether the K-theoretical invariants
C -algebra O1 (see below). In the first of these two described above might be sufficient to classify all
cases, the compact convex set of tracial states – amenable separable C -algebras, say, those which
always a Choquet simplex – is an arbitrary (metriz- are simple and unital.
able) such space. The work of Villadsen has shown that additional
In the second case, this space is empty (as it is for invariants must in fact be considered, if one is to
O1 in particular). In both cases, K0 and K1 are deal with arbitrary amenable simple C -algebras,
arbitrary countable abelian groups, with the proviso and this has been confirmed in subsequent work of
that K0 is not the sum of a torsion group and a Rørdam and of Toms. (Villadsen’s examples were
cyclic group. In the first case, the order structure on obtained by removing the condition of low dimen-
K0 , the class of the unit element, and the pairing of sion on the spectra of the commutative C -algebras
K0 with the space of traces have certain special appearing in the inductive limit decomposition
properties; as it turns out, these can be expressed in considered above.) The very nature of these authors’
a simple way. (The class of the unit need only be work, however, has been to introduce additional
positive and nonzero.) In the second case, the order invariants, all of which it seems natural to consider
structure on K0 is degenerate – every element is as, broadly speaking, K-theoretical. (And all of
positive – and the class of the unit can be arbitrary which, as it happens, are already familiar.)
(including zero!). The question of the classifiability, in terms of
Let us just note that the Cuntz C -algebra O1 is simple invariants (K-theoretical in nature, at least in
the unital C -algebra generated by an infinite the broad sense, and including the spectrum which is
sequence s1 , s2 , . . . of isometries with orthogonal indispensable in the nonsimple case), of all (separ-
ranges (in other words, elements si such that si si is able) amenable C -algebras would therefore still
the unit and sj si = 0 if j 6¼ i). One need not require appear to be on the agenda.
the C -algebra to have the universal property with Already, in any case, just like the analogous
respect to these generators and relations as it is in question for von Neumann algebras (now settled),
fact unique (up to an isomorphism preserving these this question would appear to have had a noticeable
generators). In particular, this C -algebra is simple. influence on the development of the subject – not
(If one considers a finite sequence of isometries with least in underlining the importance of K-theoretical
orthogonal ranges, and assumes in addition that the methods, which have proved to be pertinent both in
sum of these is the unit, one also obtains a simple connection with the index theory of differential
C -algebra, the Cuntz C -algebra On , n = 2, 3, . . .). operators on geometrical structures – from foliations
The K0 -group and K1 -group of O1 are, respectively, to fractals – and in connection with questions in
Z and 0. (The K0 -group and K1 -groups of On for physics, related to quantum statistical mechanics
n = 2, 3, . . . are, respectively, Z=(n  1)Z and 0.) (see e.g., Quantum Hall Effect), to quantum field
Both classes of C -algebras considered in the theory (e.g., the standard model), and even to string
classification result stated above, although des- theory and M-theory.
cribed in rather a concrete way (in terms of
inductive limits and tensor products), can also be See also: Axiomatic Quantum Field Theory; Bosons and
characterized axiomatically, in a way that makes it Fermions in External Fields; The Jones Polynomial;
clear that they are, in fact, much more general than K-Theory; Positive Maps on C *-Algebras; Quantum Hall
Effect; von Neumann Algebras: Introduction, Modular
they seem. (These axiomatizations are due to
Theory, and Classification Theory; von Neumann
Lin and to Kirchberg and Phillips. Typically, the Algebras: Subfactor Theory.
abstract axioms are easier to establish in a
given case than the inductive limit form described
above.)
In view of this, and the fact that one of the axioms Further Reading
is a notion of amenability (the analogous property Davidson KR (1996) C -Algebras by Example. Fields Institute
for C -algebras of a notion that has also been Monographs, 6. Providence, RI: American Mathematical
considered for von Neumann algebras) and since Society.
amenable von Neumann algebras (on a separable Dixmier J (1969) Les C -Algèbres et leurs Répresentations,
Hilbert space) have been classified completely (in 2nd edn. Paris: Gauthier–Villars.
Elliott GA (1995) The classification problem for amenable
remarkable work of Connes, together with many C -algebras. In: Chatterji SD (ed.) Proceedings of the Interna-
others, starting with Murray and von Neumann – tional Congress of Mathematicians, vols. 1, 2, pp. 922–932.
and, one must also mention, ending with Haagerup, (Zürich, 1994). Basel: Birkhäuser.
398 Calibrated Geometry and Special Lagrangian Submanifolds

Evans DE and Kawahigashi Y (1998) Quantum Symmetries on Pedersen GK (1979) C -Algebras and their Automorphism
Operator Algebras. Oxford: Oxford University Press. Groups, London Math. Soc. Monographs. London: Academic
Fillmore PA (1996) A User’s Guide to Operator Algebras. Press.
New York: Wiley. Rørdam M (2002) Classification of Nuclear, Simple C -Algebras,
Kadison RV and Ringrose J (1983–92) Fundamentals of the Theory Encyclopaedia of Mathematical Sciences, vol. 126, pp. 1–145.
of Operator Algebras (4 volumes). New York: Academic Press. Berlin: Springer.
Lin H (2001) An Introduction to the Classification of Amenable Sakai S (1971) C -Algebras and W  -Algebras. Berlin: Springer.
C -Algebras. Singapore: World Scientific.

Calibrated Geometry and Special Lagrangian Submanifolds


D D Joyce, University of Oxford, Oxford, UK Proposition 2 Let (M, g) be a Riemannian mani-
ª 2006 Elsevier Ltd. All rights reserved. fold, ’ a calibration on M, and N a compact
’-submanifold in M. Then N is volume-minimizing
in its homology class.
Calibrated Geometry Proof Let dim N = k, and let [N] 2 Hk (M, R) and
[’] 2 H k (M, R) be the homology and cohomology
‘‘Calibrated geometry,’’ introduced by Harvey and
classes of N and ’. Then
Lawson (1982), is the study of special classes of Z Z
‘‘minimal submanifolds’’ N of a Riemannian mani-
½’  ½N ¼ ’jTx N ¼ volTx N ¼ VolðNÞ
fold (M, g), defined using a closed form ’ on M x2N x2N
called a calibration. For example, if (M, J, g) is a
since ’jTx N = volTx N for each x 2 N, as N is a
Kähler manifold with Kähler form !, then complex
calibrated submanifold. If N 0 is any other compact
k-submanifolds of M are calibrated with respect to
k-submanifold of M with [N 0 ] = [N] in Hk (M, R),
’ = !k =k!. Another important class of calibrated
then
submanifolds are special Lagrangian submanifolds
Z Z
in Calabi–Yau manifolds, which is the focus of the 0
½’  ½N ¼ ½’  ½N  ¼ ’jTx N0  volTx N0
section ‘‘Special Lagrangian geometry.’’ x2N0 x2N0
¼ VolðN 0 Þ
Calibrations and Calibrated Submanifolds
since ’jTx N0  volTx N0 because ’ is a calibration. The
We begin by defining ‘‘calibrations’’ and ‘‘calibrated last two equations give Vol(N)  Vol(N 0 ). Thus, N
submanifolds.’’ is volume-minimizing in its homology class. &
Definition 1 Let (M, g) be a Riemannian manifold. Now let (M, g) be a Riemannian manifold with a
An ‘‘oriented tangent k-plane’’ V on M is a vector calibration ’, and let  : N ! M be an immersed
subspace V of some tangent space Tx M to M with submanifold. Whether N is a ’-submanifold
dimV = k, equipped with an orientation. If V is an depends upon the tangent spaces of N. That is, it
oriented tangent k-plane on M then gjV is a depends on  and its first derivative. So, for N to be
Euclidean metric on V; so, combining gjV with the calibrated with respect to ’ is a first-order partial
orientation on V gives a natural volume form volV differential equation on . But if N is calibrated then
on V, which is a k-form on V. N is minimal, and for N to be minimal is a second-
Now let ’ be a closed k-form on M. ’ is said to order partial differential equation on .
be a calibration on M, if for every oriented k-plane One moral is that the calibrated equations, being
V on M, ’jV  volV . Here, ’jV =   volV for some first order, are often easier to solve than the minimal
 2 R, and ’jV  volV if   1. Let N be an submanifold equations, which are second order. So
oriented submanifold of M with dimension k. Then calibrated geometry is a fertile source of examples of
each tangent space Tx N for x 2 N is an oriented minimal submanifolds.
tangent k-plane. We say that N is a calibrated
Calibrated Submanifolds and Special Holonomy
submanifold if ’jTx N = volTx N for all x 2 N.
It is easy to show that calibrated submanifolds A calibration ’ on (M, g) is only interesting if there
are automatically ‘‘minimal submanifolds.’’ We exist plenty of ’-submanifolds N in M, locally
prove this in the compact case, but noncompact or globally. Since ’jTx N = volTx N for each x 2 N,
calibrated submanifolds are locally volume-minimizing ’-submanifolds will be abundant only if the family
as well. F ’ of calibrated tangent k-planes V with ’jV = volV
Calibrated Geometry and Special Lagrangian Submanifolds 399

is ‘‘reasonably large’’ – say, if F ’ has small Cm . Thus, a Calabi–Yau m-fold (M, g) with
codimension in the family of all tangent k-planes V Hol(g) = SU(m) has a holomorphic volume form
on M. A maximally boring example is the k-form . The real part Re  is a calibration on M, and
’ = 0, which is a calibration but has no calibrated the corresponding calibrated submanifolds are
tangent k-planes, so no ’-submanifolds. called special Lagrangian submanifolds.
Thus, most calibrations ’ will have few or no  The group G2  O(7) preserves a 3-form ’0 and a
’-submanifolds, and only special calibrations ’ with 4-form ’0 on R7 . Thus, a Riemannian 7-manifold
F ’ large will have interesting calibrated geometries. (M, g) with holonomy G2 comes with a 3-form ’
Now the field of Riemannian holonomy groups is a and 4-form ’, which are both calibrations. The
natural companion for calibrated geometry, because corresponding calibrated submanifolds are called
it gives a simple way to generate interesting associative 3-folds and coassociative 4-folds.
calibrations ’ which automatically have F ’ large.  The group Spin(7)  O(8) preserves a 4-form 0
Let G  O(n) be a possible holonomy group of a on R8 . Thus a Riemannian 8-manifold (M, g) with
Riemannian metric. In particular, we can take G to be holonomy Spin(7) has a 4-form , which is a
one of the holonomy groups U(m), SU(m), Sp(m), G2 , calibration. The -submanifolds are called Cayley
or Spin(7) from Berger’s classification. Then G acts 4-folds.
on the k-forms k (R n ) on Rn , so we can look for
It is an important general principle that to each
G-invariant k-forms on Rn . Suppose ’0 is a nonzero,
calibration ’ on an n-manifold (M, g) with special
G-invariant k-form on Rn .
holonomy constructed in this way, there corre-
By rescaling ’0 we can be arrange that for each
sponds a constant calibration ’0 on Rn . Locally, ’-
oriented k-plane U  R n , we have ’0 jU  volU , and
submanifolds in M resemble the ’0 -submanifolds in
that ’0 jU = volU for at least one such U. Let H be the
Rn , and have many of the same properties. Thus, to
stabilizer subgroup of this U in G. Then ’0 jU =
understand the calibrated submanifolds in a mani-
volU by G-invariance, so   U is a calibrated
fold with special holonomy, it is often a good idea to
k-plane for all  2 G. Thus, the family F 0 of
start by studying the corresponding calibrated
’0 -calibrated k-planes in R n contains G=H, so it is
submanifolds of Rn .
‘‘reasonably large,’’ and it is likely that the calibrated
In particular, singularities of ’-submanifolds in M
submanifolds will have an interesting geometry.
will be locally modeled on singularities of ’0 -
Now let M be a manifold of dimension n, and g
submanifolds in Rn . (In the sense of geometric
a metric on M with Levi-Civita connection r and
measure theory, the tangent cone at a singular point
holonomy group G. Then there is a k-form ’ on M
of a ’-submanifold in M is a conical ’0 -submanifold
with r’ = 0, corresponding to ’0 . Hence d’ = 0,
in Rn .) So by studying singular ’0 -submanifolds in
and ’ is closed. Also, the condition ’0 jU  volU for
Rn , we may understand the singular behavior of
all oriented k-planes U in Rn implies that ’jV 
’-submanifolds in M.
volV for all oriented tangent k-planes V in M. Thus,
’ is a calibration on M. The family F ’ of calibrated
tangent k-planes on M fibers over M with fiber F 0 ; Special Lagrangian Geometry
so, it is ‘‘reasonably large.’’
This gives a general method for finding interesting We now focus on one class of calibrated submani-
calibrations on manifolds with reduced holonomy. folds, special Lagrangian submanifolds in Calabi–
Here are the most significant examples. Yau manifolds. Calabi–Yau 3-folds are used to
make the spacetime vacuum in string theory, and
 Let G = U(m)  O(2m). Then G preserves a special Lagrangian 3-folds are the classical versions
2-form !0 on R 2m . If g is a metric on M with of A-branes, or supersymmetric 3-cycles, in Calabi–
holonomy U(m), then g is Kähler with complex Yau 3-folds. Special Lagrangian geometry aroused
structure J, and the 2-form ! on M associated to great interest amongst string theorists because of its
!0 is the Kähler form of g. rôle in the SYZ conjecture, providing a geometric
One can show that ! is a calibration on (M, g), basis for ‘‘mirror symmetry’’ of Calabi–Yau 3-folds.
and the calibrated submanifolds are exactly the
Calabi–Yau Manifolds
‘‘holomorphic curves’’ in (M, J). More generally,
!k =k! is a calibration on M for 1  k  m, and Here is our definition of Calabi–Yau manifold.
the corresponding calibrated submanifolds are the Readers are warned that there are several different
complex k-dimensional submanifolds of (M, J). definitions of Calabi–Yau manifolds in use in the
 Let G = SU(m)  O(2m). Then G preserves a literature. Ours is unusual in regarding  as part of
complex volume form 0 = dz1 ^    ^ dzm on the given structure.
400 Calibrated Geometry and Special Lagrangian Submanifolds

Definition 3 Let m  2. A Calabi–Yau m-fold is a Special Lagrangian Submanifolds


quadruple (M, J, g, ) such that (M, J) is a compact
Definition 5 Let (M, J, g, ) be a Calabi–Yau m-fold.
m-dimensional complex manifold, g a Kähler metric
Then Re  is a calibration on the Riemannian
on (M, J) with Kähler form !, and  a holomorphic
manifold (M, g). An oriented real m-dimensional
(m, 0)-form on M called the holomorphic volume
submanifold N in M is called a special Lagrangian
form, which satisfies
submanifold (SL m-fold) if it is calibrated with respect
to Re .
!m =m! ¼ ð 1Þmðm 1Þ=2 ði=2Þm  ^ 
 ½1
Here is an alternative definition of SL m-folds. It
The constant factor in [1] is chosen to make Re  a is often more useful than Definition 5.
calibration. It follows from [1] that g is Ricci-flat, 
is constant under the Levi-Civita connection, and Proposition 6 Let (M, J, g, ) be a Calabi–Yau
the holonomy group of g has Hol(g)
SU(m). m-fold, with Kähler form !, and N a real m-dimen-
sional submanifold in M. Then N admits an
Let (M, J) be a compact, complex manifold, and g orientation making it into an SL m-fold in M if
a Kähler metric on M, with Ricci curvature Rab . Define and only if !jN 0 and Im jN 0.
the Ricci form  of g by ac = Jab Rbc . Then  is a closed
real (1, 1)-form on M, with de Rham cohomology class Regard N as an immersed submanifold, with
[] = 2c1 (M) 2 H 2 (M, R), where c1 (M) is the first immersion  : N ! M. Then [!jN ] and [ Im jN ] are
Chern class of M in H 2 (M, Z). The Calabi conjecture unchanged under continuous variations of the
specifies which closed (1, 1)-forms can be the Ricci immersion . Thus, [!jN ] = [Im jN ] = 0 is a neces-
forms of a Kähler metric on M. sary condition not just for N to be special
Lagrangian, but also for any isotopic submanifold
The Calabi conjecture Let (M, J) be a compact, N 0 in M to be special Lagrangian. This proves:
complex manifold, and g0 a Kähler metric on M,
with Kähler form !0 . Suppose that  is a real, closed Corollary 7 Let (M, J, g, ) be a Calabi–Yau m-
(1, 1)-form on M with [] = 2c1 (M). Then there fold, and N a compact real m-submanifold in M.
exists a unique Kähler metric g on M with Kähler Then a necessary condition for N to be isotopic
form !, such that [!] = [! 0 ] 2 H2 (M, R), and the to a special Lagrangian submanifold N0 in M
Ricci form of g is . is that [!jN ] = 0 in H 2 (N, R) and [Im jN ] = 0 in
H m (N, R).
Note that [!] = [!0 ] says that g and g 0 are in the
same Kähler class. The conjecture was posed by Calabi
Deformations of Compact SL m-Folds
in 1954, and was eventually proved by Yau in 1976.
Its importance to us is that when the canonical bundle The deformation theory of compact special Lagran-
KM is trivial, so that c1 (M) = 0, we can take  0, and gian manifolds was studied by McLean (1998), who
then g is Ricci-flat. Since KM is trivial, it has a nonzero proved the following result:
holomorphic section, a holomorphic (m, 0)-form . As
Theorem 8 Let (M, J, g, ) be a Calabi–Yau
g is Ricci-flat, it follows that r = 0, where r is the
m-fold, and N a compact special Lagrangian
Levi-Civita connection of g. Rescaling  by a complex
m-fold in M. Then the moduli space MN of special
constant makes [1] hold, and then (M, J, g, ) is a
Lagrangian deformations of N is a smooth manifold
Calabi–Yau m-fold. This proves:
of dimension b1 (N), the first Betti number of N.
Theorem 4 Let (M, J) be a compact complex m-
Sketch proof. Suppose for simplicity that N is an
manifold with KM trivial. Then every Kähler class
embedded submanifold. There is a natural orthogo-
on M contains a unique Ricci-flat Kähler metric g.
nal decomposition TMjN = TN , where  ! N is
There exists a holomorphic (m, 0)-form , unique
the normal bundle of N in M. As N is Lagrangian,
up to change of phase  7! ei , such that
the complex structure J : TM ! TM gives an iso-
(M, J, g, ) is a Calabi–Yau m-fold.
morphism J :  ! TN. But the metric g gives an
Using algebraic geometry, one can produce many isomorphism TN ffi T  N. Composing these two
examples of complex m-folds (M, J) satisfying these gives an isomorphism  ffi T  N.
conditions, such as the Fermat (m þ 2)-tic Let T be a small tubular neighborhood of N in M.
Then we can identify T with a neighborhood of the
f½z0 ; . . . ; zmþ1  zero section in . Using the isomorphism  ffi T  N, we

2 CPmþ1 : zmþ2 þ    þ zmþ2 have an identification between T and a neighborhood of
0 mþ1 ¼ 0 ½2
the zero section in T  N. This can be chosen to identify
Therefore, Calabi–Yau m-folds are very abundant. the Kähler form ! on T with the natural symplectic
Calibrated Geometry and Special Lagrangian Submanifolds 401

structure on T  N. Let  : T ! N be the obvious for all t can be satisfied by choosing the phases of
projection. the t appropriately, and if the image of H2 (N, Z) in
Under this identification, submanifolds N 0 in T  H2 (M, R) is zero, then the condition [!jN ] = 0 holds
M which are C1 close to N are identified with the automatically.
graphs of small smooth sections  of T  N. That is, Thus, the obstructions [!t jN0 ] = [Im t jN0 ] = 0 in
submanifolds N 0 of M close to N are identified with Theorem 9 are actually fairly mild restrictions, and
1-forms  on N. We need to know: which 1-forms  SL m-folds should be considered as pretty stable
are identified with SL m-folds N 0 ? under small deformations of the Calabi–Yau
Now, N 0 is special Lagrangian if !jN0 Im jN0 0. structure.
But jN0 : N 0 ! N is a diffeomorphism, so we can
Remark The deformation and obstruction theory
push !jN0 and Im jN0 down to N, and regard them
of compact SL m-folds are extremely well behaved
as functions of . Calculation shows that
compared to many other moduli space problems in
 ð!jN0 Þ ¼ d and  ðIm jN0 Þ ¼ Fð; rÞ differential geometry. In other geometric problems
(such as the deformations of complex structures on a
where F is a nonlinear function of its arguments.
complex manifold, or pseudoholomorphic curves in
Thus, the moduli space MN is locally isomorphic to
an almost-complex manifold, or instantons on a
the set of small 1-forms  on N such that d 0
Riemannian 4-manifold), the deformation theory
and F(, r) 0.
often has the following general structure.
Now it turns out that F satisfies F(, r) 
d() when  is small. Therefore, MN is locally There are vector bundles E, F over a compact
approximately isomorphic to the vector space of 1- manifold M, and an elliptic operator P : C1 (E) !
forms  with d = d() = 0. But by Hodge theory, C1 (F), usually first order. The kernel Ker P is the
this is isomorphic to the de Rham cohomology set of infinitesimal deformations, and the cokernel
group H 1 (N, R), and is a manifold with dimension Coker P the set of obstructions. The actual moduli
b1 (N). space M is locally the zeros of a nonlinear map
To carry out this last step rigorously requires  : Ker P ! Coker P.
some technical machinery: one must work with In a generic case, Coker P = 0, and then the
certain Banach spaces of sections of T  N, 2 T  N moduli space M is locally isomorphic to Ker P,
and m T  N, use elliptic regularity results to prove and so is locally a manifold with dimension ind(P).
that the map  7! (d, F(, r)) has closed image in However, in nongeneric situations Coker P may be
these Banach spaces, and then use the implicit nonzero, and then the moduli space M may be
function theorem for Banach spaces to show that nonsingular, or have an unexpected dimension.
the kernel of the map is what is expected. However, SL m-folds do not follow this pattern.
Instead, the obstructions are topologically determined,
and the moduli space is always smooth, with dimen-
Obstructions to Existence of Compact SL m-Folds
sion given by a topological formula. This should be
Let {(M, Jt , gt , t ) : t 2 ( , )} be a smooth one- regarded as a minor mathematical miracle.
parameter family of Calabi–Yau m-folds. Suppose
N0 is an SL m-fold in (M, J0 , g0 , 0 ). When can we
Mirror Symmetry and the SYZ Conjecture
extend N0 to a smooth family of SL m-folds Nt in
(M, Jt , gt , t ) for t 2 ( , )? Mirror symmetry is a mysterious relationship
By Corollary 7, a necessary condition is that between pairs of Calabi–Yau 3-folds M, M̂, arising
[!t jN0 ] = [Im t jN0 ] = 0 for all t. Our next result from a branch of physics known as string theory,
shows that locally, this is also a sufficient condition. and leading to some very strange and exciting
conjectures about Calabi–Yau 3-folds, many of
Theorem 9 Let {(M, Jt , gt , t ) : t 2 ( , )} be a
which have been proved in special cases.
smooth one-parameter family of Calabi–Yau m-folds,
In the beginning (the 1980s), mirror symmetry
with Kähler forms !t . Let N0 be a compact SL m-fold
seemed mathematically completely mysterious. But
in (M, J0 , g0 , 0 ), and suppose that [!t jN0 ] = 0
there are now two complementary conjectural
in H 2 (N0 , R) and [Im t jN0 ] = 0 in H m (N0 , R) for all
theories, due to Kontsevich and Strominger–Yau–
t 2 ( , ). Then N0 extends to a smooth one-
Zaslow, which explain mirror symmetry in a fairly
parameter family {Nt : t 2 (
,
)}, where 0 <

mathematical way. Probably both are true, at some
and Nt is a compact SL m-fold in (M, Jt , gt , t ).
level. The second proposal, due to Strominger, Yau,
This can be proved using similar techniques to and Zaslow (1996), is known as the SYZ conjecture.
Theorem 8. Note that the condition [Im t jN0 ] = 0 Here is an attempt to state it.
402 Calibrated Geometry and Special Lagrangian Submanifolds

The SYZ conjecture Suppose M and M̂ are mirror submanifolds, and especially their singularities,
Calabi–Yau 3-folds. Then (under some additional rather than on global topological questions. In
conditions), there should exist a compact topologi- addition, we are intrested in what fibrations of
cal 3-manifold B and surjective, continuous maps generic Calabi–Yau 3-folds might look like.
f : M ! B and f̂ : M̂ ! B, such that There is now a well-developed theory of SL
m-folds with isolated singularities modeled on
(i) There exists a dense open set B0  B, such that
cones (Joyce 2003a). This is applied to SL
for each b 2 B0 , the fibers f 1 (b) and f̂ 1 (b) are
fibrations and the SYZ conjecture in Joyce
nonsingular special Lagrangian 3-tori T 3 in M (2003a, b), leading to the tentative conclusions
and M̂. Furthermore, f 1 (b) and f̂ 1 (b) are in that for generic Calabi–Yau 3-folds M, special
some sense dual to one another. Lagrangian fibrations f : M ! B will be only piece-
(ii) For each b 2  = BnB0 , the fibers f 1 (b) and wise smooth, and have discriminants  of real
f̂ 1 (b) are expected to be singular special codimension 1 in B, in contrast to smooth fibra-
Lagrangian 3-folds in M and M̂. tions which have  of codimension 2. We also
The fibrations f and f̂ are called special Lagran- argue that for generic mirrors M, M̂ and f , f̂,
the discriminants ,  ˆ cannot be homeomorphic
gian fibrations, and the set of singular fibers  is
called the discriminant. In part (i), the nonsingular and so do not coincide. This contradicts part (ii)
fibers of f and f̂ are supposed to be dual tori. What above.
does this mean? A better way to formulate the SYZ conjecture
On the topological level, we can define duality may be in terms of families of mirror Calabi–Yau
between two tori T, T̂ to be a choice of isomorph- 3-folds Mt , M̂t and fibrations ft : Mt ! B, f̂t : M̂t !
ism H 1 (T, Z) ffi H1 (T̂, Z). We can also define B for t 2 (0, ) which approach the ‘‘large complex
duality between tori equipped with flat Riemannian structure limit’’ as t ! 0. Then we could require the
discriminants t ,  ˆ t of ft , f̂ to converge to some
metrics. Write T = V=, where V is a Euclidean t
vector space and  a lattice in V. Then the dual common, codimension 2 limit 0 as t ! 0.
torus T̂ is defined to be V  = , where V  is the It is an important, and difficult, open problem to
dual vector space and  the dual lattice. However, construct examples of special Lagrangian fibrations
there is no notion of duality between nonflat of compact, holonomy SU(3) Calabi–Yau 3-folds.
metrics on dual tori. None are currently known.
Strominger, Yau, and Zaslow argue only that
their conjecture holds when M, M̂ are close to the See also: Minimal submanifolds; Mirror Symmetry:
‘‘large complex structure limit.’’ In this case, the A Geometric Survey; Moduli Spaces: An Introduction;
Riemannian Holonomy Groups and Exceptional Holonomy.
diameters of the fibers f 1 (b), f̂ 1 (b) are expected to
be small compared to the diameter of the base space
B, and away from singularities of f , f̂, the metrics on
the nonsingular fibers are expected to be approxi- Further Reading
mately flat. So, part (i) of the SYZ conjecture says Gross M, Huybrechts D, and Joyce D (2003) Calabi–Yau
that for b 2 BnB0 , f 1 (b) is approximately a flat Manifolds and Related Geometries, Universitext Series, Berlin:
Riemannian 3-torus, and f̂ 1 (b) is approximately the Springer.
dual flat Riemannian torus. Harvey R and Lawson HB (1982) Calibrated geometries. Acta
Mathematical research on the SYZ conjecture has Mathematica 148: 47–157.
Joyce DD (2000) Compact Manifolds with Special Holonomy.
followed two broad approaches. The first could be Oxford: Oxford University Press.
described as symplectic topological. For this, we Joyce DD (2003a) Special Lagrangian submanifolds with isolated
treat M, M̂ just as symplectic manifolds and f , f̂ just conical singularities. V. Survey and applications. Journal of
as Lagrangian fibrations. We also suppose B is a Differential Geometry 63: 279–347, math.DG/0303272.
smooth 3-manifold and f , f̂ are smooth maps. Under Joyce DD (2003b) Singularities of special Lagrangian fibrations
and the SYZ conjecture. Communications in Analysis and
these simplifying assumptions, Mark Gross, Wei- Geometry 11: 859–907, math.DG/0011179.
Dong Ruan, and others have built up a beautiful, Joyce DD (2003c) U(1)-invariant special Lagrangian 3-folds in C3
detailed picture of how dual SYZ fibrations work at and special Lagrangian fibrations. Turkish Mathematical
the global topological level. Journal 27: 99–114, math.DG/0206016.
The second approach could be described as local McLean RC (1998) Deformations of calibrated submanifolds.
Communications in Analysis and Geometry 6: 705–747.
geometric. Here, we try to take the special Lagran- Strominger A, Yau S-T, and Zaslow E (1996) Mirror symmetry
gian condition seriously from the outset, and focus is T-duality. Nuclear Physics B 479: 243–259, hep-th/
on the local behavior of special Lagrangian 9606040.
Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type 403

Calogero–Moser–Sutherland Systems of Nonrelativistic


and Relativistic Type
S N M Ruijsenaars, Centre for Mathematics and and with their most prominent features and inter-
Computer Science, Amsterdam, The Netherlands relationships. Second, we intend to give a rough
ª 2006 Elsevier Ltd. All rights reserved. sketch of the state of the art concerning explicit
solutions for the various versions. This involves a
concretization of the action-angle maps and eigen-
function transforms that simultaneously diagonalize
Introduction the commuting dynamics, paying special attention to
Systems of Calogero–Moser–Sutherland (CMS) type their remarkable duality properties.
form a class of finite-dimensional dynamical systems It is beyond the scope of this article to review the
that are integrable both at the classical and at the hundreds of papers specifically dealing with CMS
quantum level. The CMS systems describe N point type systems, let alone the much larger literature
particles moving on a line or on a ring, interacting where they play some role. Indeed, the systems have
via pair potentials that are specific functions of four been encountered in a great many different contexts
types, namely rational (I), hyperbolic (II), trigono- and they are related to a host of other integrable
metric (III), and elliptic (IV). They occur not only in systems in various ways. Accordingly, they can be
a nonrelativistic (Galilei-invariant), but also in a studied from the perspective of various subfields of
relativistic (Poincaré-invariant) setting. Thus, one mathematics and theoretical physics. First some of
can distinguish a hierarchy of 16 physically distinct these perspectives and relations to seemingly quite
versions (classical/quantum, nonrelativistic/relativis- different topics will be mentioned before embarking
tic, type I–IV), the most general one being the on the far more focused survey.
quantum relativistic type IV system. Staying first within the confines of the CMS type
The nonrelativistic systems date back to pioneer- systems, some nonobvious limits yielding other
ing work by Calogero, Sutherland, and Moser in the familiar finite-dimensional integrable systems will
early 1970s. The pair potential structure of the be mentioned. To begin with, all of the AN1 type
interaction can be encoded in the root system AN1 , systems give rise to systems with a Toda type
and there also exist integrable versions for all of the (exponential ‘‘nearest neighbor’’) interaction via a
remaining root systems. The classical systems are suitable limiting transition (basically a strong-
given by N Poisson commuting Hamiltonians with a coupling limit). This leads to integrable N-particle
polynomial dependence on the particle momenta systems with a classical/quantum, nonrelativistic/
p1 , . . . , pN . Accordingly, the quantum versions are relativistic, nonperiodic/periodic version; starting
described by N commuting Hamiltonians that are from the quantum relativistic periodic Toda system,
partial differential operators. the remaining seven versions can be obtained by
The relativistic systems were introduced in the suitable limits.
mid-1980s, at the classical level by Ruijsenaars and Next, we recall that the quantum system of N
Schneider, and at the quantum level by Ruijsenaars. nonrelativistic bosons on the line or ring interacting
They converge to the nonrelativistic systems in the via a pair potential of -function type is soluble via a
limit c ! 1, where c is the speed of light. Again, the Bethe ansatz, with the ‘‘line version’’ exhibiting
systems can be related to the root system AN1 , and quantum soliton behavior (factorized scattering). It
they admit integrable versions for other root has been shown that there exist scaling limits of
systems. All of the commuting classical Hamilto- eigenfunctions for suitable CMS systems that give
nians depend exponentially on generalized momenta rise to the latter Bethe type eigenfunctions for N = 2,
p1 , . . . , pN . Hence, the associated commuting quan- while convergence for N > 2 is plausible, but has
tum Hamiltonians are analytic difference operators. not been demonstrated thus far.
The above integrable systems can be further Via suitable analytic continuations preserving
generalized by allowing supersymmetry or internal reality/formal self-adjointness, one can arrive at
degrees of freedom (‘‘spins’’), coupled in quite CMS systems with more than one species of particle
special ways to retain integrability. In this article, (particles and ‘‘antiparticles’’). Likewise, analytic
however, the focus is on the 16 versions of the continuations and appropriate limits of CMS sys-
AN1 -symmetric CMS systems without internal tems associated with root sytems other than AN1
degrees of freedom. The primary aim is to acquaint lead to a further proliferation of N-dimensional
the reader with their definition and integrability, integrable systems. Typically, such limits refer either
404 Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type

to the commuting Hamiltonians (the Toda limit in the intersection of the theory of Hilbert space
being a case in point) or to the joint eigenfunctions eigenfunction expansions and the theory of linear
(as exemplified by the -function system limit); it analytic difference equations.
seems difficult to control both sets of quantities at The study of the thermodynamics (N ! 1 limit
once. with temperature  0 and density  0 fixed) asso-
Starting from the spin type CMS systems, another ciated with the trigonometric and elliptic CMS
kind of limit can be taken. Specifically, by ‘‘freez- systems and their spin cousins yields its own circle
ing’’ the particles at equilibrium positions, it is of problems. It was initiated by Sutherland three
possible to arrive at integrable spin chains of decades ago, and even though a host of results on
Haldane–Shastry and Inozemtsev type. partition functions, correlation functions, fractional
At this point, it is expedient to insert a brief statistics, strong–weak coupling duality, relations to
remark on finite-dimensional integrable systems. As Yangians, etc., have meanwhile been obtained,
the term suggests, one may expect that, with due many questions are still open. This area also has
effort, such systems can be ‘‘integrated,’’ or, equiva- links with random-matrix theory, but the input from
lently, ‘‘solved.’’ But it should be noted that the this field is thus far limited to certain discrete
latter terms (let alone the qualifier ‘‘due effort’’) couplings.
have no unambiguous mathematical meaning. Cer- The above N-dimensional integrable systems are
tainly, ‘‘solving’’ involves obtaining explicit infor- related to a great many infinite-dimensional integr-
mation on the action-angle map and joint able systems, both at the classical and at the
eigenfunction transform at the classical and quan- quantum level. On the one hand, there are structural
tum level, resp., but a priori it is not at all clear how analogs that have been used to advantage in the
far one can proceed. study of CMS systems, including Lax pair and R-
Focusing again on the CMS systems and their matrix formulations, zero-curvature representations,
relatives, it should be stressed that, in many cases, bi-Hamiltonian formalism, Bäcklund transforma-
one is still far removed from a complete solution, tions, time discretizations, and tools such as Baker–
especially for the elliptic CMS systems. In this Akhiezer functions, Bethe ansatz, separation of
regard the previous remark serves not only as a variables, and Baxter-type Q-operators.
caveat, but also to make clear why the various On the other hand, there are striking physical
vantage points provided by different subfields in similarities between various soliton field theories
mathematics and physics are crucial: typically, they (a prominent one being the sine-Gordon field
yield complementary insights and distinct represen- theory) and infinite soliton lattices (in particular
tations for solutions, serving different purposes. several Toda type lattices), and the CMS systems for
To be sure, in first approximation the mathe- special parameter values. Particularly conspicuous
matics involved at the classical and quantum level is are the ties between the classical CMS systems and
symplectic geometry and Hilbert space theory, resp. the KP and two-dimensional Toda hierarchies. The
In point of fact, however, far more ingredients have latter relations actually extend beyond the solitons,
turned out to be quite natural and useful. On the including rational and theta function solutions.
classical level, these include the theory of groups, Lie CMS systems are relevant in various other
algebras and symmetric spaces, linear algebra and contexts not yet mentioned. A prominent one
spectral theory, Riemann surface theory, and more among these is a class of supersymmetric gauge
generally algebraic geometry. field theories. In this quantum context, the classical
On the quantum level, the viewpoint of harmonic CMS systems have surfaced in the description
analysis on symmetric spaces is particularly natural of moduli spaces encoding the vacuum structure
and fruitful for the nonrelativistic CMS systems and (Seiberg–Witten theory). Equally surprising, certain
their arbitrary root-system versions, whereas quan- classical CMS systems (with internal degrees
tum groups/algebras/symmetric spaces can be tied in of freedom) have found a second application in a
with the relativistic systems and their versions for quantum context, namely in the description of
other root systems. (The c ! 1 limit amounts to the quantum chaos (level repulsion).
q ! 1 limit in the quantum group picture.) As a We conclude this introduction by listing addi-
matter of fact, the whole area of special functions tional disparate subjects where connections with
and their q-analogs is intimately related to the CMS type systems have been found. These include
quantum CMS type systems (cf. also the last section the theory of Sklyanin, affine Hecke, Kac–Moody,
of this article). Finally, the occurrence of commut- Virasoro and W-algebras, equations of Knizhnik–
ing analytic difference operators in the relativistic Zamolodchikov, Yang–Baxter, Witten–Dijkgraaf–
(q 6¼ 1) systems leads to largely uncharted territory Verlinde–Verlinde, and Painlevé type, Gaudin,
Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type 405

Hitchin, Wess–Zumino, matrix and quasi-exactly reduces to [5] (up to an additive constant). Likewise,
solvable models, Dunkl–Cherednik and Polychrona- [7] results from [6] by choosing ! = =2 and
kos operators, the quantum Hall effect and quantum taking i!0 to 1.
transport, two-dimensional Yang–Mills theory, The physical picture associated with the trigono-
functional equations, integrable mappings, Huygens’ metric and elliptic systems is quite different from
principle, and the bispectral problem. that of the rational and hyperbolic ones. Of course,
the potentials [7] and [6] are again repulsive, but
now the internal motion is confined and oscillatory.
Classical Nonrelativistic CMS Systems More specifically, due to energy conservation the
A system of N nonrelativistic equal-mass m particles phase spaces
on the line interacting via pair potentials can be III ¼ GIII  R N ;
described by a Hamiltonian
GIII ¼ fxN <    < x1 ; x1  xN < =g ½8
1 XN X
H¼ p2j þ Vðxj  xk Þ; m>0 ½1 IV ¼ GIV  R N ;
2m j¼1 1j<kN
GIV ¼ fxN <    < x1 ; x1  xN < 2!g ½9
The CMS systems are defined by four distinct
choices of pair potential. The simplest choice reads are left invariant by the flow generated by the
trigonometric and elliptic N-particle Hamiltonian, resp.
VðxÞ ¼ g2 =mx2 ; g>0 ðIÞ ½2 Alternatively, one may interpret the trigonometric
Hamiltonian as describing particles constrained to
Hence, the coupling constant g has dimension
move on a circle and interacting via the inverse
[action] (the product of [position] and [momen-
square potential [2]. In this picture, the quantities
tum]). This potential is clearly repulsive. Thus, each
2x1 , . . . , 2xN are viewed as angular positions on
initial state in the phase space
the circle, and one needs a suitable quotient of the
 ¼ fðx; pÞ 2 R2N j x 2 Gg ½3 phase space [8] by a discrete group action to
describe a state of the system.
where G is the configuration space Turning to integrability aspects, we begin by
G ¼ fx 2 RN j xN <    < x1 g ½4 noting that the total momentum Hamiltonian

is a scattering state. X
N
P¼ pj ½10
The next level is given by the hyperbolic choice
j¼1

VðxÞ ¼ g2  2 =m sinh2 ðxÞ; >0 ðIIÞ ½5 obviously Poisson commutes with the above defin-
1
Hence,  has dimension [position] , and the ing Hamiltonians of the systems. For N = 2, there-
previous system arises by taking  to 0. It is clear fore, integrability is plain. It is possible to write
that [5] yields again a repulsive particle system, so down explicitly the higher commuting Hamiltonians
that each state in  given by [3] is a scattering state. for N > 2 as well but, in the nonrelativistic setting,
The highest level in the hierarchy is the elliptic it is more illuminating to characterize them as the
level, where power traces or (equivalently) the symmetric func-
tions of a so-called Lax matrix.
VðxÞ ¼ g2 }ðx; !; !0 Þ=m; !; i!0 > 0 ðIVÞ ½6 The Lax matrix is an N  N matrix-valued
and }(x; !, !0 ) denotes the Weierstrass }-function function on the phase space of the system. It plays
with periods 2! and 2!0 . It is beyond the scope of a pivotal role not only for understanding integr-
this article to elaborate on the elliptic regime, even ability, but also for setting up an action-angle
though it is of considerable interest. It reappears in transformation. The latter issue is discussed again
later sections as the most general regime in which later. Here the more conspicuous features of the Lax
integrability holds true. Indeed, a prominent feature matrix will be explained, focusing on the type II
of the elliptic case [6] is that it can be specialized system for expository ease. Then one can choose
both to the hyperbolic case [5] and to the trigono- Ljj ¼ pj ; Ljk ¼ ig=sinh ðxj  xk Þ;
metric case, given by
j; k ¼ 1; . . . ; N; j 6¼ k ½11
VðxÞ ¼ g2  2 =m sin2 ðxÞ ðIIIÞ ½7
Thus, L is Hermitean and we have
To obtain the hyperbolic specialization, one
should take !0 = i=2 and send ! to 1; then [6] tr L ¼ P; tr L2 ¼ 2mH ½12
406 Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type

(The rational Lax matrix results from [11] by taking Accordingly, one gets
 ! 0, and the trigonometric one by taking  ! i.
The elliptic Lax matrix has a similar structure, but it ^1 ; . . . ; p
Lt diagðp ^N Þ ¼ L1 ; t!1 ½19
involves an extra ‘‘spectral’’ parameter.)
Since the time evolution is a canonical transforma-
Although not obvious, it is true that all of the
tion and the Poisson brackets {Hk , Hl } are time
power traces
independent (by the Jacobi identity), it now readily
1 follows from [19] that they vanish. (Indeed, Hk and
Hk ¼ tr Lk ; k ¼ 1; . . . ; N ½13 Hl reduce to power traces of L1 , and the asymptotic
k
momenta p̂1 , . . . , p̂N Poisson commute.)
are in involution (i.e., Poisson commute). One way to
understand this involves the so-called Lax pair
equation associated with the Hamiltonian flow gener-
Quantum Nonrelativistic CMS Systems
ated by H = H2 =m. This involves a second N  N
matrix function given by The canonical quantization prescription
X ig 2 pj ! ih@=@xj ; j ¼ 1; . . . ; N ½20
Mjj ¼ 2
l6¼j m sinh ðxj  xl Þ
(h being the Planck constant) gives rise to an
2 ½14 unambiguous quantum Hamiltonian
ig cosh ðxj  xk Þ
Mjk ¼
m sinh2 ðxj  xk Þ
2 X
h N X
j 6¼ k H¼ @j2 þ Vðxj  xk Þ ½21
2m j¼1 1j<kN
When the positions and momenta in L and M evolve
according to the H-flow, one has for any classical Hamiltonian [1]. Thus, the defin-
ing Hamiltonians of the above systems give rise to
L_ t ¼ ½Mt ; Lt  ½15 well-defined partial differential operators (PDOs),
which act on suitable dense subspaces of the
where [  ,  ] is the matrix commutator. (Indeed, [15]
Hilbert space L2 (G , dx),  = I, . . . , IV, with GI and
amounts to the Hamilton equations, as is readily
GII given by G in [4], and GIII , GIV by [8] and [9],
checked.) Since M is anti-Hermitean, it is not
respectively.
difficult to derive from this Lax pair equation that
We recall that there is no general result ensuring that
the flow is isospectral: Lt is related to L0 by a
a classically integrable system admits an integrable
unitary transformation Lt = Ut L0 Ut obtained from
quantum version. More precisely, when one substi-
Mt , so that the spectrum of Lt is time independent.
tutes [20] in N Poisson commuting Hamiltonians, it
This argument already shows the existence of N
need not be true that they commute as quantum
conserved quantities under the H-flow, namely the
operators, even when no ordering ambiguities are
N eigenvalues of L. It is, however, simpler to work
present. For the power trace Hamiltonians such
with either the power traces Hk given by [13] or
ambiguities do occur. (For example, [11] gives rise
with the symmetric functions Sk of L, given by
to a term in H3 proportional to p1 =sinh2 (x1  x2 ).)
X
N On the other hand, no noncommuting factors occur
detð1N þ LÞ ¼  k Sk ½16 in the quantization of S1 , . . . , SN . To verify this, one
k¼0 need only note that Sk equals the sum of all k  k
These Hamiltonians depend only on the eigenvalues principal minors of L, cf. [16]; choosing a diagonal
of L, so they are also conserved under the flow. element pj in a summand, one therefore has no
Note that dependence on xj in the remaining factors, hence no
ordering ambiguity.
S1 ¼ P; S2 ¼ P2  mH ½17 As a result, the prescription [20] yields N
To see why these Hamiltonians are in involution, unambiguous operators Sk (x, ihr), which are
one can invoke the long-time asymptotics of the moreover formally self-adjoint on L2 (G , dx) for
H-flow. It reads each of the four cases  = I, . . . , IV. Although by no
means obvious, it is true that these operators do
^;
pðtÞ p p^N <    < p
^1 ; commute. Thus, integrability is preserved under
xj ðtÞ xþ ^j =m;
þ tp ½18 quantization of the above systems. Now the power
j
traces of a matrix can be expressed as polynomials
j ¼ 1; . . . ; N; t ! 1 in the symmetric functions (via the Newton
Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type 407

identities), so this yields an ordering ensuring that A natural ansatz to take interaction into account
the quantized power traces commute as well. now reads
Just as the action-angle transformation for a
X
N p 
classically integrable system ‘‘diagonalizes’’ all of j
H ¼ mc2 cosh Vj ðxÞ
the Poisson commuting Hamiltonians at once (in the j¼1
mc
sense that the transformed Hamiltonians depend
X
N p 
only on the action variables), one expects that there P ¼ mc sinh
j
Vj ðxÞ ½25
exists a unitary operator that transforms all of the j¼1
mc
commuting Hamiltonians to diagonal form. In the Y
classical setting, the existence of this diagonalizing Vj ðxÞ ¼ f ðxj  xk Þ
k6¼j
map follows (under suitable technical restrictions)
from the Liouville–Arnold theorem, whereas in the Indeed, it is plain that this still entails
quantum context the existence of such a joint
eigenfunction transformation is a far more delicate fH; Bg ¼ P; fP; Bg ¼ H=c2 ½26
issue. This problem is briefly discussed later again,
noting here that the solutions obtained to date vary But to obtain a relativistic particle system, the time
considerably in completeness and ‘‘explicitness’’ for and space translations must also commute. The
the four regimes. corresponding requirement {H, P} = 0 yields a severe
constraint on the ‘‘pair potential’’ function f (x) in
[25] whenever N > 2. (For N = 2, one gets
{H, P} = 0 irrespective of the choice of f.)
Classical Relativistic CMS Systems As it turns out, the vanishing requirement is
The nonrelativistic spacetime symmetry group is the satisfied when
Galilei group. Its Lie algebra is represented by the
f 2 ðxÞ ¼ a þ b}ðxÞ ½27
time translation generator H given by [1], space
translation generator P given by [10], and the Galilei where a, b are constants and }(x) is the Weierstrass
boost generator function already encountered. Taking, for example,
a, b > 0, one can take the positive square root of the
X
N
B ¼ m xj ½22 right-hand side of [27]. This choice of f (x) yields the
j¼1
defining Hamiltonian of the relativistic elliptic
system (type IV). In the three degenerate cases, it is
More precisely, the Poisson brackets are given by convenient to choose
8 2 2 2 2 1=2
fH; Pg ¼ 0; fH; Bg ¼ P; fP; Bg ¼ Nm ½23 < ð1 þ g =m c x Þ
> ðIÞ
2 2 1=2
so that the last bracket does not vanish (as is f ðxÞ ¼ ð1 þ sin ðg=mcÞ=sinh ðxÞÞ ðIIÞ ½28
>
: 2 2 1=2
the case for the Galilei Lie algebra). This deviation ð1 þ sinh ðg=mcÞ=sin ðxÞÞ ðIIIÞ
is inconsequential, however, since the constant
Nm (central extension) yields trivial Hamilton It is an elementary exercise to check that this
equations. implies
The relativistic spacetime symmetry group (Poin-
lim ðH  Nmc2 Þ ¼ Hnr ; lim P ¼ Pnr ½29
caré group) yields a Lie algebra that differs from c!1 c!1
[23] only in Nm being replaced by H=c2 , where c is
where Hnr and Pnr are the above nonrelativistic time
the speed of light. Clearly, the functions
and space translation generators. Hence, the defin-
X
N p  ing Hamiltonians of the relativistic systems reduce
j
H ¼ mc2 cosh to their nonrelativistic counterparts in the limit
mc
j¼1 c ! 1.
½24
X
N p  The special character of the function [27] makes
j
P ¼ mc sinh itself felt not only in ensuring Poincaré invariance,
mc
j¼1 but also in entailing integrability. To begin with,
note that the functions
together with B given by [22] give rise to these
altered Poisson brackets. Physically, these three  X
N 
generators describe a system of N relativistic free S
N ¼ exp
 pj ;  ¼ 1=mc ½30
mass-m particles in terms of their rapidities pj =mc. j¼1
408 Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type

commute with H and P, so that integrability for f (x) = 1, one obtains commuting quantum operators
N = 3 is plain. More generally, the Hamiltonians whose action is exemplified by
X  X Y    
S
l ¼ exp
 pj f ðxj  xk Þ; h d h
exp  i FðxÞ ¼ F x  i ½39
I f1;...;Ng j2I j2I
½31 mc dx mc
jIj¼l k62I

l ¼ 1; . . . ; N That is, the operators act on functions that have an


analytic continuation in x1 , . . . , xN from the real line
can be shown to mutually commute. Clearly, one has R to a strip around R in the complex plane C,
Sl ¼ SN SNl ; l ¼ 1; . . . ; N  1 ½32 whose width is at least 2h=mc.
Operators of this type are called analytic differ-
and ence operators (henceforth AOs). The choice
H ¼ ðS1 þ S1 Þ=2m2 ; P ¼ ðS1  S1 Þ=2 ½33 f (x) = 1 amounts to the free case g = 0 in [28].
For g 6¼ 0, however, the canonical quantization
exemplified by [39] yields noncommuting AOs.
As anticipated by the notation, the functions Thus, the factor ordering following from [31]
S1 , . . . , SN may be viewed as the symmetric functions would entail that integrability breaks down at the
of a Lax matrix. More precisely, in the elliptic case quantum level.
this is true up to multiplicative constants that As mentioned before, there is no general result
depend on a spectral parameter occurring in the guaranteeing that a different ordering that preserves
Lax matrix. As before, only the Lax matrix for the integrability exists. Even so, this is true in the
type II system is specified here. In this case, one can present case. Specifically, the function f (x) can be
dispense with the spectral parameter and choose factorized as fþ (x)f (x), and then the AOs
Ljk ¼ ej Cjk ek ; j; k ¼ 1; . . . ; N ½34 X Y
S
l ¼ f ðxj  xk Þ
I f1;...;Ng j2I
where jIj¼l k62I
Y  X Y
ej ¼ expðxj þ pj =2Þ f ðxj  xl Þ1=2 ½35  exp ih @j f
ðxj  xk Þ ½40
l6¼j j2I j2I
k62I

sinhðigÞ
Cjk ¼ expððxj þ xk ÞÞ ½36 do commute. In the elliptic case [27], this factoriza-
sinh ðxj  xk þ igÞ
tion involves the Weierstrass -function, and com-
In [35], f (x) is the type II function given by [28]. The mutativity can be encoded in a sequence of
matrix C arises from Cauchy’s matrix 1=(wj  zk ) functional equations satisfied by the -function.
via a suitable substitution, and Cauchy’s identity For the type I–III systems the pertinent factorization
 N of [28] is given by
1 8
det > 1=2
wj  zk j;k¼1 < ð1
ig=xÞ ðIÞ
Y
N Y ðwj  wk Þðzj  zk Þ f
ðxÞ ¼ ðsinh ðx
igÞ=sinh xÞ1=2 ðIIÞ ½41
1 >
:
¼ ½37 ðsin ðx
igÞ=sin xÞ1=2 ðIIIÞ
w  zj 1j<kN ðwj  zk Þðzj  wk Þ
j¼1 j
(Here one has g > 0, and the choice of square root is
ensures that [34] yields the Hamiltonians Sl of [31]. such that f
(x) ! 1 for g # 0.)
To conclude this section, we point out that the The nonrelativistic limit c ! 1 of the quantum
relation Hamiltonians [33] can be determined by expanding
S1 and S1 in a power series in  = 1=mc. In this
L ¼ 1N þ Lnr þ Oð 2 Þ; !0 ½38
way, one obtains once more [29], except for a small,
where Lnr denotes the nonrelativistic Lax matrix but crucial change in Hnr : instead of the coupling
[11], can be used to deduce the involutivity of the constant dependence g2 in the potential energy, one
nonrelativistic Hamiltonians from that of their gets g(g  h). The extra term arises from the action
relativistic counterparts. of the term linear in  in the expansion of the
exponential on the term linear in  in the expansion
of the functions f
(x).
Quantum Relativistic CMS Systems
From the perspective of the nonrelativistic quan-
When the canonical quantization prescription [20] is tum CMS systems, the change g2 ! g(g  h) appears
applied to the classical Hamiltonians [31] with ad hoc. As it transpires, however, the different
Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type 409

dependence on g ensures that the eigenfunctions of and in particular to reveal its hidden duality
Hnr depend on g in a far simpler way. This will properties. The starting point is a commutation
become clear shortly. relation of L(x, p) with a diagonal matrix A(x)
given by
AðxÞ ¼ diagðdðx1 Þ; . . . ; dðxN ÞÞ
Action-Angle Transforms and Duality 
y (I) ½44
Under certain technical assumptions, any integrable dðyÞ ¼
expð2yÞ (II)
system given by N independent Poisson commu-
ting Hamiltonians S1 (x, p), . . . , SN (x, p) on a 2N- Obviously, the symmetric functions Ďk (x) of A(x)
dimensional phase space admits local canonical yield an integrable system on , so the Hamiltonians
transformations to action-angle variables. Like the  k 1 Þð^
^Þ ¼ ðD ^Þ;
x; p
Dk ð^ x; p k ¼ 1; . . . ; N ½45
spectral theorem on the quantum level, this
structural result is of limited practical value. Indeed, yield an integrable system on the action-angle phase
just as the spectral theorem yields no concrete space .ˆ The crux of the matter is now that these
information concerning eigenfunctions, bound-state systems are familiar: they are also systems of type I
energies, scattering, etc., associated with a given and II!
self-adjoint Hamiltonian, the Liouville–Arnold To be specific, let us denote the dual systems just
theorem only yields general insight in the type of described by a caret, and the nonrelativistic/relati-
motion that can occur and the geometric character vistic systems by a suffix nr/rel, resp. Then the
of the local maps (in terms of invariant tori). duality properties alluded to are given by
To fully comprehend (‘‘solve’’) a given integrable
^Inr ’ Inr ; ^Irel ’ IInr
system, one should render the associated action- ½46
angle map as concrete as possible. For the CMS type ^ nr ’ Irel ;
II ^ rel ’ IIrel
II
systems, a complete solution to this problem has
only been achieved for the systems of type I–III. The and 1 serves as the action-angle map for the dual
motion in the trigonometric systems is oscillatory, so systems.
that a closeup via the action-angle transform In order to sketch why this state of affairs holds
involves extensive geometric constructions. By con- true for the IIrel system, recall that its Lax matrix is
trast, the type I and II systems are scattering systems, given by [34]. From this, one readily checks the
and here the action-angle map can be tied in with commutation relation
the classical wave maps (Møller transformations). cothðigÞ½A; L ¼ 2e  e  ðAL þ LAÞ ½47
We now sketch some salient features of the
action-angle maps for systems of type I and II. In Since L is Hermitean, there exists a unitary U
all cases the map (denoted ) is a canonical diagonalizing L. It can now be shown that the
transformation from the phase space  (eqn [3]) spectrum of L is positive and nondegenerate, and
with 2-form dx ^ dp to the phase space that U e has nonzero components. The gauge
ambiguity in U (given by a permutation matrix and
^ ¼ fð^
 ^Þ 2 R2N j p
x; p ^ 2 Gg ½42 diagonal phase matrix) can, therefore, be fixed by
requiring
with 2-form dx̂ ^ dp̂. Thus, the actions p̂1 , . . . , p̂N
vary over G given by [4] and the ‘‘angles’’ x̂1 , . . . , x̂N U LU ¼ diagðexpðp
^1 Þ; . . . ; expðp
^N ÞÞ;
over R. Consequently,  ˆ amounts to  with x and p
^ ^
pN <    < p1 ½48
interchanged.
As should be the case, the transformed commuting ðU eÞj > 0; j ¼ 1; . . . ; N ½49
Hamiltonians
A suitable reparametrization of U e then yields the
^
Sk ¼ Sk  1 ; k ¼ 1; . . . ; N ½43 ‘‘angle’’ vector x̂.
depend only on the action vector p̂. To be specific, As a consequence, U AU becomes a function of x̂
they arise from Sk (x, p) by taking g = 0 (no interac- and p̂. In detail, one finds
tion, hence no x dependence) and substituting p ! p̂. ðU AUÞð^ ^Þ ¼ Lð=2; 2; p
x; p ^ ÞT
^; x ½50
Indeed, the actions p̂k are the t ! 1 limits of the
momenta pk (t), where the t dependence refers to the where L(, ; x, p) is given by [34] and T denotes the
defining Hamiltonian of the system. transpose. Therefore, the ‘‘dual Lax matrix’’
As it happens, the Lax matrix L is of decisive  = U AU is essentially equal to L, explaining the
importance to concretize the action-angle map , ^ rel ’ IIrel announced above.
self-duality II
410 Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type

With the action-angle transform under explicit existence of joint eigenfunctions has been shown,
control, much more can be said about the solutions but also because in the relativistic case the unitarity
to Hamilton’s equations for each of the commuting of II and IV already breaks down for N = 2 when
Hamiltonians, both as regards finite times and as g increases beyond a critical value, cf. [57] below. It
regards long-time asymptotics (scattering). It is is quite likely that this happens for N > 2 as well,
beyond the scope of this article to enlarge on this, but this is not readily apparent from the current
but it is worth mentioning that the scattering reveals fragmentary knowledge on joint eigenfunctions for
the solitonic character of the particles. Indeed, the N > 2.
set of asymptotic momenta p̂1 , . . . , p̂N is conserved The only two cases where the g > 0 joint
under the scattering and the asymptotic position eigenfunction transform is of an elementary nature
shifts are factorized in terms of pair shifts. A quite are the IIInr and IIIrel cases. Indeed, the joint
remarkable feature of the type I systems is that the eigenfunctions describing the internal motion are of
shifts actually vanish (‘‘billiard ball’’ scattering). the form

n ðxÞ ¼ WðxÞ1=2 Pn ðxÞ; n 2 NN1 ½54


Eigenfunction Transforms and Duality Here,
Both at the relativistic and at the nonrelativistic level Y
the commuting quantum Hamiltonians S1 , . . . , SN WðxÞ ¼ wðxj  xk Þ ½55
1j<kN
are formally self-adjoint on the Hilbert space
L2 (G , dx),  = I, . . . , IV. Thus, it may be expected is a positive weight function on GIII and the Pn (x)
that it is possible to construct a unitary eigenfunc- are multivariable orthogonal polynomials. Thus,
tion transform Pn (x) is a finite linear combination of the above
^  ; d  ðpÞÞ; free boson states, with p in [52] a linear function of
 : L2 ðG ; dxÞ ! L2 ðG
n. For the IIInr case, these eigenfunctions were
 ¼ I; . . . ; IV ½51 already found by Sutherland. (Here, the functions
diagonalizing Sk as multiplication by a real-valued Pn (x) amount to polynomials, often called the Jack
function Mk (p). Here Ĝ encodes the joint spectrum polynomials, which arose in a statistics context.)
and d  (p) is a suitable measure on Ĝ . The IIIrel polynomials may be viewed as the special
Obviously, this expectation is borne out in the AN1 case of Macdonald’s orthogonal q-polyno-
free case g = 0. Then,  is basically Fourier mials for arbitrary root systems, with
transformation, its kernel consisting of a sum of q ¼ expð2hÞ ½56
joint eigenfunctions
(Note that q converges to 1 both in the nonrelati-
expðix  ðpÞ=
hÞ;  2 SN ½52 vistic limit c ! 1 and in the classical limit h ! 0.)
with  ranging over the permutation group SN . For For the IInr case, the joint eigenfunctions were
 = I, II, one can take G = Ĝ = G (eqn [4]) and found and studied a couple of decades ago by
d  (p) = dp. Here one gets Heckman and Opdam, yielding a multivariable
( hypergeometric transform. Indeed, for N = 2, the
X pi1    pik eigenfunctions can be expressed in terms of the
Mk ðpÞ ¼ ½53 hypergeometric function 2 F1 , as has been known
1i1 <<ik N expðpi1 Þ    expðpik Þ
since the early days of quantum mechanics. Like-
in the nonrelativistic and relativistic case, resp. For wise, the arbitrary-N Inr joint eigenfunction trans-
 = III, IV, one needs to take into account periodic form (studied in detail by de Jeu) can be viewed as a
boundary conditions on the walls of G , yielding a multivariable Hankel transform, the N = 2 kernel
discrete joint spectrum after the center-of-mass being essentially a Hankel function.
motion is omitted. (With the above choices of GIII Much less is known concerning IVnr eigenfunc-
and GIV , cf. [8] and [9], the center-of-mass motion is tions, and a fortiori for the associated transform
a free motion along the line, so the total momentum IV . For N = 2 the time-independent Schrödinger
still varies continuously.) Of course, the diagona- equation amounts to the Lamé equation. Hence,
lized Sk are once more given by [53], since the kernel solutions are Lamé functions that can be studied in
of  consists of free boson states. particular via Fuchs theory (regular singularities). A
Taking next g > 0, the above expectation has not far more explicit form of the eigenfunctions dates
been confirmed for all of the eight regimes involved. back to work by Hermite in the nineteenth century.
This is not only because in some cases not even the More precisely, provided the g dependence of the
Calogero–Moser–Sutherland Systems of Nonrelativistic and Relativistic Type 411

defining Hamiltonian is changed from g2 to g(g  h) To conclude, we mention that the soliton scatter-
(a change already encountered above), Hermite’s ing behavior at the classical level is preserved under
results apply to couplings g = l h, l = 2, 3, 4, . . . His quantization in all cases where this can be checked.
eigenfunctions have a structure that is nowadays That is, no new momenta are created in the
referred to as the Bethe ansatz. For the same g values scattering process and the S-matrix is factorized as
and arbitrary N, Hnr eigenfunctions of Bethe ansatz a product of pair S-matrices. Moreover, for the type
type were found and studied by Felder and I cases, the S-matrix is a momentum-independent
Varchenko, but even for these g values much (but g-dependent) phase, as a quantum analog of the
remains to be done to achieve a complete under- classical billiard ball scattering.
standing of the IV transform.
A quite different approach, due to Komori and See also: Bethe Ansatz; Classical r-Matrices, Lie
Takemura, does yield rather detailed information on Bialgebras, and Poisson Lie Groups; Functional
IV for arbitrary g > 0. The key feature of their Equations and Integrable Systems; Integrable Discrete
Systems; Integrable Systems and Algebraic Geometry;
strategy is to view the IVnr case as a perturbation of
Integrable Systems in Random Matrix Theory; Integrable
the IIInr case. This entails, however, that the validity
Systems: Overview; Isochronous Systems; Ordinary
of their results is restricted to large imaginary period Special Functions; q-Special Functions; Quantum
of the }-function. Calogero–Moser Systems; Seiberg–Witten Theory;
For the IVrel system, there are only rather Separation of Variables for Differential Equations;
complete results on IV for N = 2. More specifically, Sine-Gordon Equation; Toda Lattices.
the eigenfunction transform is known to be unitary
for
g 2 ½0; 
h þ = ½57 Further Reading
and a dense set in a corresponding parameter space. Babelon O, Bernard D, and Talon M (2003) Introduction to
(For g outside this interval, unitarity is violated.) Classical Integrable Systems. Cambridge: Cambridge Univer-
The kernel of IV involves eigenfunctions of Bethe sity Press.
ansatz structure. For g = lh, l = 2, 3, . . . and arbitrary Calogero F (1971) Solution of the one-dimensional N-body
problem with quadratic and/or inversely quadratic pair
N, Bethe ansatz type Hrel eigenfunctions were found potentials. Journal of Mathematical Physics 12: 419–436.
by Billey, generalizing the Felder–Varchenko results Calogero F (2001) Classical Many-Body Problems Amenable to
mentioned above. Exact Treatments. Berlin: Springer.
It remains to discuss the Irel and IIrel systems. To van Diejen JF and Vinet L (eds.) (2000) Calogero–Moser–
this end, we first recall the classical dualities [46]. It Sutherland Models. Berlin: Springer.
Fock V, Gorsky A, Nekrasov N, and Rubtsov V (2000) Duality in
is natural to expect that these dualities are still integrable systems and gauge theories. Journal of High Energy
present at the quantum level. For the Inr case, this is Physics 7(28): 1–39.
readily confirmed: the transform is indeed invariant Marshakov A (1999) Seiberg–Witten Theory and Integrable
under interchange of x and p. In fact, the N = 2 Systems. Singapore: World Scientific.
center-of-mass Hankel transform even depends only Moser J (1975) Three integrable Hamiltonian systems connected
with isospectral deformations. Advances in Mathematics
on (x1  x2 )(p1  p2 ), so that self-duality is manifest 16: 197–220.
in this case. Olshanetsky MA and Perelomov AM (1981) Classical integrable
More generally, for N = 2 the expected dualities finite-dimensional systems related to Lie algebras. Physics
[46] are indeed present. The IInr 2 F1 transform Reports 71: 313–400.
satisfies the Irel analytic difference equation in p1  Olshanetsky MA and Perelomov AM (1983) Quantum integrable
systems related to Lie algebras. Physics Reports 94: 313–404.
p2 due to the contiguous relations obeyed by 2 F1 . The Ruijsenaars SNM (1987) Complete integrability of relativistic
IIrel transform is only unitary when g is restricted by Calogero–Moser systems and elliptic function identities.
[57], and it is indeed self-dual in the same sense as the Communications in Mathematical Physics 110: 191–213.
action-angle map (Ruijsenaars). Ruijsenaars SNM (1999) Systems of Calogero–Moser type. In:
Turning finally to the case N > 2, the multi-variable Semenoff G and Vinet L (eds.) Proceedings of the 1994 Banff
Summer School Particles and Fields, pp. 251–352. Berlin:
hypergeometric transform II does have the expected Springer.
duality property. More specifically, its inverse diag- Ruijsenaars SNM and Schneider H (1986) A new class of
onalizes the commuting Irel AOs (Chalykh). For IIrel integrable systems and its relation to solitons. Annals of
with N > 2 and g = l h, l = 2, 3, . . . , Chalykh also Physics (NY) 170: 370–405.
finds elementary joint eigenfunctions with the Sutherland B (1972) Exact results for a quantum many-body
problem in one dimension II. Physical Review A
expected self-duality. To date, no Hilbert space results 5: 1372–1376.
for the N > 2 IIrel case have been obtained.
412 Canonical General Relativity

Canonical General Relativity


C Rovelli, Université de la Méditerranée et Centre de initial time, the theory predicts the value A(t) of
Physique Théorique, Marseilles, France these quantities for any given later instant of time t.
ª 2006 Elsevier Ltd. All rights reserved. The space of the possible initial states s is the phase
space 0 . Observables are real functions on 0 .
Infinitesimal time evolution can be represented as a
vector field in 0 . This vector field is determined by
Introduction
the Hamiltonian, which is also a function on 0 . The
Lagrangian formulations of general relativity (GR) integral lines s(t) of this vector field determine
were found by Hilbert and by Einstein himself, the time evolution A(t) = A(s(t)) of the observables.
almost immediately after the discovery of the theory. This conceptual structure is very general. It can be
The construction of Hamiltonian formulations of easily adapted to special-relativistic systems. How-
GR, on the other hand, has taken much longer, and ever, it is not general enough for general-relativistic
has required decades of theoretical research. systems. GR is not formulated as the evolution of
The first such formulations were developed by states and observables in a preferred time variable
Dirac and by Bergmann and his collaborators, in the which can be measured by a physical clock. Rather,
1950s. Their cumbersome formalism was simplified it is formulated as the relative (common) evolution
by the introduction of new variables: first by of many observable quantities. Accordingly, in GR
Arnowit, Deser, and Misner in the 1960s and then there is no quantity playing the same role as the
by Ashtekar in the 1980s. A large number of conventional Hamiltonian. In fact, the canonical
variants and improvements of these formalisms Hamiltonian density that one obtains from a
have been developed by many other authors. Most Legendre transformation from a Lagrangian
likely the process is not over, and there is still much vanishes identically in GR.
to learn about the canonical formulation of GR. The origin of this peculiar behavior of the theory is
A number of reasons motivate the study of the following. The field equations are written as
canonical GR. In general, the canonical formalism evolution equations in a time coordinate t. However,
can be an important step towards quantum theory; they are invariant under arbitrary changes of t. That is,
it allows the identification of the physical degrees of if we replace t with an arbitrary function t0 = t0 (t) in a
freedom, and the gauge-invariant states and obser- solution of the field equations, we obtain another
vables of theory; and it is an important tool for solution. This underdetermination does not lead to a
analyzing formal aspects of the theory such as its lack of predictivity in GR, because we do not interpret
Cauchy problem. All these issues are highly non- the variable t as the measurable reading of a physical
trivial, and present open problems, in GR. clock, as we do in non-general-relativistic theories.
In turn, the structural peculiarity and the con- Rather, we interpret t as a nonobservable mathematical
ceptual novelty of GR have motivated re-analyses parameter, void of physical significance. Accordingly,
and extensions of the canonical formalism itself. the notions of ‘‘state at a given time’’ and ‘‘value of
The following sections discuss the source of the an observable at a given time’’ are very unnatural in GR.
peculiar difficulty of canonical GR, and summarize A Hamiltonian formulation of GR requires a
the formulations of the theory that are most version of the canonical formalism sufficiently
commonly used. general to deal with this broader notion of evolu-
tion. Generalizations of the Hamiltonian formalism
have been developed by many authors, such as Dirac
The Origin of the Difficulties (see below), Souriau, Arnold, Witten, and many
The reason for the complexity of the Hamiltonian others. The first step in this direction was taken by
formulation of GR is not so much in the intricacy of Lagrange himself: Lagrange gave a time-independent
its nonlinear field equations; rather, it must be found interpretation of the phase space as the space  of
in the conceptual novelty introduced by GR at the the solutions of the equations of motion (modulo
very foundation of the structure of mechanics. gauges). As we shall see, however, consensus is still
The dynamical systems considered before GR can lacking on a fully satisfactory formalism.
be formulated in terms of states evolving in time. One
assumes that a time variable t can be measured by a
Dirac Theory of Constrained Systems
physical clock, and that certain observable quantities
A of the system can be measured at every instant of Dirac has developed a Hamiltonian theory for
time. If we know the state s of the system at some mechanical systems with constraints, precisely in
Canonical General Relativity 413

view of its application to GR. Dirac’s theory is A constrained system is ‘‘first class’’ if the Poisson
beautiful, finds vast applications, and it is still brackets of the constraints among themselves
commonly taken as the basis to discuss Hamiltonian vanishes weakly. Maxwell theory and GR are first-
GR, although GR does not fit very naturally into class constrained systems. In a first-class constrained
Dirac’s scheme. In the following, only the part of system, the constraints generate flows that preserve
Dirac’s theory relevant for GR is summarized. C and foliate it into ‘‘orbits.’’ The space of these
Consider a Lagrangian system with Lagrangian orbits is called the physical phase space (see
variables qi , with i = 1, . . . , n. Call vi the corresponding Figure 1).
velocities. Let the system be defined by the Lagrangian This flow is interpreted as a ‘‘gauge’’ transforma-
L(qi , vi ). The momenta are defined as functions of qi tion, namely as a change of mathematical descrip-
and vi by pi (qi , vi ) = @L(qi , vi )=@vi . The canonical tion of the same physical state. As first observed by
Hamiltonian H(qi , pi ) = vi (qi , pi )pi  L(qi , vi (qi , pi )) Dirac, such interpretation is necessary if we demand
(summation over repeated indices is understood) is a deterministic physical evolution, for the following
obtained by inverting the function pi (qi , vi ) and expres- reason. A first-class constrained system is a system
sing the velocities as functions of the momenta vi (qi , pi ). in which the time evolution qi (t) of the Lagrangian
The phase space 0 is the space of the variables (qi , pi ). variables is not completely determined by the
Infinitesimal time evolution is given by the vector field equations of motion. (The relation between con-
V = vi (qi , pi )@=@qi þ fi (qi , pi )@=@pi , where velocities straints and underdetermination of the evolution is
and forces are given by the Hamilton equations simple to understand. In a Lagrangian system, the
vi = @H=@pi and fi = @H=@qi . number of equations of motion is equal to the
More formally, the 2-form ! = dpi ^ dqi endows number of Lagrangian variables. If one of these
0 with a symplectic structure. In the presence of equations is a constraint (between the initial
such a structure, every function A determines a velocities and initial coordinates), then one evolu-
vector field VA , defined by iVA ! = dA. By inte- tion equation is missing.) To recover a deterministic
grating this field, we have a flow in 0 , called the physical evolution, we must interpret two ‘‘mathe-
flow generated by A. Time evolution is the flow matical’’ states that can evolve from the same initial
generated by the Hamiltonian. Given two functions data, as describing the same ‘‘physical’’ state. As
A and B, their Poisson brackets are defined by the shown by Dirac, the transformations generated by
function {A, B} = VA (B) = VB (A). Therefore, the the constraints are precisely the ones that implement
time evolution of an observable A satisfies such an identification.
dA=dt = {A, H}. A dynamical system is completely It follows that the physical states must be identified
characterized by the set (0 , !, A, H), where with the equivalence classes of the points of C under
A = (A1 , . . . , AN ) is the ensemble of the observables. the gauge transformations generated by the con-
A constrained system, in the sense of Dirac, is straints, namely with the orbits of their flow. It is
a system for which the image of the function vi ! easy to show that (locally) there is a unique
pi (qi , vi ) is smaller than Rn . We can characterize symplectic 2-form !ph on ph such that its pullback
the image I of the map (qi , vi ) ! (qi , pi ) with a set to C is equal to the pullback of ! to C (i ! =  !ph ,
of equations on 0 see Figure 1). Physical observables Aph are functions
on C that are gauge invariant, namely constant on
C ðqi ; pi Þ ¼ 0 ½1

where  = 1, . . . , m0 . These are called the primary Γ0


constraints. i
The ‘‘constraint surface’’ C is the largest subspace Orbits

of I which is preserved by time evolution. It can be C


characterized by adding additional constraints, still
of the form (1), with  = m0 þ 1, . . . , m. These
π
additional constraints, called secondary constraints,
can be computed as the Poisson brackets of the π
Γph
primary constraints with the Hamiltonian (plus the
Poisson brackets of these secondary constraints with
Space of the orbits
the Hamiltonian, and so on, until the Poisson
Figure 1 The structure of a first-class constrained system.
brackets of all the constraints with the Hamiltonian 0 : phase space, C : constraint surface, ph : physical phase
vanish on in C). We say that an equation holds space; i : imbedding of C in ;  projection to orbit space
weakly if it holds on C. (sending each point into its orbit).
414 Canonical General Relativity

the orbits. That is, they are functions on ph . The freedom of GR are therefore (10  4  4) = 2 per
Hamiltonian is a physical observable. The dynamical point. In the linearized theory, these are the two
system (ph , !ph , Aph , H), where Aph is the ensemble degrees of freedom that describe the two polariza-
of the physical observables, is a complete description tions of a gravitational wave of given momentum.
of the physical system, called the gauge-invariant Formulations of GR in which there are additional
formulation, with no more constraints or gauges. gauge invariances (such as Cartan’s tetrad formula-
For instance, the phase space of Maxwell theory is tion, see below) have, accordingly, more constraints.
coordinatized by the Maxwell potential Since the Hamiltonian generates evolution in the
A (x),  = 0, 1, 2, 3, and its conjugate momentum Lagrangian evolution parameter t, and since such
E (x). Since the time derivative of A0 does not evolution can be obtained as a gauge transforma-
appear in the Maxwell action, the primary con- tion, it follows that the Hamiltonian is a constraint
straint is in GR. The vanishing of the Hamiltonian is a
characteristic feature of general-relativistic systems.
E0 ðxÞ ¼ 0 ½2 The Hamiltonian structure of GR is therefore
The secondary constraint turns out to be the Gauss determined by its phase space and its constraints.
law, The gauge-invariant formulation of the theory is
given just by the set (ph , !ph , Aph ) and no Hamilto-
@a Ea ðxÞ ¼ 0 ½3 nian. The physical interpretation of this structure is
where a = 1, 2, 3. The first generates arbitrary discussed in the last section.
transformations of A0 , while the second gene-
rates the time-independent gauge transformations
Aa (x) = @a (x). The pair (A0 , 0 ) can be dropped ADM Formalism
altogether, since it is formed by a pure gauge In Einstein’s formulation, the Lagrangian variable of
variable and a variable constrained to vanish. GR is the metric field g (x, t) (here we use the
RThe3 (gauge-invariant) Hamiltonian is H = 1=8 signature [ , þ , þ , þ ]). Arnowit, Deser, and
d x (Ea Ea þ Ba Ba ), where Ba = abc @b Ac is the Misner have introduced the following change of
magnetic field and Ea is easily recognized as the variables:
electric field. Ea and Ba are the physical pffiffiffiffiffiffiffiffiffiffiffi
observables. qab ¼ gab ; N ¼ 1= g00 ; N a ¼ qab ga0 ½6
where qab is the inverse of the three-dimensional
metric qab , used henceforth to raise and lower space
General Structure of GR Constraints indices a, b = 1, 2, 3. This is equivalent to writing the
GR fits into Dirac theory with a certain difficulty. invariant interval in the form
Since the constraints are the generators of the gauge ds2 ¼ N 2 dt2 þ qab ðdxa þ N a dtÞðdxb þ N b dtÞ
invariances, it is easy to determine their structure in
GR. The gauge invariances of GR are given by the These variables have an interesting geometric inter-
coordinate transformations x ! x0 = f  (x), where pretation. Consider a family of spacelike (‘‘ADM’’)
x = (x, t). Accordingly, we have four primary con- surfaces t defined by t = constant. qab is the 3-metric
straints  = 0, analogous to [2], and four secondary induced on the surface. N is called the ‘‘lapse’’ function
constraints C (x) = 0, analogous to [3]. These are and N a is called the ‘‘shift’’ function. Their geometrical
usually separated into the three ‘‘momentum’’ interpretation is illustrated in Figure 2.
constraints When written in terms of these variables, the
action of GR takes the form
Ca ðxÞ ¼ 0 ½4 Z
pffiffiffi
which generate fixed-time spatial coordinate trans- S½qab ; N; N a  ¼ d4 x qN½R þ kab kab  k2 
formations and the ‘‘Hamiltonian’’ constraint
CðxÞ ¼ 0 ½5 where q = det qab and R are the determinant and the
Ricci scalar of the metric qab ;
which generates changes in the t coordinate.
1
The metric g (x) that represents the gravitational kab ¼ ð@t qab  Da Nb  Db Na Þ
field in Einstein’s original formulation has ten 2N
independent components per point. Each first-class is the extrinsic curvature of the constant time
constraint indicates that one Lagrangian variable is surface; and Da is the covariant derivative of qab .
a gauge degree of freedom. The physical degrees of This action is independent of the time derivatives of
Canonical General Relativity 415

t + dt Tetrad Formalism
N a dt
The tetrad formalism, developed by Cartan, Weyl,
(x, t + dt) and Schwinger, has definite advantages with respect
N
to the metric formalism. It allows the coupling of
t
fermion fields to GR and is, therefore, needed to
(x, t) couple the standard model to GR. In the tetrad
Figure 2 The geometrical interpretation of the lapse N(x , t)
formalism, the gravitational field is represented by
and shift N a (x , t) fields. Two ADM surfaces, defined by the four covariant fields eI (x), where I, J, . . . = 0, 1, 2, 3
values t and t þ dt, are displayed. N(x , t)dt is the proper length are flat Lorentz indices raised and lowered with the
of the vector joining the two surfaces, normal to the first surface Minkowski metric IJ = diag[1, þ1, þ1, þ1]. The
at (x , t). This is the proper time lapsed between the two surfaces relation with the metric formalism is given by
for an observer at rest on the first surface at (x , t). The quantity
dx a = N a (x , t)dt is the shift (the displacement) between the g ¼ IJ eI eJ
endpoint of this vector and the point (x , t þ dt) having the same
spacial coordinates as (x , t). In this formulation, GR has an additional local
SO(3,1) gauge invariance, given by local Lorentz
transformations on the I indices. The corresponding
N and N a . The conjugate momenta  and a of these
canonical formalism is usually defined in a gauge
quantities are therefore the primary constraints and
in which ei0 = 0, where i, j, . . . = 1, 2, 3 are flat
the pairs (, N) and (a , N a ) can be taken out of the
three-dimensional indices raised and lowered with
phase space as for the pair (E0 , A0 ) in the Maxwell
example. We can therefore take the 3-metric qab (x) the ij = diag[þ1, þ1, þ1]. In this gauge, the
and its conjugate momentum pab (x) as the canonical Lorentz group is reduced to the local SO(3) group
variables of GR. The momentum is related to the of spatial transformations, and the ADM variable
‘‘velocity’’ @t qab , by are defined by
 
pffiffiffi N Ni
pab ¼ qðkab  kqab Þ I
e ¼ ½11
0 eia
where k = kab qab .
where N i = eia Na . This is equivalent to writing the
The secondary constraints [4] and [5] turn out to be
  invariant interval in the form
pffiffiffi 1 b  
Ca ¼ qDb pffiffiffi p a ¼ 0 ½7 ds2 ¼ N 2 dt2 þ ðeai dxa þ Ni dtÞ eib dxb þ N i dt
q
and The reduced canonical variables can be taken to be
  the field eia (x) that represents the ‘‘triad’’ of the
1 1 pffiffiffi
C ¼ pffiffiffi pab pab  p2  qR ¼ 0 ½8 ADM surface, and its conjugate momentum pai (x).
q 2 Their relation with the three-dimensional metric
where p = pab qab variables is given by transforming internal indices
If the two fields qab (x, t) and pab (x, t) satisfy the into tangent indices with the triad field eia and its
Hamilton equations inverse eai . In particular,
j
@qab ðx; tÞ qab ¼ ij eia eb ½12
¼ fqab ðx; tÞ; HðtÞg ½9
@t
pab ¼ ebi pai ½13
ab
@p ðx; tÞ
¼ fpab ðx; tÞ; HðtÞg ½10 Also, for later reference,
@t
where 2  i 1 i 
Z kia  eib kab ¼ p  e p ½14
det e a 2 a
3 ab
HðtÞ ¼ d x Nðx; tÞC½qab ðx; tÞ; p ðx; tÞ
where p = eia pai .
þ N a ðx; tÞCa ½qab ðx; tÞ; pab ðx; tÞ The momentum and Hamiltonian constraints are
the same as in the ADM formulation, with qab and
with arbitrary functions N(x, t), N a (x, t), then the pab expressed in terms of the triad variables. The
metric g (x, t), defined from qab , N, N a by eqn [6], is additional constraint that generates the internal
the general solution of the vacuum Einstein equation rotations is
Ricci[g] = 0. Therefore, these equations provide a
Hamiltonian form of the Einstein field equation. Gi ¼ ijk eja pak ¼ 0 ½15
416 Canonical General Relativity

Ashtekar Formalism A variant of this formalism commonly used in


quantum gravity is obtained by replacing [16] with
The Ashtekar formalism simplifies the form of the
the Barbero connection
constraints and casts GR in a form having the same
kinematics as Yang–Mills theory. With its variants, it Aia ¼ 1
2 ijk !ajk þ kia ½22
is widely used in nonperturbative quantum gravity, in
particular in the loop formulation (see Loop Quan- where is an arbitrary complex number, called the
tum Gravity). It can be obtained from the tetrad Immirzi parameter. In terms of this connection, [21]
canonical formalism by the canonical transformation is replaced by

Aia ¼ 12 ijk !ajk þ ikia ½16 i 1 þ 2


C ¼ ijk Fab Eja Ekb þ det eðkab kab  k2 Þ ¼ 0
4
Eai ¼ det e eai ½17 where eia and kab are given as function of E and A by
where !ij = !ija
dxa is the (torsion-free) spin connec- [22] and [17]. The choice = 1, with the constraint
tion of the triad 1-form field ei = eia dxa , determined [19]–[21], gives the canonical formulation of Eucli-
by the Cartan equation dean GR.
All the formulations described extend readily to
j
dei þ !k ^ ek ¼ 0 matter couplings. The structure of the constraints
remains the same – with additional constraints corre-
The ‘‘electric’’ field E is real, while the Sen–Ashtekar
sponding to matter gauge invariances, if any. The GR
connection Ai = Aia dxa is complex and satisfies the
constraints are modified by the addition of matter terms.
reality condition
In particular, the Hamiltonian constraint C and the
Ai þ Ai ¼ 2i ½e ½18 momentum constraint Ca are modified by the addition
of terms determined by the energy density and the
The connection Ai has a simple geometrical inter- momentum density of the matter, respectively. In the
pretation. It is the pullback Aai = !(þ)
a0i on the t = 0 Ashtekar formulation, a fermion field modifies the
ADM surface of the self-dual part Gauss law constraint by the addition of a torsion term.
 
ðþÞ 1 i KL
!IJ ¼ !IJ  IJ !KL
2 2
Evolution
of the four-dimensional torsion free spin connection
!IJ I In the gauge-invariant canonical structure of GR, there
 determined by the tetrad field e .
In terms of these fields, the constraint equations is no explicit time flow generated by a Hamiltonian. If
can be written in the form the formalism is utilized just in order to express the
Einstein equation in first-order canonical form, this is
Gi ¼ Da Eai ¼ 0 ½19 not a difficulty, because evolution in the coordinate
i
time is generated by the constraints. On the other
Ca ¼ Fab Eai ¼ 0 ½20 hand, if we are interested in understanding the
i structure of states, observables, and evolution of GR,
C ¼ ijk Fab Eja Ekb ¼ 0 ½21
the situation appears to be puzzling. An additional
where Da is the covariant derivative and Fab is the complication arises from the fact that virtually no
curvature defined by the connection A. The first of these gauge-invariant observable Aph is known explicitly as
constraints is the nonabelian version of the Gauss law a function on the phase space. These issues become
[3]: it is the gauge constraint of Yang–Mills theory. The especially relevant when the canonical formalism is
constraints are polynomial in the canonical variables. taken as a starting point for quantization. How is
These equations are often written using a basis
i physical evolution represented in canonical GR?
in the su(2) Lie algebra, and defining the su(2) The first relevant observation is that the gauge-
connection A = Ai
i and the su(2)-valued vector invariant phase space ph is better understood as a
field Ea = Eai
i . In terms of these fields the con- phase space in the sense of Lagrange: namely as the
straints can be written in the form space  of the solutions of the equations of motion
modulo gauges, rather than a space of instantaneous
G ¼ Da Ea ¼ 0 states. Recall that in GR the notion of ‘‘instanta-
Ca ¼ tr½Fab Ea  ¼ 0 neous state’’ is rather unnatural.
In the ADM formulation, for instance, an orbit on
C ¼ tr½Fab Ea Eb  ¼ 0
the constraint surface of GR can be understood as
where the trace is on su(2). the ensemble of all possible values that the variables
Canonical General Relativity 417

(qab (x), pab (x)) can take on arbitrary spacelike ADM explored. Among these: definitions of the physical
surfaces embedded in a given solution of the symplectic structure directly on the space of the
Einstein equation. Motion along the orbit (which solutions of the field equations; generalization of the
has dimension 4  13 ) corresponds to arbitrary initial and final surfaces to boundaries of compact
deformations of the surface. spacetime regions; construction of ‘‘evolving con-
Physical applications of classical GR deal with stants of motion,’’ namely families of gauge-invar-
relations between ‘‘partial observables.’’ A partial iant observables depending on a clock time
observable is any variable physical quantity that can parameter; multisymplectic formalisms that treats
be measured, even if its value cannot be determined space and time derivatives on a more equal footing;
from the knowledge of the physical state. An example and others. Many of these techniques are attempts
of partial observable in nonrelativistic mechanics is to overcome the unequal way in which time and
given precisely by the nonrelativistic time t. Partial space dependence are treated in the conventional
observables are represented in GR as functions on 0 . Hamiltonian formalism.
A physical state in ph determines an orbit in C, and GR has deeply modified our understanding of
therefore a set of relations between partial observables space and time. An extension of the canonical
(see Figure 1). That is, it determines the possible values formalism of mechanics, compatible with such a
that the partial observables can take ‘‘when’’ and modification, is needed, but consensus on the way
‘‘where’’ other partial observables have given values. (or even the possibility) of formulating a fully
All physical predictions of classical GR can be satisfactory general-relativistic extension of Hamil-
expressed in this form. tonian mechanics is still lacking.
One of the partial observables can be selected to
play the role of a physical clock time, and evolution See also: Asymptotic Structure and Conformal Infinity;
can be expressed in terms of such clock time. In Constrained Systems; General Relativity: Overview;
general, it is difficult – if not impossible – to find a Loop Quantum Gravity; Quantum Cosmology; Quantum
Geometry and its Applications; Spin Foams;
clock time observable in terms of which evolution is
Wheeler–De Witt Theory.
a proper conventional Hamiltonian evolution. Mat-
ter couplings partially simplify the task. For
instance, if the motion of planet Earth is coupled
Further Reading
to GR, then proper time along this motion from a
significative event on Earth, which is a partial Arnowitt R, Deser S, and Misner CW (1962) The dynamics of
observable, can be a convenient clock time. In pure general relativity. In: Witten L (ed.) Gravitation: An Introduc-
gravity, the ‘‘York time’’ defined as the trace of the tion to Current Research, p. 227. New York: Wiley.
Ashtekar A (1991) Non-Perturbative Canonical Gravity. Singapore:
extrinsic curvature TY = k, on ADM surfaces where World Scientific.
k is spatially constant, has been extensively and Bergmann P (1989) The canonical formulation of general
effectively used as a clock time in formal analysis of relativistic theories: the early years, 1930–1959. In: Howard D
the theory. A Hamiltonian that generates evolution and Stachel J (eds.) Einstein and the History of General
in a given clock time T can be formally obtained by Relativity. Boston: Birkhäuser.
Dirac PAM (1950) Generalized Hamiltonian dynamics. Canadian
solving the Hamiltonian constraint with respect to a Journal of Mathematical Physics 2: 129–148.
momentum PT conjugate to T. Such ‘‘reparametriza- Dirac PAM (1958) The theory of gravitation in Hamiltonian form.
tions’’ of the relative evolution of the partial Proceedings of the Royal Society of London, Series A 246: 333.
observables can be useful to analyze equations and Dirac PAM (1964) Lectures on Quantum Mechanics. New York:
to help intuition, but they are by no means necessary Belfer Graduate School of Science, Yeshiva University.
Gotay MJ, Isenberg J, Marsden JE, and Montgomery R (1998)
to have a well-defined interpretation of the theory. Momentum maps and classical relativistic fields. Part 1:
Another possibility to introduce a preferred time Covariant field theory. Archives: physics/9801019.
flow is to consider asymptotically flat solutions of Hanson A, Regge T, and Teitelboim C (1976) Constrained
the field equations. In this case, one can define a Hamiltonian Systems. Rome: Academia Nazionale dei Lincei.
nonvanishing Hamiltonian, given by a boundary Henneaux M and Teitelboim C (1972) Quantization of Gauge
Systems. Princeton: Princeton University Press.
integral at spacial infinity. This Hamiltonian gen- Isham CJ (1993) Canonical quantum gravity and the problem of
erates evolution in an asymptotic Minkowski time. time. In: Ibort LA and Rodriguez MA (eds.) Recent Problems in
This choice is convenient for describing observations Mathematical Physics, Salamanca, Dordrecht: Kluwer Academic.
performed from a large distance on isolated gravita- Lagrange JL (1808) Mémories de la première classe des sciences
tional systems. Many general-relativistic physical mathematiques et physiques. Paris: Institute de France.
Rovelli C (2004) Quantum Gravity. Cambridge: Cambridge
observations do not belong to this category. University Press.
Various other techniques to define a fully gen- Souriau JM (1969) Structure des Systemes Dynamics. Paris:
erally covariant canonical formalism have been Dunod.
418 Capacities Enhanced by Entanglement

Capacities Enhanced by Entanglement


P Hayden, McGill University, Montreal, QC, Canada mixed state on C and d to the maximally mixed
ª 2006 Elsevier Ltd. All rights reserved. state on a specified d-dimensional quantum system.
For a given quantum state ’AB on the composite
system AB, ’A = trB ’AB and

Introduction HðAÞ’ ¼ Hð’A Þ ¼ trð’A log2 ’A Þ ½1


Shared entanglement between a sender and receiver is the von Neumann entropy of ’A , while
can significantly improve the usefulness of a
HðAjBÞ’ ¼ Ic ðAiBÞ ¼ HðABÞ’  HðBÞ’ ½2
quantum channel for the communication of either
classical or quantum data. Superdense coding and is its conditional entropy and
teleportation provide the most well-known examples
of this improvement; free entanglement doubles the IðA; BÞ’ ¼ HðAÞ’ þ HðBÞ’  HðABÞ’ ½3
classical capacity of a noiseless quantum channel its mutual information.
and makes it possible for a noiseless classical channel
to send quantum data. In fact, the entanglement-
assisted classical and quantum capacities of a Entanglement-Assisted Classical
quantum channel are in many senses simpler and and Quantum Capacities
better behaved than their unassisted counterparts
(Holevo 1998, Schumacher and Westmoreland The entanglement-assisted classical capacity of a
1997, Devetak 2005). Most importantly, these quantum channel N : A0 ! B is the optimal rate at
capacities can be calculated using simple formulas which classical information can be communicated
and finite optimization procedures (Bennett et al. through the channel while in addition making use of
1999, 2002). No such finite procedure is known for an unlimited number of maximally entangled states.
either of the unassisted capacities. Moreover, the The formal definition proceeds as follows. Alice
entanglement-assisted classical and quantum capa- and Bob are assumed to share nS ebits in the form of
~~
cities are related by a simple factor of 2. The a maximally entangled state jiAB of Schmidt rank
unassisted capacities, in contrast, have completely 2nS . Conditioned on her message m 2 {1, 2, . . . , 2nR },
Alice will apply an encoding operation E m : A ~ ! A0n .
different formulas. In fact, the simple factor of 2
2nR
generalizes to a statement known as the quantum Bob’s decoding is given by a POVM {m }m = 1 on the
composite system BB~ n . The procedure is said to have
reverse Shannon theorem, which governs the rate at
which one quantum channel can simulate another maximum probability of error  if
(Bennett et al. 2005). The answer is given by the  n 
max tr m ðN  E m ÞðÞ  1   ½4
ratio of the entanglement-assisted capacities. m

These elements, illustrated in Figure 1, consisting of


Notation the shared entanglement, as well as the encoding and
Quantum systems will be denoted by A, B, and so decoding operations meeting the criterion of eqn [4],
on as well as their variants such as A0 and A.^ The are called a (2nR , 2nS , n, ) entanglement-assisted clas-
choice of letter will generally indicate which party sical code for the channel N . A rate R is said to be
holds a given system, with A reserved for the sender, achievable if there exists a choice of S  0 and a
Alice, and B for the receiver, Bob. Given a quantum sequence of entanglement-assisted classical codes
system C, Cn will often be written as Cn . These (2nR , 2nS , n, n ) with n ! 0. The entanglement-assisted
symbols will be used to denote both the Hilbert
space of the quantum system and the set of density A′n Bn
n
operators on that system. Thus, a quantum channel Ã
m

N : A0 ! B refers to a trace-preserving, completely {Λm′}mM′ = 1


positive (TPCP) map from the operators on the B
Hilbert space of A0 to those of B. idC refers to the
identity channel on C. The map N  idC will Figure 1 Circuit representation of the elements of an
entanglement-assisted classical code for the channel N . Alice
frequently be abbreviated to N in order to simplify
encodes message m by applying the operation E m to her half
long expressions. Likewise, the density operator of the shared entanglement. Bob decodes by applying the
j’ih’j of a pure quantum state j’i will be POVM fm 0g on the output of the channel and his half of the
abbreviated to ’. C will refer to the maximally shared entanglement.
Capacities Enhanced by Entanglement 419

classical capacity CE (N ) of N is defined to be the capacities of a channel. Proceeding as before to


supremum over all achievable rates. formally define the quantum capacity, Alice and Bob
are again assumed to share a maximally entangled
Theorem 1 (Bennett et al. 1999, 2002). The ~~
state jiAB of Schmidt rank 2nS . Alice’s encoding
entanglement-assisted classical capacity CE of a ^A~ ! A0n acting
operation will be a TPCP map E : A
quantum channel N : A0 ! B is given by ^ and her half of the shared
on an input system A
entanglement, A.~ Bob’s decoding will likewise be a
CE ðN Þ ¼ max IðA; BÞ ½5

TPCP map D : BB ~ n!B ^ acting on the output of the
0
where the maximization is over states  AB = N (’AA ) channel, Bn , and his half of the shared entangle-
arising from the channel by acting on the A0 half of ment, B.~ A ^ and B^ are assumed to be isomorphic
AA0
any pure state j’i . quantum systems of some fixed dimension 2nQ . The
procedure is said to have subspace fidelity 1   if
The theorem bears a strong formal resemblance to    ^
Shannon’s noisy coding theorem for the classical ^ n ~~ ^
B
h’j D  N  E AB  ’A j’iB  1   ½9
capacity of a classical noisy channel. There the
^
capacity formula is also given by an optimization of for all j’iA 2 A. ^ These elements, illustrated in
the mutual information, but over joint distributions Figure 2, are together called a (2nQ , 2nS , n, )
between the input and output alphabets arising from entanglement-assisted quantum code for the channel
the action of the channel. Such a joint distribution N . A rate Q is said to be achievable if there exists a
cannot exist in general for a quantum channel choice of S  0 and a sequence of entanglement-
because the no-cloning theorem excludes the possi- assisted quantum codes (2nR , 2nS , n, n ) with n ! 0.
bility of the input and output existing simulta- The entanglement-assisted quantum capacity QE (N )
neously. Equation [5] instead refers to a natural of N is defined to be the supremum over all
substitute for the joint input–output distribution: a achievable rates.
quantum state arising from the quantum channel There is considerable freedom in the definition of
acting on half of an entangled pure state. the entanglement-assisted quantum capacity. It
Another point worth stressing is that, unlike the could, for example, be defined as the largest amount
known formulas for the unassisted classical and of maximal entanglement that can be generated
quantum capacities of a quantum channel, eqn [5] using the channel, minus the entanglement con-
refers to only a single use of N instead of the limit sumed during the protocol itself. Alternatively, the
n
of many uses, N . The formula can therefore fidelity criterion eqn [9] could be strengthened to
n
readily be used to evaluate CE for any channel of require that D  N  E preserve not only pure
interest. ^
states on A but any entanglement between A ^ and a
Consider, for example, the d-dimensional depo- reference system. All of these variants yield the same
larizing channel capacity formula:
Dp ðÞ ¼ ð1  pÞ þ pd ½6 QE ðN Þ ¼ 12 CE ðN Þ ½10

that with probability p completely randomizes the This equivalence is a direct consequence of the
input but otherwise leaves the input invariant. For existence of the teleportation and superdense coding
such channels, the maximum is achieved 0 by choos- protocols. When maximal entanglement is available,
ing a maximally entangled state for j’iAA , yielding teleportation converts the ability to send classical
data into the ability to send quantum data at half
CE ðDp Þ ¼ 2 log2 d the classical rate. Conversely, by consuming
 
d2  1
 hd 2 1  p ½7
d2
ϕ〉
 A′n Bn
n
where for any 0  q  1 and integer r  1,
Ã
B
hr ðqÞ ¼  q log2 q  ð1  qÞ
  B
1q
 log2 ½8
r1 Figure 2 Circuit representation of the elements of an
entanglement-assisted quantum code for the channel N . E is
is the Shannon entropy of the distribution
Alice’s encoding operation, which acts on both her input state
(q, (1  q)=(r  1), . . . , (1  q)=(r  1)). and her half of the shared entanglement. Bob decodes using a
Entanglement assistance also simplifies the rela- quantum operation D acting on the output of the channel and his
tionship between the classical and quantum half of the shared entanglement.
420 Capacities Enhanced by Entanglement

maximal entanglement, superdense coding converts quantity of an ensemble of states that can be produced
the ability to send quantum data into the ability to by Alice acting on half of a shared entangled state and
send classical data at double the quantum rate. then sending her half through the channel. Invok-
ing the Holevo–Schumacher–Westmoreland (HSW)
theorem for the classical capacity (Holevo 1998,
Sketch of Proof Schumacher and Westmoreland 1997) therefore com-
The proof of a capacity theorem can usually be pletes the proof; using coding, the Holevo quantity is
broken into two parts, achievability and optimality. an achievable communication rate.
The achievability part demonstrates the existence of The proof that eqn [5] is optimal involves a series
a sequence of codes reaching the prescribed rate of entropy manipulations similar to the optimality
while the optimality part shows that it is impossible proofs for the unassisted classical and quantum
to do better. capacities. From the point of view of quantum
The main idea in the achievability proof can be information, the truly unusual part of the proof is
understood by studying the special case where the demonstration that it is unnecessary to consider
d2n
0 0
’A = A . Let dA0 = dimA0 and {Uj }j =A0 1 be a set of multiple copies of N (Cerf and Adami 1997).
Weyl operators for A0n . The relevant property of Specifically, let
these operators is that averaging over them imple-
f ðN Þ ¼ max IðA; BÞ ½17
ments the constant map: for all density operators , 

d2n0 where the maximization is defined as in Theorem 1.


1 X A
0n

2n
Uj Ujy ¼  A ½11 Techniques analogous to those used for the unas-
dA0 j¼1 sisted capacities yield the upper bound
Consider the state j that arises if Alice acts with Uj 1 n
on the A0n 0nhalf of a rank-dAn 0 maximally entangled CE ðN Þ  lim f ðN Þ ½18
n!1 n
state j’iAA and then sends the A0n half of the
resulting state through N . (Note that here A0n also Unlike the unassisted case, however, a relatively easy
~ The entropy of the resulting
plays the role of A.) argument shows that
state is f ðN 1  N 2 Þ ¼ f ðN 1 Þ þ f ðN 2 Þ ½19
 
Hðj Þ ¼ H N ððUj  IB~ Þ’ðUjy  IB~ ÞÞ ½12 (The analogous statement is an important conjecture
for the classical capacity and is known to be false for
¼ H ðN ð’ÞÞ ½13 the quantum capacity (DiVincenzo et al. 1998).) As
a result, CE (N )  f (N ), which is the optimality part
since Uj does not change the local density operator of Theorem 1.
on A0n . To see the origin of eqn [19], it will be helpful to
On the other hand, if Alice selects a value of j BE
invoke Stinespring’s theorem to write N j = trEj U j j j ,
from the uniform distribution, then the resulting where0 0 U j : A0j ! Bj Ej is an isometry. Fix a state
average input state to the channel will be j’iAA1 A2 and let  = (U 1  U 2 )(’). Equation [19]
0n 0n follows from the fact that
A   A ¼ ’A  ’A ½14
and the corresponding average output state will be IðA; B1 B2 Þ  IðAB2 E2 ; B1 Þ
0n
N (’A )  ’A , which has entropy þ IðAB1 E1 ; B2 Þ ½20
0n
HðN ð’A ÞÞ þ Hð’A Þ ½15 Simply redefining A to be AB2 E2 shows that the first
Therefore, the Holevo quantity of the ensemble of term of the right-hand side is upper bounded by
output states, defined as the entropy of the average f (N 1 ). The second term, likewise, is upper bounded
state minus the average of the entropies of the by f (N 2 ). Equation [20] is itself equivalent to the
individual output states, will be equal to inequality
   
0n 0n HðB1 B2 jE1 E2 Þ þ HðB1 B2 Þ
Hð’A Þ þ H N ð’A Þ  H N ð’AA Þ ½16
 HðB1 jE1 Þ þ HðB2 jE2 Þ
This is precisely the quantity I(A; B) for the state þ HðB1 Þ þ HðB2 Þ ½21
0n
N (’AA ) since the channel N transforms the A0n
system into B. Moreover, if Bob is given the A part of The inequality H(B1 B2 )  H(B1 ) þ H(B2 ) holds
the maximally entangled state, then this is the Holevo by the subadditivity of the von Neumann entropy.
Capacities Enhanced by Entanglement 421

Repeated applications of the strong subadditivity decoding will likewise be a TPCP map D : Bm B ~ ! Bn
inequality, moreover, lead to the inequality acting on m copies of the output of the channel, and his
half of the shared entanglement, B.~ This procedure is
HðB1 B2 jE1 E2 Þ  HðB1 jE1 Þ said to -simulate N n on (’ A0 n
) if
2
þ HðB2 jE2 Þ ½22  
n  0n  m  ~ ~ 0n
F N 2 ’AA ; D  N 1  E AB  ’AA
Together, they prove eqn [20] and, thence, eqn [19].
The intuitive meaning of this ‘‘single-letterization’’ is 1   ½25
unclear, but regardless, it is interesting to note that
where F is the mixed state fidelity F(, ) =
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the proof involved invoking a pair of purifying
(tr 1=2 1=2 )2 . The entire procedure, illustrated in
environment systems, E1 and E2 , and studying the
Figure 3, is said to be a (2nS , m, n, ) entanglement-
entropy relationships between the true outputs of
assisted simulation of N 2 by N 1 . A rate R, measured
the channel and the environment’s share.
in copies of N 2 per copy of N 1 , is said to be
0
achievable for ’A if there exists a choice of S  0 and
The Quantum Reverse Shannon Theorem a sequence of (2nS , mn , n, n ) entanglement-assisted
simulations with n=mn ! R while n ! 0.
A strong argument can be made that the entanglement- The quantum reverse Shannon theorem states
assisted capacity of a quantum channel is the most that the entanglement-assisted capacity completely
important capacity of that channel and that all the governs the achievable simulation rates.
other capacities are, in some sense, of less significance.
The fact that it is unnecessary to distinguish between Theorem 2 (Winter 2004, Bennett et al.). Given
the classical and quantum entanglement-assisted capa- two channels N 1 : A0 ! B and N 2 : A0 ! B, R is an
cities because they are related by a factor of 2 is a hint achievable simulation rate for N 2 by N 1 and all
0

in that direction, as is the simple, single-letter formula input states ’ A if and only if
for CE (N ). CE ðN 1 Þ
A more general argument can be made by R ½26
CE ðN 2 Þ
considering the problem of having one channel
simulate another. Indeed, the quantum capacity of Note that the form of eqn [26] ensures that the
a quantum channel is simply the optimal rate at simulation is asymptotically reversible: if a channel
which that channel can simulate the noiseless N 1 is used to simulate N 2 and the simulation is then
channel id2 on a single qubit. Likewise, the classical used to simulate N 1 again, then the overall rate
capacity of a quantum channel is its optimal rate for becomes
simulation of a qubit dephasing channel CE ðN 1 Þ CE ðN 2 Þ
¼1 ½27
 7! j0ih0jj0ih0j þ j1ih1jj1ih1j ½23 CE ðN 2 Þ CE ðN 1 Þ
In this spirit, the fact that CE (N ) = 2QE (N ) can be Thus, in the presence of free entanglement and for a
0
re-expressed in the form known input density operator of the form (’A )n , a
single parameter, the entanglement-assisted classical
CE ðN Þ
QE ðN Þ ¼ ½24 capacity, suffices to completely characterize the
CE ðid2 Þ asymptotic properties of a quantum channel.
Equivalently, when entanglement is free, the optimal
rate at which N can simulate a noiseless qubit channel
is given by the ratio between the entanglement- A′n A′m m Bn
assisted classical capacities of N and id2 . The Ã
1

quantum reverse Shannon theorem generalizes this Bn


statement to the simulation of arbitrary channels in
B A′n n Bn
the presence of free entanglement. 2

Suppose that Alice and Bob would like to use


N 1 : A0 ! B to simulate another channel N 2 : A0 ! B. (a) (b)
A0 AA0n
Fix an input state ’ and let j’i be a purification Figure 3 Circuit representation of an entanglement-assisted
0
of (’A )n . As always, assume that Alice and Bob share simulation of N 2 by N 1 . (a) The simulation circuit, with Alice’s
~~ encoding operation E acting on n copies of A0 and Bob’s
a maximally entangled state jiAB of Schmidt rank
decoding operation producing n copies of B. (b) The circuit that
2nS . Alice’s encoding operation will be a TPCP map the protocol is intended to simulate. As stated, the quantum
~ 0n ! A0m acting on n copies of the input system
E : AA reverse Shannon theorem allows the simulation circuit to depend
0
A and her half of the shared entanglement, A. ~ Bob’s on the density operator of the input state restricted to A0n .
422 Capacities Enhanced by Entanglement

Moreover, since two channels that are asymptoti- can be written trE U BE for some isometry U BE .0 Let
AA0
cally equivalent without free entanglement will j’i be a pure state and jiABE = U BE j’iAA the
surely remain equivalent if free entanglement is corresponding purified channel output state. Careful
permitted, eqn [26] gives essentially the only analysis of the entanglement-assisted classical commu-
possible nontrivial, single-parameter asymptotic nication protocol achieving the rate I(A; B) leads to
characterization of quantum channels. This is the an entanglement-assisted quantum communication
sense in which the entanglement-assisted capacity protocol consuming entanglement at the rate
should be regarded as the most important capacity (1=2)I(A; E) ebits per use of N and yielding commu-
of a quantum channel. nication at the rate of (1=2)I(A; B) qubits per use N .
The proof of the quantum reverse Shannon The protocol achieving this goal is known as the
theorem is quite involved, but some of its features ‘‘father’’ (Devetak et al. 2004).
can be understood without much work. First, note If the entanglement consumed in the father were
that by the optimality statement of the entanglement- actually supplied by quantum communication from
assisted classical capacity, the desired simulation can Alice to Bob, then the net rate of quantum
exist only if eqn [26] holds. Otherwise, composing communication produced by the resulting protocol
the simulation of N 2 by N 1 with a sequence of codes would be (1=2)I(A; B)  (1=2)I(A; E) qubits from
achieving CE (N 2 ) would result in a sequence of codes Alice to Bob, that is, the total produced minus the
beating the capacity formula for N 1 . total consumed.
Similarly, note that one method to simulate a This quantity, how much more information B has
channel N 1 using N 2 is to first use N 2 to simulate about A than E does, can be simplified using an
the noiseless channel and then use the simulated interesting identity. Since jiABE is pure,
noiseless channel to simulate N 1 . Since the achiev-
able rates for the first step are characterized by the IðA; EÞ ¼ HðAÞ þ HðEÞ  HðAEÞ ½28
entanglement-assisted capacity theorem, proving the
¼ HðAÞ þ HðABÞ  HðBÞ ½29
achievability part of Theorem 2 reduces to finding
protocols for simulating a general noisy quantum Expanding I(A; B) and canceling terms then reveals
channel N 2 by a noiseless one. That perhaps sounds that
like a strange goal, but nonetheless is the difficult
1
part of the quantum reverse Shannon theorem. 2IðA; BÞ  12IðA; EÞ ¼ HðAjBÞ
It is likely that the quantum reverse Shannon ¼ Ic ðAiBÞ ½30
theorem can be extended to cover other types of
0
inputs than the known tensor power states (’A )n . where the function Ic is known as the coherent
The most desirable form of the theorem would be information. After optimizing over input states and
one valid for all possible input density operators on multiple channel uses, this is precisely the formula for
A0n , providing a single simulation procedure the unassisted quantum capacity of a quantum channel
dependent only on the channels and not the input (Devetak 2005). Thus, the net rate of qubit commu-
state. It is known that without modifying the form nication for the protocol derived from the father
of the free entanglement, this most ambitious form exactly matches the rates necessary to achieve the
of the theorem fails, but it is conjectured that the unassisted quantum capacity. The only caveat is that
full-strength theorem does hold provided very large the protocol derived from the father uses quantum
amounts of entanglement are supplied in the form of communication catalytically, meaning that some com-
the so-called embezzling states (van Dam and munication needs to be invested in order to get a gain
Hayden 2003). of Ic (AiB). For the unassisted quantum capacity, no
investment is necessary. Nonetheless, detailed analysis
of the situation reveals that the amount of catalytic
Relationships between Protocols communication required can be reduced to an amount
There is another sense in which the entanglement- sublinear in the number of channel uses, meaning the
assisted capacity can be viewed as the fundamental rate of required investment can be made arbitrarily
capacity of a quantum channel: an efficient protocol small. In this sense, the father protocol essentially
for achieving the entanglement-assisted capacity can generates the optimal protocols for the unassisted
be converted into protocols achieving the unassisted quantum capacity.
quantum and classical capacities, or at least very Protocols achieving the unassisted classical capa-
close variants thereof. city can be constructed in a similar way. In this case,
0
An efficient protocol in this case refers to one that one starts from an ensemble E = {pj , N ( jA )} of
does not waste entanglement. Suppose that N : A0 ! B states generated by the channel. Achievability of
Capacities Enhanced by Entanglement 423

the unassisted classical capacity formula follows discuss their results prior to their publication and
from achievability of rates of the form to Jon Yard for a careful reading of the manu-
X 0
 script. This work has been supported by the
ðEÞ ¼ H pj N ð A
j Þ Canadian Institute for Advanced Research, the
j Canada Research Chairs program, and Canada’s
X  
 pj H N ð A0 NSERC.
j Þ ½31
j
See also: Capacity for Quantum Information; Channels in
for arbitrary ensembles of output states. Consider Quantum Information Theory; Entanglement; Finite Weyl
the channel Systems; Quantum Channels: Classical Capacity;
X Quantum Entropy.
e ðÞ ¼
N hjjjji N ð j Þ ½32
j
0 P pffiffiffiffi 0
and input state j’iAA = j pj jjiA jjiA . If  = Ne(’), Further Reading
then I(A; B) is equal to (E). Thus, there are protocols
consuming entanglement that achieve the classical Abeyesinghe A, Devetak I, Hayden P, and Winter A (2005) Fully
quantum Slepian–Wolf (in preparation).
communications rate (E) for the modified channel
Bennett CH, Devetak I, Harrow AW, Shor PW, and Winter A (2005)
Ne. Because the channel Ne includes an orthonormal The quantum Reverse Shannon Theorem (in preparation).
measurement which destroys all entanglement between Bennett CH, Shor PW, Smolin JA, and Thapliyal AV (1999)
A and B, however, it can be argued that any Entanglement-assisted classical capacity of noisy quantum
entanglement used in such a protocol could be replaced channels. Physical Review Letters 83: 3081 (arXiv.org:quant-
ph/9904023).
by shared randomness, which could then in turn be
Bennett CH, Shor PW, Smolin JA, and Thapliyal AV (2002)
eliminated by a standard derandomization argument. Entanglement-assisted capacity of a quantum channel and
The net result is a procedure for choosing rate (E) the reverse Shannon theorem. IEEE Transactions on Informa-
codes for the channel N consisting of states of the form tion Theory 48(10): 2637 (arXiv.org:quant-ph/0106052).
Cerf N and Adami C (1997) Von Neumann capacity of noisy
j1   jn , which is the essence of the achievability
quantum channels. Physical Review A 56: 3470 (arXiv.org:
proof for the unassisted classical capacity.
quant-ph/9609024).
This may seem like an unnecessarily cumbersome Devetak I (2005) The private classical capacity and quantum
and even circular approach to the unassisted capacity of a quantum channel. IEEE Transactions on
classical capacity given that the proof sketched Information Theory 51(1): 44 (arXiv.org/0304127).
above for the entanglement-assisted classical capa- Devetak I, Harrow AW, and Winter A (2004) A family of
quantum protocols. Physical Review Letters 93: 230504
city itself invokes the unassisted result in the form of
(arXiv.org:quant-ph/0308044).
the HSW theorem. The approach becomes more DiVincenzo DP, Smolin JA, and Shor PW (1998) Quantum
satisfying when one learns that simple and direct channel capacity of very noisy channels. Physical Review A
proofs of the father protocol exist that completely 57: 830 (arXiv.org:quantph/9706061).
bypass the HSW theorem (Abeyesinghe et al. 2005). Holevo AS (1998) The capacity of the quantum channel with
general signal states. IEEE Transactions on Information
Thus, the entanglement-assisted communication
Theory 44: 269–273.
protocols can be easily transformed into their Schumacher B and Westmoreland MD (1997) Sending classical
unassisted analogs, confirming the central place of information via noisy quantum channels. Physical Review A
entanglement-assisted communication in quantum 56: 131–138.
information theory. van Dam W and Hayden P (2003) Universal entanglement
transformation without communication. Physical Review A
67: 060302 (arXiv.org:quant-ph/0201041).
Winter A (2004) Extrinsic and instrinsic data in quantum
Acknowledgmnts measurements: asymptotic convex decomposition of
positive operator valued measures. Communications in
The author is grateful to the inventors of the Mathematical Physics 244(1): 157 (arXiv.org:quantph/
quantum reverse Shannon theorem for letting him 0109050).
424 Capacity for Quantum Information

Capacity for Quantum Information


D Kretschmann, Technische Universität Information Theory), it lies at the heart of quantum
Braunschweig, Braunschweig, Germany information theory.
ª 2006 Elsevier Ltd. All rights reserved. In a very typical scenario, Alice and Bob would
like to implement the ideal (noiseless) quantum
channel S = id: they are interested in sending
quantum states undistorted over some distance, or
Introduction want to store them safely for some period of time, so
that all the precious quantum correlations are
Any processing of quantum information, be it preserved. The capacity Q(T)  Q(T, id) is then the
storage or transfer, can be represented as a quantum maximal number of qubit transmissions per use of
channel: a completely positive and trace-preserving the channel, taken in the limit of long messages and
map that transforms states (density matrices) on the using collective encoding and decoding schemes
sender’s end of the channel into states on the asymptotically eliminating all transmission errors.
receiver’s end. Very often, the channel S that sender This is what is generally called the quantum capacity
and receiver (conventionally called Alice and Bob, of the channel T, and it is our main focus in this
respectively) would like to implement is not readily article. Little is known so far about the quantum
available, typically due to detrimental noise effects, capacity for the simulation of other (nonideal)
limited technology, or insufficient funding. They channels (cf. the section ‘‘Related capacities’’).
may then try to simulate S with some other channel In remarkable contrast to the classical setting,
T, which they happen to have at their disposal. The quantum channel capacities are very much affected
quantum channel capacity Q(T, S) of T with respect by additional resources. This leads to unexpected
to S quantifies how well this simulation can be and fascinating applications such as teleportation
performed, in the limit of long input strings, so that and dense coding. But it also results in a bewildering
Alice and Bob can take advantage of collective pre- variety of inequivalent channel capacities, which still
and post-processing (cf. Figure 1). Higher capacities hold many challenges for future research.
may result if Alice and Bob are allowed to use
additional resources in the process, such as classical
Notation
side channels or a bunch of maximally entangled
pairs shared between them. A quantum channel which transforms input systems
Quantum capacity thus gives the ultimate bench- on a Hilbert space HA into output systems on a
marks for the simulation of one quantum channel by (possibly different) Hilbert space HB is represented
another and for the optimal use of auxiliary (in Schrödinger picture) by a completely positive and
resources. Together with the compression rate of a trace-preserving linear map T : B (HA ) ! B (HB ),
quantum source (see Source Coding in Quantum where B (H) denotes the space of trace class
operators on the Hilbert space H (see Channels in
Quantum Information Theory). We write A instead
of B (HA ) to streamline the presentation, and An for
the n-fold tensor product B (HA )n .
Decoding It is evident that the definition of channel capacity
requires the comparison of different quantum
channels. A suitable distance measure is the norm
Resources

of complete boundedness (or cb-norm, for short),


T T T ≈ S S
denoted by k  kcb . For two channels T and S, the
distance (1=2)kT  Skcb can be defined as the largest
difference between the overall probabilities in two
Encoding statistical quantum experiments differing only by
exchanging one use of S by one use of T. These
experiments may involve entangling the systems on
Figure 1 Equipped with collective encoding and decoding which the channels act with arbitrary further
operations (and perhaps some auxiliary resources), n = 3
systems; hence the cb-norm remains a valid distance-
instances of the channel T simulate m = 2 instances of the
channel S. The transmission rate of the above scheme is 2/3. measure if the given channel is only part of a larger
Capacity is the largest such rate, in the limit of long messages system. Equivalently, we may set kTkcb :¼
and optimal encoding and decoding. supn kT  idn k, where kRk := supk%k1  1 kR(%)k1
Capacity for Quantum Information 425

denotes p the
ffiffiffiffiffiffiffi norm of linear operators, and or even the average fidelity,
k%k1 := tr % % is the trace norm on the space of Z
trace-class operators B (H). 
FðTÞ :¼ h jTðj ih jÞj i d ½3
We use base two logarithms throughout, and we
write ld x := log2 x and exp2 x := 2x . Unfortunately, this equivalence is restricted to
capacities with noiseless reference channel S = id.
In the vicinity of other (nonideal) channels, equiva-
lence of the stabilized and unstabilized error criteria
Quantum Channel Capacity
may be lost. Of course, the comparison of channels
The intuitive concept underlying quantum channel is ultimately based on the comparison of a state to
capacity is made rigorous in the following its image, and here the pure states are the worst
definition: case. Hence, the remarkable insensitivity of the
quantum capacity to the choice of the error criterion
Definition 1 A positive number R is called achiev-
stems from the observation that the comparison
able rate for the quantum channel T : A ! B with
between an arbitrary state and a pure state is rather
respect to the quantum channel S : A0 ! B0 iff for any
insensitive to the criterion used.
pair of integer sequences (n )2N and (m )2N with
Instead of requiring the error quantity in eqn [1] to
lim ! 1 n = 1 and lim ! 1 mn  R we have
approach zero in the large block limit  ! 1, one
lim inf kDT  n E  S  m kcb ¼ 0 ½1 might feel tempted to impose that the errors vanish
!1 D;E
completely for some sufficiently large block length,
the infimum taken over all encoding channels E and since this is the standard setup in the theory of
decoding channels D with suitable domain and quantum error correction (see Quantum Error Correc-
range. The channel capacity Q(T, S) of T with tion and Fault Tolerance). While it is true that errors
respect to S is defined to be the supremum of all can always be assumed to vanish exponentially in eqn
achievable rates. The quantum capacity is the special [1], requiring perfect correction may completely change
case Q(T) := Q(T, id2 ), with id2 being the ideal the picture: if a channel has some small positive
qubit channel. probability for depolarization, the same also holds for
its tensor powers, and no such channel allows the
In this article, we mainly concentrate on
perfect transmission of even one qubit. Hence, the
channels between finite-dimensional systems. This
capacity for perfect correction will vanish for such
is enough to bring out the basic ideas. Many of the
channels, while the standard capacity (in accordance
concepts and results discussed here can be general-
with Definition 1) will be close to maximal, Q(T) 1.
ized to Gaussian channels, which play a central
The existence of perfect error-correcting codes thus
role as building blocks for quantum optical
gives lower bounds on the channel capacity, but is not
communication lines (Holevo and Werner 2001,
required for a positive transfer rate.
Eisert and Wolf).
In the other extreme, one might sometimes feel
There is considerable freedom in the definition
inclined to tolerate (small) finite errors in the
of quantum channel capacity, at least for ideal
transmission. For some " > 0, we define Q" (T)
reference channels (Kretschmann and Werner
exactly like the quantum capacity in Definition 1,
2004). In particular, the encoding channels E in
but require only that the error quantity in eqn [1]
eqn [1] may always be restricted to isometric
falls below " for some sufficiently large .
embeddings.
Obviously, Q" (T)
Q(T) for any quantum
In addition, it is not necessary to check an infinite
channel T. We also have lim" ! 0 Q" (T) = Q(T)
number of pairs of sequences (n )2N and (m )2N
(Kretschmann and Werner 2004). In the classical
when testing a given rate R, as Definition 1 would
setting, even a strong converse is known: if " > 0 is
suggest. Instead, it is enough to find one such pair
small enough, one cannot achieve bigger rates by
which achieves the rate R infinitely often,
allowing small errors, that is, C" (T) = C(T). It is still
lim ! 1 m =n = R.
undecided whether an analogous property holds for
Without affecting the capacity, the cb-norm kTkcb
the quantum capacity Q(T).
may be replaced by the unstabilized operator norm
kTk or by fidelity measures, which are in general
much easier to compute. In particular, one might
choose the minimum fidelity, Related Capacities
FðTÞ :¼ min h jTðj ih jÞj i ½2 This article is chiefly concerned with the quantum
k k¼1 capacity of a quantum channel. A variety of other
426 Capacity for Quantum Information

capacities have been derived from Definition 1 by enhance it. However, unlike in the purely classical
either amending the channel S to be simulated, or case, both the quantum and classical channel
allowing Alice and Bob to make use of additional capacity (but not the entanglement-assisted capacity)
resources. Their interrelations are reviewed in Bennett may increase under classical feedback.
et al. (2004)
Much interest has been devoted to the hybrid
problem of transmitting classical information undis-
Elementary Properties
torted over noisy quantum channels. The classical The capacity of a composite channel T1 T2 cannot
capacity C(T) of a quantum channel T is discussed in be bigger than the capacity of the channel with the
the article Quantum Channels: Classical Capacity of smallest bandwidth. This in turn suggests that
this Encyclopedia. It is obtained by choosing the ideal simulating a concatenated channel is in general easier
one-bit channel rather than the one-qubit channel as than simulating any of the individual channels. These
the standard of reference in Definition 1. Encoding relations are known as bottleneck inequalities:
channels E and decoding channels D are then
QðT1 T2 ; SÞ  minfQðT1 ; SÞ; QðT2 ; SÞg ½4
restricted to preparations and measurements, respec-
tively. Since a quantum channel can also be employed
QðT; S1 S2 Þ
maxfQðT; S1 Þ; QðT; S2 Þg ½5
to send classical information, we have C(T)
Q(T).
There are, obviously, examples in which this Instead of running T1 and T2 in succession, we may
inequality also run them in parallel. In this case, the capacity
P is strict: the entanglement-breaking channel
T(%) = j hjj%jji jjihjj is composed of a measurement can be shown to be superadditive,
in the orthonormal basis {jji}j , followed by a prepara-
QðT1  T2 ; SÞ
QðT1 ; SÞ þ QðT2 ; SÞ ½6
tion of the corresponding basis states. It destroys all
the entanglement between the sender and a reference For the standard ideal channels, we even have
system, implying Q(T) = 0. Yet all the basis states jji additivity. The same holds true if both S and one
are transmitted undistorted, which is enough to of the channels T1 , T2 are noiseless, the third
guarantee that C(T) = 1. channel being arbitrary. However, results on the
Definition 1 also applies to purely classical activation of bound-entangled states seem to suggest
channels, and thus to the setting of Shannon’s that the inequality in eqn [6] may be strict for some
information theory. A classical channel T between channels (see Entanglement).
two d-level systems is completely specified by the Finally, the two-step coding inequality tells us that
d d matrix (Txy )dx, y = 1 of transition probabilities. by using an intermediate channel in the coding
For these channels the cb-norm difference is just process we cannot increase the transmission rate:
(twice) the maximal error probability:
QðT1 ; T2 Þ
QðT1 ; T3 Þ QðT3 ; T2 Þ ½7
kid  Tkcb = 2 supx {1  Txx } Applying eqn [7] twice with T2 = id and T3 = id
immediately yields upper and lower bounds on the
which is the standard error criterium for classical channel capacity with nonideal reference channel,
information transfer.
QðT1 Þ
Dense coding and teleportation suggest that
QðT1 ; T2 Þ
QðT1 Þ Qðid; T2 Þ ½8
entanglement is a powerful resource for information QðT2 Þ
transfer. It doubles the classical channel capacity of The evaluation of the lower bound in eqn [8] then
a noiseless channel, and it allows to send quantum requires efficient protocols for simulating a noisy
information over purely classical channels. Surpris- channel T2 with a noiseless resource.
ingly, the entanglement-assisted capacities are often There are special cases in which the quantum
simpler and better behaved than their unassisted channel capacity can be evaluated relatively easily,
counterparts. Unlike the classical and quantum the most relevant one being the noiseless channel idn ,
capacities proper, they are relatively easy to calcu- where by the subscript n we denote the dimension of
late using finite optimization procedures, and there the underlying Hilbert space. In this case, we have
has recently been significant progress in under-
standing the simulation rates for nonideal channels ld n
Qðidn ; idm Þ ¼ ½9
in this scenario (see Capacities Enhanced by ld m
Entanglement). The lower bound Q(idn , idm )
ldn=ldm is immedi-
The quantum channel capacity is unaffected by ate from counting dimensions. To establish the
entanglement-breaking side channels. In particular, upper bound, we use the fact that a noiseless
classical forward communication alone cannot quantum channel cannot simulate itself with a rate
Capacity for Quantum Information 427

exceeding unity: Q(idm , idm )  1. This is just the which n copies of a given bipartite quantum state %
upper bound we want to prove for the special case shared between Alice and Bob can be asymptotically
n = m, and it can be extended to the general case converted into m maximally entangled qubit pairs
with the help of the two-step coding inequality [7]: (see Entanglement). Similar to the quantum capa-
Q(idm , idn ) Q(idn , idm )  Q(idm , idm )  1, implying city, the definition involves the large block limit
Q(idn , idm )  1=Q(idm , idn )  ld n=ld m, where in the n, m ! 1 and an optimization over all conceivable
last step we have applied the lower bound with the distillation protocols. These may consist of several
roles of n and m interchanged. rounds of local quantum operations and (forward or
Combining eqn [9] with the two-step coding two-way) classical communication. The one-way
inequality [7], we see that for any channel T and two-way distillable entanglement of % will be
denoted by D1 (%) and D2 (%), respectively.
ld m
QðT; idn Þ ¼ QðT; idm Þ ½10 Suppose that Alice and Bob are connected by a
ld n quantum channel T and run such a one-way distilla-
which shows that quantum channel capacities relative tion protocol on (many copies of) theP state
pffiffiffiffiffiffi
to noiseless channels of different dimensionality only %T := (T  id)jihj, where ji := (1= dA ) i ji, ii
differ by a constant factor. Fixing the dimensionality is maximally entangled on HA  HA0 . If the distillation
of the reference channel then only corresponds to a yields maximally entangled qubits at positive rate R,
choice of units. Conventionally, the ideal qubit Alice may apply the standard teleportation scheme to
channel id2 is chosen as a standard of reference, as send arbitrary quantum states to Bob undistorted at
in Definition 1 above, thereby fixing the unit ‘‘bit.’’ that same rate R. Like the distillation protocol itself,
The upper bound on the capacity of ideal channels teleportation requires classical forward communica-
can also be obtained from a general upper bound on tion, which however does not affect the channel
quantum capacities (Holevo and Werner 2001), capacity (cf. the section ‘‘Related capacities’’). Thus,
which has the virtue of being easily calculated in Q(T)
D1 (%T ). If two-way distillation is allowed, we
many situations. It involves the transposition map, have Q2 (T)
D2 (%T ) for the capacity Q2 (T) assisted
which we denote by , defined as matrix transposi- by two-way classical side communication.
tion with respect to some fixed orthonormal basis. Conversely, if Alice and Bob use a bipartite
The transposition is positive but not completely quantum state % shared between them as a substitute
positive, and thus does not describe a physical for the maximally entangled state ji in the
channel (see Channels in Quantum Information standard teleportation protocol, they will implement
Theory). We have kkcb = d for a d-level system. some noisy quantum channel T% . If this channel
For any channel T and small " > 0, allows to transfer quantum information at nonvan-
ishing rate R, Alice may share maximally entangled
QðTÞ  Q" ðTÞ  ld kTkcb ¼: Q ðTÞ ½11
states with Bob at that same rate R. Consequently,
where Q" is the finite error capacity introduced in D1 (%)
Q(T% ) and D2 (%)
Q2 (T% ).
the section ‘‘Quantum channel capacity.’’ These relations (Bennett et al. 1996) allow to
The upper bound Q (T) has some remarkable bound channel capacities in terms of distillable
properties, which make it a capacity-like quantity in entanglement and vice versa. If the two maps
its own right. For example, it is exactly additive, T 7! %T and % 7! T% are mutually inverse, we even
have D1 (%) = Q(T% ) and D2 (%) = Q2 (T% ). In this
Q ðS  TÞ ¼ Q ðSÞ þ Q ðTÞ ½12
case, the duality % Ð T% is the physical implementa-
for any pair S, T of channels, and it satisfies tion of Jamiolkowski’s isomorphism between bipar-
the bottleneck inequality: tite states and channels (see Channels in Quantum
Information Theory). This has been shown
Q ðSTÞ  min{Q ðSÞ; Q ðTÞ}
(Horodecki et al. 1999) to hold for isotropic states,
Moreover, it coincides with the quantum capacity on which are invariant under the group of all U  U
ideal channels, Q (idn ) = Q(idn ) = ld n, and it vanishes transformations, where U is the complex conjugate
whenever T is completely positive. In particular, if of the unitary U. The corresponding channels are
id  T maps any entangled state to a state with positive partly depolarizing.
partial transpose, we have Q (T) = 0. In general, T%T 6¼ T. However, the so-called con-
clusive teleportation allows us to implement T at
least probabilistically, resulting in the relation
State–Channel Duality
1
Quantum capacity is closely related to the distillable 2
QðTÞ  D1 ð%T Þ  QðTÞ ½13
entanglement, which is the optimal rate m/n at dA
428 Capacity for Quantum Information

The duality [13] can be applied to show that both taking the limit n ! 1 in eqn [15] is indeed required,
the unassisted and the two-way quantum capacities and in general the evaluation of the capacity formula
are continuous in any open set of channels [15] still demands the solution of asymptotically large
having nonvanishing capacities (Horodecki and variational problems. This should be contrasted with
Nowakowski 2005). the entanglement-assisted capacities CE (T) = 2QE (T)
(where a simple nonregularized coding theorem is
known to hold, see Capacities Enhanced by Entan-
Coding Theorems glement) and the capacity for classical information
C(T) (where additivity is conjectured but not proved,
Computing channel capacities straight from Defini-
see Quantum Channels: Classical Capacity). Even a
tion 1 is a tricky business. It involves optimization in
maximization of the single-shot coherent information
systems of asymptotically many tensor factors, and
Ic (T, %) appears to be a difficult optimization
can only be performed in special cases, like the
problem, since this quantity is neither convex nor
noiseless channels in the section ‘‘Elementary prop-
concave and may have multiple local maxima (Shor
erties.’’ Coding theorems aspire to reduce this
2003). Thus, even for simple-looking systems like the
problem to an optimization over a low-dimensional
qubit depolarizing channel, so far we only have upper
space. They usually come in two parts: the converse
and lower bounds on the quantum channel capacity,
provides an upper bound on the channel capacity
but do not yet know how to compute its exact value.
(typically in terms of some entropic expression),
We now sketch Devetak’s proof of Theorem 1,
while the direct part consists of a coding scheme
assuming only some familiarity with Holevo–
that attains this bound. By Shannon’s celebrated
Schumacher–Westmoreland (HSW) random codes
coding theorem, the classical capacity of a classical
for the classical channel capacity (see Quantum
noisy channel can be obtained from a maximization
Channels: Classical Capacity). It is easily seen from
of the mutual information over all joint input–
Stinespring’s dilation theorem (see Channels in
output distributions.
Quantum Information Theory) that a noiseless
For the quantum channel capacity, the relevant
quantum channel provides perfect security against
entropic quantity is the coherent information,
eavesdropping. This is one of the characteristic traits
  of quantum mechanics and lies at the heart of
Ic ðT; %Þ :¼ H ðTð%ÞÞ  H T  idðj % ih % jÞ ½14
quantum cryptography. In his proof, Devetak
where H denotes the von Neumann entropy: showed a way to turn this around and upgrade
H(%) = tr% ld%, and % 2 HA  HA0 is a purifica- coding schemes for private classical information to
tion of the density operator % 2 A. The coherent quantum channel codes.
information does not increase under quantum The relation between quantum information trans-
operations, Ic (S T, %)  Ic (T, %) for any quantum fer over a channel T : A ! B and privacy against
channel S and state % 2 A. This is the data eavesdropping is best understood in terms of the
processing inequality (Barnum et al. 1998), which companion channel TE : A ! E. TE arises from a
shows that the regularized coherent information given Stinespring isometry V : HA ! HB  HE of
provides an upper bound on the quantum channel T  TB by interchanging the roles of the output
capacity: if Alice and Bob have a coding scheme for system B and the environment E:
the channel T with capacity Q(T), n channel uses
TB ð%Þ ¼ trE V%V  Ð TE ð%Þ ¼ trB V%V  ½16
allow them to share a maximally entangled state of
size exp2 n Q(T). The coherent information of this The channel TE describes the information flow into
state equals n Q(T), and was no larger prior to the environment E, a system we assume to be under
Bob’s decoding. complete control of a potential eavesdropper, Eve
Recently, Devetak (2005) developed a coding say. The setup for private classical information
scheme to show that this bound is in fact attainable. transfer (including the definition of rates and capa-
Different proofs were outlined by Lloyd and Shor. city) is then exactly the same as for the classical
channel capacity (see Quantum Channels: Classical
Theorem 1 For every quantum channel T,
Capacity), but the protocols now have to satisfy the
1 additional requirement that TE releases (almost) no
QðTÞ ¼ lim max Ic ðT  n ; %Þ ½15 information to the environment. This can be achieved
n!1 n %
by randomizing over E exp2 n (TE , {pi , %i }) code
Unlike the classical or quantum mutual information, words of a standard HSW code of total size
coherent information is strictly superadditive for exp2 n (TB , {pi , %i }), where {pi , %i } is the quantum
some channels (DiVincenzo et al. 1998). Hence, ensemble from which a set of random code words
Capacity for Quantum Information 429

{k, l }kB=, 1,E l = 1 is generated. The appearance of Given a set of pure state code words
the Holevo bound {j’kl i}kB=, 1,E l = 1 of a private classical information
! protocol, for entanglement transfer Alice prepares
X X  
ðT;fpi ;%i gÞ :¼ H pi Tð%i Þ  pi H Tð%i Þ ½17 the input state
i i
1 X 1 X

B E
in the dimension of both these code spaces can be jiA0 A ¼ pffiffiffiffiffi jkiA0  pffiffiffiffiffi j’kl iA ½20
B k¼1 E l¼1
understood from the size of the relevant typical
subspaces (Devetak and Winter 2004).
The randomization guarantees that the remaining where A0 denotes a reference system that Alice keeps
B exp2 n((TB )  (TE )) code words are almost in her lab. On his share of the resulting output state
indistinguishable to Eve: j0 iA0 BE Bob will then employ the corresponding
  measurement operators {Mkl }k,B l, =
E
1 to implement the
1 XE
  coherent measurement
 n 
 TE kl  jl   "; 8j; k ¼ 1; . . . ; B ½18
E  X pffiffiffiffiffiffiffiffi
l¼1 1
VM j ’iB := kl
Mkl j’iB  j kliB1 B2
The net transfer rate for private classical informa-
tion is then R (TB )  (TE ), which is just the total which places the measurement outcomes into some
transfer rate for the channel Alice ! Bob reduced by reference system B1  B2 . Any measurement which
the transfer rate Alice P
! Eve. identifies the output with high probability only
Remarkably, if % = i pi j i ih i j is a decomposi- slightly disturbs the output state, and thus Bob’s
tion of % 2 A into pure states, the private transfer coherent measurement leaves the total system in an
rate exactly equals the coherent information, approximation of the state
Ic ðTB ; %Þ ¼ H ðTB ð%ÞÞ  H ðTE ð%ÞÞ X
B ;E
1
¼ ðTB Þ  ðTE Þ ½19 j00 i ¼ pffiffiffiffiffiffiffiffiffiffi jki 0 jki jli j’0 i ½21
B E k¼1;l¼1 A B1 B2 kl BE
The so-called entropy exchange
  in which Eve and Bob are still entangled. A
H ðTE ð%ÞÞ = H TB  idðj % ih % Þ
completely depolarizing channel TE would directly
quantifies the extent to which a formerly pure yield a factorized output state B  E here. Although
ancilla state becomes mixed via interaction with the randomization in eqn [18] does not necessarily
the signal states. Equation[19] then nicely reflects result in complete depolarization, there is a controlled
the intuition that for high-rate quantum information unitary operation which Bob may apply to effectively
transfer the signal states should not entangle too decouple Eve’s system, resulting in the output state
pffiffiffiffiffi P
much with the environment. In fact, for an almost (1= B ) k j kkiA0 B1  E, which is the maximally
noiseless channel the entropy exchange nearly entangled state of size B exp2 n Ic (TB , %) required
vanishes, and the optimized coherent information for teleportation. The direct part of the capacity
almost attains the maximal value 1, while for nearly theorem then follows by applying the above coding
depolarizing channels we have Ic (TB , %) H(%)  0. scheme to large blocks and maximizing over (pure)
So far, we have sketched a protocol for private input ensembles, concluding the proof.
classical information transfer. Devetak’s coherenti- Devetak’s proof of the coding theorem seems to
fication allows to pass from the transmission of indicate that the private classical capacity Cp (T)
classical messages to the transmission of coherent equals the quantum capacity Q(T) for every
superpositions. This technique has also been applied quantum channel T. However, for the coherentifica-
to obtain entanglement distillation protocols from tion protocol, we have restricted the private coding
secret key distillation, and offers a unified view on schemes to pure state input ensembles, and thus we
the secret classical resources and their quantum can only conclude that Q(T)  Cp (T). The existence
counterparts (Devetak and Winter 2004, Devetak of bound-entangled states with positive one-way
et al. 2004). distillable secret key rate (Horodecki et al. 2005)
In order to transfer quantum information, Alice implies that this inequality can be strict. A general
will only need to send one half of a maximally procedure does exist to retrieve (almost) all the
entangled state of dimensionality exp2 n Ic (TB , %). information from the output of a noisy quantum
As described in the previous section, teleportation channel that releases (almost) no information to the
then allows her to transfer arbitrary quantum states environment. But this requires a stronger form of
from a subspace of that size. privacy than eqn [18].
430 Capacity for Quantum Information

Quantum Channels with Memory shown to die out even exponentially. The set of
these channels is open and dense in the set of
This article has so far been restricted to memory-
quantum memory channels. Hence, generic memory
less quantum channels, in which successive chan-
channels are forgetful.
nel inputs are acted on independently. Messages of
The capacity of memory channels is defined in
n symbols are then processed by the tensor
complete analogy to the memoryless case, replacing
product channel T  n , as in Definition 1 and
the n-fold tensor product T n in Definition 1 by
illustrated in Figure 1. In many real-world applica-
the n-fold concatenation Tn . The coding theorems
tions, the assumption of having uncorrelated noise
for (private) classical and quantum information
cannot be justified, and memory effects need to be
can then be extended from the memoryless case
taken into account. For a quantum channel T with
to the very important class of forgetful channels
register input A and register output B, such effects
(Kretschmann and Werner 2005).
are conveniently modeled (Bowen and Mancini
Nonforgetful channels call for universal coding
2004) by introducing an additional memory
schemes, which apply irrespective of the initializa-
system M, so that now T : M  A ! B  M is a
tion of the input memory. Such schemes are
completely positive and trace-preserving map with
presently known only for very special cases.
two input systems and two output systems. Long
messages with n signal states will then be
processed by the concatenated channel Acknowledgmnts
Tn : M  An ! Bn  M. In such a concatenation,
the memory system is passed on from one channel The author thanks the members of the quantum
application to the next, and thus introduces information group at TU Braunschweig for their
(classical or quantum) correlations between con- careful reading of the manuscript and many helpful
secutive register inputs. suggestions. He also gratefully acknowledges the
Remarkably, this relatively simple model can be funding from Deutsche Forschungsgemeinschaft
shown (Kretschmann and Werner 2005) to encom- (DFG).
pass every reasonable physical process: every sta-
See also: Capacities Enhanced by Entanglement;
tionary channel S : A1 ! B1 which turns an infinite
Channels in Quantum Information Theory; Entanglement;
string of input states (on the quasilocal algebra A1 ) Positive Maps on C -Algebras; Quantum Channels:
into an infinite string of output states on B1 and Classical Capacity; Quantum Error Correction and Fault
satisfies the causality constraint is in fact a con- Tolerance; Source Coding in Quantum Information Theory.
catenated memory channel. Causality here means
that the outputs of the stationary channel S at given
time t0 do not depend on inputs at times t > t0 . Further Reading
Figure 2 illustrates the structure theorem for causal
Barnum H, Nielsen MA, and Schumacher B (1998) Information
stationary quantum channels. In general, it produces transmission through a noisy quantum channel. Physical
not only the memory channel T with memory Review A 57: 4153 (quant-ph/9702049).
algebra M, but also a map R describing the Bennett CH, Devetak I, Shor PW, and Smolin JA (2004)
influence of input states in the remote past. Inequalities and separations among assisted capacities of
quantum channels, quant-ph/0406086.
Intuitively, such a map is often not needed, because
Bennett CH, DiVincenzo DP, Smolin JA, and Wootters WK
memory effects decrease in time: the memory (1996) Mixed-state entanglement and quantum error correc-
channel T is called forgetful if outputs at a large tion. Physical Review A 54: 3824 (quant-ph/9604024).
time t depend only weakly on the memory initializa- Bowen G and Mancini S (2004) Quantum channels with a finite
tion at time zero. In fact, memory effects can be memory. Physical Review A 69: 012306 (quant-ph/0305010).
Devetak I (2005) The private classical information capacity and
quantum information capacity of a quantum channel. IEEE
Transactions on Information Theory 51: 44 (quant-ph/0304127).
tr tr Devetak I, Harrow AW, and Winter A (2004) A family of
quantum protocols. Physical Review Letters 93: 230504
(quant-ph/0308044).
S = R T T tr
Devetak I and Winter A (2004) Relating quantum privacy and
quantum coherence: an operational approach. Physical
Time Time Review Letters 93: 080501 (quant-ph/0307053).
DiVincenzo DP, Shor PW, and Smolin JA (1998) Quantum
Figure 2 By the structure theorem, a causal automaton S can channel capacities of very noisy channels. Physical Review A
be decomposed into a chain of concatenated memory channels 57: 830 (quant-ph/9706061).
T plus some input initializer R. Evaluation with the partial trace tr Eisert J and Wolf MM Gaussian quantum channels. In Cerf N,
means that the corresponding output is ignored. Leuchs G, and Polzik E (eds.) Quantum Information with
Capillary Surfaces 431

Continuous Variables of Atoms and Light. London: Imperial Horodecki K, Pankowski L, Horodecki M, and Horodecki P
College Press (in preparation)(quant-ph/0505151). (2005) Low dimensional bound entanglement with one-way
Holevo AS and Werner RF (2001) Evaluating capacities of distillable cryptographic key, quant-ph/0506203.
bosonic Gaussian channels. Physical Review A 63: 032312 Kretschmann D and Werner RF (2004) Tema con variazioni:
(quant-ph/9912067). quantum channel capacity. New Journal of Physics 6: 26
Horodecki M, Horodecki P, and Horodecki R (1999) General (quant-ph/0311037).
teleportation channel, singlet fraction, and quasidistillation. Kretschmann D and Werner RF (2005) Quantum channels with
Physical Review A 60: 1888 (quant-ph/9807091). memory. Physical Review A 72: 062323 (quant-ph/0502106).
Horodecki P and Nowakowski ML (2005) Simple test for Shor PW (2003) Capacities of quantum channels and how to find
quantum channel capacity, quant-ph/0503070. them. Mathematical Programming 97: 311 (quant-ph/0304102).

Capillary Surfaces
R Finn, Stanford University, Stanford, CA, USA
a
ª 2006 Elsevier Ltd. All rights reserved.
γ

g
Historical and Conceptual Background u0
A capillary surface is the interface separating two
fluids that lie adjacent to each other and do not mix.
Examples of such surfaces are the upper surface of
liquid partially filling a vertical cylinder (capillary
tube), the surface of a liquid drop resting in Figure 1 Capillary tube in infinite reservoir, in downward
equilibrium on a tabletop (sessile drop) and the gravity field.
surface of a liquid drop hanging from a ceiling
(pendent drop); further instances are the surface of a
falling raindrop, the bounding surface of the liquid more general usage adopted in the definition above
in the fuel tank of a spaceship, and the interface derives from the recognition of a class of phenomena
formed by a fluid mass rotating within another fluid. with a common physical basis.
This last example extends to the problem of rotating The first recorded observations concerning
stars. capillarity seem due to Aristoteles c. 350 BC. He
Interfaces separating fluids and solids share some wrote that ‘‘a broad flat body, even of heavy
of the physical attributes of capillary surfaces, and material, will float on water, however a narrow
the study of wetted portions of rigid ‘‘support thin one such as a needle will always sink.’’ Any
surfaces’’ becomes essential for describing global reader with access to a needle and a glass of water
behavior of capillary configurations. However, some will have little difficulty refuting the assertion.
significant distinctions appear that change the Remarkably, the error in reasoning seems not to
formal structure of the problems, and must be have been pointed out for almost 2000 years,
accounted for in the theory. when Galileo addressed the problem in his
Phenomena governed by capillarity pervade all of Discorsi, about 1600. The only substantive studies
daily life, and most are so familiar as to escape till that time are apparently those of Leonardo da
special notice. By contrast, throughout the eigh- Vinci a hundred years earlier. Leonardo intro-
teenth century and presumably earlier, great atten- duced reasoning close in spirit to that of current
tion centered on the rise of liquid in a narrow glass literature; however, the Calculus was not available
circular-cylindrical tube dipped vertically into a to him, and he was not in a position to develop his
liquid reservoir (Figure 1); this striking event had a ideas in quantitative ways.
dramatic impact that confounded intuition. Clarifi-
Young’s Contribution
cation of the behavior became one of the major
problems challenging the scientific world of the The later discovery of the Calculus provided a
time, and was not achieved during that period. The driving impetus guiding many new studies during
term ‘‘capillary,’’ adapted from the Latin ‘‘capillus’’ the eighteenth century. But despite the enormity of
for hair, was applied to the phenomenon since it was that weapon, it did not on its own suffice, and initial
observed only for tubes with very fine openings; the quantitative success had to await two initiatives
432 Capillary Surfaces

taken by Thomas Young in 1805. Young based his where N is a unit normal on S, and n is unit
studies on the concept of surface tension that had conormal (as indicated in Figure 2) on . Multi-
been introduced by von Segner half a century earlier. plying both sides of [4] by , the right-hand side
Segner hypothesized that every curve on a fluid/fluid becomes the net surface tension force on S. Since
interface S experiences on both its sides an orthogo- that must equal the net balancing pressure force, we
nal force  per unit length, which (for given obtain
temperature) depends only on the materials and is Z
directed into the tangent planes on the respective
ðp  2H ÞN dS ¼ 0 ½5
sides. The presence of such forces can be indicated S
by simple experiments. They become clearly evident
in the case of thin (soap) films spanning a frame, in Letting the diameter of S tend to zero, the assertion
which case there is an easily observed orthogonal follows.
pull on the frame, see the section ‘‘Dual interpreta- We emphasize here the implicit assumption above,
tion of : distinction between fluids and solids.’’ that  is a constant depending only on the particular
Young made two basic conceptual contributions materials, and not on the shape of S. This author
(Y1, Y2): knows of no source in which that is clearly
established, although experiments and experience
Y1. Relation of pressure jump across a free interface
provide some a posteriori justification. See the
to mean curvature and surface tension.
further comments under Y2, and later in sections
Consider a piece of surface S in the shape of a ‘‘Gauss’ contribution: the energy method’’ and
spherical bowl of radius R, separating two immisci- ‘‘Dual interpretation of : distinction between fluids
ble fluid media, as in Figure 2. In equilibrium, any and solids.’’
pressure difference p across S must be balanced by
Y2. The capillary contact angle.
a tension  on its rim . If S projects to a disk of
(small) radius r on the plane tangent to S at the Young asserted that there are surface tensions for
symmetry point, we are led to solid/fluid interfaces analogous to those just intro-
duced, and again depending only on the materials.
r2 p ’ 2r sin # ½1
This assertion is erroneous, as was suggested in
where # is inclination of S at the rim, relative to the writings of Bikerman and of others, and more
plane. We thus find at the base point recently established in a definitive example by Finn.
Using his premise, Young attempted to characterize
d sin # 1
p ¼ 2 ¼ 2 ½2 the contact angle  made by the fluid surface with a
dr R rigid boundary, by requiring that the net tangential
Young then went on to consider a general S, without component of the three surface tension vectors
symmetry hypothesis. Letting 1=R1 , 1=R2 denote the vanish at the triple interface; this leads to the often
planar curvatures at a point in S of two normal employed but incorrect ‘‘Young diagram,’’ see
sections in orthogonal directions, he asserted that Figure 3, and the relation
 
1 1 1  1  2
p ¼ 2 þ  2H ½3 cos  ¼ ½6
2 R1 R2 0
where H is the mean curvature of S at the point.
Although Young provided no formal justification for
this step, we can establish it with the aid of a general σ1
formula from differential geometry that was not
known in his lifetime: Solid Gas
Z I
2HN dS ¼ n ds ½4 γ
S r
σ2
σ0
σn
p1
Liquid

p2
Figure 2 Pressure change across fluid element, balanced by Figure 3 Young diagram; balance of tangential forces.
surface tension. Residual normal force remains.
Capillary Surfaces 433

for cos  in terms of the magn itudes of the three quantitative indication of what ‘‘narrow’’ should
‘‘surface tensions.’ ’ Young concluded that the signify. Note that whenever 0   < =2, [9]
contact angle depends only on the materials, and becomes negative when the nondimensional Bond
in no other way on the conditions of the problem. Number B = a2 exceeds 8; since u is known to be
This basic assertion is by a fortuitous acciden t positive in the indicated range for , [9] provides
correc t, as follows from the contribution by no information in that case, whereas [7] is still of
Gauss described below; it underlies all modern some value. Nevertheless, [9] is asymptotically
theory. exact and consists of the first two terms of the
Using Y1 and Y2, Young produced the first formal expansion in powers of a; that was first
verifiable prediction for the rise height u0 in proved by D Siegel in 1980, almost 200 years
the circular capillary tube of Figure 1. He following the discovery of the formulas. In 1968,
assumed the interface to be spherical, so that H P Concus extended the formal expansion for the
is constant and a = cos =H. He assumed vanish- height to the entire traverse 0 < r < a. F Brulois
ing outside pressure. According to classic laws of (1981) and independently E Miersemann (1994)
hydrostatics, p = gu0 = 2H by Y1, where  is proved the expansion to be asymptotic to every
fluid density; there follows the celebrated rela- order. Explicit bounds for the rise height above
tion, presented entirely in words in his 1805 and below, making quantitative the notion of
article: ‘‘narrow,’’ were obtained by Finn.
Laplace supplied the first detailed mathematical
2 cos  g
u0  ; ¼ ½7 investigations into the behavior of capillary surfaces,
a  applying his ideas to many specific examples. His
underlying motivation apparently derived at least
Young scorned the mathematical method, and partly from astronomical problems, and he pub-
made a point of deriving and publishing his lished his contributions in two ‘‘Suppléments’’ to the
results on capillarity without use of any mathe- tenth volume of his Mécanique Céleste.
matical symbols. This personal idiosyncrasy
causes his publications to be something of a
Gauss’ Contribution: The Energy Method
challenge to read.
Young and Laplace both based their reasonings
The Laplace Contribution on force-balance arguments, which at best were
unclear and at worst conceptually wrong. In
In 1806, Laplace published the first analytical expres-
1830, Gauss took up the problem anew from a
sion for the mean curvature of a surface u(x, y), and
variational point of view, using the Johann
showed that the expression can be written as a
Bernoulli principle of virtual work. To do so, he
divergence. He obtained the equation
attempted to characterize both surface energies
ru and bulk fluid energies in terms of postulated
div Tu  2H; Tu  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½8
particle attractions and repulsions. In an aston-
1 þ jruj2
ishing 30 pages, he essentially introduced founda-
Thus, if H is known from geometrical or physical tions of modern potential theory, of measure
considerations, as it is for the capillary tube in theory, and of thermodynamics. He ended up
the example just considered, one finds a second- with elaborate expressions that could not readily
order (nonlinear) equation for the surface height be applied, and which at least to some extent he
of any solution as a graph. The equation is did not use. He asserted, for example, that the
elliptic for any function u(x, y) inserted into the bulk internal energy would be proportional to
coefficients, however not uniformly so; the parti- volume, which for an incompressible fluid is
cular nonuniformity leads to some striking and constant under admissible deformations, and on
unusual behavior of its solutions, as we shall see. that basis he ignored the bulk energy term
With the aid of [8], Laplace improved the Young completely. His procedures then led him, in an
estimate [7] to independent and more convincing way, to the
" !# identical equation and boundary condition that
2 cos  1 2 1  sin3  had been produced by his predecessors. It must,
u0    a ½9
a cos  3 cos3  of course, be remarked that his justification for
ignoring the bulk energy term would not be
Both Young and Laplace proposed their for- correct for a compressible liquid (see the section
mulas for ‘‘narrow tubes’’, but neither gave any ‘‘Compressibility’’), and it is open to some
434 Capillary Surfaces

question for the central motivating problem of a


capillary tube dipped into an infinite liquid bath,
in which event there is no volume constraint.
S 2 .
The material that follows is guided by the ideas of
1
.
Gauss; however, I have found it advantageous to
replace his elaborate hypotheses on particle attrac-
tions and repulsions by a simpler phenomenological Figure 5 Attractions on a fluid element: (1) interior to the fluid;
reasoning as to the nature of the energy terms to be (2) on the surface interface.
expected.
To fix ideas, we consider a semi-infinite cylinder The constant  has the dimensions of force per unit
of general section  and of homogeneous material, length, and turns out to be the surface tension of the
closed at the bottom, situated vertically in a down- interface. We note from [10] its dual interpretation
ward gravity field g per unit mass, and partly filled as areal energy density on S, arising from formation
with an incompressible liquid of density  covering of that surface. This alternative interpretation lends
the bottom (a more exact discussion, taking account conceptual support to the supposition that  is
of compressibility, is indicated below in the section constant on S. See the section ‘‘Dual interpretation
‘‘Compressibility’’). We assume an equilibrium fluid of : distinction between fluids and solids.’’
configuration with the liquid bounded above by an Implicit in the above discussion are deep
ideally thin interface S : u(x, y) (see Figure 4). We premises about the nature of the forces acting
distinguish the energy terms that occur: within the fluid. Essentially these forces must be
1. Surface energy. This is the energy required to perceptible only at infinitesimal distances, and
create the surface interface S. We can characterize it grow rapidly with decreasing distance. Forces
by noting that fluid particles within or exterior to the both of attraction and of repulsion must be
liquid are attracted equally to neighboring particles in present. The recognition of the need for such
all directions; however, at the surface S there is a forces can be traced back to Newton. Quantita-
differential attraction, to particles of the exterior tive postulates as to their precise nature were
medium (such as air) above, or to the liquid below introduced by van der Waals in the late nine-
(see Figure 5). Thus, particles in the interface are teenth century, and the topic remains still in
pulled orthogonally to S. In general, for a liquid–gas active study. Since these forces appear at mole-
interface, significant work will be done only on the cular distance levels, their introduction leads
liquid and those particles will be pulled toward the inevitably to questions of statistical mechanics.
liquid; otherwise, the liquid would evaporate across Additionally, our discussion of work done in
the interface and disappear. The work done in that forming the surface implicitly assumes a compres-
(infinitesimal) motion is proportional to the area of S, sible transition layer there, in conflict with our
so that for the surface energy ES we obtain treatment of S as an ideally thin interface
Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bounding an incompressible fluid. In these senses,
ES ¼  1 þ jruj2 dx ½10 it is striking that [10] – which is in accord with
 classical constructions – could be obtained via
global qualitative postulates concerning a con-
tinuum in static equilibrium, in which the specific
nature of the forces is not introduced.
S Rayleigh measured the thickness of the surface
interface between water and air to be of mole-
cular size, thus providing experimental justifica-
tion for the procedure adopted.
γ 2. Wetting energy. A similar discussion applies at
g
the interface separating the liquid and solid at the
cylinder walls; however, this time the net attraction
can be in either direction, as particles from neither
medium can migrate significantly into the other. For
Ω ν the wetting energy EW , we write, with  the
Σ boundary of ,
Figure 4 Liquid in cylindrical capillary tube, of general section . I
Reproduced with permission from the American Institute of EW ¼  u ds ½11
Aeronautics and Astronautics. 
Capillary Surfaces 435

We designate  as the relative adhesion coefficient of


as we wish on the boundary, and the fundamental
the liquid–gas–solid configuration. We assume that lemma now yields  Tu =  on . We now note
the cylinder walls are of homogeneous material, so that for any liquid surface u(x, y) there holds
that  will be constant. In general,  is a difference of
factors that apply on the walls at the two interfaces,  Tu ¼ cos  ½17
with the liquid and with the external medium. on , where  is the angle between the cylinder wall
3. Gravitational energy. The work done in and the surface S, measured within the liquid. Since
lifting an amount of liquid h against the  is assumed to be constant, that is so also for . It is
gravity field from the base level to a height h in a a physical constant: the contact angle, that must be
vertical tube of small section  is ghh. Thus, measured in an independent experiment, and cannot
the work done in filling that tube up to the be prescribed in advance or calculated within the
surface height u is (gu2 =2), and the total scope of the theory.
gravitational energy is The constant , originally introduced as a general
Z proportionality constant, is now characterized as
g
EG ¼ u2 dx ½12  = cos . We thus see that a physical surface of the
2 
form envisaged is possible only if 1    1.
4. Volume constraint. In the configuration con- Physically, one expects that if  < 1 the liquid
sidered the volume is to be unvaried during will separate from the walls, while, if  > 1, the
admissible deformations; we take account of the liquid will spread over the walls as a thin film.
constraint by introducing a Lagrange parameter , Equation [16] and boundary condition [17]
and an additional ‘‘energy’’ term provide a nonlinear second-order equation that is
Z elliptic for any function u(x, y), and also a non-
EV ¼  u dx ½13 linear transversality condition on the boundary, for

determining the surface interface S. The expression
According to the principle of virtual work, the div Tu is exactly twice the mean curvature of the
sum E of the above energies must remain unvaried surface S. If  6¼ 0 then can be eliminated by
in any deformation that respects all mechanical addition of a constant to u. The problem [16]–[17]
constraints other than the volume constraint. We for the fluid in a vertical cylindrical capillary tube
choose a deformation u ! u þ "
, with
smooth in of general section becomes thus a geometrical one:
the closure of , which determines a functional E("). to find a surface whose mean curvature is a
From E0 (0) = 0 follows prescribed function of position in space, and
8 9 which meets the cylindrical boundary walls in a
Z >
< >
= prescribed angle .
ru In the absence of gravity, [16] takes the form
r
 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ
ðu þ Þ dx
>
: 1 þ jruj2
>
;
div Tu ¼ 2H ½18
I
 
ds ¼ 0 ½14 for a surface of constant mean curvature H. The

constant H is determined by integrating [18] over ,
from which and using [17]:
Z jj cos 
 

div Tu þ ðu þ Þ dx 2H ¼ ½19
jj

I
þ
ð  Tu  Þds ¼ 0 ½15 where jj and jj denote the respective perimeter
 and area, and thus H is independent of volume.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi From the known uniqueness up to an additive
with Tu  ru= 1 þ jruj2 , and with the unit constant of the solutions of [18], [17] it follows
exterior normal on . Choosing first
to have that the shape of the solution surface is indepen-
compact support in , the boundary term vanishes, dent of volume. That result holds also for [16], [17]
and the ‘‘fundamental lemma’’ of the calculus of in view of the possibility to eliminate from the
variations yields equation by addition of a constant, and the
uniqueness of the solutions of the resulting
div Tu ¼ u þ ;  ¼ g= ½16
equation.
throughout . Thus, the area integral in [15] Equations [16]–[17] or [18]–[17] are appropriate
vanishes for any
. We are therefore free to choose for determining capillary surfaces that are graphs
436 Capillary Surfaces

u(x, y) over a base domain . More generally, any σ1


surface S in 3-space satisfies the equation σ1
σ0 σ0
x ¼ 2HN ½20 γ
σ2 σV σN
σ2
where H is its scalar mean curvature and N is a unit
normal vector on S. Here  is the ‘‘intrinsic (a) (b)
Laplacian’’ in the metric of S. This is the appropriate Figure 7 (a) Floating spherical ball; presumed ‘‘Young’’ forces.
relation to be applied in situations for which the (b) Normal and vertical components of Young forces; contra-
physical surface folds over itself and cannot be diction to presumed equilibrium.
expressed globally as a graph. The formal simplicity
of [20] is deceptive; the challenges arising from the To do so would lead to a net downward force v on
nonlinearity in the equation can be formidable, and the ball (see Figure 7b), contradicting the supposed
very little general theory is as yet available. equilibrium state.

Dual Interpretation of : Distinction between Mathematical and Physical Predictions:


Fluids and Solids
Experiments
We have already remarked the duality in connection
In the following sections, we study the kinds of
with eqn [10] above. It can be made explicit with a
behavior imposed on a surface S by the requirement
simple experiment proposed by Dupré. One makes a
that it appear as solution of one of the indicated
rigid frame with a sliding bar of length l, as in
equations and boundary conditions. Some of these
Figure 6, and dips the frame into soap solution. On
properties are quite surprising in the context of
lifting the frame from the solution the opening will
classically expected behavior of solutions of equa-
be filled with a soap film, and one finds a force
tions of mathematical physics. The mathematical
F = 2l on the bar, directed orthogonal to the bar
predictions were, however, corroborated in certain
(the factor 2 appears since the film has two sides).
cases experimentally, as we discuss below.
The work done in sliding the bar a distance x is
F = 2lx, which can also be written F = 2A
Uniqueness and Nonuniqueness
with A an element of area. In this sense, the two
interpretations of  are formally equivalent, for We begin by considering uniqueness questions. We
fluid/fluid interfaces. start with a semi-infinite capillary tube, closed at the
The equivalence cannot be extended to solid/fluid bottom, to be partially filled with a prescribed
interfaces. Consider a rigid spherical ball of generic volume of (incompressible) liquid making contact
material and radius R, freely floating in an infinite angle  on the container walls (Figure 8a). If   0,
liquid bath in a gravity-free environment, see any solution is uniquely determined. That is a quite
Figure 7a. It can be shown that the unique general theorem, valid for a wide class of domains 
symmetric solution to the problem is a horizontal including all piecewise smooth domains (at the
surface, as in the figure. A variational procedure as corners of which data of the form [17] cannot be
above shows that if e0 , e1 , e2 are the interfacial prescribed); formally, data can be omitted on any
energy densities associated with the three interfaces, boundary set of linear Hausdorff measure zero. In
then this result, no growth conditions need be imposed
e1  e2 near the boundary (note that such a statement
cos  ¼ ½21 would be false for solutions of the Laplace equation
e0
under Dirichlet boundary conditions).
in formal analogy with the Young relation [6]. But Next we consider a sessile liquid drop on a
e1 , e2 cannot be interpreted as interfacial forces horizontal plate (Figure 8b). Again the solution is
whose net tangential component cancels that of e0 . uniquely determined by the volume and by ,
although the known proof differs greatly from that
of the other case.
We now consider a smooth deformation of the
base plane, depending on a parameter t, which
F carries it into the cylinder; that can be done in such
a way that the supporting surface is at all times
‘‘bowl-shaped,’’ as in Figure 8c. Since the bowl
Figure 6 Dupré apparatus for exhibiting surface tension. formation tends to restrict the possible deformations
Capillary Surfaces 437

admitting an entire continuum of distinct solution


interfaces, all with the same contact angle and
g enclosing the same fluid volume (Gulliver and
Hildebrandt; Finn). This can be done for any gravity
field. Figure 9 illustrates seven members of the family
of interfaces, in the particular case  = 0.
The question immediately arises as to which if
Ω any of the continuum of surfaces will be seen in
an experiment. In fact, it can be proved that none
(a) (b) (c) of the indicated surfaces is mechanically stable
(Finn, Concus and Finn, Wente). Since the indicated
family includes all symmetric surfaces that are
stationary for the energy functional, we find that
45° any stable stationary configuration must be asym-
metric. Thus, we have obtained an example of
symmetry breaking, in which all conditions of the
45°
45° problem are symmetric, but for which all physically
45°
acceptable solutions are asymmetric.
(d) These results were subjected to computational test
by M Callahan using the Surface Evolver software,
Figure 8 Support configurations: (a) capillary tube, general
section; (b) horizontal plate; (c) convex surface appearing during to experimental test by M Weislogel in a drop
deformation of horizontal plate to capillary tube; and (d) tower, and to experimental test by S Lucid in the
Nonuniqueness of configuration appearing during convex defor- Mir Space Station. The results of the latter experi-
mation. Reproduced from Mathematics Intelligencer 24(3) 2002 ment are compared in Figure 10 with the computer
21–33 with permission from Springer-Verlag Heidelberg.
calculations. In both cases, both a local minimizer
(potato chip) and a presumed global minimizer
of the fluid consistent with smooth contact with the (spoon) were observed.
supporting rigid surface, one might expect that The seven surface interfaces indicated in Figure 9
the corresponding capillary surface S(t), arising all provide the same sum of surface and wetting
from the identical fluid mass, will for each t be energy, and bound the same volume of fluid. They
uniquely determined. all satisfy an eqn [18] with constant H, in
That is however not true, even for symmetric accordance with hypotheses of incompressibility
configurations. We can see that from the configuration and vanishing gravity. Thus, formally, all configura-
of Figure 8d, consisting of a vertical circular cylinder tions have identical mechanical energy. The surfaces
whose base is a 45 cone. We assume a contact angle
 = 45 and adjust the radius so that a horizontal
surface lying just below the cylinder/cone juncture
provides the prescribed volume. This is a formal
solution surface. Now fill the configuration with a
larger volume, so that the contact line will lie above the
juncture. The upper surface will no longer be flat, in
view of the 45 contact angle, and takes an appearance
as indicated in the figure. Finally, we decrease the fluid
volume, keeping all other parameters unchanged. As
noted above, the upper surface moves rigidly down-
ward, and it is clear that if the original surface is close
enough to the juncture line, then the prescribed volume
will be attained before the contact line reaches the
juncture. Thus, uniqueness fails. Figure 9 Seven spherical capillary interfaces in an ‘‘exotic’’
In this construction as just described, the bounding container of homogeneous material in zero gravity. All interfaces
surface is not smooth; however, one sees easily that bound the same volume and have the same sum of free surface
the procedure continues to work if the edge and and wetting energies. If all pressures above the interfaces are the
same, then the pressures below them successively increase as the
vertex are smoothed locally. In fact, one can carry the curvature vectors of the vertical sections change from upwardly to
procedure to a striking conclusion; by appropriate downwardly directed. Reproduced from Mathematics Intelligences
smoothing, one can construct a bounding surface 24(3) 2002 21–33 with permission from Springer-Verlag Heidelberg.
438 Capillary Surfaces

Γ δ
α
P
Ωδ
Spoon (left) Rotationally Potato chip
symmetric

Figure 11 Wedge domain. Reproduced from Finn R ‘‘Capillary


Surface Interfaces’’ in Notices of AMS 46 No.7 (1999) with
permission of the American Mathematical Society.

γ2
Spoon (left) Potato chip
π
Figure 10 Symmetry breaking in exotic container, g = 0. Below:
calculated presumed global minimizer (spoon) and local minimizer + D1–
D2
(potato chip). Above: experiment on Mir: symmetric insertion of fluid (No graph)
(center); spoon (left); potato chip (right). This is a grayscale version
(D)
of a color figure reproduced from Journal of Fluid Mechanics, 224:
383–94, (1991) with permission of Cambridge University Press.
R 2α

are all spherical caps; however, the radii R of the


(Continuous)
caps vary considerably. According to Y1 above, the
pressure change across each interface is p = 2=R.
D2–
Since one may assume the outer region to be a
vacuum with zero pressure for all caps, we find that D1+
(I )
the pressures within the fluids vary greatly among (No graph)
the configurations. One would thus expect that
work is done within the fluid in passing from one 0
0 π γ1
configuration to another, a circumstance we have
Figure 12 Domain R of data yielding continuous normal to
excluded by hypothesis when determining the
capillary surface in wedge of opening 2a < p. The symbols D
family. From this point of view, the (customary) and I are clarified in the section ‘‘Behavior at a corner point.’’
hypothesis of incompressibility that was used in Reproduced from ‘‘Capillary Wedges Revisited’’ in SIAM J. Math.
determining the family is put into significant ques- Anal. 27 No.1 (1996) 56–69 with permission from SIAM.
tion; we examine this point in some detail in the
section ‘‘Compressibility.’’ also additional material anticipating the section
‘‘Drops in wedges’’).
For data points interior to R, this criterion also
Discontinuous Dependence I
suffices for the existence of at least one such solution
Capillary surfaces can exhibit striking discontinuous surface, for any prescribed H; such surfaces can in
dependence on the defining data. As initial example, fact be produced explicitly as spherical caps (planes
we consider the behavior of a solution of [18]–[17] if H = 0). It remains to discuss what can occur with
at a protruding corner point P of the domain  of data arising from the remaining four subregions of
definition. For simplicity, we assume the corner the square.
bounded locally by straight segments, meeting in an If (1 , 2 ) 2 D

1 , then there is no solution to


opening angle 2 < , thus forming locally a wedge [18]–[17] in any neighborhood of the corner point
domain. In anticipation of material to follow, we P. On the other hand, an explicit solution for any
assume contact angles 1 and 2 on the respective H > 0 can be found as a lower spherical cap on
sides, 0  1 , 2  . One can show that a necessary the segment 1 þ 2 =   2 that separates Dþ 1
condition for a solution surface over a domain  as from R (see Figure 13, which indicates the
in Figure 11 to have a continuous normal vector up equatorial circle). Correspondingly, if H < 0 then
to P is that the data point (1 , 2 ) lie in the closure of an explicit solution can be found on the separation
the rectangle R of Figure 12. (This figure includes line between D 1 and R. Thus, there is a
Capillary Surfaces 439

0 lies strictly interior to a section 1 of a tube Z1 ,


will raise liquid from an infinite reservoir in a
downward directed gravity field to a higher level
over 0 than will Z1 over that subdomain of its
γ2 section. That is true if both cylinders are circular,
γ1 and in the intervening years its correctness was
established in a number of other cases of particular
γ2 2α . P interest.
Finn and Kosmodem’yanskii, Jr. showed, how-
γ1
ever, by example that the assertion fails in a large
range of cases, and in fact can fail with arbitrarily
large height differences, uniformly over 0 . Beyond
Figure 13 Construction of solution as lower hemisphere; g 1 þ
g 2 = p  2a, H > 0. Reproduced from ‘‘Capillary Wedges Revis- that, the construction exhibits a strikingly discontin-
ited’’ in SIAM J. Math. Anal. 27 No.1 (1996) 56–69 with uous change of behavior, under perturbations of a
permission from SIAM. disk as inner domain. Perhaps more remarkably, the
assertion can hold with the inner domain a disk, but
discontinuous change in behavior in crossing from with discontinuous reversal of behavior as the disk is
R to either of the D1 regions. perturbed to neighboring disks. That was shown in a
This behavior was put to experimental test by form of the example given later by Finn, and
W Masica, who considered the case 0 < 1 = 2 = illustrated in Figure 15. Here the outer domain 1
 < =2 near the crossing point  = cr with Dþ 1 , for is polygonal, with sides that extend to be tangent to
which þ cr = =2. He partially filled a regular a unit disk 0 , as indicated. The angle  is to be
hexagonal cylinder of acrylic plastic, successively chosen so that 0  =2    min , where min is the
with two different liquids, making respective contact smallest of the interior vertex half-angles of 1 . In
angles greater or less than cr with the plastic. For view of the assumed infinite fluid reservoir, there is
each liquid, Masica then allowed the cylinder to fall no volume constraint, and the governing equation
in a 132 m drop tower. Figure 14 compares the two [16] takes the form
configurations after about 5 s of free fall. In the case
div Tu ¼ u;  ¼ g= > 0 ½22
 > cr he obtained the spherical-cap solution,
which in this case covers the entire base domain  Taking at first the inner domain to be 0 , it can
and appears as an explicit solution of [18]–[17]. be shown that for the corresponding solutions u0
When  < cr , the liquid rose to the top of the and u1 of [22], there holds u0 > u1 over 0 for
cylinder near the edges, filling out the edges over the
corner points. The surface interface S does not cover
, but instead folds back over itself, doubly covering
a portion of . Thus, a physical surface appears as it
must, but it is not a solution of [18] over .

Discontinuous Dependence II

About 1970, M Miranda raised informally the Ωε


question, whether a capillary tube Z0 , whose section
1 1+ε
·
Ω0
Ω1

Figure 15 Discontinuous reversal of limiting height behavior. All


sides of the polygonal domain 1 are tangent to the unit disk 0 .
(a) (b) For the corresponding solution heights u 0 in 0 , u " in the disk "
Figure 14 Liquid in hexagonal cylinder, during free fall in drop of radius 1 þ e, and u 1 in 1 , there holds u 1  u 0 < 0, for any
tower: (a) a þ g > p=2; (b) a þ g < p=2. downward gravity. But lim ! 0 (u 1  u " ) = þ1, for any e > 0.
440 Capillary Surfaces

any  > 0, and thus the Miranda question has a Gerhardt (F and G) extended this condition, and
positive answer for that configuration. But if we showed in particular that solutions exist in general
replace 0 by a concentric disk " 1 of radius in piecewise smooth . This result contrasts with the
1 þ ", we find zero-gravity case [18] discussed in the section
( )  ‘‘Existence p
questions
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi II,’’ for which solutions fail to
 2" cos  
 1 " exist when 1 þ L2 cos  > 1 at a protruding corner
 inf u ðx; Þ  sup u ðx;Þ  
 " 1þ"   (see the section ‘‘Discon
" ptinuous
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi dependence I’’).
1  sin ! 1  sin  However, in the cases 1 þ L2 cos  > 1 studied
< þ ð1 þ "Þ ½23 by F and G the solution u(x) is necessarily
cos  cos 
unbounded in the corner. This condition is equiva-
where ! = arccos(cos = sin ), and u" is the solution lent to < j  =2j at the corner. Concus and Finn
of [22], [17] in " . Since  does not appear on the showed that if  j  =2j in a neighborhood 
right side of [23], there follows in particular that for of a corner with rectilinear sides, as indicated in
any " > 0, there holds Figure 11, then the solution u(x) satisfies
( )
2
lim inf u1 ðx;Þ  sup u" ðx;Þ ¼1 ½24 juðx;Þj < þ ½26
!0 " " 

In particular, a negative answer to Miranda’s independent of ,  in the range considered. Here it


question appears for all gravity sufficiently small. is assumed that [16] is normalized so that = 0;
But as observed above, a positive answer occurs in when  6¼ 0 this can always be achieved by adding a
0 , for any positive gravity. Thus, the limiting constant to u. On the other hand, if < j  =2j,
behavior as  ! 0 changes discontinuously, as " ! 0. then
We find that the two limiting procedures cannot be pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
interchanged: for any x 2 0 , we obtain cos #  k2  sin2 #
uðx;Þ  ½27
  kr
lim lim u1 ðx;Þ  u" ðx;Þ ¼ þ1:
"!0 !0 where k = sin = cos  and # is polar angle relative
  ½25
lim lim u1 ðx;Þ  u" ðx;Þ  const: < 0 to a bisector at the vertex; hence u becomes
!0 "!0
unbounded as O(1=r). Thus, the behavior changes
discontinuously as the configuration for which
= j  =2j is crossed.
Existence Questions I
This prediction was corroborated by T Coburn in
For the general equation [20] there is an established a ‘‘kitchen sink’’ experiment in the Medical School
literature on existence of surfaces containing a at Stanford University. Coburn formed a wedge
prescribed space curve. There is very little literature using two sheets of acrylic plastic, resting on a glass
relating to the capillarity boundary condition that plate, and inserted a drop of distilled water at the
the solution surface S meet a prescribed ‘‘support’’ base of the wedge. Initially, the wedge was opened
surface W in a prescribed angle . The existence of sufficiently that þ   =2, and he obtained the
at least one such surface interior to a prescribed configuration of Figure 16a, with the maximum
sufficiently smooth closed space domain was proved height slightly lower than that indicated by [26]. By
by Almgren, and then Taylor proved smoothness at closing down the angle slightly, the liquid rose to
the contact curve. These are abstract theorems that over ten times that height, as shown in Figure 16b.
are basic for the theory but in general do not This experiment was later repeated by Weislogel
provide specific information in particular cases of under laboratory conditions; it incidentally estab-
interest. lishes the contact angle of water and acrylic plastic
Special interest attaches to the nonparametric in the Earth’s atmosphere as 80
2 .
cases [16] or [18] with boundary condition [17], The indicated procedure provides in general a
especially in view of the discontinuous behavior very accurate way to measure contact angles, when
properties described above. These cases were studied the angle is not far from =2. For  near zero or  in
in depth by a number of authors, with results that the Earth’s gravity field, the discontinuity is con-
put the above examples into some perspective. fined to a microscopic neighborhood of the vertex,
M Emmer proved the existence of a unique and can be difficult to observe. This technical
solution of [16]–[17] for any compact  having difficulty was addressed by Fischer and Finn, who
Lipschitz boundary with Lipschitz constant L such
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi introduced ‘‘canonical proboscis’’ domains, the
that 1 þ L2 cos  < 1  " for some " > 0. Finn and theory of which was further developed by Finn and
Capillary Surfaces 441

particular computed solutions, Concus and Finn


conjectured that all solutions of [18] or of [16] that
arise from data in D
2 are discontinuous at P. A
number of attempts to prove or to disprove this
conjecture have till now been unsuccessful.
An existence theorem for [16]–[17] alternative to
that of Emmer was obtained independently by
Ural’tseva, using a very different approach. This
procedure yielded smoothness estimates up to the
boundary, but required a hypothesis of boundary
smoothness, so that the result does not mesh with the
discontinuous dependence behavior as does that of
Emmer. Later versions of the existence result, again
under boundary smoothness requirements, were given
(a) (b) by Gerhardt, Spruck, and Simon and Spruck. In the
procedure introduced by Emmer, the boundary trace is
Figure 16 Distilled water in wedges formed by acrylic plastic
plates; g > 0. (a) a þ g > p=2; (b) a þ g < p=2. Reproduced shown to exist only in a very weak sense (which,
from P Concus and R Finn, ‘‘On Capillary Free Surfaces in a however, suffices for a uniqueness proof). The later
Gravitational Field’’ in Acta Math 132 (1974) 207–223 with work can be adapted to show that the Emmer
permission of Institut Mittag-Loeffler. solutions are smooth on the smooth parts of @.
None of the above procedures provides existence for
Leise and by Finn and Marek. For such domains the the zero gravity case [18]. As we shall see in the
change in behavior is not strictly discontinuous, but following section, that is not an accident of the
it is nearly so, and it extends over large portions of methods, but reflects subtle properties of the equations.
the cylinder section, so that it is easily observable.
Concus, Finn, and Weislogel conducted space
Existence Questions II
experiments, demonstrating the feasibility of the
method as a means for measuring contact angles in We consider here the zero-gravity case [18], over a
general ranges. domain  bounded by a piecewise smooth curve ,
In [26]–[27] no growth conditions at the corner under the boundary condition [17]. Integrating [18]
are imposed; the estimates hold for every solution over  and using [17], we find 2H jj = jj cos . Let
defined in  and assuming the prescribed data on  ,  =  \ @ ,  =  \ @ . The same proce-
the side walls, with no data prescribed at the vertex. dure over  , using that jTuj < 1 for any u(x, y),
The formula [27] is the initial term of a formal leads to the bound
asymptotic expansion of the solution, in powers of r.
½;   > 0 ½28
Miersemann obtained the complete expansion,
asymptotic to every order, when < j  =2j. He where  is defined by
obtained somewhat less complete information in the
½;    jj  j j cos  þ 2H j j ½29
bounded case [26].
Chen, Finn, and Miersemann provided a form of The inequality [28] must hold for any choice of
[27] that is applicable for any data (1 , 2 ) on the  . This provides a necessary condition for
respective sides of the wedge, that arise from the D
1 existence of a solution to [18]–[17] in . E Giusti
regions of Figure 12. Lancaster and Siegel and showed that when  is interpreted in a generalized
independently Chen, Finn, and Miersemann showed sense as a Caccioppoli set, the condition [28]
that if 2  1 þ 2    2 , then every solution becomes also sufficient for existence.
is bounded at the vertex. This result holds also for It is easy to give specific examples of convex
the zero gravity eqn [18]. analytic domains , in which subdomains  can be
In the case of [18], Concus and Finn showed that found such that [28] fails. Thus, the general
in the D
1 regions no solution exists, regardless of H. existence results for [16] do not carry over to [18],
Again, this result holds without growth conditions. regardless of local domain smoothness. Neverthe-
From these considerations and from remarks in less, in many cases of interest (e.g., a circular disk or
the section ‘‘Discontinuous dependence I’’ follows an ellipse that is not too eccentric), solutions of
that for data in D
2 , all solutions either of [18] or of [18]–[17] do exist for any  and are well behaved.
[16] are bounded but have discontinuous derivatives Finn investigated the condition [28] in general by
at the vertex P. Extrapolating from the behavior of showing the existence of a system of arcs {} 
442 Capillary Surfaces

tending to a corner point P of a domain . These limits


γ
can exhibit remarkable idiosyncratic behavior. For
Ω∗ ≥γ
Γ γ simplicity of exposition, we restrict ourselves here to
Ω Γ Ω∗
γ Σ∗
rectilinear boundary segments at P, and assume
Σ∗ ≥γ
Γ
constant boundary angles 1 , 2 6¼ 0,  on the two
Ω∗ sides. L and S prove first that the limits Ru exist and
γ
Σ
vary continuously with direction of approach; then
Figure 17 Extremal configuration for the functional . they show the existence of ‘‘fan’’ regions of directions
adjacent to those of the sides, in which the limits are
that minimize . All such arcs are circular of radius constant independent of direction, see Figure 18. They
1/2H, and meet  either at smooth points in an obtain that if the opening angle 2 at P satisfies 2 <
angle , or else at a reentrant corner point in an , then for data in the rectangle R of Figure 12 the fans
angle   , measured on the side of  opposite to overlap (see Figure 18a), so that the solution is
that into which the curvature vector points necessarily continuous at P. For data in Dþ 2 , the
(Figure 17). All minimizing configurations are solution decreases from the 1 side 1 to the 2 side 2
bounded by arcs of that form, although not all (‘‘D’’ behavior), subject to the Concus–Finn conjecture
such configurations minimize. In a typical situation (see the section ‘‘Existence questions I’’), with the
one will encounter only a finite number of such arcs, reverse behavior (‘‘I’’) in D2 . Concus and Finn showed
in which case only a finite number of cases need be that if 2 <  then in D
1 there is no bounded solution
examined. If  > 0 in each such case, then a of [16]–[17] or [18]–[17] as a graph. For [16]–[17],
solution of [18]–[17] exists for the given  and . unbounded solutions do however exist for such data
It may occur that no such arcs exist; we then observe (see the section ‘‘Existence questions I’’).
that since [;; ] = [; ] = 0,  cannot become
nonpositive for any   unless a minimizing 
can be found in , contradicting the assumed
nonexistence of minimizers. Thus, the criterion is A′
then vacuously satisfied, and we conclude that a
solution of [18]–[17] exists. A
A
One has, of course, to ask what happens
physically in cases for which [; ]  0 for some

 as above. The possible modes of behavior were B′ P
studied in particular cases by Tam and later by 2α P
A′
deLazzer, Langbein, Dreyer, and Rath; Finn and
Neel characterized the general case. Formally, the B
fluid rises to infinity throughout domains  of the B B′
form indicated, but with H replaced by a value
(a) (b)
H  < H; on the opposite side of the circular arcs ,
the fluid is asymptotic to the vertical cylinders over
. In a physical situation, the fluid will rise to the
top of the container in a nearly cylindrical region
adjacent to a portion of the container walls,
approximating the indicated behavior and partially
wetting the top of the container. One sees that
behavior in Figure 14b, in which the fluid fills out
regions adjacent to the corners. An analogous
configuration would still be observed if the corners
were smoothed locally. If insufficient fluid is
available, a portion of the base  could become
unwetted. (c)
Figure 18 (a) Fan domains APA0 and BPB 0 of constant limiting
values; 2a < p so that the fans overlap when data are in R. (b)
Behavior at a Corner Point 2a > p; case 1. Fans APA0 and BPB 0 of constant radial limits
appear. Limiting value changes strictly monotonically as
Lancaster and Siegel (L and S) studied the behavior of approach direction changes from A0 P to B 0 P. (c) 2a > p; case 2.
the limits (which they designate by Ru) of bounded In addition to the two fans adjacent to the sides of the
solutions of [16] or of [18] along radial segments wedge, a half plane of constant radial limits appears.
Capillary Surfaces 443

If 2 > , then the fans do not overlap, and γ2


in fact continuity at P cannot in general be 2(π – α)
π
expected. Outside the indicated fan regions adja-
+
cent to the wedge sides, the limit values either D2
D1

change strictly monotonically with angle of (D)


approach, as in Figure 18b, or else they do so (ID), (D), (I )
except for approaches within a third, central fan,
which covers a full half-space, and interior to
2α – π
which the limiting values again remain constant,
see Figure 18c. L and S give an example under R
which that behavior actually occurs. Remarkably, 2(π – α)
in the example the prescribed data are the same on Continuous, (I), (D)
both boundary segments. The solution is never-
+
theless discontinuous at P, with an interval in D1

D2
which the radial limit increases, another interval in (DI ), (D), (I )
which it decreases, two fans of constant limit (I )

adjacent to the sides, and a fan of breadth  in- 0


between. 0 2α – π π γ1
General conditions for continuity at a reentrant Figure 19 p < 2a < 2p. Possible modes of behavior. Repro-
corner (2 > ) have not yet been established. L and duced with permission from the Pacific Journal of Mathematics.
S give a sufficient condition, depending on a
hypothesis of symmetry. Since no such hypothesis possibility is that the drop surface S is part of a
is needed when 2 < , one might at first expect it sphere. For data in D
1 , no such drop can exist,
to be superfluous. However, Shi and Finn showed barring exotically singular behavior at the vertex
that by introducing an asymmetric domain perturba- points where the edge of the wedge meets S.
tion that in an asymptotic sense can be arbitrarily For data in D
2 the situation is less clear. Concus,
small, the solution can be made discontinuous at P. Finn, and McCuan (CFM) showed that local
That can be done without affecting any other behavior exhibiting such data is indeed possible;
hypotheses of the L and S theorem. however, they conjectured that such behavior
In as yet unpublished work, D Shi characterized cannot occur for simple drops. In conjunction with
all possible behaviors at a reentrant corner, subject the above results, they were led to the conjecture
to the validity of the Concus–Finn conjecture at a that the free surface S of any liquid drop in a planar
protruding corner. If   0 then all solutions of [16] wedge, that meets the wedge in exactly two vertices
or of [18] in a neighborhood of P in  are bounded and the wedge faces in constant contact angles
at P. The further behavior depends on the particular 1 , 2 , is necessarily spherical. Here it is supposed
data, and is indicated in Figure 19. Note the analogy only that 0  1 , 2  .
with Figure 12, although the interpretations in the The behavior of a drop of prescribed volume, as
figures differ in detail. Here the symbol I denotes the data move from the midpoint of R to the D
strictly increasing from the side 1 to 2 , except on regions along parallels to the sides of R, is displayed
the fan regions of constant limits; ID denotes in Figure 20. As one moves into the D
2 regions, the
constancy on a fan adjacent to 1 , then strictly drop detaches from one side of the wedge and
increasing, then constancy on a fan of opening , becomes a spherical cap resting on a single planar
then strictly decreasing, then constancy on a fan surface, in accord with the above conjecture. As D 1
adjacent to 2 . D and DI are defined analogously. is approached, the liquid becomes a drop of very
All cases can be realized in particular configurations. large radius that fills out a long thin region in the
wedge, and disappears to infinity as the boundary of
Drops in Wedges R is crossed. However, as Dþ 1 is entered, the
configuration transforms smoothly into a spherical
Closely related to the material just discussed is the liquid bridge, connecting the two faces of the wedge
question of the possible configurations of a con- without contacting the wedge line.
nected drop of liquid placed into a wedge formed by
intersecting plates of possibly differing materials, in
Stability Questions
the absence of gravity. Thus, one has distinct
contact angles 1 , 2 on the two plates. Finn and A number of authors, for example, Langbein, Vogel,
McCuan showed that if (1 , 2 ) 2 R then the only Finn and Vogel, Steen, and Zhou, have studied the
444 Capillary Surfaces

causes the entire fluid to disappear to infinity in the


wedge.
CFM proved that if a connected liquid mass with
spherical outer surface S cuts off areas jW1 j, jW2 j
from plates 1 , 2 which it meets in angles 1 , 2 , as
in Figure 20, then
+ +
(a) In R, near D1 (a) In R, near D2 X
2  
 Wj  cos j þ jSj ¼ 3jV j ½30
1
R

where jSj denotes area of the spherical free surface


interface, jV j the enclosed volume, and R the radius.
An immediate consequence is that the mechanical
energy E of the configuration is
(b) In R, near D1

(b) Center point 3jV j
E¼ ½31
R
where  is surface tension. Using this result, they
show that if a spherical liquid mass meets two
wedge faces in angles 1 , 2 in the absence of
gravity, then the configuration has smaller mechan-
ical energy than does any connected liquid mass of
the same volume that meets only one of the faces in
the contact angle for that face. In turn, the drop on a
– – single face has smaller energy than does a spherical
(c) In D1 (c) In R, near D2
ball of the same volume that meets no face. Note
(A) (B) that in all zero-gravity cases for which stability
Figure 20 (A) Drop configurations in wedge with opening relative to plate tilting can be expected, the liquid
angle 2a = 50 , for three data positions on the line g 1 = g 2 = g mass must be spherical.
(a) g = 70 (in R, near D þ 
1 ); (b) g = 90 (in R, near D 1 ); (c)
g = 110 (in D 1 ). The first two cases yield edge blobs, the third a
spherical tube that does not contact the edge line. (B) Drop Compressibility
configurations in a wedge of opening angle 2a = 50 , for three
data choices in R, on the line g 1 = p  g 2 = g ; (a) g = 70 (near Until very recently, all literature on capillarity was
Dþ 
2 ); (b) g = 90 (center of R); (c) g = 35 (near D 2 ). As D 2 is

based on a hypothesis that the body of the fluid


entered, original boundary conditions can no longer be satisfied is incompressible. Indeed, from the point of view
by spherical drop, but configuration changes smoothly into drop
of macroscopic mechanical measurements, most
on single plane, with prescribed data for that plane. Reproduced
with permission from Concus P, Finn R and McCuan J (2001) liquids are nearly incompressible. But all liquids are
Liquid bridges, edge blobs, and Scherk-type capilliary surfaces. also to some extent compressible, and this property
Indiana University Mathematics Journal 50: 411–441. was even conceptually essential in our characteriza-
tion in the section ‘‘Gauss’ contribution: the energy
stability of liquid drops trapped between parallel method’’ of the surface energy, even for the nomin-
plates, forming an annular liquid bridge joining the ally incompressible case. It is as yet unclear to what
plates under the capillarity boundary condition of extent the compressibility properties of the bulk
prescribed contact angles 1 , 2 on the respective liquid will influence the physical predictions of the
plates. These studies consider the effects of dis- theory. In this connection, see the remarks at the end
turbances within the fluid, assuming the plates are of the section ‘‘Uniqueness and nonuniqueness.’’
rigid and perfectly parallel. CFM show that from the
The Equations I
point of view of physical prediction, the results of
these studies may be open to some question. Finn derived two possible equations extending [16]
Specifically, they show that unless the drop is and [17], arising from different modelings. Both
initially of spherical form, then infinitesimal tilting characterize equilibrium points as stationary points
of one of the plates always results in a discontinuous for the mechanical energy, and both are based on a
transition of the drop form. Depending on the hypothesized pressure–density relation  = 0 þ
particular data, the transition can be to a spherical (p  p0 ). The first equation takes account of
drop; however, it can also occur that the tilting the change in density with height, arising from
Capillary Surfaces 445

the gravity field. For a container consisting of a existence theorem above can no longer be expected;
semi-infinite vertical cylinder, closed at the bottom, it is possible to give explicit examples of analytic
one obtains domains, and constant data , for which no solution
0 g of the problem exists. Thus, even in a large down-
div Tu ¼ u þ gð1  cos !Þ þ ½32 ward gravity field, the solutions can emulate the

behavior of solutions of [18]. That can happen,
where ! is the angle between the upward directed however, only for data  exceeding =2. The
surface normal and the vertical axis, and is to be condition [33] is again necessary for existence.
determined by a volume constraint. Athanassenas For eqn [34],  cannot be eliminated by addition
and Finn proved that for a general smooth domain of a constant to the solution, and its determination
, prescribed , and prescribed fluid mass M subject creates a new level of difficulty toward solution of
to the restriction the physical existence question. Athanassenas and
M < 0 jj= g ½33 Finn proved unique existence of solutions of [35],
[17] for a capillary tube of general smooth section 
there exists exactly one solution of [32] achieving dipped into an infinite liquid bath (which corre-
the boundary data . sponds to  = 0), when 0    =2. If  > =2 then
The condition [33] is necessary for existence with solutions do not always exist; it can happen that the
the prescribed mass. surface moves down to the bottom of the tube,
The methods used for this theorem do not permit regardless of the depth of immersion. Under a
regularity conditions to be relaxed to allow domains hypothesis of radial symmetry, Finn and Luli were
with corner points. An approximation procedure able to prove the existence of solutions with
yields an existence theorem for such cases, however prescribed mass in a semi-infinite cylinder closed at
the uniqueness proof then fails; it can be replaced by the bottom, in the range 0   < , and uniqueness
a weaker result, estimating the difference between if 0    =2. Note that in this case, values  >
two eventual solutions: Let u, v, be solutions of [32] =2 are not excluded. For large enough mass, the
in a piecewise smooth domain , and suppose  surface will always cover the base of the tube.
Tu   Tv on  = @ except at the corner points,
where no data are prescribed. Then
Closing Remarks
u  v þ =0 ½34
This brief survey is intended only as a general
throughout .
indication of the current state of the theory; much
Note that in this result, no growth condition is
material of interest could not be included. Nor have
imposed at the corner points. It can happen that
we addressed hysteresis effects on contact angle.
both u and v are unbounded at a corner point;
Detailed references to the material discussed and also
nevertheless, [34] holds uniformly over .
to further information can be found in the articles
The solutions of [32] emulate many of the
listed below. More recent publications can be located
characteristics of solutions of [16]. Notably, there is
by following links in MathSciNet or Zentralblatt.
again a dichotomy of behavior, depending on open-
ing angle 2 at a corner point, with all solutions
either bounded, or unbounded with growth like 1=r. Acknowledgmnt
I owe a special debt of thanks to my colleague
The Equations II Paul Concus, who read the material in detail and
provided many effectual suggestions, leading to a
If in addition to taking account of the change of density
much-improved exposition.
with height, one accounts for the energy change due to
expansion or contraction of volume elements with See also: Compressible Flows: Mathematical Theory;
changing density, one is led to the equation Interfaces and Multicomponent Fluids; Newtonian Fluids
0  p0 gu and Thermohydraulics.
div Tu ¼ ðe  1Þ

þ gð1  cos !Þ þ  ½35 Further Reading
Here the changes from the incompressible case are References for text material and for further reading are cited in
much more significant than for [32]. In order to the expository articles:
ensure stable behavior of solutions, it seems appro- Finn R (2002a) Milan Journal of Mathematics 70: 1–23.
priate to impose the condition 0 > p0 . The general Finn R (2002b) Mathematical Intelligencer 24: 21–33.
446 Cauchy Problem for Burgers-Type Equations

Cartan Model see Equivariant Cohomology and the Cartan Model

Cauchy Problem for Burgers-Type Equations


G M Henkin, Université P.-M. Curie, Paris VI, Equation [2] first appeared for ’(F) = a þ b  F,
Paris, France " = 1, x = n 2 Z, in Levi, Ragnisco, Bruchi (1983) as
ª 2006 Elsevier Ltd. All rights reserved. a semidiscrete equation reducible to the linear
equation
dGn ðtÞ
¼ aðGn1 ðtÞ  Gn ðtÞÞ
dt
Burgers Type Equations
by the substitution
We consider here two types of equations: the scalar  
partial differential equations (PDEs) of the form a Gn ðtÞ  Gn1 ðtÞ
Fðn; tÞ ¼ 
b Gn ðtÞ
@f @f @2f
þ ’ðf Þ ¼" 2; ">0 ½1 Equation [2] for general ’(F) was introduced by
@t @x @x
Henkin, Polterovich (1991) for the description of a
f = f (x, t), x 2 R, t 2 Rþ , and the scalar difference– Schumpeterian evolution of industry. For any " > 0,
differential equations of the form one can consider [2] as the family of difference–
differential equations, depending on the parameter
@F Fðx; tÞ  Fðx  "; tÞ  = {x="} 2 [0, 1), where {x="} denotes the frac-
þ ’ðFÞ ¼ 0; ">0 ½2
@t " tional part of x=". For physical applications of [1]
F = F(x, t), x 2 R, t 2 Rþ . (see Gelfand (1959), Landan and Lifschitz (1968),
Equation [1] for the case of linear f 7! ’(f ) Lax (1973)), the inviscid case (" = þ0) is the most
was called as Burgers equation by Hopf (1950), interesting. But, for some special physical models
who justified this by the remark: ‘‘equation was and for some social and biological applications (see
first Henkin, Polterovich (1991), Serre (1999)), the
interesting case concerns eqn [2] with " = 1 and
@f @f @2f x 2 Z.
þf ¼" 2
@t @x @x The results considered in this article concern
mainly the Cauchy problem for eqns [1] and [2]
introduced by J. M. Burgers (1940) as a simplest with initial data f(x, 0), F(x, 0) satisfying the
model to the differential equations of fluid flow’’. In conditions
fact, eqn [1] for linear ’(f ) was introduced earlier in
1915 by Bateman. Equation [1] for general ’(f ) f ðx; 0Þ !  ; x ! 1
appeared later in very different models, for example, Z 0
in the model for displacement of oil by water, in a jf ðx; 0Þ   jdx ½3
model of road traffic, etc. 1
Z 1
For ’(f ) = a þ b  f , Hopf and Cole have studied
þ jþ  f ðx; 0Þjdx < 1
[1] basing on the substitution 0
  
1 @g and correspondingly
f ¼ aþ" g
b @x
Fðk" þ "Þ !  ; k ! 1
reducing [1] to the heat equation X
0
jFðk" þ "; 0Þ   j
@y @2g k¼1 ½4
¼" 2
@t @x X
1
þ jþ  Fðk" þ "; 0Þj < 1
This transformation (often called as the Hopf– k¼0
Cole transform) appeared for the first time in 1906
in the book of Forsyth ‘‘Theory of differential where   þ ,  2 [0, 1) and the mapping
equations.’’  7! {F(k" þ ", 0)  sgn k , k 2 Z} 2 l1 is smooth.
Cauchy Problem for Burgers-Type Equations 447

The standard classical questions concerning From references one can deduce the following gene-
Cauchy problems [1], [3] and [2], [4], namely ral properties of Cauchy problems [1], [3] and [2], [4].
those relating to existence, unicity, regularity, and
Theorem 0 Under Assumption 1, we have:
conservation laws are well established (see Oleinik
(1959), and Serre (1999)). This section formulates (i) There exists a unique (weak) solution f(x, t), x 2
only those which are essential for the study R, t 2 Rþ of the problem [1], [3]; this solution is
of asymptotic behavior of solutions f(x, t) and necessarily smooth for t > 0; besides, it satisfies
F(x, t), when t ! 1 or " ! 0, and of the relation the following conservation laws for t > 0:
between vanishing viscosity and difference scheme
f ðx; tÞ !  ; x ! 1
approximations for inviscid Burgers type
equations. f ðx; tÞ ! þ ; x ! þ1
One can see that asymptotic behavior of solutions Z 1  Z 0
d þ
of [2], [4] when " ! þ0 is not the same as the ð  f ðx; tÞÞ dx  ðf ðx; tÞ   Þdx
dt 0 1
asymptotic behavior of [1], [3] when " ! þ0, in Z þ
spite of fact that in the limiting case " = þ0 both [1] ¼ ’ðyÞdy
and [2] look identical. It can be explained by the 
fact that eqn [2] can be interpreted as a semidiscrete Moreover, if the initial value f(x, 0) is nonde-
approximation of the nonconservative (nonphysical) creasing as a function of x, then solution f(x, t)
equation is nondecreasing as a function of x for all t  0.
@F @F " @2F (ii) There exists a unique solution F(x, t) x 2 R, t 2
þ ’ðFÞ ¼ ’ðFÞ 2 R þ of the problem [2], [4]; this solution is
@t @x 2 @x
smooth for t > 0; besides, it satisfies the follow-
However, the problem [2], [4] can be naturally ing conservation laws for t > 0 and  2 [0, 1):
transformed into conservative (physical) initial pro-
blem. Indeed, the substitution Fðk" þ "; tÞ !  ; k ! 1
þ
Z F Fðk" þ "; tÞ !  ; k ! þ1
dy " #
f ¼ 1 Z þ Z Fðk"þ";tÞ
0 ’ðyÞ
d X dy X0
dy

dt k¼1 Fðk"þ";tÞ ’ðyÞ k¼1  ’ðyÞ
(under condition of integrability of 1=’(y)) trans-
forms [2] into the equation ¼ þ  

@f ðx; tÞ ðf ðx; tÞÞ  ðf ðx  "; tÞÞ Moreover, if for some  2 [0, 1) the F(k" þ ", 0) is
þ ¼0 ½5 nondecreasing as a function of k 2 Z then solution
@t "
F(k" þ ", t) is also nondecreasing as a function of
where 0 (f ) = ’(F). Equation [5] is the so-called k 2 Z for all t  0 and the same .
monotone one-sided semidiscrete approximation of
conservative viscous equation,
 
@f @f " @ @f Gelfand’s Problem and Iljin–Oleinik
þ ’ðFÞ ¼ ’ðFÞ ½6
@t @x 2 @x @x Theorem
where The main results considered in this article are related
Z 
to the following problem, formulated explicitly by

dy Gelfand (1959): to find the asymptotic (t ! 1) of the
f ðx; 0Þ ! ; x ! 1
0 ’ðyÞ solution f (x, t) of the eqn [1] with the initial condition
 
The results of finite-difference approximations  ; if x > x
for nonlinear conservation laws (see A. Harten, f ðx; 0Þ ¼ 0 ½7
f ðxÞ; if x 2 ½x ; xþ 
J. Hyman, P. Lax (1976)) explain both the similarity
of behavior of [6] and [5] as well as some difference where   þ .
in the behavior of [1] and [2]. Gelfand found a solution to this problem for the
For further exposition the following assumption is inviscid case " = þ0 with initial conditions
useful: f (x, 0) =  if x < 0, and f (x, 0) = þ if x  0 (see
below), and remarked that it would be interesting to
Assumption 1 Let ’ in [1], [2] be a positive and prove that the main term of the asymptotic (t ! 1)
continuously differentiable function on the interval of f (x, t) satisfying [1], [7] coincides with the
[ , þ ]. Let ’0 have only isolated zeros. solution of [1], [7] for " = þ0.
448 Cauchy Problem for Burgers-Type Equations

Gelfand’s problem admits natural extension for N-wave has been obtained by Dafermos (1977)
eqn [2] with the initial conditions and Liu (1978).
For the case of a general ’(f ), in particular, for
Fðx; 0Þ ¼  ; if x > x the case of nonincreasing ’(f ), we need the notion
½8
Fðx; 0Þ ¼ F0 ðxÞ; if x 2 ½x ; xþ  of shock profile. Following Serre (1999), three
definitions can be introduced.
Let us Rintroduce, for u 2 [ , þ ], the function
u Definition The initial problem [1], [3] (correspond-
(u) =   ’(y)dy. Let the function ˆ (u), u 2
[ , þ ], be upper bound of the convex set ingly, [2], [4]) admits ( , þ )-shock profile ( < þ )
if there exists a traveling-wave solution of this equation,
fðu; vÞ: v  ðuÞ; u 2 ½ ; þ g that is, of the form f = f̃(x  ct) (correspondingly,
F = F̃(x  Ct)), such that f̃(x) !  when x ! 1
By Assumption 1, the set s = {u 2 [ , þ ]:
(correspondingly, F̃(x) !  when x ! 1).
(u) < ˆ (u)} is the finite union of intervals,
s= ( , 0 ) [ (1 , 1 ) [   (L , þ ), where  = 0  From the results of Gelfand (1959) and Oleinik
0  1 < 1    L  L = þ . (1959), it follows that initial problem [1], [3] admits
Let us define the function f̂(x, t) by ( , þ )-shock profile iff
8  Z þ
< ; if x < ’ð Þ  t 1
^f ðx; tÞ ¼ ð ^0 Þð1Þ ðx=tÞ; if ’ð Þ  t  x  ’ðþ Þ  t c¼ þ ’ðyÞdy
: þ    
 ; if x > ’ðþ Þ  t Z u
1
0 < ’ðyÞdy; 8u 2 ð ; þ Þ ½9
where in the case ˆ (u)  l , u 2 (l , l ), l = 0, u    
0
1, . . . , L; also, by definition, ( ˆ )( 1) (l ) = [l , l ]. From the results of Henkin and Polterovich
Theorem 1 (Gelfand) The solution f (x, t) of the (1991) and Belenky (1990), it follows that initial
problem [1], [7] for the case " = þ0 and initial problem [2], [4] admits ( , þ )-shock profile iff
conditions f (x, 0) =  , if x > 0, has the explicit Z þ
form: f (x, t) = f̂(x, t). 1 1 dy
¼ þ 
C     ’ðyÞ
The analogous statement is valid also for the Z u
1 dy
problem [2], [8] if, in the construction above, one > 
; 8u 2 ð ; þ Þ ½10
u  u  ’ðyÞ
takes
Z u In the case " = þ0, the equality in [9] and [10] is
dy
ðuÞ ¼ called the Rankine–Hugoniot condition, the inequal-
0 ’ðyÞ ity in [9] and [10] is called the entropy condition (or
instead of (u), u 2 [ , þ ]. the Gelfand–Oleinik condition).
The Gelfand problem for [1], [3] and [1], [7] with Definition For initial problem [1], [3] (correspond-
monotonic ’(f ) was solved by Iljin and Oleinik ingly, [2], [4]) admitting ( , þ )-shock profile and
(1960). In the case  = þ , the solution of this for " = þ0, we will call by shock waves the weak
problem follows from an earlier work of Lax (1957). solutions of [1], [3] (correspondingly, [2], [5], [4]) of
For the case of linear ’(f ), the solution of this problem the form
follows from an earlier work of Hopf (1950).

For semidiscrete initial problems [2], [4] and [2], ~f  ðx  ctÞ ¼  ; if x  ct
[8], the analog of the asymptotic results of Hopf and

Iljin–Oleinik have been obtained and applied by ~ ðx  CtÞ ¼  ;
F if x  Ct
Henkin and Polterovich (1991).
The case of increasing ’(f ) has been studied in where c, C satisfy Rankine–Hugoniot and entropy
detail. In this case, for both initial problems [1], [3] conditions [9], [10].
and [2], [4], there is uniform convergence of solutions Definition The ( , þ )-shock profile for [1] (cor-
f (x, t) and F(x, t) to the so-called rarefaction profile respondingly, for [2]) is called strict if in addition to
  [9], [10] we have the Lax (1954) condition:
 ; x > ’ð Þt
gðx=tÞ ¼ ð1Þ
’ ðx=tÞ; x 2 ½’ð Þ  t; ’ðþ Þ  t ’ðþ Þ < c < ’ð Þ ½11
t ! 1 (see Iljin and Oleinik (1960) and Henkin and correspondingly
and Polterovich (1991)). More precise result in
this case about convergence to the so-called ’ðþ Þ < C < ’ð Þ ½12
Cauchy Problem for Burgers-Type Equations 449

The ( , þ )-shock profile for [1] or [2] is called The values of d0 and D0 are determined by
semicharacteristic if one of the inequalities in [11] or Z d0 Z 1
[12] is strict and the other is an equality. This profile ðf ðx; 0Þ   Þ dx þ ðf ðx; 0Þ  þ Þ dx ¼ 0
is called characteristic if both inequalities in [11] or 1 d0
Z D0 Z
[12] are equalities. 1
ðFðx; 0Þ   Þ dx þ ðFðx; 0Þ  þ Þ dx ¼ 0
One can check (Iljin and Oleinik 1960, Henkin and 1 D0
Polterovich 1991) that if in addition to Assumption 1
the function ’ on [ , þ ] is nonconstant and
nonincreasing then eqn [1] (correspondingly, [2])
admits a strict ( , þ )-shock profile. Remarks
The main result of Iljin–Oleinik (1960) for eqn [1] (i) The statements of Theorem 2 give a positive
and analogous statement of Henkin and Polterovich answer to Gelfand’s question for the case of
(1991) for eqn [2] can be presented as follows. initial problem [1], [3] and [2], [4], admitting
Theorem 2 strict shock profiles.
(ii) For linear ’(f ) = a þ bf , a > 0, a þ bþ > 0,
(i) Let the initial problem [1], [3] admit a strict b < 0, the traveling waves f̃, F̃ for [1], [3] and
( , þ )-shock profile f̃. Let f (x, t), x 2 R, t 2 [2], [4] can be found explicitly:
Rþ , be a solution of [1], [3]. Then there exists
d0 2 R þ  
~f ¼  þ
sup jf ðx; tÞ  ~f ðx  ct  d0 Þj ! 0; t ! 1 ½13 1 þ expfpðx  ctÞg
x2R b ð  þ Þb
c ¼ a þ ðþ þ  Þ; p ¼
The value of d0 is determined uniquely by relation 2 2"
þ 
Z 1 ~¼ þ   
F
ff ðx; 0Þ  ~f ðx  d0 Þg dx ¼ 0 1 þ expfPðx  CtÞg
1 
a þ bþ  a þ b
(ii) Let the initial problem [2], [4] admit a strict C ¼ b ln 
; P ¼ ln
a þ b " a þ bþ
( , þ )-shock profile F̃. Let F(x, t), x 2 R, t 2
Rþ be a solution of [2], [4]. Then there exists where
continuous function D0 (),  2 [0, 1), such that    
 a þ bþ
~  Ct  D0 ðfx="gÞj ! 0;
sup jFðx; tÞ  Fðx b ¼ ð þ b Þ 1 
a þ b
x2R ½14
t!1 (iii) For initial problems [1], [7] and [2], [8], þ >
The function D0 (),  2 [0, 1], is determined  , the asymptotic convergence statements
uniquely from relation [13]–[15] admit the precise asymptotic esti-
mates (see Iljin and Oleinik (1960) for [1], [7]:
X
1
~  D0 ÞÞg ¼ 0
fðFðn; 0ÞÞ  ðFðn
k¼1 sup jf ðx; tÞ  ~f ðx  ct  d0 Þj ¼ Oðet Þ
x2R ½16
where  > 0; " > 0
Z A
dy ~<A
ðFÞ ¼ ; F < A; F ~  Ct  D0 ðfx="gÞÞj ¼ Oðet Þ
sup jFðx; tÞ  Fðx
F ’ðyÞ
x2R
(iii) If in conditions (i) and (ii), we take " = þ0 then  > 0; " > 0 ½17
there exist d0 , D0 such that 8 > 0, we have

sup jþ  f ðx; tÞj f ðx; tÞ ¼  for  x > ðct þ d0 Þ


xctþd0 þ t  t0 ; " ¼ þ0
½18
þ sup j  f ðx; tÞj ! 0; t!1 Fðx; tÞ ¼  for  x > ðCt þ D0 Þ
xctþd0 
þ
½15 t  t0 ; " ¼ þ0
sup j  Fðx; tÞj
xCtþD0 þ

þ sup j  Fðx; tÞj ! 0; t!1 Theorem 2(i) is proved basing on the following
xCtþD0  idea. Let f satisfy the initial problem [1], [3] and let
450 Cauchy Problem for Burgers-Type Equations

f̃(x  ct þ d0 ) be ( , þ )-shock profile for [1], and, correspondingly,


satisfying condition [13]. Put X
1
Z x ~
fðFðk" þ "  D0 Þ  ðFðk" þ "; 0ÞÞg ¼ 1
ðx; tÞ ¼ ff ðy; tÞ  ~f ðy  ct  d0 Þgdy 1
1
So, the crucial argument, related to conservation
The function (x, t) satisfies the nonlinear parabolic law, does not hold.
equation
One can extend the important Theorems 2(i), 2(ii)
@ @ @  2 for the case of nonstrict shock profiles in two different
þ ’ð~f þ ð1  Þf Þ ¼" 2 ways: by changing conditions of these theorems or by
@t @x @x
changing conclusions of these theorems.
where (x, t) is some smooth function of (x, t) with The first method (started by Mei, Matsumura, and
values in [0, 1]. Nishihara in 1994) was completed by the following
Besides, by conservation law of Theorem 0(i), we L1 -asymptotic stability result (Serre 2004).
have (x, t) ! 0, x ! 1, 8t  0.
Estimates basing on maximum principle and Theorem 3 (Freistühler–Serre). Let eqns [1], [2]
appropriate comparison statements give that admit ( , þ )-shock profiles and f̃, F̃ – the corre-
(x, t) ) 0, x 2 R, t ! 1. It implies that sponding train-wave solutions of [1], [2]. Let
f (x, t), F(n, t), x 2 R, n 2 Z, t 2 Rþ be solutions of
f ðx; tÞ  ~f ðx  ct  d0 Þ ) 0; x 2 R; t ! 1 eqns [1], [2] with such initial conditions that
Z 1
Theorem 2(ii) is proved in a similar way. Let F(n, jf ðx; 0Þ  ~f ðxÞjdx < 1
t) satisfy the initial problem [2], [4] with x = n 2 1
Z, " = 1,  = {x} = 0, and let F̃(n  Ct  D0 ) be X
1
~
jFðn; 0Þ  FðnÞj <1
( , þ )-shock profile for [2], satisfying condition 1
[14]. Put
Then
X
n Z 1
ðn; tÞ ¼ ~  Ct  D0 ÞÞg
fðFðn; tÞÞ  ðFðn
1
jf ðx; tÞ  ~f ðx  ct  d0 Þjdx ! 0
1

Then function (n, t) satisfies the semidiscrete and, correspondingly,


parabolic equation
X
1
~  Ct  D0 Þj ! 0;
jFðn; tÞ  Fðn t!1
dðn; tÞ
¼ ’ðð1Þ ððFÞ 1
dt
~ where constants d0 and D0 are calculated from the
þ ð1  ÞðFÞÞÞððn  1; tÞ  ðn; tÞÞ
same relations as in Theorem 2.
where (n, t) is some function with values in [0, 1]. Remark For the inviscid case " = þ0, the state-
Besides, by conservation law of Theorem 0(ii), we ment of Theorem 3 is still valid for equations
have admitting strict shock profiles, but generally is not
ðn; tÞ ! 0; n ! 1; 8 t  0 valid for equations admitting only nonstrict shock
profiles (see Serre (2004)).
Estimates, basing on generalized maximum prin-
The second method permits, keeping initial con-
ciple and comparison statements, give that
ditions [3], [4], to localize the positions of viscous
(n, t) ) 0, n 2 Z, t ! 1. It implies that
shock waves for generalized Burgers equations
~  Ct þ D0 Þ ) 0;
Fðn; tÞ  Fðn n 2 Z; t ! 1 (see the next section).

Remark For the cases of nonstrict shock profiles Asymptotic Behavior of Solutions of
(characteristic or semicharacteristic) the statements Generalized Burgers Equations
of Theorem 2 are not valid. The reason is that,
under initial conditions [3], [4] for any d0 and D0 , The main current interest and the main difficulty in
we have the study of Gelfand’s problem for generalized
Z 1 Burgers equations consist in the following question
ff ðx; oÞ  ~f ðx  d0 Þgdx ¼ 1 formulated explicitly for initial problem [1], [3] by
1 Liu et al. (1998): ‘‘In the Cauchy problem there is
Cauchy Problem for Burgers-Type Equations 451

the question of determining the location of viscous show, on the contrary, that characteristic shock
shock waves’’. A similar question and related profiles and, as a consequence, the behavior of
conjecture were formulated by Henkin and Potter- initial problems [1], [3] and [2], [4] as in Theorem
ovich (1999) for the initial problem [2], [4]. 4 are rather a rule than an exception.
For solving this problem, it is important to solve it (ii) The statement of Theorem 4(i) (and also of
first for the Burgers type equations admitting Theorem 5(i)) below) disprove the Gelfand hope
nonstrict shock profiles. that the main term of asymptotic (t ! 1) of
f (x, t), satisfying [1], [7], coincides with the
Theorem 4 (Henkin–Shananin–Tumanov).
solution of [1], [7] for = þ0 with the same
(i) Let the initial problem [1], [3] admit the nonstrict initial condition. Indeed, in conditions of Theorem
( , þ )-shock profile [9] and f̃(x  ct) be a 4, we have ’( ) = c or ’(þ ) = c, but ’0 ( ) 6¼
corresponding traveling-wave solution. Let ’0 (þ ); then for any > 0 the traveling wave
f̃(x  ct  0 ln t  d0 ) for [1], [3], concentrated
’0 ð Þ 6¼ 0; if ’ð Þ ¼ c
near the point x (t) = ct þ 0 ln t þ d0 , moves
’0 ðþ Þ 6¼ 0; if ’ðþ Þ ¼ c away (t ! 1) from the shockwave for [1], [7] for
= þ0, concentrated near the point x0 (t) = ct þ
Let f (x, t) be a solution of [1], [3]. Then there
o( ln t), where o( ln t)= ln t ! 0, t ! 1.
exist constants 0 and d0 such that
(iii) Theorem 4 (and also Theorem 5 below) also
sup jf ðx; tÞ  ~f ðx  ct  0 ln t  d0 Þj ! 0; t!1 illustrate another interesting phenomenon: for
x2R
the case ’0 ( ) 6¼ ’0 (þ ), one has asymptotic
where convergence of the solution of [1], [3] (corre-
spondingly of [2], [4]) to the traveling
ðþ   Þ  0
8 wave f̃(x  ct  0 ln t  d0 ) (correspondingly
0 þ
< 1=’ ð Þ;
> if ’ð Þ > c ¼ ’ðþ Þ F̃(x  Ct  0 ln t  D0 )), which does not
0 
¼ 1=’ ð Þ; if ’ð Þ ¼ c > ’ðþ Þ satisfy eqn [1] or correspondingly eqn [2]. Such
>
:
1=’0 ð Þ  1=’0 ðþ Þ; if ’ð Þ ¼ c ¼ ’ðþ Þ a phenomenon was first discovered by Liu and
Yu (1997) in the special boundary-value pro-
(ii) Let the initial problem [2], [4] with = 1 admit the
blem for the classical Burgers equations, if
nonstrict ( , þ )-shock profile [10] and F̃(n  Ct)
u(x, t) satisfies the following conditions:
be a corresponding traveling-wave solution. Let
’0 ð Þ 6¼ 0; if ’ð Þ ¼ C if ut þ u  ux ¼ uxx ; uð0; tÞ ¼ 1; uð1; tÞ ¼ 1;
x
’0 ðþ Þ 6¼ 0; if ’ðþ Þ ¼ C uðx; 0Þ ¼ th ; then
2
Let F(n, t) be a solution of [2], [4]. Let 1
juðx; tÞ þ th ðx  lnð1 þ tÞÞj ! 0; t ! 1; x  0
def 2
Fðn; 0Þ¼ Fðn; 0Þ  Fðn  1; 0Þ  0
Theorem 4 is proved in basing on the following
Then there exist constants 0 and D0 such that idea. Let f (x, t) satisfy [1], [3] and F(n, t) satisfy [2],
~  Ct  0 ln t  D0 Þj ! 0;
sup jFðn; tÞ  Fðn [4]. Let f̃(x  ct) be the traveling wave for [1], [3]
n2Z and F̃(n  Ct) be the traveling wave for [2], [4].
t!1 Suppose that ’( ) > c = C = ’(þ ). Let dA (t) and
DA (t), A > 0 be functions such that
where
Z ctþApffit
ðþ   Þ  0 ~
8
C=ð2’0 ðþ ÞÞ; if ’ð Þ > C ¼ ’ðþ Þ pffi ff ðx; tÞ  f ðx  ct  dA ðtÞÞgdx ¼ 0 ½19
>
> ctA t
>
< C=ð2’0 ð ÞÞ; if ’ð Þ ¼ C > ’ðþ Þ
¼ and, correspondingly,
>
> ðC=2Þ½1=’0 ðþ Þ
>
:
þ1=’0 ð Þ; 
if ’ð Þ ¼ C ¼ ’ð Þþ pffi
X t
½CtþA
~  Ct  DA ðtÞÞÞg
fðFðk; tÞÞ  ðFðk
pffi
k¼½CtA t
Remarks pffiffi pffiffi pffiffi
þ ðCt þ A t  ½Ct þ A tÞððFðCt þ A t þ 1; tÞÞ
(i) One could think that nonstrict shock profiles pffiffi
~
 ðFð½Ct þ A t þ 1  Ct þ DA ðtÞÞ ¼ 0
as in Theorem 4 can appear only in exceptional
cases. But Proposition 2 and Theorem 5 below ½20
452 Cauchy Problem for Burgers-Type Equations

The relations [9], [20] can be called ‘‘localized initial problems [1], [3] and [2], [4] and some
conservation law.’’ The proof contains two difficult partial results which confirm this conjecture. To
parts. pffiffiffi simplify formulation we admit the following.
The first part consists in
pproving
ffiffiffiffi that for A > 2 c
Assumption 2 Let ˆ (u) and (u)ˆ be upper bounds of
(correspondingly, A > 2 C) the following asymp-
the convex hulls for the graphs of
totics are valid:
Z u
 ln t ðuÞ ¼  ’ðyÞdy
dA ðtÞ ¼ þ d0 þ oð1Þ; t ! 1 
ð
 þ Þ’0 ðþ Þ
C ln t and
DA ðtÞ ¼ þ D0 þ oð1Þ; t ! 1 Z
2ð  þ Þ’0 ðþ Þ
 u
dy
½21 ðuÞ ¼
 ’ðyÞ
where d0 , D0 are independent of A. respectively, with u 2 [ , þ ]. We suppose that
The second part gives the following convergence
statements: s ¼ fu 2 ½ ; þ  : ðuÞ < ^ðuÞg
Z
x ¼ ð ; 0 Þ [ ð1 ; 1 Þ [    ðL ; þ Þ
sup ~
pffi pffi pffi ff ðy; tÞ  f ðy  ct  dA ðtÞÞg
x2½ctA t;ctþA t ctA t
where
dy ! 0; t!1
  ¼  0 < 0 <  1 < 1 <    <  L < L ¼  þ
X n
sup fðFðk; tÞÞ
pffi pffi or, correspondingly,
x2½CtA t;CtþA t k¼½CtApffit
^
~  Ct  DA ðtÞÞÞg ! 0; t ! 1
 ðFðk S ¼ fu 2 ½ ; þ  : ðuÞ < ðuÞg
¼ ð ; b0 Þ [ ða1 ; b1 Þ [    ðaM ; þ Þ
The precise a priori estimates of local solutions of
[1], [2] play an important role in the proof. An where
example of such an estimate, also useful for further
results, is given below.  ¼ a0 < b0 < a1 < b1 <    < aM < bM ¼ þ
Proposition 1 Let, in eqn [2], Cffi = ’(0) > 0, =
pffiffiffiffiffi In addition, we suppose that ’0 (l ) 6¼ 0, ’0 (l ) 6¼
1, 0  ’0 (0) < 0 , x̄ def
= (x  Ct)= Ct . Let the func- 0, l = 0, 1, . . . , L or, correspondingly, ’0 (am ) 6¼
tion F(x, t), defined in the domain 0 = {(x, t): a1 < 0, ’0 (bm ) 6¼ 0, m = 0, 1, . . . , M.
x̄ < a2 }, a2 > 0, satisfy eqn [2],
Proposition 2 (Weinberger 1990, Henkin and
def
Fðx; tÞ¼ Fðx; tÞ  Fðx  1; tÞ  0 Polterovich 1999). Under Assumptions 1, 2, one has:
 (i) If u 2 [ , þ ] n s and, correspondingly, u 2
jFðx; tÞj  pffiffiffiffiffiffi ; ðx; tÞ 2 0 ; t  t0
Ct [ , þ ] n S, then following functions are well
defined:
Then
8
> l ; if x < ’ðl Þ  t
B >
> ð1Þ
Fðx; tÞ  ; ðx; tÞ 2 0 ; t  t0
x < ’ ð x=t Þ; if ’ðl Þ  t  x
Ct gl ¼  ’ðlþ1 Þ  t
where t >
>
>
: lþ1 ; if x > ’ðlþ1 Þ  t;
    l ¼ 0; 1; . . . ; L
1 0 
B ¼ B 0 a2 þ þ ð1 þ lnð1 þ a2 ÞÞ
d C and, correspondingly,
d ¼ minðx  a1 ; a2  x
Þ 8
> b ; if x < ’ðbm Þ  t
> mð1Þ
and B0 is an absolute constant.
x > <’ ðx=tÞ; if ’ðbm Þ  t  x
Gl ¼  ’ðamþ1 Þ  t
It is interesting to compare a priori estimate of t >
>
>
: am ; if x > ’ðamþ1 Þ  t;
Proposition 1 with some similar (but less precise) m ¼ 0; 1; . . . ; M
estimates in the theory of classical quasilinear
parabolic equations (Ladyzhenskaya et al. 1968). (ii) For any interval (l , l ) s and, correspond-
We will formulate now the general conjecture ingly, (am , bm ) S there exist traveling waves
concerning asymptotic behavior of solutions of f̃l (x  cl t) for [1] with overfall (l , l ) and,
Cauchy Problem for Burgers-Type Equations 453

correspondingly, F̃m (x  Cm t) for [2] with over- (iii) For any solution F(n", t), n 2 Z, t 2 Rþ , of initial
fall (am , bm ), where problem [2], [4], there exist shift-functions m (t):
Z l
1  þ
m ln t þ Oð1Þ  m ðtÞ  m ln t þ Oð1Þ
cl ¼ ’ðyÞdy
l  l l 0   þ
l ¼ 0; 1; . . . ; L
m  m < 1;
cl ¼ ’ðl Þ; l ¼ 0; . . . ; L  1
such that
cl ¼ ’ðl Þ; l ¼ 1; . . . ; L
~
sup jFðn"; tÞ  Fðn"; t; 0 ; 1 ; . . . ; M Þj ! 0;
and, correspondingly, n2Z
Z bm t!1
1 dy
C1
m ¼
bm  am am ’ðyÞ (iv) Moreover, in (iii) one can take
Cm ¼ ’ðbm Þ; m ¼ 0; . . . ; M  1
 þ
m ¼ m
Cm ¼ ’ðam Þ; m ¼ 1; . . . ; M Cm
¼
Conjecture (Henkin and Polterovich 1994, 1999, ðbm  am Þ
8
Henkin and Shananin 2004). Let > 1
>
> 0 ; if m ¼ 0 < M; ’ða0 Þ 6¼ ’ðb0 Þ
>
> ’ ðb mÞ
>
>
~f ðx; t; 0 ; . . . ; L Þ < 1 1

 ; if 0 < m < M
X
L L1

X X
L1 >
> ’0 ðam Þ ’0 ðbm Þ
x >
>
¼ ~f ðx  c t  " ðtÞÞ þ g  l >
> 1
l l l l >
: 0 ; if m ¼ M > 0; ’ðaM Þ 6¼ ’ðbM Þ
l¼0 l¼0
t l¼0 ’ ðam Þ
X
L
The main result confirming formulated conjec-
 l ; L1
l¼1
tures is the following.
~
Fðn"; t; 0 ; . . . ; M Þ Theorem 5 (Henkin and Shananin). Conjecture
X
M X
M 1
n" (i) for L = 1 and corresponding conjecture (iii) for
¼ ~ m ðn"  Cm t  "m ðtÞÞ þ
F Gm M = 1 are true, that is,for solution of initial problem
m¼0 m¼0
t [1], [3] there exist shift functions l (t) = O (ln t) such
X
M 1 X
M that for t ! 1 we have
 bm  am ; M1 8
m¼0 m¼1 < ~f 0 ðx  c0 t  "0 ðtÞÞ; if x  c0 t
f ðx; tÞ7! ’ð1Þ ðx=tÞ; if c0 t  x  c1 t
Then under Assumptions 1, 2, the following state- :~
ments are valid: f 1 ðx  c1 t  "1 ðtÞÞ; if x  c1 t

(i) For any solution f (x, t), x 2 R, t 2 Rþ , of ini- and for solution of initial problem [2], [4] there exist
tial problem [1], [3], there exist shift-functions l (t): shift functions m (t) = O(ln t) such that for t ! 1
we have
l ln t þ Oð1Þ  l ðtÞ  lþ ln t þ Oð1Þ 8
0  l  lþ < 1; l ¼ 0; 1; . . . ; L > ~ ðn"  C0 t  "0 ðtÞÞ; if n"  C0 t
F
>
< 0ð1Þ
’ ðn"=tÞ; if C0 t  n"
such that Fðn"; tÞ7!
>
>  C1 t
:~
F1 ðn"  C1 t  "1 ðtÞÞ; if n"  C1 t
sup jf ðx; tÞ  ~f ðx; t; 0 ; 1 ; . . . ; L Þj ! 0;
x2R
The proof of Theorem 5 is of the same nature as
t!1 the proof of Theorem 4.
(ii) Moreover, in (i) one can take Remarks
l ¼ lþ (i) The proof of stronger Conjectures (ii) and (iv)
"
¼ for L = 1 or M = 1 are in preparation.
ð8
l  l Þ
> 1 (ii) The numerical results, Rykova and Spivak (pre-
>
> ; if l ¼ 0 < L; ’ð0 Þ 6¼ ’ð0 Þ
>
> 0
’ ð Þ print, 2004), confirm conjecture (iii) for M = 2.
>
< 1 l 1 (iii) The results of Weinberger (1990) and Henkin

 0 ; if 0 < l < L
>
> ’0 ð Þ
l ’ ð lÞ and Polterovich (1999) confirm convergence
>
> 1
>
> statements of Conjectures (i), (iii) for all L and
: 0 ; if l ¼ L > 0; ’ðL Þ 6¼ ’ðL Þ
’ ðl Þ M, but only on the intervals of rarefaction
454 Cauchy Problem for Burgers-Type Equations

profiles: x 2 [’(l )t, ’(l þ 1 )t] or, correspond- Henkin GM and Polterovich VM (1999) A difference-differential
ingly, x 2 [’(bm )t, ’(am þ 1 )t], t > 0. analogue of the Burgers equation and some models of
economic development. Discrete and Continuous Dynamical
The problem of finding asymptotics (t ! 1) of Systems 5: 697–728.
solutions of (viscous) conservation laws has been Henkin GM and Shananin AA (2004) Asymptotic behavior of
solutions of the Cauchy problem for Burgers type equations.
posed originally not only for generalized Burgers Journal Mathématiques Pure et Appliquée 83: 1457–1500.
equations but also for systems of conservation laws in Henkin GM, Shananin AA, and Tumanov AE (2005) Estimates
one spatial variable (see Gelfand (1959)). In this for solutions of Burgers type equations and some applications.
direction many important results on existence and Journal Mathématiques Pure et Appliquée 84: 717–752.
asymptotic stability of viscous shock profiles (con- Hoff D and Zumbrun K (2000) Asymptotic behavior of multi-
dimensional viscous shock fronts. Indiana University Mathe-
tinuous and discrete) have been obtained and applied matical Journal 49: 427–474.
(see Benzoni-Gavage (2004), Lax (1973), Serre Hopf E (1950) The partial differential equation ut þ uux =
uxx .
(1999), Zumbrun and Howard (1998) and references Communications in Pure and Applied Mathematics 3: 201–230.
therein). The results of type of Theorems 4,5 have not Iljin AM and Oleinik OA (1960) Asymptotic behavior of the
yet been obtained for systems of conservation laws. solutions of the Cauchy problem for some quasilinear
equations for large values of time. Mat. Sbornik 51: 191–216
It is also very interesting to study asymptotic (in Russian).
behavior of scalar (viscous) conservation laws in Ladyzhenskaya OA, Solonnikov VA, and Ural’ceva NN (1968)
several spatial variables (continuous or discrete), Linear and Quasilinear Equations of Parabolic Type. Amer.
basing on the asymptotic properties of Burgers type Math.Soc.Transl. Monogr. vol. 23. Providence, RI.
equations. In this direction there have been several Landau LD and Lifschitz EM (1968) Fluid Mechanics. Elmsford,
NY: Pergamon.
important results and problems (see Bauman and Lax PD (1954) Weak solutions of nonlinear hyperbolic equation
Phillips (1986), Henkin and Polterovich (1991), and their numerical computation. Communications in Pure
Hoff and Zumbrun (2000), Serre (1999), and Applied Mathematics 7: 159–193.
Weinberger (1990), and references therein). Lax PD (1957) Hyperbolic systems of conservation laws, II.
Communications in Pure and Applied Mathematics
10: 537–566.
Lax PD , (1973) Hyperbolic systems of conservation laws and the
mathematical theory of shock waves. Conference Board of the
Mathematical Science, Monograph 11. SIAM.
Further Reading
Levi D, Ragnisco O, and Brushi M (1983) Continuous and discrete
Bauman P and Phillips D (1986b) Large-time behavior of matrix Burgers Hierarchies. Nuovo Cimento 74: 33–51.
solutions to a scalar conservation law in several space Liu T-P (1978) Invariants and asymptotic behavior of solutions of
dimensions. Transactions of the American Mathematical a conservation law. Proceedings of American Mathematical
Society 298: 401–419. Society 71: 227–231.
Belenky V (1990) Diagram of growth of a monotonic function and Liu T-P, Matsumura A, and Nishihara K (1998) Behaviors of
a problem of their reconstruction by the diagram. Preprint, solutions for the Burgers equation with boundary correspond-
CEMI Academy of Science, Moscow, 1–44 (in Russian). ing to rarefaction waves. SIAM Journal of Mathematical
Benzoni-Gavage S (2002a) Stability of semi-discrete shock profiles Analysis 29: 293–308.
by means of an Evans function in infinite dimension. J.Dyn. Liu T-P and Yu S-H (1997) Propagation of stationary viscous
Diff. Equations 14: 613–674. Burgers shock under the effect of boundary. Archieves for
Burgers JM (1940) Application of a model system to illustrate Rational and Mechanical Analysis 139: 57–92.
some points of the statistical theory of free turbulence. Proc. Oleinik OA (1959) Uniqueness and stability of the generalized
Acad. Sci. Amsterdam 43: 2–12. solution of the Cauchy problem for a quasi-linear equation.
Dafermos CM (1977) Characteristics in hyperbolic conservation Usp.Mat.Nauk 14: 165–170. ((1963) American Mathematical
laws. A study of structure and the asymptotic behavior of Society Translations 33).
solutions. In: Knops RJ (ed.) Nonlinear Analysis and Serre D (1999) Systems of Conservation Laws, I. Cambridge:
Mechanics: Heriot–Watt Symposium, vol. 17, pp. 1–58. Cambridge University Press.
Research Notes in Mathematics, London: Pitman. Serre D (2004) L1 -stability of nonlinear waves in scalar
Gelfand IM (1959) Some problems in the theory of quasilinear conservation laws. In: Dafermos C and Feireisl E (eds.)
equations. Usp. Mat. Nauk 14: 87–158 (in Russian). ((1963) Handbook of Differential Equations, pp. 473–553. Elsevier.
American Mathematical Society Translations 33). Weinberger HF (1990) Long-time behavior for a regularized
Harten A, Hyman JM, and Lax PD (1976) On finite-difference scalar conservation law in the absence of genuine non-
approximations and entropy conditions for shocks. Commu- linearity. Annales de L’institut Henri Poincare (C) Analyse
nications in Pure and Applied Mathematics 29: 297–322. Nonlineaire.
Henkin GM and Polterovich VM (1991) Schumpeterian dynamics as Zumbrun K and Howard D (1998) Poinwise semigroup methods
a nonlinear wave theory. Journal of Mathematical Economics and stability of viscous shock waves. Indiana University
20: 551–590. Mathematical Journal 47: 63–185.
Cellular Automata 455

Cellular Automata
M Bruschi, Università di Roma ‘‘La Sapienza’’, Rome, (iia) if the box is empty and the box on its left is
Italy empty then put a ball in the box;
F Musso, Università ‘‘Roma Tre’’, Rome, Italy (iib) if there is a ball in the box and also there is a ball
ª 2006 Elsevier Ltd. All rights reserved. in the box on its left then empty the box.
An example of the evolution of such a rather trivial
CA is given in Figure 1.
What is a Cellular Automaton? A more precise notation can now be established.
First, let us denote the state of a cell at time t by a
Cellular automata (CAs) were first introduced by ‘‘state function,’’ say S. According to the point (iib)
J von Neumann in his investigation of ‘‘complexity,’’ above, the number of possible states is arbitrary but
following an inspired suggestion by S Ulam. But in the finite: denote this number by the positive integer M
last 50 years they have been investigated and used in a (M > 1). Then S takes values on a finite field, say
number of fields; widely different terminologies have ZM = Z=MZ = {0, 1, 2, . . . , M  1} (in plain words,
been used by researchers that now it is difficult even we have denoted the M states for the CA by the
to give a precise general definition of a CA. Thus, first M non-negative integers). Different cells can be
some definitions and approximations are in order. labeled with a progressive number: c(n), n = n1 , n1 þ
First a broad definition: 1, . . . , n2  1, n2 ; possibly, in case of an infinite
1. have a number of cells (boxes); number of cells, one has n1 ! 1 and/or
2. at any (discrete) time step, any cell can present n2 ! þ1. In the case of n1 = 1, n2 = 1, one
itself in a certain ‘‘state’’ among a finite number speaks of a unidimensional CA. Of course, the field S
of different states; depends on n as well as on time (remember that, for a
3. the state of any cell can change (evolve) from a CA, ‘‘time’’ is a discrete variable: t = 0, 1, 2, . . .). The
time step to the subsequent time step; and field S(n, t) describes completely the CA. If the EL is
4. there is a rule (evolution law, EL) which deterministic, then one can determine (com-
determines this transition. pute) S(n, t) step by step for t > 0 from the initial
configuration S(n, 0) (initial datum, ID). Consider
Note that the number of cells can be finite or infinite; only static ELs, namely those that do not change in
the cells can be arranged on a line, on a surface, in the time. A further distinction can be made: there are
ordinary three-dimensional (3D) space, or possibly in a ELs such that the future state of the generic cell,
hyperspace (in any case, the cells can be numbered); the S(n, t þ 1), depends on the whole current configura-
different states of a cell can be denoted by integer tion of the CA (these are called nonlocal ELs) and
numbers but, in different contexts of application of there are ELs for which S(n, t þ 1) depends only on
CAs, different imaginative pictures have also been used
(e.g., different colors, dead and living cells, number of
balls in a box, etc.); the evolution of a CA proceeds in
c (1) c (2) c (3) c (4) c (5) c (6) c (7)
finite time steps (time is also discrete); the EL, provided
that it is effective on any possible configuration of a t=0
given CA (computability), is otherwise completely
arbitrary (indeed, there are not only deterministic and t=1
probabilistic ELs, but also those that ‘‘evolve’’ in time – t=2
following a meta-EL, which in turn can be determinis-
tic or probabilistic). t=3
Consider some examples of CAs.
t=4
Example 1 (CA1) Consider a linear array of seven
boxes (cells; one can number them c(i), i = 1, 2, . . . , 7). t=5
Each box can be empty or it can contain a ball (so
t=6
there are just two states for each cell). Given a
configuration of this CA at time t, what happens at t=7
time t þ 1 (EL)?
Figure 1 A seven time-step evolution of CA1 starting from a
(i) the state of the first box c(1) never changes; given ID (t = 0). Note that a stable configuration has been
(ii) for each other box c(i), i = 2, 3, . . . , 7; reached at t = 6.
456 Cellular Automata

the current state of a finite number, say N, of cells S(n, t) 6¼ V be called population set (PS), then PS is
(local ELs): a finite set at each time.
Of course, one can easily devise an EL for which
fSðn þ ki ; tÞg; i ¼ 1; 2; . . . ; N; ki 2 Z this is not true; nevertheless, the EL itself is still
¼) Sðn; t þ 1Þ ½1 valid (computable), for instance,
Note that, in principle, the set of cells that Example 3 (CA3) This is an unidimensional CA,
determine, according to the EL, the future state of the namely there are infinite cells on a line (n 2 Z). The
generic cell n, could depend on n, namely one can have cells have M states and V = 0; the EL reads:
N = N(n), as well ki = ki (n), i = 1, 2, . . . , N(n) (see
CA2 below). In any case, such a set of cells is called the state of each cell cycles in the set of available states
the interaction set (IS). Moreover, the distance from ð0 ! 1; 1 ! 2; . . . ; M  2 ! M  1; M 1 ! 0Þ
the cell n of the farthest cell in the IS is called
the range R (of the interaction): R = max(jki j). If Note that the range R is zero, there is a vacuum
IS  {c(n  R), c(n  R þ 1), . . . c(n), . . . c(n þ R  1), excitation; nevertheless, the EL is effective.
c(n þ R)}, then this IS is called a neighborhood of
Deterministic, static, and local ELs that do not give
range R. It is, moreover, clear that, for unidimensional
rise to vacuum excitation are called normal ELs (NELs).
CA, there exists at least one infinite subset of cells that
Since M, N are finite for an NEL, one can give the
have the same state. If there is only one such subset,
then it is called the vacuum set and the state of its NEL itself as a table, considering every possible
configuration of the IS and specifying the outcome
cells is called vacuum state: let V denote the value of
for each configuration (note that there are MN
this state (0  V < M, S(n, t) n!1 ! V).
possible configurations).
Example 2 (CA2) An example of CA with
n-dependent IS (M = 2, R = 3, V = 0). This is the Example 4 (CA4) n 2 Z, M = 2, V = 0, IS  {c(n),
c(n  1), c(n þ 2)}, N = 3, R = 2. The EL is:
EL: the cell c(n) changes its state (0 ! 1, 1 ! 0) iff
Sðn; tÞ 0 0 0 0 1 1 1 1
(i) n is even and at least one of the two cells on its
left is not in the vacuum state; Sðn  1; tÞ 0 0 1 1 0 0 1 1
½2
Sðn þ 2; tÞ 0 1 0 1 0 1 0 1
(ii) n is odd and one or three of the three cells on its
Sðn; t þ 1Þ 0 1 1 0 1 1 0 1
right are not in the vacuum state.
An example of the evolution of such a CA is given
An example of the evolution of such a CA is given
in Figure 2. in Figure 3.
However, these NELs can also be given in an
Usually, only a subclass of ELs is considered for
alternative representation (more useful in view of the
which the phenomenon of vacuum excitation
cannot occur. Namely, during the evolution of extensions of the concept of CA itself, see below).
Namely, an NEL can be given as a discrete-time
the CA, an infinite subset of the vacuum set
EL for the state function S(n, t) in the finite field
cannot change its state in just one time step. In
ZM = {0, 1, 2, . . . , M  1}.
other words: if the set of cells starting from the
first cell and ending with the last one for which

Figure 2 Three hundred and eighty time steps of CA2, starting


from a random chosen initial configuration. Note the left–right Figure 3 Four hundred and sixty-one time steps of CA4,
asymmetry due to the asymmetry of its IS and EL. starting from a random chosen PS of 50 cells.
Cellular Automata 457

For example, the NEL above for CA4 can be


expressed as follows:
2
Sðn; t þ 1Þ ¼ Sðn  1; tÞ þ Sðn; tÞ þ Sðn þ 2; tÞ
þ Sðn; tÞSðn þ 2; tÞ
þ Sðn  1; tÞSðn; tÞSðn þ 2; tÞ ½3
M
Here and in the following, the symbol ¼ denotes a
congruence mod M.
Another example is the following.
Example 5 (CA5) n 2 Z, M = 3, N = 3, V = 0, R = 1,
IS  {c(n  1), c(n), c(n þ 1)}. The NEL is:
3
Sðn; t þ 1Þ ¼ Sðn  1; tÞ þ Sðn; tÞ þ Sðn þ 1; tÞ
þ 2Sðn  1; tÞSðn þ 1; tÞ ½4 Figure 5 A class-1 CA: every ID rapidly evolves to
3
periodic structures; M = 3,V = 0, R = 2, EL: S(n, t þ 1) = S(n, t)þ
An example of the evolution of such a CA is given S(n  1, t)S(n þ 2, t).
in Figure 4.

Classification of ELs Deep and extensive computer investigations have


been exploited for unidimensional CAs with small
Considering a CA with given M > 1, N  1, the values of M, N. Surprisingly enough, it seems that
number L of possible deterministic, static ELs is the typical behavior of all these CAs can be (roughly
LðM; NÞ ¼ MðM Þ
N
½5 and heuristically) classified in just four classes
(Wolfram 2002):
Of course, this number can be very large for
 Class 1 (simple): possibly after a complicated
relatively small values of M and N also. Never-
theless, it is a finite positive integer, so that, for transient, simple patterns emerge.
 Class 2 (fractalic): possibly after a transient,
given M, N, one could denote every EL by an
integer number and investigate the typical behavior overall regular nested structures are obtained.
 Class 3 (chaotic): complicated but seemingly
of each EL. A considerable reduction of this
number is obtained if one limits attention to random behavior.
 Class 4 (complex): possibly after a transient,
totalistic ELs, namely to those whose outcome
depends only on the global configuration of the localized structures emerge that interact in com-
IS, often just on plex ways.
Due to the looseness of the above definitions,
X
N
ðn;tÞ ¼ Sðn þ ki Þ; i ¼ 1; 2;... ;N; ki 2 Z ½6 perhaps a better way to distinguish between classes
i¼1 is to train one’s eye. Consider some examples of
CAs for each class: the typical behavior of class-1
CA is shown in Figures 5 and 6, of class-2 CA in
Figures 7 and 8, of class-3 CA in Figures 4 and 9,
of class-4 CA in Figures 10 and 11. Note, however,
that often one has ‘‘mixed type’’ CA: for example,
CA4 is of class 1 on the right and of class 2 on
the left (see Figure 3); Figure 12 exhibits a CA
where the typical behaviors of classes 2 and 3 are
superimposed.

Extensions
The concept of a CA is so simple that many
extensions of the above-sketched definition of a
Figure 4 Four hundred and sixty-one time steps of CA5, CA can be easily devised. A (nonexhaustive) survey
starting from a random chosen PS of 50 cells. of such extensions follows.
458 Cellular Automata

Figure 8 A class-2 CA: a double fractal structure appears; M = 4,


4
V = 0, R = 2, EL: S(n, t þ 1) = S(n  2, t) þ S(n, t) þ S(n þ 2, t):

Figure 6 A class-1 CA, a random ID vanishes after 337


5
time steps, M = 5, V = 0, R = 2, EL: S(n, t þ 1) = S(n  1, t)
S(n  2,t) þ S(n þ 1,t)S(n þ 2,t) þ S(n 1,t)S(n þ 1,t) þ S(n  2,t)
S(n þ 2,t).

5
Figure 9 A class-3 CA: M = 5, V = 0, R = 2, EL: S(n, t þ 1) =
2S(n  1, t) þ S(n þ 1, t) þ S(n, t)(S(n þ 1, t) þ S(n þ 2, t)) þ
S(n  1, t)S(n þ 1, t).

function ~S(n, t) that takes values in the finite field


Figure 7 A class-2 CA: Sierpinsky triangles appear; M = 2,
2 ZM , M = M1 M2 . . . ML ; for example,
V = 0, R = 1, EL: S(n, t þ 1) = S(n  1, t) þ S(n þ 1, t).
!
X
L1 Y
L
~Sðn; tÞ ¼ SL ðn; tÞ þ Sl ðn; tÞ Mk ½7
l¼1 k>l
Vector CA

In this extension, the state function S(n, t) is Thus, in a sense, vector CAs are still usual CAs
considered as a ‘‘vector,’’ namely S(n, t)  with a complicated EL.
(S1 (n, t), S2 (n, t), . . . SL (n, t)), L being a positive inte-
Example 6 (CA6) A two-component vector CA:
ger. Each component Sl (n, t)(l = 1, 2, . . . , L) takes
values in a finite field, say ZMl = {0, 1, 2, . . . , Ml  M
S1 ðn; t þ 1Þ ¼1 S1 ðn; tÞS1 ðn þ 1; tÞ
1}, and evolves, according to some EL, interacting
with the other components. Of course, one can give þ ðM1  1ÞS2 ðn  1; tÞS2 ðn; tÞ þ c1 ½8
separately the time evolution for each component; M
S2 ðn; t þ 1Þ ¼2 S1 ðn  1; tÞS2 ðn; tÞ
however, it is also possible to give a global
representation of a vector CA, introducing a global þ S1 ðn; tÞS2 ðn þ 1; tÞ þ c2 ½9
Cellular Automata 459

Figure 12 A mixed-class CA: a fractalic structure is super-


2
imposed on a chaotic one; M = 4, V = 0, R = 2, EL: S(n, t þ 1) =
S(n, t)(S(n  2, t) þ S(n þ 2, t)) þ S(n  1, t)S(n þ 1, t).

Figure 10 A class-4 CA (Wolfram CA 110): M = 2, V = 0, R = 1,


2
EL: S(n,t þ 1)= S(n,t) þ S(n þ 1,t) þ S(n,t)S(n þ 1,t) þ S(n  1,t)
S(n,t)S(n þ 1,t).

Figure 13 Global behavior of the vector CA6.

Obviously, ~S 2 ZM with M = M1 M2 . Figure 13


represents the global behavior of this CA with
M1 = 2, M2 = 3, c1 = 1, c2 = 1, V = 0.
Note that this CA can be considered as an
extension of the celebrated quadratic map.

Multidimensional CA
Up to now we have considered CAs with finite number
of cells (finite CAs) or with an infinite number of cells
arranged on a line (unidimensional CAs). Now we
Figure 11 A class-4 CA. Note the interacting moving struc- consider CAs with cells arranged on a surface,
tures on the left and on the right; note also the apparently usually a plane (bidimensional CAs), or on 3-space
2
chaotic behavior in the center; M = 2,V = 0,R = 2,EL: S(n,t þ 1)= (tridimensional CAs), or even on a hyperspace (multi-
S(n,t) þ S(n þ 1,t) þ S(n  1,t)S(n þ 2,t). dimensional CAs). In any case, if the number of cells
is finite, the evolution of such CAs, according to an
The global behavior of this CA can be expressed, NEL, must end up to a final cycle: this is due to the
for example, through the global state function finiteness of the ‘‘phase space’’ (thus, these CAs should
be classified as class 1; however, note that, if the
~
Sðn; tÞ ¼ M2 S1 ðn; tÞ þ S2 ðn; tÞ ½10 ‘‘phase space’’ is large enough, the dynamics of
460 Cellular Automata

such CAs can still be very rich). Usually, one (periodic structures), gliders and ships (moving
considers an infinite number of cells tessellating structures), emitters and absorbers (namely, struc-
the whole s-space, s = 2, 3, . . . (e.g., squares or tures that, after a time period, reconstitute them-
hexagons on the plane, cubic cells in 3-space). The selves, but meanwhile they have emitted or adsorbed
changes in the previous notation and definitions are moving structures). These structures are essential to
plain: for example, for a bidimensional CA, the state prove that Life can be used to construct a universal
function depends now on two discrete ‘‘space’’ Turing machine (see below). One can get a rough
variables (S(n1 , n2 , t), n1 2 Z, n2 2 Z); furthermore, idea of such ‘‘richness’’ from Figure 14.
there is a greater freedom in choosing a neighbor- As in the previous case of vector CA, one could
hood of range R. Two most-used neighborhoods of object that also multidimensional CAs are not true
range 1 are shown below: extensions of the unidimensional CAs. Indeed, since
the whole set of cells is still a countable set, one
Neumann neighborhood
could number the cells with just a discrete ‘‘space’’
variable (say n 2 Z ). For example, in the case of a

square tessellation of the plane, we could enumerate

}
the cells in the plane starting from the origin as
follows:


22 !
½11
Moore–Conway neighborhood 21 20 19 18
13 12 11 4 5 6 17



14 9 10 3 2 7 16
15 8 1 0 1 8 15 ½14

}

16 7 2 3 10 9 14



17 6 5 4 11 12 13
18 19 20 21
The most famous (and interesting) bidimensional  22
CA is ‘‘Life’’, introduced by J H Conway, which is
discussed next.
Thus, any multidimensional CA could in principle
Example 7 (CA ‘‘Life’’; Moore–Conway neighbor- be viewed as a unidimensional one. Of course, one
hood, V = 0, M = 2). A cell in the vacuum state 0 is has to pay a price for this: ISs and ELs that are
called ‘‘dead’’; a cell in the state 1 is called ‘‘alive.’’ simple for a multidimensional CA become cumber-
The EL is as follows: some for its unidimensional version and vice versa.
(i) If a cell is dead at time t, it comes alive at time
t þ 1 if and only if exactly three of its eight
Higher Time Derivatives
neighbors are alive at time t (reproduction).
(ii) If a cell is alive at time t, it dies at time t þ 1 if and Up to now, we have considered CAs whose evolved
only if fewer than two (loneliness) or more than state S(t þ 1) depends only on the state S(t), namely
three (overcrowding) neighbors are alive at time t. the state of the CA itself at the previous time step. In
other words the EL involves just the first (discrete)
Clearly, this is a totalistic NEL. Now considering
time derivative (1 CA). One can easily extend all the
the explicit form of  (see [6]):
previous definitions to consider higher-order discrete
ðn1 ; n2 ; tÞ ¼ Sðn1 ; n2 ; tÞ time derivatives (K CA). Of course, the ID and the IS
for such a CA involve the state of the CA at K
X
1 X
1
þ Sðn1 þ k1 ; n2 þ k2 ; tÞ ½12 subsequent time steps.
k1 ¼1 k2 ¼1 An example of a unidimensional 2 CA is given
below.
the above EL can be simply expressed as:
Example 8 (CA7) M = 3, V = 0, R = 1. The EL is:
Sðn1 ; n2 ; t þ 1Þ ¼ 3; þ 2; Sðn1 ; n2 ; tÞ ½13
3
Sðn; t þ 1Þ ¼ Sðn  1; tÞ þ Sðn; t  1Þ þ Sðn þ 1; tÞ ½15
where 3,  is the Kroenecker symbol.
Life is a class-4 CA; it exhibits a rich variety of An example of the evolution of such a CA is given in
interesting structures: stable structures, oscillators Figure 15.
Cellular Automata 461

(a) (b)

Figure 15 CA7, clearly a class-2 CA.

It is plain that taking a suitable continuum limit


of a K CA one gets a partial differential equation of
order K for the evolution. However, there are also
special and interesting CAs, called ‘‘filter’’ CAs,
(c) (d) that in a suitable continuum limit end up in integral
evolution equations. For a filter unidimensional
CA, the evolved state at the cell n, S(n, t þ 1),
depends also on the (already) evolved states of the
cells on its left (or right): for example, an NEL of
the type
M ~j ; t þ 1ÞÞ
Sðn; t þ 1Þ ¼ FðSðn þ ki ; tÞ; Sðn  k
i ¼ 1; 2; . . . ; N; ki 2 Z
~
j ¼ 1; 2; . . . ; N; ~j 2 N
k ½16

(e) (f) is still valid (computable). Extensions to K CAs or


vector CAs or multidimensional CA are plain. Very
Figure 14 CA ‘‘Life’’: (a) Time 0. Near the lower border, five
stable structures (from the left to the right: a ‘‘block’’, a ‘‘boat’’, a often filter CAs exhibit a class-4 behavior with
‘‘ship’’, a ‘‘loaf’’, a ‘‘beehive’’); near the left border three ‘‘blinkers’’ particle-like structures moving and interacting in a
(period-2 oscillators); near the right corner, a symmetric structure complex way; see the following example and
that, in one time step, evolves into a ‘‘pulsar’’ (a period-3 examples in the next section.
oscillator), on the left-up corner a ‘‘glider’’ (a moving structure);
on the right-up corner a ‘‘medium weight spaceship’’ (another Example 9 (CA8) M = 2, V = 0, R = 2. The EL is:
moving structure); in the center, a configuration that vanishes in a
few time steps. (b) Time 1. The structures on the lower border are 2
Sðn; t þ 1Þ ¼ Sðn  1; t  1ÞSðn  2; tÞ
unchanged, the blinkers, the glider, and the space ship are in an
intermediate state, on the right border, the pulsar starts to pulse. þ Sðn; tÞ þ Sðn þ 1; tÞSðn þ 2; tÞ ½17
(c) Time 2. The three blinkers on the left border are again in their
original configurations (periodic structure with period 2), the
pulsar, the glider and the spaceship are in another intermediate An example of the evolution of such a CA is given
state. (d) Time 3. The pulsar is in its second state, the glider and in Figure 16.
the spaceship in their third, the structure in the center is going to
vanish. (e) Time 4. The pulsar has completed its pulsation (period-
3 oscillator, see Figure 14b); the structure in the center has Invertible CA
vanished, the glider and the spaceship have recovered their
original configurations (see Figure 14a) but meanwhile they have
For most of the ELs there is a loss of information
moved of a cell in four time steps (1n4 of the highest velocity in the course of the evolution (see, e.g., Figures 5
attainable by a moving structure in a CA of range 1). The glider is and 6). Indeed, different definitions of ‘‘CA
moving downward and to the right, the space ship in horizontal to entropy’’ have been introduced to measure the
the left. (f) Time 60. The space ship has almost completed its ‘‘randomness’’ in the behavior of a given CA.
crossing, the glider has reached the center and it is in a collision
route with the pulsar.
However, since CAs are important in physical
462 Cellular Automata

Example 10 (CA9) A 6 CA: M = 2, V = 0, R = 1.


The EL is:
2
Sðn; t þ 1Þ ¼ Sðn; t  5Þ þ Sðn; t  3Þ þ Sðn þ 1; t  2Þ
þ Sðn  1; t  1Þ
þ Sðn; t  2ÞSðn þ 1; t  2Þ
þ Sðn; tÞSðn  1; tÞ ½20
The inverse EL, according to [19], reads
(Figure 17)
2 ~
~Sðn; ~t þ 1Þ ¼ Sðn; ~t  5Þ þ ~Sðn; ~t  1Þ þ ~Sðn þ 1; ~t  2Þ
þ ~Sðn  1; ~t  3Þ
þ ~Sðn; ~t  2Þ~Sðn þ 1; ~t  2Þ
þ ~Sðn; ~t  4Þ~Sðn  1; ~t  4Þ ½21

Figure 16 CA8, a ‘‘filter’’ CA. Note the emerging of particle-like


structures moving to the left and to the right and interacting in
complex ways.

modeling as well as in cryptography and data


compression, there is great interest in a special
subclass of CAs which are ‘‘invertible’’ (time
reversible). Namely, for an ‘‘invertible’’ CA fol-
lowing a given EL and starting from an arbitrary
ID, there exists an ‘‘inverse’’ EL such that one
can recover the ID from the evolved states.
Invertible CAs can be easily devised in the case of
K CA (K > 1). For example, if K = 2, 3 . . . , one can
consider ELs of the form
(a)
 
M j
Sðn; t þ 1Þ ¼ Sðn; t  K þ 1Þ þ F Sðn þ ki ; t  jÞ ½18a

where

j
i ¼ 1; 2; . . . ; N j ; ki 2 Z
½18b
j ¼ 0; 1; 2; . . . ; K  2

and F is an arbitrary polynomial function.


It is then clear that the inverse EL reads

~ M
Sðn;~t þ 1Þ ¼ ~
Sðn;~t  K þ 1Þ
 
þ ðM  1ÞF ~ Sðn þ kji ;~t þ j  K þ 2 ½19

Indeed, if an arbitrary ID evolves according to (b)


the EL [18], then applying the inverse EL [19] to K
Figure 17 CA9, a 6 CA: (a) a 50 time-step evolution from a
subsequent evolved states (taken in reversed order), peculiar ID; (b) a 50 time-step evolution of the inverse EL, starting
eventually the original ID is recovered (in reversed from the last six configurations of Figure 17a (taken in inverse
order) (see the following example). order); the ID of Figure 17a is recovered (in inverse order).
Cellular Automata 463

Of course, more complicated invertible ELs can be Example 11 (CA10) A 1.5 CA, M = 2, V = 0, R = 3.
devised. Invertible ELs can be also easily devised for The EL is:
‘‘filter’’ CA, for example, if an NEL for a ‘‘filter’’ CA
reads 2
Sðn; t þ 1Þ ¼ Sðn; tÞ þ Sðn  3; t þ 1ÞSðn  2; t þ 1Þ
þ Sðn þ 2; tÞSðn þ 3; tÞ
M
Sðn; t þ 1Þ ¼ Sðn; tÞ þ Sðn  2; t þ 1ÞSðn  1; t þ 1Þ
~j ; t þ 1ÞÞ þ Sðn þ 1; tÞSðn þ 2; tÞ ½24
þ FðSðn þ ki ; tÞ; Sðn  k ½22
Note that this EL is of the form [22]; therefore, it
where ki and k ~j are positive integers is invertible (see Figure 18a). According to [23], the
~ and F is an arbitrary
(i = 1, 2, . . . , N; j = 1, 2, . . . , N) inverse EL reads:
(polynomial) function, then it is invertible and
the inverse NEL reads 2
~Sðn; ~t þ 1Þ ¼ Sðn; ~tÞ þ ~Sðn þ 3; ~t þ 1Þ~Sðn þ 2; ~t þ 1Þ
M
~  2; tÞSðn
þ Sðn ~  3; tÞ
~
Sðn; ~t þ 1Þ ¼ ~
Sðn; ~tÞ þ ðM  1Þ
þ ~Sðn þ 2; ~t þ 1Þ~Sðn þ 1; ~t þ 1Þ
Fð~
Sðn þ ki ; ~t þ 1Þ; ~ ~j ; ~tÞÞ
Sðn  k ½23 þ ~Sðn  1; ~tÞ~Sðn  2; ~tÞ ½25

Note that [22] is computable starting from This CA exhibits a very rich dynamics: any
n = 1, whereas [23] is computable starting from complex ID rapidly decays in a great variety of coherent
n = þ1. particle-like structures, steady or moving to the right or

(a) (b)

(c) (d)
Figure 18 CA10: (a) 230 time-step evolution, then the inverse EL is applied for 230 further time step in order to recover the initial
configuration. (b) Collisions between different kinds of particle-like coherent moving structures. The last collision (on the right) is
a solitonic one: the interaction produces just a phase shift, preserving number, shape, and velocities of the involved ‘‘particles.’’
(c) ‘‘Particles’’ moving with different velocities and interacting in complex ways (solitonic collisions, particle creations and annihilations).
(d) A particle goes through a nonhomogeneous medium and undergoes refraction by the medium itself.
464 Cellular Automata

to the left with different velocities. The interactions the constructing arm. When on the tape, it stores a
between different particles may be solitonic (the description of the universal constructor itself, then it
particles emerge unchanged but shifted) or annihila- self-reproduces. The total size of the self-reproducing
tion–creation phenomena can occur (see Figures 18a–d). automaton amounts to 200 000 cells. (Some com-
puter simulations of von Neumann self-reproducing
automaton are available on the web.)
Applications of CAs Since von Neumann’s CA is a very complex one,
it led researchers to think that a CA able to simulate
CAs as Universal Constructors and
a universal Turing machine should also be quite
Turing Machines
complex. The perspective changed completely after
In the 1950s, von Neumann, who contributed to the the introduction of CA Life. Conway was looking
development of the first computer (ENIAC), decided for a simple CA with a possible rich dynamics;
to work out a mathematical theory of automata. however, it was subsequently realized that Life was
Such a theory was finalized to give an answer to the much more complicated that anyone could have
following question: is it possible to build an thought. Finally, thanks to the development of faster
automaton such that it allows universal computa- computers that allowed visualization of the evolu-
tion (i.e., it embodies a universal Turing machine) tion of quite large populations and through the
and, moreover, it is able to build (in order of contribution of a large number of researchers, it was
decreasing generality) proved that a universal Turing machine could be
embedded in Life.
1. an arbitrary automata (universal constructor);
The discovery that even a simple CA such as Life
2. a copy of itself (self-reproducing); and
could incorporate a universal Turing machine led to
3. an automaton that is itself a universal Turing
the question whether it could be possible to build a
machine (constructor)?
universal Turing machine inside a simple one-
The last question von Neumann had intention to dimensional CA. This is indeed the case: up to
address was if in the process of automata self- now, the simplest CA capable of universal computa-
reproduction (if possible) a process of evolution tion is the W110 CA (see Figure 10), as proved
could take place, that is, if a simpler automaton recently by Cook after a conjecture formulated by
could generate a more complex one. Wolfram in 1985.
In the beginning, the idea of von Neumann was to
describe, using mathematical axioms, an automaton
CAs for Computer Simulations
moving inside a warehouse and selecting various
elementary spare parts (e.g., ‘‘muscles,’’ switches, rigid One of the major applications of CAs is the
girders) and then assembling them into a new auto- computer simulation of various dynamical pro-
maton. While this original idea was very realistic, it was cesses. Even if CAs were not invented for this
also very difficult to pursue, so that von Neumann, purpose, they possess peculiarities that make them
following a suggestion by Ulam, decided to consider his particularly suitable for this task. The main advan-
questions in the more abstract framework of CAs. tage of using a CA for a dynamical simulation is due
The particular CA he considered is an infinite to their completely discrete nature that allows exact
square CA with 29 possible states. The transition rule simulations on a computer. Thus, any spurious
is dependent upon the cell to update and its north, effect due to rounding errors is ruled out. Another
east, south, and west neighbor cell (the von Neumann advantage is that the EL of a CA can be seen as a
neighborhood). Among the 29 possible states there is function between finite sets. For this reason, one can
one state that is ‘‘quiescent’’ (the vacuum state). specify the EL through a ‘‘lookup table’’ (see [2]):
von Neumann proved the existence of a configura- then when running the simulations, the computer
tion of 50 000 cells immersed in a sea of quiescent has only to access the table instead of computing the
states that embodies a universal Turing machine and function every time, shortening considerably the
that is a universal constructor. An infinite one- computation time. Another great advantage of CAs
dimensional ‘‘tape’’ is used to store a description of in computer simulations is that, for their very nature
the automaton to build. The universal constructor (at least for local EL), they can be implemented on
reads the description on the tape, develops a parallel machines. These two concepts are at the
‘‘constructing arm’’ that builds the configuration basis of dedicated computers for CAs simulations
described on the tape in an unoccupied part of the developed by Toffoli, Margolus, and co-workers
cellular space, makes a copy of the tape and finally (CAM series). The possibility to use efficiently
attaches it to the newly built automaton and retracts parallel computers for CA simulation could prove
Cellular Automata 465

bidimensional square lattice and the particles are


described by arrows lying on the edges of the lattices
and pointing to some vertex (see Figure 20a).
The particles are assumed to be all identical and
with the same velocity, and particles on the same
edge with the same direction are not allowed
(exclusion principle). The EL prescribes that parti-
cles move with unitary velocity along the edges in
the direction pointed by the arrow (free flight)
unless there are exactly two particles on the edges
Figure 19 A CA that ‘‘computes’’ the 3n þ 1 Collatz–Ulam connected to a given vertex and they point in
map. The ID for the CA is the initial number for the iterated map opposite directions (collision); in this case they are
(binary notation, order 2300, randomly chosen, displayed on the
left vertical axis). The CA, according to the Collatz conjecture,
replaced by two arrows pointing outward on the
ends up to the final stable configuration (horizontal line on the previously empty edges (see Figure 20b). Clearly,
right for the CA, 1 ! 4 ! 2 ! 1 for the map). the EL conserves the number and the momentum of
the particles.
to be fundamental when computer speeds approach The HPP model can be described algebraically.
saturation. Moreover, CAs themselves can mimic The admissible particle velocities are just
parallel computations, see, for example, Figure 19, c1 ¼ þ^
x; c2 ¼ þ^
y; c3 ¼ ^
x; c4 ¼ ^
y ½26
where a nonlocal CA ‘‘computes’’ very efficiently the
celebrated Collatz–Ulam 3n þ 1 map.

CAs in Physics
Since Newton, physics has been described through
differential equations and continuous functions.
However, such a mathematical description is not
fit for simulation on a computer, and some
discretizations must be considered. First, one has to
discretize space and time passing from differential
equations to (finite systems of) finite difference
equations; second, one has to round off the values
of the functions to store them in the memory of the
computer. The main drawback of this procedure is
(a)
that in chaotic systems such approximations can
rapidly lead to great differences between the real Collisions
and the simulated behavior. As already noticed, this
problem does not appear in CA. Thus, one would
like to use this good characteristic of CAs in physical
modeling taking due account of the continuous
nature of the physics involved. This requires atten-
tion and ingenuity in constructing reliable CA
models for physical processes. For example, this
goal has been achieved in the so-called lattice gas
automata (LGAs).
LGAs are CA models for the microscopic Free flight
dynamics of fluids and gases. The thermodynamic
limit of these CAs yields the correct continuous
functions for the macroscopic quantities (density,
pressure, viscosity, etc.).
The first step toward LGAs was the discovery that
the HPP model developed in the 1970s by Hardy, (b)
Pomeau and De Pazzis was in fact a CA. The HPP Figure 20 (a) An example of configuration for the HPP model.
model describes the behavior of a fluid (or a gas) in (b) Head on collisions and three particle collisions in the HPP
a plane. The configuration space is given by a model.
466 Cellular Automata

Accordingly, only four bits nj (x, t), j = 1, 2, 3, 4, are nonlinear dynamical systems (nonlinear continuous
required to denote the presence (1) or the absence and discrete evolution equations, many-body pro-
(0) of a particle with velocity cj pointing vertex x at blems) could profitably be extended to find ‘‘integr-
time t. The dynamical rule for HPP can be written in able’’ CAs. Indeed, many such CAs have been found
the form that exhibit ‘‘solitons’’ and are endowed with non-
trivial conservation laws (of course, this is very
nj ðx þ cj ; t þ 1Þ ¼ nj ðx; tÞ þ !j ðx; tÞ ½27 important in physical modeling). Moreover, the
above-cited similarity between certain CA behaviors
where term nj (x, t) on the right-hand side accounts and elementary particle physics phenomena suggests
for the free flight of particles, while !j (x, t) modifies that the fundamental structure of reality (at the Planck
the trajectories in the case of collisions. The !j are level) could indeed be that of a CA (cells of Plank
determined by the state of the system according to length, discrete time flow): attempts to construct this
the following rules: underlying CA physics have been pursued.

!1 ¼ n1 ð1  n2 Þn3 ð1  n4 Þ Other Applications


þ ð1  n1 Þn2 ð1  n3 Þn4 ½28a CAs exhibit a great plasticity, which makes them
!2 ¼ n2 ð1  n3 Þn4 ð1  n1 Þ well suited to model systems in a wide range of
fields. This is mainly due to the fact that CAs with
þ ð1  n2 Þn3 ð1  n4 Þn1 ½28b
very simple rules can also simulate universal Turing
!3 ¼ n3 ð1  n4 Þn1 ð1  n2 Þ machines, so that they can exhibit a very rich and
þ ð1  n3 Þn4 ð1  n1 Þn2 ½28c complicated overall dynamics (in principle, one
could simulate any dynamical system using a simple
!4 ¼ n4 ð1  n1 Þn2 ð1  n3 Þ CA). There is another reason for the wide applic-
þ ð1  n4 Þn1 ð1  n2 Þn3 ½28d ability of CA modeling even outside of physics:
namely, it is well known that algorithms, not
It is plain that eqns [27] and [28] can be differential equations, are better instruments to
interpreted as the EL for a CA. schematize dynamical processes for complex and
In the thermodynamic limit, the equations govern- organized systems. Since simple algorithms can be
ing the dynamics of the macroscopic quantities of naturally implemented on CAs, the latter are very
the fluid are given by the continuity equation and by useful for realizing simple models and simulations in
anisotropic Navier–Stokes equations. The aniso- many fields: biology, economics, ecology, neural
tropy in the Navier–Stokes equations is due to the networks, traffic models, etc.
fact that the invariance group of the square lattice is Moreover, applications of CAs in informatics and
too small. This problem was solved by Frisch, specifically in cryptography and data compression
Hasslacher, and Pomeau in 1986, with the introduc- have been investigated.
tion of the FPP model. It turns out that a hexagonal
lattice has enough symmetries to recover the See also: Dynamical Systems in Mathematical Physics:
An Illustration from Water Waves; Generic Properties of
isotropic Navier–Stokes equations in the thermo-
Dynamical Systems; Integrable Systems: Overview.
dynamic limit. So, the FPP model is an example of a
model where even if the microscopic dynamics is
almost a caricature of the real dynamics, the
thermodynamic limit gives rise to the correct Further Reading
physical equations. Berlekamp ER, Conway JH, and Guy R (1982) Winning Ways for
CAs have been used to simulate many other Your Mathematical Plays. London: Academic Press.
physical processes (unfortunately, there is no space Boghosian BM (1999) Lattice gases and cellular automata. Future
here for a sufficiently elaborate description). The Generation Computer Systems 16: 171–185.
Boon JP, Dab D, Kapral R, and Lawniczak A (1996) Lattice gas
principal fields of application are: percolation automata for reactive systems. Physics Reports 273: 55–147.
theory, magnetism, diffusion phenomena, sandpiles, Burks AW (ed.) (1970) Essay on Cellular Automata. Urbana:
models of earthquakes, crystal growth, etc. University of Illinois Press.
The more intriguing aspect of some even simple CAs Chopard B and Droz M (1998) Cellular Automata Modeling of
(e.g., CA9, CA10: see Figures 16 and 18) is their very Physical Systems. Cambridge: Cambridge University Press.
Doolen G (ed.) (1990) Lattice Gas Methods for Partial
rich particle-like dynamics. For instance, the existence Differential Equations. New York: Addison-Wesley.
of solitonic collisions suggested that the techniques Gardner M (1983) Wheels, Life, and Other Mathematical
recently developed to find and treat ‘‘integrable’’ Amusements. New York: W H Freeman.
Central Manifolds, Normal Forms 467

Jackson EA (1990) Perspectives of Nonlinear Dynamics. von Neumann J (1966) In: Burks AW (ed.) Theory of Self-
Cambridge: Cambridge University Press. Reproducing Automata. Urbana: University of Illinois Press.
Toffoli T and Margolus N (1987) Cellular Automata Machines – Wolfram S (2002) A New Kind of Science. Champaign: Wolfram
A New Enviroment for Modeling. Cambridge: The MIT Press. Media.

Central Manifolds, Normal Forms


P Bonckaert, Universiteit Hasselt, Diepenbeek, (Non) uniqueness, Smoothness
Belgium
Most proofs in the literature (Vanderbauwhede
ª 2006 Elsevier Ltd. All rights reserved.
1989) use a cutoff in order to construct globally
defined objects, and then obtain the invariant graph
as the solution of some fixed-point problem of a
Introduction contraction in an appropriate function space.
We consider differentiable dynamical systems gen- Although this solution is unique for the globalized
erated by a diffeomorphism or a vector field on a problem, this is not the case at the germ level:
manifold. We restrict to the finite-dimensional case, another cutoff may produce a different germ of
although some of the ideas can also be developed in a central manifold. In other words, locally a
the general case (Vanderbauwhede and Iooss 1992). central manifold might not be unique, as is
We also restrict to the behavior near a stationary easily seen on the planar example x2 @=@x 
point or a periodic orbit of a flow. y@=@y. On the other hand, the 1-jet of the map
Let the origin 0 of Rn be a stationary point of a C1 c , in case of a C1 vector field, is unique, so if
vector field X, that is, X(0) = 0. We consider the there would exist an analytic central manifold then
linear approximation A = dX(0) of X at 0 and its this last one is unique; in the foregoing example,
spectrum (A), which we decompose as (A) = s [ it is the x-axis. But for the (polynomial) example
c [ u , where s resp. c resp. u consists of those (x  y2 )@=@x þ y2 @=@y one can calculate P that the
eigenvalues with real part < 0 resp. = 0 resp. > 0. If 1-jet of x = c (y) is given by j1 c (y) = n1 n!ynþ1 ,
c = ; then there is no central manifold, and the which has a vanishing radius of convergence, so
stationary point 0 is called hyperbolic. Let Es , Ec , there is no analytic central manifold. On the other
and Eu be the linear A-invariant subspaces corre- hand, by the Borel theorem we can choose a
sponding to s resp. c resp. u . Then Rn = Es C1 -representative for c . This can be generalized
Ec Eu . We look for corresponding X-invariant in the planar case:
manifolds in the neighborhood of 0, in the form of
graphs of maps. More precisely: Proposition 1 If n = 2 and if X is C1 and if the
1-jet of X in the direction of the central manifold
Theorem 1 Let the vector field X above be of class is nonzero, then this central manifold is C1 .
Cr (1  r < 1). There exist map germs ss : (Es , 0) ! In particular, if X is analytic then the central
Ec Eu , sc : (Es Ec , 0) ! Eu , uu : (Eu , 0) ! Es Ec , manifold is either an analytic curve of stationary
cu : (Ec Eu , 0) ! Es , and c : (Ec , 0) ! Es Eu of points or is a C1 curve along which X has a
class Cr such that the graphs of these maps are nonzero jet.
invariant for the flow of X. Moreover, these maps
are of class Cr , and their linear approximation at 0 For proofs and additional reading, the reader is
is zero, that is, their graphs are tangent to, referred to Aulbach (1992). In general, a central
respectively, Es , Es Ec , Eu , Ec Eu , and Ec . If X is manifold is not necessarily C1 (van Strien 1979,
of class C1 then ss and uu are also of class C1 . If Arrowsmith and Place 1990): for the system in
X is analytic then ss and uu are also analytic. R3 given by
@ @ @
The graph of c is called the (local) central ðx2  z2 Þ þ ðy þ x2  z2 Þ þ 0 
@x @y @z
(or, center) manifold of X at 0 and it is often
denoted by W c . Thus, it is an invariant manifold one can find a Ck central manifold for every k but
of X tangent at the generalized eigenspace of there is no C1 central manifold. Indeed, in this case
dX(0) corresponding to the eigenvalues having zero the domain of definition of c shrinks to zero when
real part. k tends to infinity.
468 Central Manifolds, Normal Forms

Central Manifold Reduction so-called seminormal or renormal form containing


higher-order terms (see Bonckaert (1997, 2000) and
The importance of a central manifold lies in the
references therein; here one can also find results for
principle of central manifold reduction, which
cases where extra constraints should be respected,
roughly says that for local bifurcation phenomena
like symmetry, reversibility, or invariance of some
it is enough to study the behavior on the central
given foliation etc.).
manifold, that is, if two vector fields, restricted to
their central manifolds, have homeomorphic integral Parameters
curve portraits, and if the dimensions of Es and Eu
are equal, then the two vector fields have home- Having an eigenvalue with zero real part is
omorphic integral curve portraits in Rn , at least ungeneric, so in bifurcation problems we consider
locally near 0. Let us be more precise: p-parameter families X near, say,  = 0. With
respect to the results above, we remark that such a
Theorem 2 Let m be the dimension of Ec . There family can be considered as a vector field near
exists p, 0  p  n  m, such that X is locally (0, 0) 2 Rn  Rp tangent to the leaves Rn  {}. In
C0 -conjugate to fact, the parameter direction Rp is contained in Ec .
X
m In all the results mentioned, this structure ‘‘of being
X0 ¼ ~ i ðz1 ; . . . ; zm Þ @
X a family’’ is respected. For example, in Theorem 2
@zi ~ i (z1 , . . . , zm ) by X
~ i (z1 , . . . , zm , ). Hence,
i¼1 we replace X
mþp
X @ Xn
@ ~  is a versal unfolding of X
if X ~ 0 then X is a versal
þ zi  zi unfolding of X0 . By this, the search for versal
i¼mþ1
@zi i¼mþpþ1 @zi
unfoldings is reduced to the unfolding of singula-
where (z1 , . . . , zm ) is a coordinate system on a rities whose linear approximation at 0 has a purely
central manifold, (z1 , . . . , zn ) is a coordinate imaginary spectrum.
Pm ~system
on Rn extending (z1 , . . . , zm ) and i = 1 Xi @=@zi
Diffeomorphisms, Periodic Orbits
is the restriction of X to a central manifold.
Moreover, if A completely analogous theory can be developed for
X
m fixed points of diffeomorphisms f : (Rn , 0) ! Rn .
Y¼ ~ i ðz1 ; . . . ; zm Þ @
Y Here we split up the spectrum of the linear part
i¼1
@zi L = df (0) at 0 as (L) = s [ c [ u , where s resp.
mþp
X @ Xn
@ c resp. u consists of those eigenvalues with
þ zi  zi modulus <1 resp. = 1 resp. > 1. This theory can be
@zi i¼mþpþ1 @zi
i¼mþ1 applied to the time-t map of a vector field (and will
Pm ~ i @=@zi is C0 -equivalent (resp. C0 - give the same invariant manifolds) and to the
and if Y
i = 1P Poincaré map of a transversal section of a periodic
conjugate) to m ~ 0
i = 1 Xi @=@zi then X is C -equivalent orbit of a vector field (Chow et al. 1994).
(resp. -conjugate) to Y.
For a proof and further reading (a generalization)
see Palis and Takens (1977). Normal Forms
In case that more smoothness than just C0 is The general idea of a normal form is to put a
needed, we have the principle of normal lineariza- (complicated) system into a form ‘‘as simple as
tion along the central manifold. More concretely, let possible’’ by means of a change of coordinates. This
x denote a coordinate in the central manifold and idea was already developed to a great extent by
let y be a complementary variable, that is, let H Poincaré. Simple examples are: (1) putting a square
X = Xc @=@x þ Xh @=@y. We define the normally matrix into Jordan form, (2) the flow box theorem
linear part along the central manifold by (Arrowsmith and Place 1990) near a nonsingular
@ @Xh @ point. Depending on the context and on the purpose
NX :¼ Xc ðx; 0Þ þ ðx; 0Þ  y of the simplification, this concept may vary greatly. It
@x @y @y
depends on the kind of changes of coordinates that are
Under certain nonresonance conditions (Takens tolerated (linear, polynomial, formal series, smooth,
1971, Bonckaert 1997) on the real parts of the analytic) and on the possible structures that must be
eigenvalues of dX(0), there exists a Cr local preserved (e.g., symplectic, volume-preserving, sym-
conjugacy between X and NX for each r 2 N metric, reversible etc.). Let us restrict to local normal
(assuming X to be of class C1 ). If there are forms, that is, in the vicinity of a stationary point of a
resonances, then one can conjugate with the vector field or a diffeomorphism (the latter can be
Central Manifolds, Normal Forms 469

P
applied to the Poincaré map of a periodic orbit). We the Taylor series of 
(X) is A  y þ 1 k = 2 gk (y). For
concentrate on the simplification of the Taylor series. practical computations, it is often appropriate to
The general idea is to apply consecutive polynomial first simplify the linear part A and to diagonalize it
changes of variables; at each step we simplify terms of whenever possible. Hence, it is convenient to use a
a degree higher than in the step before. The ideal complexified setting and to use complex polyno-
simplification would be to put all higher-order terms mials or power series. One can show that all
to zero, which would (at least at the level of formal involved changes of variables preserve the property
series) linearize the system. But as soon as there are of ‘‘being a complex system coming from a real
resonances (see below), this is impossible: the planar system,’’ that is, at the final stage we can return to a
system 2x@=@x þ (y þ x2 )@=@y cannot be formally real system (see, e.g., Arrowsmith and Place (1990)
linearized. for a more precise mathematical description).
Hence, we can assume that A is an upper
Setting triangular matrix. Let the eigenvalues be 1 , . . . , n .
It can be calculated that the eigenvalues of LA , as an
Let X be a Crþ1 vector field defined on a neighbor- operator H k ! P Hk , are then the numbers h, i  j
hood of 0 2 Rn , and denote A = dX(0) (its linear where  2 N , nj= 1 j = k and 1  j  n. Hence, if
n

approximation at 0). The Taylor expansion of X at these would all be nonzero then Bk = H k , and then
0 takes the form we have an ideal simplification, that is, all gk equal
X
r to zero. However, if such a number is zero, that is,
XðxÞ ¼ A  x þ Xk ðxÞ þ Oðjxjrþ1 Þ
k¼2 h; i  j ¼ 0 ½2
where Xk 2 H k , the space of vector fields whose it is called a resonance between the eigenvalues. In
components are homogeneous polynomials of such a case, we have to choose a complementary
degree k. The classical formal normal-form theorem space Gk . From linear algebra it follows that one
is as follows. We define the operator LA on H k by can always choose
putting LA h(x) = dh(x)  A  x  A  h(x); one calls LA
the homological operator. One checks that Gk ¼ kerðLA
Þ ½3
LA (H k )  H k . One also denotes this by ad A(h)(x):
where A
is the adjoint operator. But this choice [3] is
see further in the Lie algebra setting. Let Rk be the
not unique and is, from the computational point of
range of LA , that is, Rk = LA (H k ). Let Gk denote any
view, not always optimal, especially if there are
complementary subspace to Rk in H k . The formal
nilpotent blocks. This fact has been exploited by
normal-form theorem states, under the above
many authors. A typical example is the case where
settings:
A = y@=@x. On the other hand, if A is semisimple we
Theorem 3 (Chow et al. 1994, Dumortier 1991) can choose the complementary space to be ker(LA ), so
There exists a composition of near identity changes LA gk = 0; we can assume it to be the (complex)
of variables of the form diagonal[1 , . . . , n ]. In that case we can be more
explicit as follows. Let ej = @=@xj denote the standard
x ¼ y þ k ðyÞ ½1 basis on Cn . For a monomial one can calculate that
k
where the components of  are homogeneous LA ðx ej Þ ¼ ðh; i  j Þx ej ½4
polynomials of degree k, such that the vector field
X is transformed into If the latter is zero, then the monomial is called
X
r resonant. This implies that the normal form can be
YðyÞ ¼ A  y þ gk ðyÞ þ Oðjyjrþ1 Þ chosen so that it only contains resonant monomials.
k¼2 Putting a system into normal form not only
simplifies the original system, it also gives more
where gk 2 Gk , k = 2, . . . , r.
geometric insight on the Taylor series. To be more
Sometimes this theorem is applied to the restric- precise, suppose (for simplicity, this can be general-
tion of a vector field to its central manifold, for ized (Dumortier 1997)) that A is semisimple. One
reasons explained in the last section. This is the can calculate that the condition LA gk = 0 implies:
reason why we did not assume X to be C1 ; in the exp (At)gk ( exp (At)x) = gk (x) for all t 2 R. This
latter case one can let r ! 1 and obtain a normal means that gk is invariant for the one-parameter
form on the level of formal Taylor series (also called group exp(At). A typical example in the plane
1-jets). Using a theorem of Borel, we infer the is: A has eigenvalues i, i. Note that the (only)
existence of a C1 change of variables  such that resonances are h(i, i), (p þ 1, p)i  i = 0 and
470 Central Manifolds, Normal Forms

h(i, i), (p, p þ 1)i þ i = 0 for all p 2 N. We done, one says that L0 respects the grading by the
suppose that the original system was real, that is, homogeneous polynomials. In order to fix ideas,
on R2 ; we can choose linear coordinates such that suppose that L0 are the divergence-free planar vector
for z = x þ iy, z = x  iy the linear part is fields. Note that a monomial xi yj @=@x is not diver-
A = diagonal[i, i]. Applying the remarks above, gence free. We can instead use time mappings of
we conclude that the normal form only contains the homogeneous vector fields of the form a(q þ
monomials (zz)p z@=@z and (zz)pz@=@z. The geo- 1)xpþ1 yq @=@x  a(p þ 1) xp yqþ1 @=@y. Up to terms
metric interpretation here is that these monomials of higher order we can use the time-one map of hk
are invariant for rotations around (0, 0). This can instead of x þ hk (x). In case that one asks for a C1 -
also be seen on the real variant of this: the Taylor realization of the normalizing transformation, we need
series of the (real) normalized system has the an extra assumption on the extra structure, that is, on
form ( þ f (x2 þ y2 ))(x@=@y  y@=@x) þ g(x2 þ y2 ) L0 , called the Borel property: denote by J1, 0 the set of
(x@=@x þ y@=@y) and is invariant for rotations. formal series such that each truncation is the Taylor
Warning: the dynamic behavior of a formal normal polynomial of an element of L0 . The extra assumption
form in the central manifold can be very different is: each element of J1, 0 must be the Taylor series of a
from that of the original vector field, since we are C1 vector field in L0 . It can be proved (Broer 1981)
only looking at the formal level. A trivial example is that the following structures respect the grading and
(take f = g = 0 in the foregoing example) X(x, y) = satisfy the Borel property: being an r-parameter family,
(x@y  y@x)  exp (1=(x2 ))@=@x, where orbits respecting a volume form on Rn , being a Hamiltonian
near (0, 0) spiral to (0, 0), whereas the normal form vector field (n even), and being reversible for a linear
is just a linear rotation. This difference is due to the involution.
so-called flat terms, that is, the difference between One could consider other types of grading of the
the transformed vector field and a C1 -realization of Lie-algebras involved.
its normalized Taylor series (or polynomial). In case This method, using the framework of the so-called
of analyticity of X, one can ask for analyticity of the filtered Lie algebras, is explained and developed
normalizing transformation . Generically, this is systematically in a more general and abstract
not the case in many situations. The precise meaning context in Broer (1981).
of this ‘‘genericity condition’’ is too elaborate to In nonlocal bifurcations, such as near a homo-
explain in this brief review article. We provide some clinic loop, for example, it is not enough to perform
suggestions for further reading in the next section. central manifold reduction near the singularity: a
One could roughly say that, in the central manifold, simplified smooth model in a full neighborhood of
the normal form has too much symmetry and is too the singularity is often needed, for example, in order
poor to model more complicated dynamics of the to compute Poincaré maps.
system, which can be ‘‘hidden in the flat terms.’’ To Let us start with the ‘‘purely’’ hyperbolic case (i.e.,
quote Il’yashenko (1981): ‘‘In the theory of normal dim Ec = 0). First we compute the formal normal
forms of analytic differential equations, divergence form such as the above. If there are no resonances
is the rule and convergence the exception . . . .’’ [2] then we can formally linearize the vector field X.
In many applications, we want to preserve some If X is C1 then a classical theorem of Sternberg
extra structure, such as a symplectic structure, a (1958) states that this linearization can be realized
volume form, some symmetry, reversibility, some by a C1 change of variables (i.e., no more flat terms
projection etc.; the case of a projection is important remaining). In case there are resonances, we must
since it includes vector fields depending on a para- allow nonlinear terms: the resonant monomials. In
meter. Sometimes a superposition of these structures this case we can also reduce C1 to this normal form.
appears (e.g., a family of volume-preserving systems). Using the same methods, it is also possible to reduce
We would like that the normal-form procedure to a polynomial normal form, but this time using
respects this structure at each step. One can often Ck (k < 1) changes of variables. More precisely, if k
formulate this in terms of vector fields belonging to is a given number and if we write the vector field as
some Lie subalgebra L0 . The idea is then to use X = XN þ RN , where XN is the Taylor polynomial
changes of variables like [1], where k is then generated up to order N (which can be assumed to be in
by a vector field in L0 . This will guarantee that all normal form) and where RN (x) = O(jxjNþ1 ), then for
changes of variables are ‘‘compatible’’ with the extra N sufficiently large there is a Ck change of variables
structure. Unlike the general case where we could conjugating X to XN near 0. The number N depends
work with monomials as in [4], we will have to on the spectrum of A = dX(0). An elegant proof of
consider vector fields hk in L0 whose components are these facts can be found in Il’yashenko and Yakovenko
homogeneous polynomials of degree k. If this can be (1991). For the case when extra structure must be
Central Manifolds, Normal Forms 471

preserved, see Bonckaert (1997), which also deals with For local diffeomorphisms there are completely
the partially hyperbolic case (dim Ec  1). As already similar theorems pertaining to all the cases consid-
remarked above, the case of a parameter-dependent ered above.
family can be regarded as a partially hyperbolic
stationary point preserving this extra structure.
The question of an analytic normal form, also in Concluding Remarks
the hyperbolic case, leads to convergence questions
and calls upon the so-called small-divisor problems. The concept of central manifold can be extended to
The classical results are due to Poincaré and Siegel. more general invariant sets (see Chow et al. (2000)
Let us summarize them; they are formulated in the and references therein). It can also be extended to
complex analytic setting: the infinite-dimensional case and can be applied to
partial differential equations (Vanderbauwhede and
Theorem 4 Iooss 1992).
(i) If the convex hull of the spectrum of A does not Concerning the generic divergence of normalizing
contain 0 2 C then X can locally be put into transformations, the reader is referred to Broer and
normal form by an analytic change of variables. Takens (1989), Bruno (1989), Il’yashenko (1981), and
Moreover, this normal form is polynomial. Il’yashenko and Pyartli (1991). Although the power
(ii) If the spectrum {1 , . . . , n } of A satisfies the series giving the normalizing transformation generally
condition that there exists C diverges, the study of the dynamics is often performed
P> 0 and  > 0 such by truncating the normal form at a certain order.
that for any m 2 N n with j mj  2:
Recently, Iooss and Lombardi (2005) considered the
C question as to what an optimal truncation is. It is
jhð1 ; . . . ; n Þ; mi  j j  ½5 shown, in case dX(0) is semisimple, that the order of
jmj
the normal form can be optimized so that the remainder
for 1  j  n then X can be locally linearized by satisfies some estimate shrinking exponentially fast to
an analytic change of variables. zero as a function of the radius of the domain.
Note that case (i) contains the case where 0 is a Concerning normal forms preserving the
hyperbolic source or sink. This case (i) in Theorem 4 Hamiltonian structure, see Birkhoff (1966) and
can be extended if there are parameters: if X Siegel and Moser (1995) for a starting point; this is
depends analytically on a parameter " 2 Cp near an extended subject on its own, sometimes called
" = 0 then the change of variables is also analytic in Birkhoff normal form, and it would require another
"; moreover, the normal form is then a polynomial review article.
in the space variables whose coefficients are analy- Further simplifications of the normal form can
tically dependent on the parameter ". sometimes be obtained by taking into account
For case (ii) this is surely not the case, since the nonlinear terms (instead of just A) in order to obtain
condition [5] is fragile: a small distortion of the reductions of higher-order terms (see Gaeta (2002)
parameter generically causes resonances, be it of a and especially the references therein).
high order. To fix ideas, consider n = 2 and suppose Applications of normal forms and central mani-
1 < 0 < 2 . By a generic but arbitrary small folds to bifurcation theory have been explained in
perturbation, we can have that the ratio of these Dumortier (1991).
eigenvalues becomes a negative rational number
See also: Averaging Methods; Bifurcation Theory;
p=q, which gives a resonance of the form [2] Dynamical Systems and Thermodynamics; Dynamical
with j = 1 and  = (q þ 1, p), so [5] is violated. Systems in Mathematical Physics: An Illustration from
So analytic linearization, or even a polynomial Water Waves; Finite Group Symmetry Breaking;
analytic normal form, is ungeneric for families of Korteweg–de Vries Equation and Other Modulation
such hyperbolic stationary points. The search for Equations; Multiscale Approaches; Normal Forms and
analytic normal forms, that is, simplified models, for Semiclassical Approximation; Symmetry and Symmetry
families is still under investigation. A first simplifica- Breaking in Dynamical Systems.
tion is obtained via the stable and unstable manifold
from Theorem 1, that is, the graphs of ss and uu .
When X is analytic near 0 then these manifolds are Further Reading
also analytic. So, up to an analytic change of variables,
Arrowsmith D and Place C (1990) Dynamical Systems. Cambridge:
we can assume that Es and Eu are invariant, which Cambridge University Press.
gives a simplification of the expression of X. More- Aulbach B (1992) One-dimensional center manifolds are C1 .
over, there is analytic dependence on parameters. Results in Mathematics 21: 3–11.
472 Channels in Quantum Information Theory

Birkhoff GD (1966) Dynamical Systems. With an addendum by Il’yashenko YS (1981) In the theory of normal forms of analytic
Jurgen Moser. American Mathematical Society Colloquium differential equations violating the conditions of Bryuno
Publications, vol. IX. Providence, RI: American Mathematical divergence is the rule and convergence the exception. Moscow
Society. University Mathematical Bulletin 36(2): 11–18.
Bonckaert P (1997) Conjugacy of vector fields respecting Il’yashenko YS and Pyartli AS (1986) Materialization of reso-
additional properties. Journal of Dynamical and Control nances and divergence of normalizing series for polynomial
Systems 3: 419–432. differential equations. Journal of Mathematical Sciences
Bonckaert P (2000) Symmetric and reversible families of vector 32(3): 300–313.
fields near a partially hyperbolic singularity. Ergodic Theory Il’yashenko YS and Yakovenko SY (1991) Finitely smooth normal
and Dynamical Systems 20: 1627–1638. forms of local families of diffeomorphisms and vector fields.
Broer H (1981) Formal normal forms for vector fields and some Russian Mathematical Surveys 46: 1–43.
consequences for bifurcations in the volume preserving case. Iooss G and Lombardi E (2005) Polynomial normal forms with
In: Dynamical Systems and Turbulence, Warwick 1980, exponentially small remainder for analytic vector fields.
vol. 898, Lecture Notes in Mathematics. New York: Springer. Journal of Differential Equations 212: 1–61.
Broer H and Takens F (1989) Formally symmetric normal forms Palis J and Takens F (1977) Topological equivalence of normally
and genericity. Dynamics Reported. A Series in Dynamical hyperbolic dynamical systems. Topology 16(4): 335–345.
Systems and their Applications 2: 11–18. Siegel CL and Moser JK (1971) Lectures on Celestial Mechanics,
Bruno AD (1989) Local Methods in Nonlinear Differential (reprint 1995). Berlin: Springer.
Equations. New York: Springer. Sternberg S (1958) On the structure of local homeomorphisms of
Chow S-N, Li C, and Wang D (1994) Normal Forms and Euclidean n-space. II. American Journal of Mathematics
Bifurcations of Planar Vector Fields. Cambridge: Cambridge 80: 623–631.
University Press. Takens F (1971) Partially hyperbolic fixed points. Topology
Chow S-N, Liu W, and Yi Y (2000) Center manifolds for invariant 10: 133–147.
sets. Journal of Differential Equations 168: 355–385. Vanderbauwhede A (1989) Center manifolds, normal forms and
Dumortier F (1991) Local study of planar vector fields: singula- elementary bifurcations. In: Kirchgraber U and Walther O
rities and their unfoldings. In: Van Groesen E and De Jager EM (eds.) Dynamics Reported, vol. 2, pp. 89–169. New York:
(eds.) Structures in Dynamics, Studies in Mathematical Physics, Wiley.
vol. 2, pp. 161–241. Amsterdam: Elsevier. Vanderbauwhede A and Iooss G (1992) Center manifold theory in
Gaeta G (2002) Poincaré normal and renormalized forms. Acta infinite dimensions. In: Jones CKRT et al. (eds.) Dynamics
Applicandae Mathematicae 70(1–3): 113–131 (symmetry and Reported, vol. 1, New Series, pp. 125–163. Berlin: Springer.
perturbation theory).

Channels in Quantum Information Theory


M Keyl, Università di Pavia, Pavia, Italy C
-algebras (which are, in our case, always finite
ª 2006 Elsevier Ltd. All rights reserved.
dimensional): quantum systems can be represented
in terms of the algebra B(H) of (bounded) operators
on the Hilbert space H = Cd ; for classical informa-
Introduction tion we have to choose the set C(X) of (continuous),
complex-valued functions on the finite alphabet X;
Consider a typical quantum system such as a string and the tensor product of both B(H) C(X)
of ions in a trap. To ‘‘process’’ the quantum describes hybrid systems which are half-classical
information the ions carry, we have to perform in and half-quantum. Assume now that A is one of
general many steps of a quite different nature. these algebras. Effects (i.e., yes/no measurements on
Typical examples are: free time evolution (including the system in question) are then described by A 2 A
unwanted but unavoidable interactions with the satisfying 0  A  1, states are positive, normalized
environment), controlled time evolution (e.g., the linear functionals ! : A ! C, and the probability to
application of a ‘‘quantum gate’’ in a quantum get the result ‘‘yes’’ during an A measurement on a
computer), preparations and measurements. Each system in the state ! is given by !(A). Since A is
processing step can be described by a channel which assumed to be finite dimensional, each state ! on
transforms input systems into output system of a B(H) is represented by a density operator , that is,
possibly different type (e.g., a measurement trans- !(A) = tr(A).P Likewise, a state ! on C(X) has the
forms quantum systems into classical information). form !(A) = x A(x)px , where (px )x2X denotes a
probability distribution on X, and a state ! on
Systems, States, and Algebras
B(H) C(X) is described by a sequence (x )x2X of
To get a unified mathematical description of systems positive
P (trace-class) operatorsPon B(H) with
of different physical nature, it is useful to consider x tr( x ) = 1 such that !(A) = x tr(x Ax ). Here
Channels in Quantum Information Theory 473

we have used the fact that an element A 2 B(H)  (i) T is called positive if T(A)  0 holds for all
C(X) can be represented in a canonical way by a positive A 2 A.
sequence (Ax )x2X of operators on H. The set of (ii) T is called completely positive (CP) if T 
states will be denoted in the following by S(A) and Id : A  B(Cn ) ! B(H)  B(Cn ) is positive for
the set of effects by E(A). all n 2 N. Here Id denotes the identity map
on B(Cn ).
Completely Positive Maps (iii) T is called unital if T(1) = 1 holds.
Our aim is now to get a mathematical object which Consider now the map T  : B ! A which is dual
can be used to describe a channel. To this end, to T, that is, T  (A) = (TA) for all  2 B and A 2 A.
consider two C -algebras, A, B, describing the input It is called the Schrödinger-picture representation of
and output system, respectively, and an effect A 2 B the channel T, since it maps states to states provided T
of the output system. If we invoke first a channel is unital. (Complete) positivity can be defined in the
which transforms A systems into B systems, and Schrödinger picture as in the Heisenberg picture, and
measure A afterwards on the output systems, we end we immediately see that T is (completely) positive iff
up with a measurement of an effect T(A) on the T  is.
input systems. Hence, we get a map T : E(B) ! E(A) It is natural to ask whether the distinction
which completely describes the channel (note that between positivity and complete positivity is
the direction of the mapping arrow is reversed really necessary, that is, whether there are
compared to the natural ordering of processing). positive maps which are not CP. If at least one
Alternatively, we can look at the states and interpret of the algebras A or B is classical, the answer is
a channel as a map T  : S(A) ! S(B) which trans- no: each positive map is CP in this case. If both
forms A systems in the state  2 S(A) into B systems algebras are quantum however, complete positiv-
in the state T  (). To distinguish between both ity is not implied by positivity alone. The most
maps, we can say that T describes the channel in the prominent example for this fact is the transposi-
Heisenberg picture and T  in the Schrödinger tion map.
picture. On the level of the statistical interpretation, If item (ii) holds only for a fixed n 2 N,
both points of view should coincide of course, that the map T is called n-positive. This is obviously
is, the probabilities (T  )(A) and (TA) to get the a weaker condition than complete positivity.
result ‘‘yes’’ during an A measurement on B systems However, n-positivity implies m-positivity for
in the state T  , respectively a TA measurement on all m  n, and for A = B(Cd ) complete positivity
A systems in the state , should be the same. Since is implied by n-positivity, provided n  d holds.
(T  )(A) is linear in A, we see immediately that T Let us consider now the question whether a
must be an affine map, that is, T(1 A1 þ 2 A2 ) = channel should be unital or not. We have already
1 T(A1 ) þ 2 T(A2 ) for each convex linear combina- mentioned that T(1)  1 must hold since effects
tion 1 A1 þ 2 A2 of effects in B, and this in turn should be mapped to effects. If T(1) is not equal to 1,
implies that T can be extended naturally to a linear we get (T1) = T  (1) < 1 for the probability to
map, which we will identify in the following with measure the effect 1 on systems in the state T  ,
the channel itself, that is, we say that T is the but this is impossible for channels which produce an
channel. output with certainty, because 1 is the effect which
Let us now change slightly our point of view and is always true. In other words, if a CP map is not
start with a linear operator T : A ! B. To be a unital, it describes a channel which sometimes
channel, T must map effects to effects, that is, T has produces no output at all and T(1) is the effect
to be positive: T(A)  0 8 A  0 and bounded from which measures whether we have got an output. We
above by 1, that is, T(1)  1. In addition, it is natural will assume henceforth that channels are unital if
to require that two channels in parallel are again a nothing else is explicitly stated.
channel. More precisely, if two channels T : A1 ! B1
and S : A2 ! B2 are given, we can consider the map
T  S which associates to each A  B 2 A1  A2 the Quantum Channels
tensor product T(A)  S(B) 2 B1  B2 . It is natural to
In this section we will discuss some basic properties
assume that T  S is a channel which converts
of CP maps which transform quantum systems into
composite systems of type A1  A2 into B1  B2
quantum systems, in particular the Stinespring
systems. Hence, S  T should be positive as well.
theorem, which constitutes the most important
Definition 1 Consider two observable algebras structural result. For a more detailed presentation,
A, B and a linear map T : A ! B  B(H). including generalizations to more general input/
474 Channels in Quantum Information Theory

output algebras the reader should consult the This representation of a channel has a (seemingly)
textbook by Paulsen (2002). very nice physical interpretation, because we can
look at eqn [3] as the unitary interaction of the
system with an unobservable environment, which is
The Stinespring Theorem
initially in the state 0 . The problem, however, is
Hence consider channels between quantum systems, that there is a great arbitrariness in the choice of U
i.e., A = B(H1 ) and B = B(H2 ). A fairly simple and 0 . This is the weakness of the ancilla form
example (not necessarily unital) is given in terms of compared to the Stinespring representation.
an operator V : H1 ! H2 by B(H1 ) 3 A 7! VAV  2 Finally, let us state a related result. It characterizes
B(H2 ). A second example is the restriction to a all decompositions of a given completely positive
subsystem, which is given in the Heisenberg picture map into completely positive summands. By analogy
by B(H) 3 A 7! A  1K 2 B(H  K). Finally the com- with results for states on abelian algebras (i.e.,
position S  T = ST of two channels is again a probability measures), we will call it a Radon–
channel. The following theorem says that each Nikodym theorem (see Arveson (1969) for a proof).
channel can be represented as a composition of Theorem 5 (Radon–Nikodym theorem). Let
these two examples [7]. Tx : B(H1 ) ! B(H2 ), x 2 X be a family of CP
Theorem 2 (Stinespring dilation theorem). Every maps and let V : P H2 ! H1  K be the Stinespring
operator of T =
completely positive map T : B(H1 ) ! B(H2 ) has the x Tx ; then there are uniquely
form determined
P positive operators Fx in B(K) with
x Fx = 1 and
TðAÞ ¼ V  ðA  1K ÞV ½1
Tx ðAÞ ¼ V  ðA  Fx ÞV ½4
with an additional Hilbert space K and an operator
V : H2 ! H1  K. Both (i.e., K and V) can be
chosen such that the span of all (A  1)V with A 2
B(H1 ) and  2 H2 is dense in H1  K. This The Jamiołkowski Isomorphism
particular decomposition is unique (up to unitary The subject of this section is a relation between CP
equivalence) and is called the minimal maps and states of bipartite systems, first discovered
decomposition. by Jamiołkowski (1972), and which is very useful in
By introducing a family jj ihj j of one-dimen- translating properties of bipartite systems into
P
sional projectors with j jj ihj j = 1, we can define properties of positive maps and vice versa.
the ‘‘Kraus operators’’ h , Vj i = h  j , Vi. The idea is based on the following setup. Alice
In terms of these, we can rewrite eqn [1] in and Bob share a bipartite system in a maximally
the following form (Kraus 1983): entangled state
Corollary 3 (Kraus form). Every CP map 1 X d

T : B(H1 ) ! B(H2 ) can be written in the form  ¼ pffiffiffi e  e 2 H  H ½5


d ¼1
X
N
(where e1 , . . . , ed denote an orthonormal basis of H).
TðAÞ ¼ Vj AVj ½2
j¼1
Alice applies to her subsystem a channel T : B(H) !
B(H0 ) while Bob does nothing. At the end of the
with operators Vj : H2 ! H1 . processing, the overall system ends up in a state
To get a third representation of channels, consider RT ¼ ðT  IdÞjihj ½6
the Stinespring form [1] of T and a vector 2 K
Mathematically, eqn [6] makes sense if T is only
such that U(  ) = V() can be extended to a
linear but not necessarily positive or CP (but then
unitary map U : H  K ! H  K. It is then easy to
RT is not positive either). If we denote the space of
see that the dual T  of T can be written as:
all linear maps from B(H) into B(H0 ) by L, we get a
Corollary 4 (Ancilla form). Assume that T : B(H) ! map
B(H) is a channel. Then there is a Hilbert space K, a
L 3 T 7! RT 2 BðK  HÞ ½7
pure state 0 , and a unitary map U : H  K ! H  K
such that which is easily shown to be linear (i.e.,
  RTþS = RT þ RS for all ,  2 C and all
T ðÞ ¼ trK ðUð  0 ÞU Þ ½3
T, S 2 L). Furthermore, this map is bijective, hence
holds. a linear isomorphism.
Channels in Quantum Information Theory 475

Theorem 6 The map defined in eqns [7] and [6] is The most prominent examples of covariant
a linear isomorphism. The inverse map is given by channels arise with H1 = H2 = Cd , G = U(d) and

1 (U) =
2 (U) = U. All channels of this type are of
BðH  H0 Þ 3  7! T 2 L ½8
the form
with
  TðAÞ ¼ ð1 #ÞA þ #d 1 trðAÞ1
he0 ; T ðÞe0 i ¼ d tr ðje0 ihe0 j  T Þ ½9 with # 2 ½0; d2 =ðd2 1Þ ½11
where e01 , . . . , e0d0 2 H0 denote an (arbitrary) ortho- and are known as ‘‘depolarizing channels.’’ They
normal basis of H0 and the transposition of  is often serve as a standard model for noise. Two
defined with respect to the basis e ,  = 1, . . . , d used particular cases are the ideal channel arising with
to define  in [5]. # = 0, and the completely depolarizing channel
From the definition of RT in eqn [6], it is obvious (# = 1) which erases all information. If we choose
 (where the bar denotes complex conju-

2 (U) = U
that RT is positive, if T is CP. To see that the
converse is also true is not as trivial (because a gate) instead of
2 (U) = U, we get
transposition is involved), but it requires only a #  
short calculation, which is omitted here. Hence, we TðAÞ ¼ trðAÞ1 þ AT
dþ1
get:
1 # 
þ trðAÞ1 AT ; # 2 ½0; 1 ½12
Corollary 7 The operator RT is positive, iff the d 1
map T is CP.
If we map these channels to states of bipartite
systems (using the Jamiołkowski isomorphism from
Examples the last section), we get ‘‘Isotropic states’’ from
eqn [11] and ‘‘Werner states’’ from [12].
Let us return now to the general case (i.e., arbitrary
input and output algebras) and discuss several
examples. Classical Channels

The classical analog to a quantum operation is a


Channels Under Symmetry channel T : C(X) ! C(Y) which describes the trans-
It is often useful to consider channels with special mission or manipulation of classical information. As
symmetry properties. To be more precise, consider already mentioned in the subsection ‘‘Completely
a group G and two unitary representations
1 ,
2 positive maps,’’ positivity and complete positivity
on the Hilbert spaces H1 and H2 , respectively. are equivalent in this case. Hence, we have to
A channel T : B(H1 ) ! B(H2 ) is called covariant assume only that T is positive and unital. Obviously,
(with respect to
1 and
2 ) if T is characterized by its matrix elements
Txy = y (Tex ), where y 2 C (X) denotes the Dirac

1 ðUÞA
1 ðUÞ  ¼
2 ðUÞT½A
2 ðUÞ measure at y 2 Y and ex 2 C(X) is the canonical
8 A 2 BðH1 Þ 8 U 2 G ½10 basis in C(X). More precisely, y and ex denote,
respectively, the probability distribution and the
holds. The general structure of covariant channels function on X, given by
is governed by a fairly powerful variant of Stine-
springs theorem (Keyl and Werner 1999). y ¼ ð xy Þx2X and ex ðyÞ ¼ xy ½13
Theorem 8 Let G be a group with finite-dimen- We will keep this notation up to the end of this
sional unitary representations
j : G ! U(Hj ) and article. Positivity and normalization of T imply that
T : B(H1 ) ! B(H2 ) a
1 ,
2 -covariant channel. 0  Txy  1 and
(i) Then there is a finite-dimensional unitary
1 ¼ y ð1Þ ¼ y ðTð1ÞÞ
representation
˜ : G ! U(K) and an operator " !#
V : H2 ! H1  K with V
2 (U) =
1 (U) 
(U)V
˜ X X
 ¼ y T ex ¼ Txy ½14
and T(A)
P = V A  1V. x x
(ii) If T =  T  is a decomposition of T in CP and
covariant
P summands, there is a decomposition holds. Hence the family (Txy )x2X is a probability
1 =  F of the identity operator on K into distribution on X and Txy is, therefore, the transition
positive operators F 2 B(K) with [F ,
(g)]
˜ =0 probability to get the information x 2 X at the
such that T  (X) = V  (X  F )V. output side of the channel if y 2 Y was sent.
476 Channels in Quantum Information Theory

   
Observables tr Tx ðÞ ¼ tr Tx ðÞ1 ¼ trðTð1  ex ÞÞ ½18
Let us consider now a channel which transforms is (again) the probability to measure x 2 X on .
quantum information B(H) into classical information The instrument T can be expressed in terms of the
C(X). Since positivity and complete positivity are operations Tx by
again equivalent, we just have to look at a positive X
and unital map E : C(X) ! B(H). With the canonical TðA  f Þ ¼ f ðxÞTx ðAÞ ½19
basis ex , x 2 X, of C(X), we get a family x
Ex = E(e
P x ), x 2 X, of positive operators Ex 2 B(H) Hence, we can identify T with the family Tx , x 2 X.
with x2X Ex = 1. Hence, the Ex form a positive Finally, we can consider the second marginal of T
operator valued (POV) measure, i.e., an observable. X
If, on the other hand, a POV measure Ex 2 B(H), x 2 BðHÞ 3 A 7! TðA  1Þ ¼ Tx ðAÞ 2 BðKÞ ½20
X, is given, we can define a quantum-to-classical x2X
channel E : C(X) ! B(H) by It describes the operation we get if the outcome of
X the measurement is ignored.
Eðf Þ ¼ f ðxÞEx ½15 The best-known example of an instrument is a von
x2X
Neumann–Lüders measurement associated with a PV
This shows that the observable Ex , x 2 X, and the measure given by family of projections Ex , x = 1,
channel E can be identified. . . . , d; for example, the eigenprojections of a self-
adjoint operator A 2 B(H). It is defined as the channel
Preparations
T : BðHÞ  CðXÞ ! BðHÞ
Let us now exchange the role of C(X) and B(H); in with X ¼ f1; . . . ; dg and Tx ðAÞ ¼ Ex AEx ½21
other words, let us consider a channel R : B(H) !
1
C(X) with a classical input and a quantum output Hence, we get the final state tr(Ex ) Ex Ex if we
algebra. In the Schrödinger picture, we get a family of measure the value x 2 X on systems initially in the
density matrices x := R ( x ) 2 B (H), x 2 X, where state  – this is well known from quantum mechanics.
x 2 C (X) denotes again the Dirac measure on X.
Hence, we get a parameter-dependent preparation
Parameter-Dependent Operations
that can be used to encode the classical information
x 2 X into the quantum information x 2 B (H). Let us change now the role of B(H)  C(X) and
B(K); in other words, consider a channel T : B(K) !
Instruments B(H)  C(X) with hybrid input and quantum output.
An observable describes only the statistics of It describes a device which changes the state of a
measuring results, but does not contain information system depending on the additional classical infor-
about the state of the system after the measurement. mation. As for an instrument, T decomposes into a
To get a description which fills this gap, we have family of (unital!) channels
P Tx : B(K) ! B(H) such
to consider channels which operate on quantum that we get T  (  p) = x px Tx () in the Schrödin-
systems and produce hybrid systems as output, that is, ger picture. Physically, T describes a parameter-
T : B(H)  C(X) ! B(K). Following Davies (1976), dependent operation: depending on the classical
we will call such an object an instrument. From T we information x 2 X, the quantum information  2
can derive the subchannel B(K) is transformed by the operation Tx .
Finally, we can consider a channel T : B(H) 
CðXÞ 3 f 7! Tð1  f Þ 2 BðKÞ ½16 C(X) ! B(K)  C(Y) with hybrid input and output
to get a parameter-dependent instrument: similarly
which is the observable measured by T, that is, to the above discussion, we can define a family of
tr(T(1  ex )) is the probability to measure x 2 X on instruments Ty : B(H) P C(X) ! B(K), y 2 Y, by the
systems in the state . On the other hand, we get for equation T  (  p) = y py Ty (). Physically, T
each x 2 X a quantum channel (which is not unital) describes the following device: it receives the
BðHÞ 3 A 7! Tx ðAÞ ¼ TðA  ex Þ 2 BðKÞ ½17 classical information y 2 Y and a quantum system
in the state  2 B (K) as input. Depending on y, a
It describes the operation performed by the instru- measurement with the instrument Ty is performed,
ment T if x 2 X was measured. More precisely, if a which in turn produces the measuring value x 2 X
measurement on systems in the state  gives the and leaves the quantum system in the state (up to
result x 2 X, we get (up to normalization) the state normalization) Ty, x (); with Ty, x given as in eqn
Tx () after the measurement, while [17] by Ty, x (A) = Ty (A  ex ).
Chaos and Attractors 477

See also: Capacities Enhanced by Entanglement; Davies EB (1976) Quantum Theory of Open Systems. London:
Capacity for Quantum Information; Entanglement; Academic Press.
Optimal Cloning of Quantum States; Positive Maps on Jamiołkowski A (1972) Linear transformations which preserve
C*-Algebras; Quantum Channels: Classical Capacity; trace and positive semidefiniteness of operators. Reports on
Mathematical Physics 3: 275–278.
Quantum Dynamical Semigroups; Quantum Entropy;
Keyl M and Werner RF (1999) Optimal cloning of pure states, testing
Quantum Spin Systems; Source Coding in Quantum single clones. Journal of Mathematical Physics 40: 3283–3299.
Information Theory. Kraus K (1983) States Effects and Operations. Berlin: Springer.
Paulsen VI (2002) Completely Bounded Maps and Dilations.
Cambridge: Cambridge University Press.
Further Reading Stinespring WF (1955) Positive functions on C -algebras.
Proceedings of the American Mathematical Society 6: 211–216.
Arveson W (1969) Subalgebras of C -algebras. Acta Mathematica
123: 141–224.

Chaos and Attractors


R Gilmore, Drexel University, Philadelphia, PA, USA attractor has fractal structure, it is called a ‘‘strange
ª 2006 Elsevier Ltd. All rights reserved. attractor.’’
Tools to study strange attractors have been
developed that depend on three types of mathe-
matics: geometry, dynamics, and topology.
Introduction Geometric tools attempt to study the metric
Chaos is a type of behavior that can be exhibited by relations among points in a strange attractor.
a large class of physical systems and their mathe- These include a spectrum of fractal dimensions.
matical models. These systems are deterministic. These real numbers are difficult to compute, require
They are modeled by sets of coupled nonlinear very long, very clean data sets, provide a number
ordinary differential equations (ODEs): without error estimates for which there is no
underlying statistical theory, and provide very little
dxi
x_ i ¼ ¼ fi ðx; cÞ ½1 information about the attractor.
dt Dynamical tools include estimation of Lyapunov
called dynamical systems. The coordinates x desig- exponents and a Lyapunov dimension. They include
nate points in a state space or phase space. globally averaged exponents and local Lyapunov
Typically, x 2 Rn or some n-dimensional manifold exponents. These are eigenvalues related to the
for some n  3, and c 2 Rk are called control different stretching ( > 0) and squeezing ( < 0)
parameters. They describe parameters that can be eigendirections in the phase space. To each globally
controlled in physical systems, such as pumping averaged Lyapunov exponent i , 1  2   n ,
rates in lasers or flow rates in chemical mixing there corresponds a ‘‘partial dimension’’ i , 0  i  1,
reactions. The most important mathematical prop- with i = 1 if i  0. The Lyapunov dimension P is
erty of dynamical systems is the uniqueness theorem, the sum of the partial dimensions dL = ni= 1 i .
which states that there is a unique trajectory through That the partial dimension i = 1 for i  0 indicates
every point at which f (x; c) is continuous and that the flow is smooth in the stretching (i > 0) and
Lipschitz and f (x; c) 6¼ 0. In particular, two distinct flow directions and fractal in the squeezing (i < 0)
periodic orbits cannot have any points in common. directions with i < 1. Dynamical indices provide
The properties of dynamical systems are gov- some useful information about a strange attractor.
erned, in lowest order, by the number, stability, and In particular, they can be used to estimate some
distribution of their fixed points, defined by fractal properties of a strange attractor, but not vice
x_ i = fi (x; c) = 0. It can happen that a dynamical versa.
system has no stable fixed points and no stable Topological tools are very powerful for a
limit cycles (x(t) = x(t þ T), some T > 0, all t). In restricted class of dynamical systems. These are
such cases, if the solution is bounded and recurrent dynamical systems in three dimensions (n = 3). For
but not periodic, it represents an unfamiliar type of such systems there are three Lyapunov exponents
attractor. If the system exhibits ‘‘sensitivity to initial 1 > 2 > 3 , with 1 > 0 describing the stretching
conditions’’ (jx(t) y(t)j
et jx(0) y(0)j for direction and responsible for ‘‘sensitivity to initial
jx(0) y(0)j = and  > 0 for most x(0)), the conditions,’’ 2 = 0 describing the direction of the
solution set is called a ‘‘chaotic attractor.’’ If the flow, and 3 < 0 describing the squeezing direction
478 Chaos and Attractors

and responsible for ‘‘recurrence.’’ Strange attractors Lyapunov exponents and squeezing occurs in the
are generated by dissipative dynamical systems, directions identified by the negative Lyapunov
which satisfy the additional condition 1 þ 2 þ exponents. In R3 there is one stretching direction
3 < 0. For such attractors, 1 = 2 = 1 and and one squeezing direction.
3 = 1 =j3 j by the Kaplan–Yorke conjecture, so A simple stretch-and-squeeze mechanism that
that dL = 2 þ 3 = 2 þ 1 =j3 j. nature appears to be very fond of is illustrated in
A number of tools from classical topology have Figure 1. In this illustration, a cube of initial
been exploited to probe the structure of strange conditions at (a) is advected by the flow in a short
attractors in three dimensions. These include the time to (b). During this process, the cube is
Gauss linking number, the Euler characteristic, the deformed by being stretched (1 > 0). It also shrinks
Poincaré–Hopf index theorem, and braid theory. in a transverse direction (3 < 0). During the initial
More recent topological contributions include sev- phase of this deformation, two nearby points
eral definitions for entropy, the development of a typically separate exponentially in time. If they
theory for knot holders or braid holders (also called were to continue to separate exponentially for all
branched manifolds), the Birman–Williams theorem times, the invariant set would not be bounded.
for these objects, and relative rotation rates, a Therefore, this separation cannot continue indefi-
topological index for individual periodic orbits and nitely, and in fact it must somehow reverse itself
orbit pairs. after some time because the motion is recurrent. The
Three-dimensional strange attractors are mechanism shown in Figure 1 involves folding,
remarkably well understood; those in higher which begins between (b) and (c) and continues
dimensions are not. As a result, the description through to (d). Squeezing occurs where points from
that follows is largely restricted to strange attrac- distant parts of the attractor approach each other
tors with dL < 3 that exist in R3 or other three- exponentially, as at (d). Finally, the cube, shown
dimensional manifolds (e.g., R2 S1 ). The obstacle deformed at (d), returns to the neighborhood of
to progress in higher dimensions is the lack of a initial conditions (a). This process repeats itself and
higher-dimensional analog of the Gauss linking builds up the strange attractor. As can be inferred
number for orbit pairs in R3 . from this figure, the strange attractor constructed by
the repetitive process is smooth in the expanding
(1 ) and flow (2 = 0) directions but fractal in the
Overview squeezing (3 ) direction. The attractor’s fractal
The program described below has two objectives: dimension is 1 þ 2 þ 3 = 2 þ 3 = 2 þ 1 =j3 j.
Figure 1 summarizes the boundedness and recur-
1. classify the global topological structure of strange rence conditions that were introduced to define
attractors in R3 ; and strange attractors, and illustrates one stretching and
2. determine the ‘‘perestroikas’’ (changes) that such squeezing mechanism that occurs repetitively to
attractors can undergo as experimental condi- build up the fractal structure of the strange attractor
tions or control parameters change.
Four levels of structure are required to complete
this program. Each is topological and discretely
quantifiable. This provides a beautiful interaction
between a rigidity of structure, demanded by
topological constraints, and freedom within this
rigidity. These four levels of structure are: Boundary (c)
layer
1. basis sets of orbits,
2. branched manifolds or knot holders,
3. bounding tori, and
4. embeddings of bounding tori. Squeeze Stre
(d) tch
(b)

Branched Manifolds: Stretching


and Squeezing (a)
Figure 1 A common stretch-and-fold mechanism generates
A strange attractor is generated by the repetition of many experimentally observed strange attractors. The Topology
two mechanisms: stretching and squeezing. Stretch- of Chaos; R Gilmore and M Lefranc; Copyright ª 2002, Wiley.
ing occurs in the directions identified by the positive This material is used by permission of John Wiley & Sons, Inc.
Chaos and Attractors 479

and to organize all the (unstable) periodic orbits in it outflow side of the branch line) have two preimages
in a unique way. The particular mechanism shown above the branch line, one in each inflow sheet. This
in Figure 1 is called a stretch-and-fold mechanism. structure generates positive entropy.
Other mechanisms involve stretch and roll, and tear A beautiful theorem of Birman and Williams
and squeeze. justifies the use of the two cartoons shown at the
The stretch-and-squeeze mechanisms are well bottom of Figure 2 to characterize strange attractors
summarized by the cartoons shown in Figure 2. On in R3 . As preparation for the theorem, Birman and
the left, a cube of initial conditions (top) is deformed Williams introduced an important identification for
under the flow. The flow is downward. Stretching the nongeneric or atypical points that ‘‘are not
occurs in one direction (horizontal) and shrinking sensitive to initial conditions’’
occurs in a transverse direction (perpendicular to the
t!1
page). In the limit of extreme shrinking (3 ! x  y if jxðtÞ  yðtÞj ! 0 ½2
‘‘1”), the dynamics of the stretching part of the
flow is represented by the two-dimensional surface That is, two points in a strange attractor are
shown on the bottom left. This surface fails to be a identified if they have asymptotically the same
manifold because of the singularity, called a splitting future. In practice, this amounts to projecting the
point. This singularity represents an initial condition flow down along the stable (3 < 0) direction onto a
that flows to an unstable fixed point with at least two-dimensional surface described by the stretching
one stable direction. On the right (squeezing), two (1 > 0) and the flow (2 = 0) directions. This
distant cubes of initial conditions (top) in the flow surface is not a manifold because of lower-
are deformed and brought to each other’s proximity dimensional singularities: splitting points and branch
under the flow (middle). In the limit of extreme lines. The two-dimensional surface has many names,
dissipation, two two-dimensional surfaces represent- for example, knot holder (because it holds the
ing inflows are joined at a branch line to a single periodic orbits that exist in abundance in strange
surface representing an outflow. This surface fails to attractors), braid holders, templates, branched mani-
be a manifold because of the branch line, which is a folds. The flow, restricted to this surface, is called a
singularity of a different kind. Points below the semiflow. Under the semiflow, points in the branched
branch line in this representation of the flow (on the manifold have a unique future but do not have a
unique past. The degree of nonuniqueness is mea-
sured by the topological entropy of the dynamical
system. The Birman–Williams theorem is:
Theorem Assume that a flow t
3
(i) on R is dissipative (1 > 0, 2 = 0, 3 < 0 and
1 þ 2 þ 3 < 0);
Shrink
Stretch (ii) generates a hyperbolic strange attractor (the
Shrink Shrink
xxx
eigenvectors of the local Lyapunov exponents
Boundary 1 , 2 , 3 span everywhere on the attractor).
layer
Squeeze
Then the projection [2] maps the strange attractor
Flow

Flow

SA to a branched manifold BM and the flow t on


SA to a semiflow  ˆ t on BM in R3 . The periodic
orbits in SA under t correspond 1:1 with the
periodic orbits in BM under ˆ t with perhaps one or
two specified exceptions. On any finite subset of
orbits the correspondence can be taken via isotopy.
Stretch
Squeeze The beauty of this theorem is that it guarantees
Branch
Flow

line that a flow t that generates a (fractal) strange


Flow

attractor SA can be continuously deformed to a new


flow ˆ t on a simple two-dimensional structure BM.
Figure 2 Left: The stretch mechanism is modeled by a two- During this deformation, periodic orbits are neither
dimensional surface with a splitting point singularity. Right: The created nor destroyed. The uniqueness theorem for
squeeze mechanism is modeled by a two-dimensional surface
with a branch line singularity. The Topology of Chaos; R Gilmore
ODEs is satisfied during the deformation, so orbit
and M Lefranc; Copyright ª 2002, Wiley. This material is used segments do not pass through each other. As a
by permission of John Wiley & Sons, Inc. result, the topological organization of all the
480 Chaos and Attractors

unstable periodic orbits in the strange attractor is unique up to cyclic permutation. This symbol
the same as the topological organization of all the sequence provides a symbolic name for the orbit.
unstable periodic orbits in the branched manifold. In For example, (a)()(b)(ba) is a period-4 orbit.
fact, the branched manifold (knot holder) defines The structure of a branched manifold is determined
the topological organization of all the unstable in part by a transition matrix T. The matrix element
periodic orbits that it supports. Topological organi- Tij is 1 if the transition from branch i to branch j is
zation is defined by the Gauss linking number and allowed, 0 otherwise. The transition matrix for the
the relative rotation rates, another braid index. figure-8 branched manifold is shown in Figure 3.
The significance of this theorem is that strange The Birman–Williams theorem is stronger than its
attractors can be characterized – in fact classified – statement suggests. More systems satisfy the state-
by their branched manifolds. Figure 3 shows a ment of the theorem than do the assumptions of the
branched manifold ‘‘for a figure-8 knot’’ as well as theorem. The figure-8 knot, and its attendant
the figure-8 knot itself (dark curve). If a constant magnetic field, is not dissipative – in fact, it is not
current is sent through a conducting wire tied into even a dynamical system, yet the closed loops can be
the shape of a figure-8 knot, a discrete countable set isotoped to the figure-8 knot holder. There are other
of magnetic field lines will be closed. These closed ways in which the Birman–Williams theorem is
field lines can be deformed onto the two-dimen- stronger than its statement suggests.
sional surface shown in Figure 3. Each of the eight It is apparent from Figure 3 that the figure-8
branches of this branched manifold can be named. branched manifold can be built up Legoª fashion
One way to do this specifies the two branch lines from the two basic building blocks shown in
that are joined by the branch in the sense of the flow Figure 2. This is more generally true. Every
(e.g., (a) and () (but not (a)). Every closed field branched manifold can be built up, Legoª fashion,
line can be labeled by a symbol sequence that is from the stretch (with a splitting point singularity)
and the squeeze (with a branch line singularity)
building blocks, subject to the following two
conditions:
1. outputs flow to inputs and
2. there are no free ends.

α The figure-8 branched manifold is built up from


a b β four stretch and four squeeze building blocks. As a
result, there are eight branches and four branch
lines.
Two often-studied strange attractors are shown in
Figures 4 and 5. Figure 4 shows the details of the
Rössler dynamical system. A similar spectrum of
features is shown in Figure 5 for the Lorenz equations.
ab aα αβ αa ba bβ βα βb
The knot holder in Figure 5e is obtained from the
ab 0 0 0 0 1 1 0 0 caricature in Figure 5d by twisting the right-hand lobe
by  radians.
aα 0 0 1 1 0 0 0 0
Branched manifolds can be used to characterize
αβ 0 0 0 0 0 0 1 1 all three-dimensional strange attractors. Branched
manifolds that classify the strange attractors gener-
αa 1 1 0 0 0 0 0 0
ated by four familiar sets of equations (for some
ba 1 1 0 0 0 0 0 0 control parameter values) are shown in Figure 6.

The sets of equations, and one set of parameter
0 0 0 0 0 0 1 1
values that generate strange attractors, are presented
βα 0 0 1 1 0 0 0 0 in Table 1.
βb 0 0 0 0 1 1 0 0
The beauty of this topological classification of
strange attractors is that it is apparent, just by
Figure 3 Figure-8 knot (dark curve) and the figure-8 branched inspection, that there is no smooth change of
manifold. Transition matrix for the eight branches of the figure-8
variables that will map any of these systems to any
branched manifold is also shown. Flow direction is shown by
arrows. The Topology of Chaos; R Gilmore and M Lefranc; of the others for the parameter values shown.
Copyright ª 2002, Wiley. This material is used by permission of Branched manifolds can be described algebrai-
John Wiley & Sons, Inc. cally. In Figure 7 we provide the algebraic
Chaos and Attractors 481

dx = –y – z 4 z(t )
dt 2
dy
= x + ay 0
dt x(t )
dz = b + z(x – c) –2
dt –4

(a) (b) (c)

(d) (e)
Figure 4 The Rössler dynamical system. (a) Rössler equations. (b) Time series z(t) and x(t) generated by these equations, and
(c) projection of the strange attractor onto the x–y plane. (d) Caricature of the flow and (e) knot holder derived directly from the
caricature. Control parameter values (a, b,c) = (2:0, 4:0, 0:398): The Topology of Chaos; R Gilmore and M Lefranc; Copyright ª 2002,
Wiley. This material is used by permission of John Wiley & Sons, Inc.

50
40
z(t )
30
dx = –σx + σy 20
dt 10
dy
= Rx – y – xz 0 x(t )
dt
dz = –bz + xy –10
dt –20

(a) (b) (c)

(e) (d)
Figure 5 (a) Lorenz equations. (b) Time series x (t) and z(t) generated by these equations, and (c) projection of the strange attractor
onto the x–y plane. (d) Caricature of the flow and (e) knot holder derived directly from the caricature by rotating the right-hand lobe by 
radians. Control parameter values (R, , b) = (26:0, 10:0, 8=3): The Topology of Chaos; R Gilmore and M Lefranc; Copyright ª 2002,
Wiley. This material is used by permission of John Wiley & Sons, Inc.

description of two branched manifolds. Figure 7a elements are twice the linking number of the
shows the branched manifold that describes experi- period-1 orbits in the corresponding pair of branches.
mental data generated by many physical systems. Since the period-1 orbits in these two branches do not
The mechanism is a simple stretch-and-fold defor- link, the off-diagonal matrix elements are 0. The
mation with zero global torsion that generates a period-1 orbits in the branches labeled 1 and 2 in
typical Smale horseshoe. There are two branches. Figure 7b have linking number þ1, so the off-diagonal
The diagonal elements of the matrix identify the matrix elements are T(1, 2) = T(2, 1) = 2 þ1. The
local torsion of the flow through the corresponding array identifies the order (above, below) that the two
branch, measured in units of . Branch 0 has no branches are joined at the branch line, the smaller the
local torsion, and branch 1 shows a half-twist and value, the closer to the viewer. These two pieces of
has local torsion þ1. The off-diagonal matrix information, four integers in Figure 7a and eight in
482 Chaos and Attractors

–1
0

0 1 0 1 2

(a) (b)

b
c
a

0 0 0
a b 0 0
0 1 2
a′ 0 1
c′ 0 2 2
b′
(c) (d) 0 –1 0 –2 –1
Figure 6 Branched manifolds for four standard sets of
equations: (a) Rössler equations, (b) periodically driven Duffing (a) (b)
equations, (c) periodically driven van der Pol equations, and Figure 7 Branched manifolds are described algebraically. The
(d) Lorenz equations. The Topology of Chaos; R Gilmore and diagonal matrix elements describe the twist of each branch.
M Lefranc; Copyright ª 2002, Wiley. This material is used by The off-diagonal matrix elements are twice the linking number of
permission of John Wiley & Sons, Inc. the period-1 orbits in each of the two branches. The array
describes the order in which the branches are connected at the
branch line. (a) Smale horseshoe branched manifold. (b) Beginning
Table 1 Four sets of equations that generate strange attractors of a ‘‘gateau roulé’’ (jelly roll) branched manifold.
Dynamical Parameter
system ODEs values Table 2 shows the number of orbits of period
p  20 for the branched manifolds with two and
x_ = y  z
Rössler y_ = x þ ay (a, b, c) = (2:0, 4:0, 0:398) three branches shown in Figure 7. The number of
z_ = b þ z(x  c) orbits of period p grows exponentially with p, and
x_ = y the limit hT = limp ! 1 log (N(p))=p defines the topo-
Duffing y_ = y  x 3 þ x (, A, !) = (0:4, 0:4, 1:0) logical entropy hT for the branched manifold. The
þ A sin(!t) limits are ln 2 and ln 3 for the branched manifolds
van der Pol x_ = by þ (c  dy 2 )x (b, c, d , A, !) = with two and three branches, respectively. The
y_ = x þ A sin(!t) (0:7, 1:0, 10:0, 0:25, =2) linking numbers of orbits up to period 5 in the
x_ = x þ y Smale horseshoe branched manifold are shown in
Lorenz y_ = Rx  y  xz (R, , b) = (26:0, 10:0, 8=3) Table 3, which identifies each of the orbits by its
z_ = bz þ xy
symbol sequence (e.g., 00111).

Table 2 Number of orbits of period p on the branched manifolds


Figure 7b, serve to determine the topological organi- with two and three branches, shown in Figure 7. The integers
zation of all the unstable periodic orbits in any N3 (p) are constructed by replacing 2p by 3p in eqn [3]
strange attractor with either branched manifold. Two Three Two Three
The periodic orbits are identified by a repeating Period branches branches Period branches branches
symbol sequence of least period p, which is unique p N2 (p) N3 (p) p N2 (p) N3 (p)
up to cyclic permutation. The symbol sequence
1 2 3 11 186 16 104
consists of a string of integers, sequentially identify- 2 1 3 12 335 44 220
ing the branches through which the orbit passes. For 3 2 8 13 630 122 640
a branched manifold with two branches, there are 4 3 18 14 1 161 341 484
two symbols. The number of orbits of period 5 6 48 15 2 182 956 576
p, N(p), obeys the recursion relation 6 9 116 16 4 080 2 690 010
7 18 312 17 7 710 7 596 480
kp=2
X 8 30 810 18 14 532 21 522 228
pNðpÞ ¼ 2p  kNðkÞ ½3 9 56 2184 19 27 954 61 171 656
1¼kjp
10 99 5880 20 52 377 174 336 264
Chaos and Attractors 483

Table 3 Linking numbers of orbits to period 5 in the Smale horseshoe branched manifold with zero global torsion

0 1 21 31 31 41 42 42 51 51 52 52 53 53

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 2 1 1 2 2 2 2 1 1
21 01 0 1 1 2 2 3 2 2 4 4 3 3 2 2
31 011 0 1 2 2 3 4 3 3 5 5 5 5 3 3
31 001 0 1 2 3 2 4 3 3 5 5 4 4 3 3
41 0111 0 2 3 4 4 5 4 4 8 8 7 7 4 4
42 0011 0 1 2 3 3 4 3 4 5 5 5 5 4 4
42 0001 0 1 2 3 3 4 4 3 5 5 5 5 4 4
51 01111 0 2 4 5 5 8 5 5 8 10 9 9 5 5
51 01101 0 2 4 5 5 8 5 5 10 8 8 8 5 5
52 00111 0 2 3 5 4 7 5 5 9 8 6 7 5 5
52 00101 0 2 3 5 4 7 5 5 9 8 7 6 5 5
53 00011 0 1 2 3 3 4 4 4 5 5 5 5 4 5
53 00001 0 1 2 3 3 4 4 4 5 5 5 5 5 4

Tables of linking numbers have been used supports. Whenever a low-dimensional strange
successfully to identify mechanisms that nature uses attractor is subjected to topological analysis, it is
to generate chaotic data. This analysis procedure is always the case that fewer periodic orbits are
called topological analysis. Segments of data are present and identified than are allowed by the
identified that closely approximate unstable periodic branched manifold that classifies it. This is the case
orbits existing in the strange attractor. These data for strange attractors generated by experimental
segments are then embedded in R3 . Each orbit is data as well as strange attractors generated by
given a trial identification (symbol sequence). Their ODEs. The full spectrum occurs only in the
pairwise linking numbers are computed either by hyperbolic limit, which has never been seen.
counting signed crossings or using the time- The orbits that are present are organized exactly
parametrized data segments and estimating the as in the hyperbolic limit – that is, as determined by
integers numerically using the Gauss linking integral the underlying branched manifold. As control para-
meters change, the strange attractor undergoes
LinkðA; BÞ
I I perestroikas. New orbits are created and/or old
1 r A ðt1 Þ  r B ðt2 Þ orbits are annihilated in direct or inverse period-
¼ dr A ðt1 Þ dr B ðt2 Þ
4 jr A ðt1 Þ  r B ðt2 Þj3 doubling and saddle–node bifurcations. The orbits
that are present are always organized as determined
This table of experimental integers is compared with by the branched manifold. Orbits are not created or
the table of linking numbers for orbits with the same annihilated independently of each other. Rather,
symbolic name on a trial branched manifold. This there is a partial order (‘‘forcing order’’) involved in
procedure serves to identify the branched manifold orbit creation and annihilation. This partial order is
and refine the symbolic identifications of the poorly understood for general branched manifolds.
experimental orbits, if necessary. The procedure is It is much better understood for the two-branch
vastly overdetermined. For example, the linking Smale horseshoe branched manifold.
numbers of only three low-period orbits serve to The forcing diagram for this branched manifold
identify the four pieces of information required to is shown in Figure 8 for orbits up to period 8. It is
specify a branched manifold with two branches. typically the case that the existence of one orbit in
Since six or more surrogate periodic orbits can a strange attractor forces the presence of a
typically be
  extracted from experimental data, spectrum of additional orbits. Forcing is transitive,
providing 62 = 15 or more linking numbers, this so if orbit A forces orbit B(A ) B) and B forces C,
topological analysis procedure has built-in self- then A forces C: if A ) B and B ) C then A ) C.
consistency checks, unlike analysis procedures For this reason, it is sufficient to show only the
based on geometric and dynamical tools. first-order forcing in this figure. The orbits shown
are labeled by their period and the order in which
they are created in a particular highly dissipative
Basis Sets of Orbits
limit of the dynamics: the logistic map (U-sequence
A branched manifold determines the topological order in Figure 8). For example, 52 describes the
organization of all the periodic orbits that it second (pair) of period-5 orbits created in the
484 Chaos and Attractors

0.70
ln 2 815
78
0.65
Forcing of horseshoe 64
orbits to period 8
0.60
810F 811R

810R 811F
0.55 52

85F 86F 8
8
0.50 813R 814R
85R 86R
Entropy

813F 814F
0.45 73F 74F
72
73R 74R

0.40 76R 77R


76F 77F

0.35 84R 87R


84F 87F
Other 61
finite order 82 812
Period 63
89
doubled 21 41 81 83 31
62 75 42 53 65 79 816
Well 71 51
ordered

(a)

U-sequence order

Wo : f 1/2 3/7 2/5 3/8 1/3 2/7 1/4 1/5 1/6 1/7 1/8

Braids
PD 21 41 81 61 82 71 51 72 83 31 62 84 73 85 52 86 74 87 63 88 75 42 89 76 810 64 811 77 812 53 813 78 814 65 815 79 816

QOD : f 2/5 1/3 1/4 1/5 1/6

(b)
Figure 8 (a) Forcing diagram for orbits up to period 8 in the Smale horseshoe branched manifold. (b) The sequence (‘‘universal
order’’) in which orbits are created in the highly dissipative limit, which is the logistic map. The Topology of Chaos; R Gilmore and
M Lefranc; Copyright ª 2002, Wiley. This material is used by permission of John Wiley & Sons, Inc.

logistic map in the transition from simple, non- period. The basis set of orbits can be constructed
chaotic behavior to fully chaotic (hyperbolic) algorithmically. The algorithm is as follows:
behavior.
1. Write down all the orbits that are present in
The orbits in the forcing diagram are organized
order of increasing two-dimensional entropy
according to their one-dimensional entropy
from left to right.
(horizontal axis, U-sequence order) and their two-
2. For orbits with the same two-dimensional entropy,
dimensional entropy (vertical axis). Nonchaotic
order by increasing one-dimensional entropy.
(‘‘laminar’’) behavior occurs at the lower left of
3. Remove the ‘‘highest’’ (rightmost) orbit from this
this figure, where both entropies are zero. Fully
list, together with all the orbits that it forces.
chaotic behavior occurs at the upper right, where
This is the first basis orbit.
both entropies are ln 2. As control parameters
4. Of the orbits remaining, again remove the right-
change, a dynamical system that can exhibit chaos
most and all the orbits that it forces. This is the
generated by a stretch-and-fold mechanism follows a
second basis orbit.
path in the forcing diagram from the lower left to
5. Continue until all orbits have been removed.
the upper right. Each such path is a ‘‘route to
chaos.’’ The Smale horseshoe mechanism exhibits For any finite period, the above algorithm
many different routes to chaos: each follows a terminates because there is only a finite number of
different path in the forcing diagram. orbits. For example, if the orbit 52 is present as well
The state of a strange attractor at any stage in its as all orbits with lower one-dimensional entropy,
route to chaos can be specified by a ‘‘basis set of the basis set is 87 R, 76 , 74 F, 86 F, 88 , 52 . As control
orbits.’’ This is a set of orbits whose presence forces parameters change, a strange attractor undergoes
the existence of all other orbits that can concur- perestroikas that are quantitatively determined by
rently be found in the attractor, up to any finite changes in the basis sets of orbits.
Chaos and Attractors 485

Bounding Tori surface. As a result, all singularities are saddles; so, by


the Poincaré–Hopf theorem, the number of singularities
As experimental conditions or control parameters
is strongly related to the genus. The number is 2(g  1).
change, strange attractors can undergo ‘‘grosser’’
The flow, restricted to the genus-g surface, can be
perestroikas than those that can be described by a
put into canonical form and these canonical forms can
change in the basis set of orbits. This occurs when new
be classified. The classification involves projection of
orbits are created that cannot be contained on the initial
the genus-g torus onto a two-dimensional surface. The
branched manifold – for example, when orbits are
planar projection consists of a disk with outer
created that must be described by a new symbol. This is
boundary and g interior holes. All singularities can be
seen experimentally in the transition from horseshoe
placed on the interior holes. The flow on the interior
type dynamics to gateau roulé type dynamics. This
holes without singularities is in the same direction as
involves the addition of a third branch to the branched
the flow on the exterior boundary. Interior holes with
manifold with two branches, as shown in Figures 7a
singularities have an even number, 4, 6, . . . . Some
and 7b. Strange attractors can undergo perestroikas
canonical forms are shown in Figure 9.
described by the addition of new branches to, or
Poincaré sections have been used to simplify the
deletion of old branches from, a branched manifold.
study of flows in low-dimensional spaces by effec-
These perestroikas are in a very real sense ‘‘grosser’’
tively reducing the dimension of the dynamics. In
than the perestroikas that can be described by changes
three dimensions, a Poincaré surface of section for a
in the basis sets of orbits on a fixed branched manifold.
strange attractor is a minimal two-dimensional sur-
There is a structure that provides constraints on
face with the property that all points in the attractor
the allowed bifurcations of branched manifolds
intersect this surface transversally an infinite number
(creation/annihilation of branches), which is analo-
of times under the flow. The Poincaré surface need
gous to the constraints that a branched manifold
not be connected and in fact is often not connected.
provides on the bifurcations and topological organi-
The Poincaré section for the flow in a genus-g torus
zation of the periodic orbits that can exist on it. This
consists of the union of g  1 disjoint disks (g  3) or
structure is called a bounding torus.
is a single disk (g = 1). The locations of the disks are
Bounding tori are constructed as follows. The semi-
determined algorithmically, as shown in Figure 9. The
flow on a branched manifold is ‘‘inflated’’ or ‘‘blown
interior circles without singularities are labeled by
up’’ to a flow on a thin open set in R3 containing this
capital letters A, B, C, . . . and those with singularities
branched manifold. The boundary of this open set is a
are labeled with lowercase letters a, b, c, . . . The
two-dimensional surface. Such surfaces have been
components of the global Poincaré surface of section
classified. They are uniquely tori of genus g; g = 0
are numbered sequentially 1, 2, . . . , g  1, in the order
(sphere), g = 1 (tire tube), g = 2, 3, . . . . The torus of
they are encountered when traversing the outer
genus g has Euler characteristic = 2  2g. The flow is
boundary in the direction of the flow, starting from
into this surface. The flow, restricted to the surface,
any point on that boundary. Each component of the
exhibits a singularity wherever it is normal to the
global Poincaré surface of section connects (in the
surface. At such singularities the stability is determined
projection) an interior circle without singularities to
by the local Lyapunov exponents: 1 > 0 and 3 < 0,
the exterior boundary. There is one component
since the flow direction (2 = 0) is normal to the
between each successive encounter of the flow with

1 7

1 A E
a
A
1
2 6
2 a 7 3 2 A
B
B D
4 D c c b B a b c
b 4 5 c 4
5 6 E
c E C D
3 6 7 3 5
ABCBDED ABCDCBE ABCBDBE
abbacca abccbaa abbccaa

(a) (b) (c)


Figure 9 Three inequivalent canonical forms of genus 8 are shown. Each is identified by a ‘‘period-7 orbit’’ and its dual. Reprinted
figure with permission from Physical Review E, 69, 056206, 2004. Copyright (2004) by the American Physical Society.
486 Chaos and Attractors

holes that have singularities. Heavy lines are used to Table 4 Number of canonical bounding tori as a function of
show the location of the seven components of the genus g
global Poincaré surface of section for each of the three g N(g) g N(g) g N(g)
inequivalent genus-8 canonical forms shown in
Figure 9. The structure of the flow is summarized by 3 1 9 15 15 2 211
a transition matrix. For the canonical form shown in 4 1 10 28 16 5 549
5 2 11 67 17 14 290
Figure 9c the transition matrix is 6 2 12 145 18 36 824
2 3 7 5 13 368 19 96 347
1 1 0 0 0 0 0
60 0 1 1 0 0 07 8 6 14 870 20 252 927
6 7
60 0 1 1 0 0 07
6 7
T¼6 60 0 0 0 1 1 07
7
60 0 0 0 1 1 07 canonical forms grows rapidly with g, as shown in
6 7 Table 4. In fact, the number, N(g), grows exponen-
40 1 0 0 0 0 15
tially and can even be assigned an entropy:
1 0 0 0 0 0 1
lnðNðgÞÞ
where Ti, j = 1 if the flow can proceed directly from lim ¼ ln 3 ½5
g!1 g1
component i to component j, 0 otherwise.
Bounding tori, dressed with flows, can be labeled. In In some sense, canonical forms that constrain
fact, two dual labeling schemes are possible. Following branched manifolds within them behave like branched
the outer boundary in the direction of the flow, one manifolds that constrain periodic orbits on them.
encounters the g  1 components of the global Poin- Every strange attractor that has been studied in R3
caré surface of section sequentially, the interior holes has been described by a canonical bounding torus that
without singularities at least once each, and the interior contains it. This classification is shown in Table 5.
holes with singularites at least twice each. The Branched manifold perestroikas are constrained
canonical form (genus-g torus dressed with a flow) on by bounding tori as follows. Each branch line of any
the genus-8 bounding torus shown in Figure 9a can be branched manifold can be moved into one of the
labeled by the sequence in which the holes without g  1 components of the global Poincaré surface of
singularities are encountered (ABCBDED) or the order section. Any branched manifold contained in a
in which the holes with singularities are encountered genus-g bounding torus (g  3) must have at least
(abbacca). Both sequences contain g  1 symbols. one branch between each pair of components of the
These labels are unique up to cyclic permutation. global Poincaré surface of section between which the
Symbol sequences for canonical forms for bounding flow is allowed, as summarized by the canonical
tori act in many ways like symbol sequences for form’s transition matrix. New branches can only be
periodic orbits on branched manifolds. Although there added in a way that is consistent with the canonical
is a 1:1 correspondence between bounded closed two- form’s transition matrix, continuity requirements,
dimensional surfaces in R3 and genus g, the number of and the no intersection condition.

Table 5 All known strange attractors of dimension dL < 3 are bounded by one of the standard dressed tori. Dual labels for the
bounding tori depend on g  1 symbols describing holes with or without singularities

Strange attractor Holes w/o singularites Holes with singularities Genus

Rossler, Duffing, Burke, and Shaw A 1


Various lasers, gateau roulé A 1
Neuron with subthreshold oscillations A 1
Shaw–van der Pol A 1
Lorenz, Shimizu–Morioka, Rikitake AB aa 3
C2 covers of Rossler AB a2 3
C2 cover of Lorenza ABCD a4 5
C2 cover of Lorenzb ABCB abba 5
2 ! 1 Image of figure-8 branched manifold ABCB ab(ab)1 5
Figure-8 branched manifold AEBECEDE a2 b 2 c 2 d 2 9
Cn covers of Rossler AB    N an nþ1
Cn cover of Lorenza AB    (2N) a 2n 2n þ 1
Cn cover of Lorenzb (AZ )(BZ )    (NZ ) a2 b 2    n 2 2n þ 1
Multispiral attractors A(B    M)N(B    M)1 (ab    m)(ab    m)1 2m þ 1
a
Rotation axis through origin.
b
Rotation axis through one focus.
Chaos and Attractors 487

In the simplest case, g = 1, a third branch can be canonical flow – have a larger (but discrete) variety of
added to a branched manifold with two branches only extrinsic embeddings in R3 .
if its local torsion differs by
1 from the adjacent
branch. In addition, the ordering of the new branch
must be consistent with the continuity and no The Embedding Question
intersection (ODE uniqueness theorem) requirements.
The mechanism that nature uses to generate chaotic
behavior in physical systems is not directly observable,
and must be deduced by examining the data that are
Embeddings of Bounding Tori generated. Typically, the data consist of a single scalar
The last level of topological structure needed for the time series that is discretely recorded: xi , i = 1, 2, . . . .
classification of strange attractors in R3 describes In order to exhibit a strange attractor, a mapping of the
their embeddings in R3 . The classification using data into RN must also be constructed. If the attractor
genus-g bounding tori is intrinsic – that is, the is low dimensional (dL < 3), one can hope that a
canonical form shows how the flow looks from mapping into R3 can be constructed that exhibits no
inside the torus. Strange attractors, and the tori that self-intersections or other degeneracies. Such a map is
bound them, are actually embedded in R3 . For a called an embedding. Once an embedding in R3 is
complete classification, we must specify not only the available, a topological analysis can be carried out. The
canonical form but also how this form sits in R3 . analysis reveals the mechanism that underlies the
This program has not yet been completed, but we creation of the embedded strange attractor.
illustrate it with the genus-1 bounding torus in But how do you know that the mechanism that
Figure 10. Figure 10a shows the canonical form, and generates the observed, embedded strange attractor
two different embeddings of it in R3 . The embedding has anything to do with the mechanism nature used
on the left is unknotted. The embedding on the right is to generate the experimental data?
knotted like a figure-8 knot. Extrinsic embeddings of If the embedding is contained in a genus-1 bounding
genus-1 tori are described by tame knots in R3 , and torus, then the topological mechanism that generates
tame knots can be used as ‘‘centerlines’’ for extrinsi- the data, as defined by some unknown branched
cally embedded genus-1 tori. Higher-genus (g  3) manifold BMEXP , and the topological mechanism that
canonical forms – intrinsic genus-g tori dressed with a is identified from the embedded strange attractor
BMEMB , are identical up to three degrees of freedom:
parity, global torsion, and the knot type. As a result, in
this case (genus-1) a topological analysis of embedded
data does reveal nature’s hidden secrets.

See also: Ergodic theory; Fractal dimensions in


dynamics; Generic Properties of Dynamical Systems;
Gravitational N-body Problem (Classical);
Homeomorphisms and Diffeomorphisms of the Circle;
Homoclinic phenomena; Inviscid Flows; Lyapunov
Exponents and Strange Attractors; Nonequilibrium
Statistical Mechanics (Stationary): Overview; Random
Algebraic Geometry, Attractors and Flux Vacua; Random
Matrix Theory in Physics; Regularization for Dynamical
(a) Zeta Functions; Singularity and Bifurcation Theory;
Symmetry and Symmetry Breaking in Dynamical
Systems; Synchronization of Chaos.

Further Reading
Abraham R and Shaw CD (1992) Dynamics: The Geometry of
Behavior, Studies in Nonlinearity, 2nd edn. Reading, MA:
Addison-Wesley.
Eckmann J-P and Ruelle D (1985) Ergodic theory of chaos and
strange attractors. Reviews of Modern Physics 57(3): 617–656.
(b) (c)
Gilmore R (1998) Topological analysis of chaotic dynamical
Figure 10 (a) Canonical form for genus-1 bounding torus. systems. Reviews of Modern Physics 70(4): 1455–1529.
Extrinsic embeddings of the torus into R 3 that are (b) unknotted Gilmore R and Lefranc M (2002) The Topology of Chaos, Alice
and (c) knotted like the figure-8 knot. in Stretch and Squeezeland. New York: Wiley.
488 Characteristic Classes

Gilmore R and Letellier C (2006) The Symmetry of Chaos Alice Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge
in the Land of Mirrors. Oxford: Oxford University Press. University Press.
Gilmore R and Pei X (2001) The topology and organization of Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear
unstable periodic orbits in Hodgkin–Huxley models of receptors Physics and Its Mathematical Tools. Bristol: IoP Publishing.
with subthreshold oscillations. In: Moss F and Gielen S (eds.) Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental
Handbook of Biological Physics, Neuro-informatics, Neural Approach to Nonlinear Dynamics and Chaos. Reading, MA:
Modeling, vol. 4, pp. 155–203. Amsterdam: North-Holland. Addison-Wesley.

Characteristic Classes
P B Gilkey, University of Oregon, Eugene, OR, USA Frames
R Ivanova, University of Hawaii Hilo, Hilo, HI, USA
A frame s := (s1 , . . . , sk ) for V 2 Vectk (M, F) over an
S Nikčević, SANU, Belgrade, Serbia and Montenegro
open set O M is a collection of k smooth sections
ª 2006 Elsevier Ltd. All rights reserved. to V jO so that {s1 (P), . . . , sk (P)} is a basis for the
fiber VP of V over any point P 2 O. Given such a
frame s, we can construct a local trivialization which
Vector Bundles identifies O Fk with VjO by the mapping
Let Vectk (M, F) be the set of isomorphism classes of ðP; 1 ; . . . ; k Þ ! 1 s1 ðPÞ þ    þ k sk ðPÞ
real (F = R) or complex (F = C) vector bundles of
rank k over a smooth connected m-dimensional Conversely, given a local trivialization of V, we can
manifold M. Let take the coordinate frame
[
VectðM; FÞ ¼ Vectk ðM; FÞ si ðPÞ ¼ P ð0; . . . ; 0; 1; 0; . . . ; 0Þ
k
Thus, frames and local trivializations of V are
equivalent notions.
Principal Bundles – Examples
Let H be a Lie group. A fiber bundle Simple Covers


:P!M An open cover {O } of M, where  ranges over some
indexing set A, is said to be a simple cover if any
with fiber H is said to be a principal bundle if there finite intersection O1 \    \ Ok is either empty or
is a right action of H on P which acts transitively on contractible.
the fibers, that is, if P=H = M. If H is a closed Simple covers always exist. Put a Riemannian
subgroup of a Lie group G, then the natural metric on M. If M is compact, then there exists a
projection G ! G=H is a principal H bundle over uniform  > 0 so that any geodesic ball of radius  is
the homogeneous space G=H. Let O(k) and U(k) geodesically convex. The intersection of geodesically
denote the orthogonal and unitary groups, respec- convex sets is either geodesically convex (and hence
tively. Let Sk denote the unit sphere in Rkþ1 . Then contractible) or empty. Thus, covering M by a finite
we have natural principal bundles: number of balls of radius  yields a simple cover.
The argument is similar even if M is not compact
OðkÞ Oðk þ 1Þ ! Sk
where an infinite number of geodesic balls is used
UðkÞ Uðk þ 1Þ ! S2kþ1 and the radii are allowed to shrink near 1.
Let RPk and CPk denote the real and complex Transition Cocycles
projective spaces of lines through the origin in Rkþ1
and Ckþ1 , respectively. Let Let Hom(F, k) be the set of linear transformations of
Fk and let GL(F, k) Hom(F, k) be the group of all
Z2 ¼ f
Idg OðkÞ invertible linear transformations.
S1 ¼ f  Id : jj ¼ 1g UðkÞ Let {s } be frames for a vector bundle V over some
open cover {O } of M. On the intersection O \ O ,
One has Z2 and S1 principal bundles: one may express s =  s , that is
Z2 ! Sk1 ! RPk1 X
j
s;i ðPÞ ¼ ;i ðPÞs;j ðPÞ
S1 ! S2k1 ! CPk1 1jk
488 Characteristic Classes

Gilmore R and Letellier C (2006) The Symmetry of Chaos Alice Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge
in the Land of Mirrors. Oxford: Oxford University Press. University Press.
Gilmore R and Pei X (2001) The topology and organization of Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear
unstable periodic orbits in Hodgkin–Huxley models of receptors Physics and Its Mathematical Tools. Bristol: IoP Publishing.
with subthreshold oscillations. In: Moss F and Gielen S (eds.) Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental
Handbook of Biological Physics, Neuro-informatics, Neural Approach to Nonlinear Dynamics and Chaos. Reading, MA:
Modeling, vol. 4, pp. 155–203. Amsterdam: North-Holland. Addison-Wesley.

Characteristic Classes
P B Gilkey, University of Oregon, Eugene, OR, USA Frames
R Ivanova, University of Hawaii Hilo, Hilo, HI, USA
A frame s := (s1 , . . . , sk ) for V 2 Vectk (M, F) over an
S Nikčević, SANU, Belgrade, Serbia and Montenegro
open set O  M is a collection of k smooth sections
ª 2006 Elsevier Ltd. All rights reserved. to V jO so that {s1 (P), . . . , sk (P)} is a basis for the
fiber VP of V over any point P 2 O. Given such a
frame s, we can construct a local trivialization which
Vector Bundles identifies O  Fk with VjO by the mapping
Let Vectk (M, F) be the set of isomorphism classes of ðP; 1 ; . . . ; k Þ ! 1 s1 ðPÞ þ    þ k sk ðPÞ
real (F = R) or complex (F = C) vector bundles of
rank k over a smooth connected m-dimensional Conversely, given a local trivialization of V, we can
manifold M. Let take the coordinate frame
[
VectðM; FÞ ¼ Vectk ðM; FÞ si ðPÞ ¼ P  ð0; . . . ; 0; 1; 0; . . . ; 0Þ
k
Thus, frames and local trivializations of V are
equivalent notions.
Principal Bundles – Examples
Let H be a Lie group. A fiber bundle Simple Covers

:P!M An open cover {O } of M, where  ranges over some


indexing set A, is said to be a simple cover if any
with fiber H is said to be a principal bundle if there finite intersection O1 \    \ Ok is either empty or
is a right action of H on P which acts transitively on contractible.
the fibers, that is, if P=H = M. If H is a closed Simple covers always exist. Put a Riemannian
subgroup of a Lie group G, then the natural metric on M. If M is compact, then there exists a
projection G ! G=H is a principal H bundle over uniform  > 0 so that any geodesic ball of radius  is
the homogeneous space G=H. Let O(k) and U(k) geodesically convex. The intersection of geodesically
denote the orthogonal and unitary groups, respec- convex sets is either geodesically convex (and hence
tively. Let Sk denote the unit sphere in Rkþ1 . Then contractible) or empty. Thus, covering M by a finite
we have natural principal bundles: number of balls of radius  yields a simple cover.
The argument is similar even if M is not compact
OðkÞ  Oðk þ 1Þ ! Sk
where an infinite number of geodesic balls is used
UðkÞ  Uðk þ 1Þ ! S2kþ1 and the radii are allowed to shrink near 1.
Let RPk and CPk denote the real and complex Transition Cocycles
projective spaces of lines through the origin in Rkþ1
and Ckþ1 , respectively. Let Let Hom(F, k) be the set of linear transformations of
Fk and let GL(F, k)  Hom(F, k) be the group of all
Z2 ¼ fIdg  OðkÞ invertible linear transformations.
S1 ¼ f  Id : jj ¼ 1g  UðkÞ Let {s } be frames for a vector bundle V over some
open cover {O } of M. On the intersection O \ O ,
One has Z2 and S1 principal bundles: one may express s =  s , that is
Z2 ! Sk1 ! RPk1 X
j
s;i ðPÞ ¼ ;i ðPÞs;j ðPÞ
S1 ! S2k1 ! CPk1 1jk
Characteristic Classes 489

The maps  : O \ O ! GL(F, k) satisfy Spin Structures


For k 3, the fundamental group of SO(k) is Z2 .
 ¼ Id on O
½1 Let Spin(k) be the universal cover of SO(k) and let
 ¼   on O \ O \ O
 : SpinðkÞ ! SOðkÞ
Let G be a Lie group. Maps belonging to a
be the associated double cover; set Spin(2) = S1 and
collection {  } of smooth maps from O \ O to G
let () = 2 . An oriented bundle V is said to be spin
which satisfy eqn [1] are said to be transition
if the transition functions can be lifted from SO(k)
cocycles with values in G; if G  GL(F, k), they
to Spin(k); this is possible if and only if the second
can be used to define a vector bundle by making
Stiefel–Whitney class of V, which is defined later,
appropriate identifications.
vanishes. There can be inequivalent spin structures,
which are parametrized by the cohomology group
Reducing the Structure Group H 1 (M; Z2 ).
If G is a subgroup of GL(F, k), then V is said to have
a G-structure if we can choose frames so the The Tangent Bundle of Projective Space
transition cocycles belong to G; that is, we can
The tangent bundle TRPm of real projective space is
reduce the structure group to G.
orientable if and only if m is odd; TRPm is spin if
Denote the subgroup of orientation-preserving
and only if m
3 mod 4. If m
3 mod 4, there are
linear maps by
two inequivalent spin structures on this bundle as
H 1 (RPm ; Z2 ) = Z2 .
GLþ ðR; kÞ :¼ f 2 GLðR; kÞ: detð Þ > 0g
The tangent bundle TCPm of complex projective
If V 2 Vectk (M, R), then V is said to be orientable if space is always orientable; TCPm is spin if and only
we can choose the frames so that if m is odd.

 2 GLþ ðR; kÞ Principal and Associated Bundles

Let H be a Lie group and let


Not every real vector bundle is orientable; the first
Stiefel–Whitney class sw1 (V) 2 H 1 (M; Z2 ), which is  : O \ O ! H
defined later, vanishes if and only if V is orientable.
In particular, the Möbius line bundle over the circle be a collection of smooth functions satisfying the
is not orientable. compatibility conditions given in eqn [1]. We define
Similarly, a real (resp. complex) bundle V is a principal bundle P by gluing O  H to O  H
said to be Riemannian (resp. Hermitian) if we can using :
reduce the structure group to the orthogonal group
O(k)  GL(R, k) (resp. to the unitary group ðP; hÞ ðP;  ðPÞhÞ for P 2 O \ O
U(k)  GL(C, k)). Because right multiplication and left multiplication
We can use a partition of unity to put a positive- commute, right multiplication gives a natural action
definite symmetric (resp. Hermitian symmetric) fiber of H on P:
metric on V. Applying the Gram–Schmidt process
then constructs orthonormal frames and shows that ~ :¼ ðP; h  hÞ
ðP; hÞ  h ~

the structure group can always be reduced to O(k)
(resp. to U(k)); if V is a real vector bundle, then the The natural projection P ! P=H = M is an H fiber
structure group can be reduced to the special bundle.
orthogonal group SO(k) if and only if V is Let  be a representation of H to GL(F, k). For
orientable.
2 P,  2 Fk , and h 2 H, define a gluing

ð
; Þ ð
 h1 ; ðhÞÞ
Lifting the Structure Group
The associated vector bundle is then given by
Let  be a representation of a Lie group H to
GL(F, k). One says that the structure group of V can P  Fk :¼ P  Fk=
be lifted to H if there exist frames {s } for V and
smooth maps  : O \ O ! H, so   =  Clearly, {  } are the transition cocycles of the
where eqn [1] holds for . vector bundle P  Fk .
490 Characteristic Classes

Frame Bundles Homotopy


If V is a vector bundle, the associated principal Two smooth maps f0 and f1 from N to M are
GL(F, k) bundle is the bundle of all frames; if V is said to be homotopic if there exists a smooth map
given an inner product on each fiber, then the F : N  I ! M so that f0 (P) = F(P, 0) and so that
associated principal O(k) or U(k) bundle is the bundle f1 (P) = F(P, 1). If f0 and f1 are homotopic maps from
of orthonormal frames. If V is an oriented Riemannian N to M, then f1 V is isomorphic to f2 V.
vector bundle, the associated principal SO(k) bundle is Let [N, M] be the set of all homotopy classes
the bundle of oriented orthonormal frames. of smooth maps from N to M. The association
V ! f  V induces a natural map
Direct Sum and Tensor Product ½N; M  Vectk ðM; FÞ ! Vectk ðN; FÞ
Fiber-wise direct sum (resp. tensor product) defines the If M is contractible, then the identity map is
direct sum (resp. tensor product) of vector bundles: homotopic to the constant map c. Consequently,
V = Id V is isomorphic to c V = M  Fk . Thus, any
: Vectk ðM; FÞ  Vectn ðM; FÞ
vector bundle over a contractible manifold is trivial.
! Vectkþn ðM; FÞ In particular, if {O } is a simple cover of M and if
: Vectk ðM; FÞ  Vectn ðM; FÞ V 2 Vect(M, F), then VjO is trivial for each . This
! Vectkn ðM; FÞ shows that a simple cover is a trivializing cover for
every V 2 Vect(M, F).
The transition cocycles of the direct sum (resp.
tensor product) of two vector bundles are the direct Stabilization
sum (resp. tensor product) of the transition cocycles
of the respective bundles. Let l 2 Vect1 (M, F) denote the isomorphism class of
The set of line bundles Vect1 (M, F) is a group the trivial line bundle M  F over an m-dimensional
under . The unit in the group is the trivial line manifold M. The map V ! V l induces a stabili-
bundle l := M  F; the inverse of a line bundle L is zation map
the dual line bundle L := Hom(L, F) since s : Vectk ðM; FÞ ! Vectkþ1 ðM; FÞ

L L ¼l which induces an isomorphism
Vectk ðM; RÞ ¼ Vectkþ1 ðM; RÞ for k > m
½2
Pullback Bundle Vectk ðM; CÞ ¼ Vectkþ1 ðM; CÞ for 2k > m
Let  : V ! M be the projection associated with These values of k comprise the stable range.
V 2 Vectk (M, F). If f is a smooth map from N to M,
then the pullback bundle f  V is the vector bundle The K-Theory
over N which is defined by setting
The direct sum and tensor product make
f  V :¼ fðP; vÞ 2 N  V : f ðPÞ ¼ ðvÞg Vect(M, F) into a semiring; we denote the associated
ring defined by the Grothendieck construction by
The fiber of f  V over P is the fiber of V over f (P).
KF(M). If V 2 Vect(M, F), let [V] 2 KF(M) be the
Let {s } be local frames for V over an open cover
corresponding element of K-theory; KF(M) is gener-
{O } of M. For P 2 f 1 (O ), define
ated by formal differences [V1 ]  [V2 ]; such formal
ff  s gðPÞ :¼ ðP; s ðf ðPÞÞÞ differences are called virtual bundles.
The Grothendieck construction (see K-theory)
This gives a collection of frames for f  V over the
introduces nontrivial relations. Let Sm denote the
open cover {f 1 (O )} of N. Let
standard sphere in Rmþ1 . Since
f  :¼  f
TðSm Þ l ¼ ðm þ 1Þl
be the pullback of the transition functions. Then
we can easily see that [TSm ] = m[ l ] in KR(Sm ),

ff s gðPÞ ¼ ðP;  ðf ðPÞÞs ðf ðPÞÞÞ despite the fact that T(Sm ) is not isomorphic to ml
  for m 6¼ 1, 3, 7.
¼ fðf  Þðf s ÞgðPÞ
Let L denote the nontrivial real line bundle over
This shows that the pullback of the transition RPk . Then TRPk l = (k þ 1)L, so
functions for V are the transition functions of the
pullback f  (V). ½TRPk  ¼ ðk þ 1Þ½L  ½ l 
Characteristic Classes 491

The map V ! Rank(V) extends to a surjective respectively. The topology on these spaces is the
map from KF(M) to Z. We denote the associated weak or inductive topology. The Grassmannians are
ideal of virtual bundles of virtual rank 0 by called classifying spaces. The isomorphisms of
eqn [4] are compatible with the inclusions of eqn [5]
f
KFðMÞ :¼ kerðRankÞ and we have
In the stable range, V ! [V]  k[ l ] identifies ½M; Grk ðF; 1Þ ¼ Vectk ðM; FÞ ½6
g
Vectk ðM; RÞ ¼ KRðMÞ if k > m
½3
g
Vectk ðM; CÞ ¼ KCðMÞ if 2k > m
Spaces with Finite Covering Dimension
These groups contain nontrivial torsion. Let L be the A metric space X is said to have a covering
nontrivial real line bundle over RPk . Then dimension at most m if, given any open cover {U  }
g
KRðRP k
Þ ¼ Z  f½L  ½ l g=2 ðkÞ Zf½L  ½ l g of X, there exists a refinement {O } of the cover so
that any intersection of more than m þ 1 of the {O }
where (k) is the Adams number. is empty. For example, any manifold of dimension
m has covering dimension at most m. More
Classifying Spaces generally, any m-dimensional cell complex has
covering dimension at most m.
Let Grk (F, n) be the Grassmannian of k-dimensional The isomorphisms of [2]–[4], and [6] continue to
subspaces of Fn . By mapping a k-plane in Fn to the hold under the weaker assumption that M is a metric
corresponding orthogonal projection on , we can space with covering dimension at most m.
identify Grk (F, n) with the set of orthogonal projec-
tions of rank k:
f
2 HomðFn Þ:
2 ¼
;
 ¼
; trð
Þ ¼ kg Characteristic Classes of Vector
There is a natural associated tautological k-plane Bundles
bundle The Cohomology of Grk (F, 1)
Vk ðF; nÞ 2 Vectk ðGrk ðF; nÞ; FÞ The cohomology algebras of the Grassmannians are
whose fiber over a k-plane is the k-plane itself: polynomial algebras on suitably chosen generators:

Vk ðF; nÞ :¼ fð
; xÞ 2 HomðFn Þ  Fn :
x ¼ xg H  ðGrk ðR; 1Þ; Z2 Þ ¼ Z2 ½sw1 ; . . . ; swk 
½7
H  ðGrk ðC; 1Þ; ZÞ ¼ Z½c1 ; . . . ; ck 
Let [M, Grk (F, n)] denote the set of homotopy
equivalence classes of smooth maps f from M to
Grk (F, n). Since [f1 ] = [f2 ] implies that f1 V is The Stiefel–Whitney Classes
isomorphic to f2 V, the association
Let V 2 Vectk (M, R). We use eqn [6] to find
f ! f  Vk ðF; nÞ 2 Vectk ðM; FÞ  : M ! Grk (R, 1) which classifies V; the map 
induces a map is uniquely determined up to homotopy and, using
eqn [7], one sets
½M; Grk ðF; nÞ ! Vectk ðM; FÞ
swi ðVÞ :¼  swi 2 H i ðM; Z2 Þ
This map defines a natural equivalence of functors
in the stable range: The total Stiefel–Whitney class is then defined by
½M; Grk ðR; þ kÞ ¼ Vectk ðM; RÞ for > m swðVÞ ¼ 1 þ sw1 ðVÞ þ    þ swk ðVÞ
½4
½M; Grk ðC; þ kÞ ¼ Vectk ðM; CÞ for 2 > m The Stiefel–Whitney class has the properties:
The natural inclusion of Fn in Fnþ1 induces natural 1. If f : X1 ! X2 , then f  (sw(V)) = sw(f  V).
inclusions 2. sw(V W) = sw(V)sw(W).
3. If L is the Möbius bundle over S1 , then sw1 (L)
Grk ðF; nÞ  Grk ðF; n þ 1Þ
½5 generates H1 (S1 ; Z2 ) = Z2 .
Vk ðF; nÞ  Vk ðF; n þ 1Þ
The cohomology algebra of real projective space
Let Grk (F, 1) and Vk (F, 1) be the direct limit is a truncated polynomial algebra:
spaces under these inclusions; these are the infinite-
dimensional Grassmannians and classifying bundles, H  ðRPk ; Z2 Þ ¼ Z2 ½x=xkþ1 ¼ 0
492 Characteristic Classes

Since TRPk l = (k þ 1)L, one has classes pi (V) 2 H 4i (X; Z) are characterized by the
properties:
swðTRPk Þ ¼ ð1 þ xÞkþ1
ðk þ 1Þk 2 1. p(V) = 1 þ p1 (V) þ    þ pk (V).
¼ 1 þ kx þ x þ  ½8
2 2. If f : X1 ! X2 , then f  (p(V)) = p(f  V).
3. Rp(V W) = p(V)p(W) mod elements of order 2.
2
4. CP2 1 p (TCP ) = 3.
Orientability and Spin Structures
The Stiefel–Whitney classes have real geometric We can complexify a real vector bundle V to
meaning. For example, sw1 (V) = 0 if and only if V construct an associated complex vector bundle VC .
is orientable; if sw1 (V) = 0, then sw2 (V) = 0 if and We have
only if V admits a spin structure. With reference to
the discussion on the tangent bundle or projective pi ðVÞ :¼ ð1Þi c2i ðVC Þ
space, eqn [8] yields Conversely, if V is a complex vector bundle, we can

construct an underlying real vector bundle VR by
sw1 ðTRP Þ ¼ 0 if k
0 mod 2
k
forgetting the underlying complex structure. Mod-
x if k
1 mod 2
ulo elements of order 2, we have
Thus, RPk is orientable if and only if k is odd.
Furthermore, pðVR Þ ¼ cðVÞcðV  Þ

Let TCPk be the real tangent bundle of complex
sw2 ðTRPk Þ ¼ 0 if k
3 mod 4
x if k
1 mod 4 projective space. Then
Thus, TRPk is spin if and only if k
3 mod 4. pðTCPk Þ ¼ ð1  x2 Þkþ1
Chern Classes

Let V 2 Vectk (M, C). We use eqn [6] to find Line Bundles
 : M ! Grk (C, 1) which classifies V; the map 
is uniquely determined up to homotopy and, using Tensor product makes Vect1 (M, F) into an abelian
eqn [7], one sets group. One has natural equivalences of functors
which are group homomorphisms:
ci ðVÞ :¼  ci 2 H 2i ðM; ZÞ
sw1 : Vect1 ðM; RÞ ! H 1 ðM; Z2 Þ
The total Chern class is then defined by
c1 : Vect1 ðM; CÞ ! H 2 ðM; ZÞ
cðVÞ :¼ 1 þ c1 ðVÞ þ    þ ck ðVÞ
The Chern class has the properties: A real line bundle L is trivial if and only if it is
orientable or, equivalently, if sw1 (L) vanishes. A
1. If f : X1 ! X2 , then f  (c(V)) = c(f  V). complex line bundle L is trivial if and only if
2. c(V W) = c(V)c(W). c1 (L) = 0. There are nontrivial vector bundles with
3. Let L be the R classifying line bundle over vanishing Stiefel–Whitney classes of rank k > 1. For
S2 = CP1 . Then S2 c1 (L) = 1. example, swi (TSk ) = 0 for i > 0 despite the fact that
The cohomology algebra of complex projective TSk is trivial if and only if k = 1, 3, 7.
space also is a truncated polynomial algebra

H  ðCPk ; ZÞ ¼ Z½x=xkþ1
Curvature and Characteristic Classes
where x = c1 (L) and L is the complex classifying line
de Rham Cohomology
bundle over CPk = Gr1 (C, k þ 1). If Tc CPk is the
complex tangent bundle, then We can replace the coefficient group Z by C at the cost
of losing information concerning torsion. Thus, we
cðTc CPk Þ ¼ ð1 þ xÞkþ1 may regard pi (V) 2 H 4i (M; C) if V is real or ci (V) 2
H 2i (M; C) if V is complex. Let M be a smooth
manifold. Let C1 p M be the space of smooth
The Pontrjagin Classes
p-forms and let
Let V be a real vector bundle over a topological
space X of rank r = 2k or r = 2k þ 1. The Pontrjagin d : C1 p M ! C1 pþ1 M
Characteristic Classes 493

be the exterior derivative. The de Rham cohomology The curvature operator  can also be computed
groups are then defined by locally. Let (si ) be a local frame. Expand
X j
p kerðd : C1 p M ! C1 pþ1 MÞ rsi ¼ !i sj
HdeR ðMÞ :¼
imðd : C1 p1 M ! C1 p MÞ j

The de Rham theorem identifies the topological to define the connection 1-form !. One then has
cohomology groups H p (M; C) with the de Rham  
p
cohomology groups HdeR (M) which are given j j
r2 si ¼ d! i  !ki ^ !k sk
differential geometrically.
Given a connection on V, the Chern–Weyl theory and so
enables us to compute Pontrjagin and Chern classes in
de Rham cohomology in terms of curvature. j j
 i ¼ d!i  !ki ^ !k
j

js
If s̃ = i j is another local frame, we compute
Connections
Let V be a vector bundle over M. A connection ~ ¼ dgg1 þ g!g1
! and ~ ¼ gg1


r : C1 ðVÞ ! C1 ðT  M VÞ Although the connection 1-form ! is not tensorial, the


curvature is an invariantly defined 2-form-valued
on V is a first-order partial differential operator
endomorphism of V.
which satisfies the Leibnitz rule, that is, if s is a
smooth section to V and if f is a smooth function
on M, Unitary Connections
rðfsÞ ¼ df s þ f rs Let ( , ) be a nondegenerate Hermitian inner product
on V. We say that r is a unitary connection if
If X is a tangent vector field, we define
rX s ¼ hX; rsi ðrs1 ; s2 Þ þ ðs1 ; rs2 Þ ¼ dðs1 ; s2 Þ

where h , i denotes the natural pairing between the Such connections always exist and, relative to a
tangent and cotangent spaces. This generalizes to the local orthonormal frame, the curvature is skew-
bundle setting the notion of a directional derivative symmetric, that is,
and has the properties:
 þ  ¼ 0
1. rfX s = f rX s.
2. rX (fs) = X(f )s þ f rX s. Thus,  can be regarded as a 2-form-valued element
3. rX1 þX2 s = rX1 s þ rX2 s. of the Lie algebra of the structure group, O(V) in the
4. rX (s1 þ s2 ) = rX s1 þ rX s2 . real setting or U(V) in the complex setting.

The Curvature 2-Form Projections

Let !p be a smooth p-form. Then We can always embed V in a trivial bundle 1 of


dimension ; let V be the orthogonal projection on
r : C1 ðp M VÞ ! C1 ðpþ1 M VÞ V. We project the flat connection to V to define a
can be extended by defining natural connection on V. For example, if M is
embedded isometrically in the Euclidean space R ,
rð!p sÞ ¼ d!p s þ ð1Þp !p ^ rs this construction gives the Levi-Civita connection on
the tangent bundle TM. The curvature of this
In contrast to ordinary exterior differentiation, r2
connection is then given by
need not vanish. We set
ðsÞ :¼ r2 s  ¼ V d V d V

This is not a second-order partial differential Let VP be the fiber of V over a point P 2 M. The
operator; it is a zeroth-order operator, that is, inclusion i : V  R n defines the classifying map
f : P ! Grk (R, n) where we set
ðfsÞ ¼ ddf s  df ^ rs þ df ^ rs þ f r2 s
¼ f ðsÞ f ðPÞ ¼ iðVP Þ
494 Characteristic Classes

Chern–Weyl Theory Other Characteristic Classes


Let r be a Riemannian connection on a real vector The Chern character is defined by the exponential
bundle V of rank k. We set function. There are other characteristic classes
  which appear in the index theorem that are defined
1 using other generating functions that appear in
pðÞ :¼ det I þ 
2 index theory. Let x := (x1 , . . . ) be a collection of
indeterminates. Let s (x) be the th elementary
Let T denote the transpose matrix of differential symmetric function;
form. Since  þ T = 0, the polynomials of odd Y
degree in  vanish and we may expand ð1 þ x Þ ¼ 1 þ s1 ðxÞ þ s2 ðxÞ þ   

pðÞ ¼ 1 þ p1 ðÞ þ    þ pr ðÞ
For a diagonal matrix A := diag( 1 , .ffi. . ), denote the
pffiffiffiffiffiffi
where k = 2r or k = 2r þ 1 and the differential forms normalized eigenvalues by xj := 1j =2 . Then
pi () 2 C1 4i (M) are forms of degree 4i. pffiffiffiffiffiffiffi !
Changing the gauge (i.e., the local frame) replaces 1
 by gg1 and hence p() is independent of the cðAÞ ¼ det 1 þ A ¼ 1 þ s1 ðxÞ þ   
2
local frame chosen. One can show that dpi () = 0;
let [pi ()] denote the corresponding element of de Thus, the Chern class corresponds in a certain sense
Rham cohomology. This is independent of the to the elementary symmetric functions.
particular connection chosen and [pi ()] represents Let f (x) be a symmetric polynomial or more
pi (V) in H 4i (M; C). generally a formal power series which is symmetric.
Similarly, let V be a complex vector bundle of We can express f (x) = F(s1 (x), . . . ) in terms of the
rank k with a Hermitian connection r. Set elementary symmetric functions and define
pffiffiffiffiffiffiffi ! f () = F(c1 (), . . . ) by substitution. For example,
1 the Chern character is defined by the generating
cðÞ :¼ det I þ 
2 function
¼ 1 þ c1 ðÞ þ    þ ck ðÞ X
k
f ðxÞ :¼ ex
Again ci () is independent of the local gauge and ¼1
dci () = 0. The de Rham cohomology class [ci ()] The Todd class is defined using a different
represents ci (V) in H 2i (M; C). generating function:
Y
tdðxÞ :¼ x ð1  ex Þ1
The Chern Character

The total Chern character is defined by the formal ¼ 1 þ td1 ðxÞ þ   


sum
If V is a real vector bundle, we can define
pffiffiffiffiffi
chðÞ :¼ trðe 1=2 Þ some additional
pffiffiffiffiffiffi
ffi characteristic classes similarly. Let
pffiffiffiffiffiffiffi { 11 , . . . } be the nonzero eigenvalues of a
X ð 1Þ
¼ trð Þ skew-symmetric matrix A. We set xj =  j =2

ð2 Þ ! and define the Hirzebruch polynomial L and the Â
¼ ch0 ðÞ þ ch1 ðÞ þ    genus by
Y x
Let ch(V) = [ch()] denote the associated de Rham LðxÞ :¼

tanhðx Þ
cohomology class; it is independent of the particular
connection chosen. We then have the relations ¼ 1 þ L1 ðxÞ þ L2 ðxÞ þ   
Y x
^
AðxÞ :¼
chðV WÞ ¼ chðVÞ þ chðWÞ 2 sinhðð1=2Þx Þ

chðV WÞ ¼ chðVÞchðWÞ ^ 1 ðxÞ þ A
¼1þA ^ 2 ðxÞ þ   
The Chern character extends to a ring isomorph- The generating functions
ism from KU(M) Q to H e (M; Q), which is a
natural equivalence of functors; modulo torsion, x x
and
K theory and cohomology are the same functors. tanhðxÞ 2 sinhðð1=2ÞxÞ
Characteristic Classes 495

are even functions of x, so the ambiguity in the If M is an even-dimensional manifold, let em (M) :=
choice of sign in the eigenvalues plays no role. This em (TM). If we reverse the local orientation of M,
defines characteristic classes then em (M) changes sign. Consequently, em (M) is a
measure rather than an m-form; we can use the
Li ðVÞ 2 H 4i ðM; CÞ and ^ i ðVÞ 2 H 4i ðM; CÞ
A Riemannian measure on M to regard em (M) as a
scalar. Let Rijkl be the components of the curvature of
Summary of Formulas the Levi-Civita connection with respect to some local
orthonormal frame field; we adopt the convention
We summarize below some of the formulas in terms that R1221 = 1 on the standard sphere S2 in R3 . If
of characteristic classes: "I,J := (eI , eJ ) is the totally antisymmetric tensor, then
pffiffiffiffiffiffiffi
1tr()
1. c1 () = , X "I;J Ri    Rim1 im jm jm1
2 1 i2 j2 j1
e2n :¼
1 ð8 Þn n!
2. c2 () = 2 {tr(2 )  tr()2 }, I; J
8
1 Let R := Rijji and ij := Rikkj be the scalar curvature
3. p1 () =  2 tr(2 ),
8   and the Ricci tensor, respectively. Then
c21  2c2
4. ch(V) = k þ c1 þ þ    (V), 1
 2  e2 ¼ R
2 4
c1 (c1 þ c2 ) c1 c2
5. td(V)= 1 þ þ þ þ   (V), 1
2 12 24 e4 ¼ ðR2  4jj2 þ jRj2 Þ
  32 2
p1 7p21  4p2
6. Â(V) = 1  þ þ    (V),
24 5760
  Characteristic Classes of Principal
p1 7p2  p21
7. L(V) = 1 þ þ þ    (V), Bundles
3 45
8. td(V W) = td(V)td(W), Let g be the Lie algebra of a compact Lie group G.
9. Â(V W) = Â(V)Â(W), Let : P ! M be a principal G bundle over M. For

2 P, let
10. L(V W) = L(V)L(W).
V
:¼ ker  : T
P ! T
M and H
:¼ V ?

The Euler Form


be the vertical and horizontal distributions of the
So far, this article has dealt with the structure groups projection , respectively. We assume that the metric
O(k) in the real setting and U(k) in the complex on P is chosen to be G-invariant and such that
setting. There is one final characteristic class which  : H
! T
M is an isometry; thus, is a Rieman-
arises from the structure group SO(k). Suppose k = 2n nian submersion. If F is a tangent vector field on M,
is even. While a real antisymmetric matrix A of shape let HF be the corresponding vertical lift. Let V be
2n  2n cannot be diagonalized, it can be put in block orthogonal projection on the distribution V. The
off 2-diagonal form with blocks, curvature is defined by
 
0  ðF1 ; F2 Þ ¼ V ½HðF1 Þ; HðF2 Þ
 0 the horizontal distribution H is integrable if and only if
The top Pontrjagin class pn (A) = x21    x2n is a perfect the curvature vanishes. Since the metric is G-invariant,
square. The Euler class (F1 , F2 ) is invariant under the group action. We may
use a local section s to P over a contractible coordinate
e2n ðAÞ :¼ x1    xn chart O to split 1 O = O  G. This permits us to
is the square root of pn . If V is an oriented vector identify V with TG and to regard  as a g-valued
bundle of dimension 2n, then 2-form. If we replace the section s by a section s̃, then
˜ = gg1 changes by the adjoint action of G on g.
e2n ðVÞ 2 H 2n ðM; CÞ If V is a real or complex vector bundle over M,
is a well-defined characteristic class satisfying we can put a fiber metric on V to reduce the
e2n (V)2 = pn (V). structure group to the orthogonal group O(r) in the
If V is the underlying real oriented vector bundle real setting or the unitary group U(r) in the complex
of a complex vector bundle W, setting. Let P V be the associated frame bundle. A
Riemannian connection r on V induces an invariant
e2n ðVÞ ¼ cn ðWÞ splitting of TP V = V H and defines a natural
496 Chern–Simons Models: Rigorous Results

metric on P V ; the curvature  of the connection r Bott R and Tu LW (1982) Differential forms in algebraic
defined here agrees with the definition previously. topology. Graduate Texts in Mathematics, p. 82. New York–
Berlin: Springer-Verlag.
Let Q(G) be the algebra of all polynomials on Chern S (1944) A simple intrinsic proof of the Gauss–Bonnet
g which are invariant under the adjoint action. If formula for closed Riemannian manifolds. Annals of Mathe-
Q 2 Q(G), then Q() is well defined. One has matics 45: 747–752.
dQ() = 0. Furthermore, the de Rham cohomology Chern S (1945) On the curvatura integra in a Riemannian
class Q(P) := [Q()] is independent of the particular manifold. Annals of Mathematics 46: 674–684.
Conner PE and Floyd EE (1964) Differentiable periodic maps.
connection chosen. We have Ergebnisse der Mathematik und ihrer Grenzgebiete, N.F.,
Band 33. New York: Academic Press; Berlin–Göttingen–
QðUðkÞÞ ¼ C½c1 ; . . . ; ck 
Heidelberg: Springer-Verlag.
QðSUðkÞÞ ¼ C½c2 ; . . . ; ck  de Rham G (1950) Complexes à automorphismes et homéomorphie
différentiable (French). Ann. Inst. Fourier Grenoble 2: 51–67.
QðOð2kÞÞ ¼ C½p1 ; . . . ; pk 
Eguchi T, Gilkey PB, and Hanson AJ (1980) Gravitation, gauge
QðOð2k þ 1ÞÞ ¼ C½p1 ; . . . ; pk  theories and differential geometry. Physics Reports 66: 213–393.
Eilenberg S and Steenrod N (1952) Foundations of Algebraic
QðSOð2kÞÞ ¼ C½p1 ; . . . ; pk ; ek =e2k ¼ pk Topology. Princeton, NJ: Princeton University Press.
QðSOð2k þ 1ÞÞ ¼ C½p1 ; . . . ; pk  Greub W, Halperin S, and Vanstone R (1972) Connections,
Curvature, and Cohomology. Vol. I: De Rham Cohomology
Thus, for this category of groups, no new character- of Manifolds and Vector Bundles. Pure and Applied Mathe-
istic classes ensue. Since the invariants are Lie- matics, vol. 47. New York–London: Academic Press.
algebra theoretic in nature, Hirzebruch F (1956) Neue topologische Methoden in der
algebraischen Geometrie (German). Ergebnisse der Mathema-
QðSpinðkÞÞ ¼ QðSOðkÞÞ tik und ihrer Grenzgebiete (N.F.), Heft 9. Berlin–Göttingen–
Heidelberg: Springer-Verlag.
Other groups, of course, give rise to different Husemoller D (1966) Fibre Bundles. New York–London–Sydney:
characteristic rings of invariants. McGraw-Hill.
Karoubi M (1978) K-theory. An introduction. Grundlehren der
Mathematischen Wissenschaften, Band 226. Berlin–New York:
Acknowledgmnts Springer-Verlag.
Kobayashi S (1987) Differential Geometry of Complex Vector
Research of P Gilkey was partially supported by Bundles. Publications of the Mathematical Society of Japan, 15.
the MPI (Leipzig, Germany), that of R Ivanova by Kanô Memorial Lectures, 5. Princeton, NJ: Princeton University
the UHH Seed Money Grant, and of S Nikčević by Press; Tokyo: Iwanami Shoten.
Milnor JW and Stasheff JD (1974) Characteristic Classes. Annals
MM 1646 (Serbia), DAAD (Germany), and Dierks of Mathematics Studies, No. 76. Princeton, NJ: Princeton
von Zweck Stiftung (Esen, Germany). University Press; Tokyo: University of Tokyo Press.
Steenrod NE (1962) Cohomology Operations. Lectures by NE
See also: Cohomology Theories; Gerbes in Quantum Steenrod written and revised by DBA Epstein. Annals of Mathe-
Field theory; Instantons: Topological Aspects; K-Theory; matics Studies, No. 50. Princeton, NJ: Princeton University Press.
Mathai-Quillen Formalism; Riemann Surfaces. Steenrod NE (1951) The Topology of Fibre Bundles. Princeton
Mathematical Series, vol. 14. Princeton, NJ: Princeton
University Press.
Further Reading Stong RE (1968) Notes on Cobordism Theory. Mathematical
Notes. Princeton, NJ: Princeton University Press; Tokyo:
Besse AL (1987) Einstein manifolds. Ergebnisse der Mathematik University of Tokyo Press.
und ihrer Grenzgebiete (3) [Results in Mathematics and Weyl H (1939) The Classical Groups. Their Invariants and
Related Areas (3)], p. 10. Berlin: Springer-Verlag. Representations. Princeton, NJ: Princeton University Press.

Chern–Simons Models: Rigorous Results


A N Sengupta, Louisiana State University, challenges for mathematicians. Most of the tremen-
Baton Rouge, LA, USA dous amount of mathematical activity generated by
ª 2006 Elsevier Ltd. All rights reserved. Witten’s discovery has been concerned primarily with
issues that arise after one has accepted the functional
integral as a formal object. This has left, as an
important challenge, the task of giving rigorous
Introduction
meaning to the functional integrals themselves and to
The relationship between topological invariants and rigorously derive their relation to topological invar-
functional integrals from quantum Chern–Simons iants. The present article will discuss efforts to put the
theory discovered by Witten (1989) raised several functional integral itself on a rigorous basis.
Chern–Simons Models: Rigorous Results 497

Chern–Simons Functional Integrals subject to the initial condition g(0) = I, the identity.
The path t 7! g(t) describes parallel transport along C
We shall describe here the typical Chern–Simons
by the connection A. If C is a loop then the final value
functional integral. For the purposes of this article,
g(1) is the holonomy of A around C. If R is a repre-
we will confine ourselves to a simpler setting rather
sentation of G on some finite-dimensional vector space
than the most general possible one. In fact, we shall
then the trace of R(g(1)) is the Wilson loop observable:
work with fields over three-dimensional Euclidean
space R 3 (instead of a general 3-manifold). WC;R ðAÞ ¼ trðRðgð1ÞÞÞ ½3
The typical Chern–Simons functional integral is of
Thus, we have specified the meaning of the terms
the form
appearing in the formal integral [1], where
Z
C1 , . . . , Cn of eqn [1] form a link (a family of
eiðk=4ÞSCS ðAÞ WC1 ;R1 ðAÞ . . . WCn ;Rn ðAÞDA ½1 nonintersecting, imbedded loops) in R3 and
A
R1 , . . . , Rn are finite-dimensional representations of
Our objective in this section will be to specify what G. Witten showed that, at least for suitable values of
the terms in this formal integral mean. Very briefly, k, integrals of this form ought to produce topologi-
the integration is with respect to a formal ‘‘Lebesgue cal invariants, which he identified, for the link.
measure’’ on A, an infinite-dimensional space of The integral [1] is problematic for several reasons.
geometric objects A called connections over R 3 with First, there is no reasonable and useful analog of
values in the Lie algebra LG of a group G. In the Lebesgue measure on an infinite-dimensional space.
first term in the integrand, in the exponent, k is a Even if one were to regularize this measure in some
real number, and SCS (A) is the Chern–Simons action simple way, one would run into the problem that the
for the connection A. Each term WCi ,Ri (A) is a measure would not live on the space of smooth
Wilson loop observable, the trace in some represen- connections, and so the integrand would become
tation Ri of the holonomy of the connection A meaningless.
around the loop Ci . The entire integral, formal There are several different approaches to a
though it may be, provides an invariant associated mathematical interpretation of [1]. The approach
with the system of loops C1 , . . . , Cn . that is often taken in practice is to simply ignore the
Let G be a compact Lie group; for ease of analytical problem and define the value of the
exposition, let us take G to be a closed, connected integral [1] to be what Witten’s calculations have
subgroup of U(n). Thus, each element of G is an given. One approach, used, for instance, by Bar-
n  n complex matrix g with g g = I, the identity. Natan (1995) is to expand the integrand in a series
The Lie algebra LG consists of all n  n matrices A and relate each individual integral in this expansion
which are skew-Hermitian, that is, satisfy A = A, separately to topological invariants. Discrete
and for which etA 2 G for all real numbers t. On LG approximation procedures to the continuum integral
there is a convenient inner product given by have also been explored. In the abelian case, infinite-
hA; Bi ¼ trðAB Þ dimensional oscillatory integral techniques have
been used to understand the functional integral.
This inner product is invariant under the conjuga- Fröhlich and King (1999) showed the possibility of
tion action of the group G on its Lie algebra LG. interpreting parallel transport using ideas from
By a connection over R3 we shall mean a C1 stochastic differential equations. Such an approach
1-form with values in LG. The set of all connections has been used successfully in the case of two-
is an affine (in our case, actually a linear) space A. If dimensional Yang–Mills theory, where the func-
A 2 A, then define tional integral actually corresponds to integration
Z with respect to a measure. In this article, we focus
SCS ðAÞ ¼ trðA ^ dA þ 23 A ^ A ^ AÞ ½2 on a method of understanding the normalized
R3 Chern–Simons functional integral in terms
This is, up to constant multiple, the Chern–Simons of infinite-dimensional distribution theory and
action functional. examining some ideas for understanding Wilson
Let A be a connection and consider a piecewise loop expectation values in this setting.
smooth path
C : ½0; 1 ! R3 Infinite Dimensional Distributions
With this one can associate a G-valued path [0,1] ! Let (x0 , x1 , x2 ) denote the usual coordinates on R 3 .
G : t 7! g(t) 2 G satisfying the differential equation Gauge symmetry, an issue which will not be
examined here, may be used to simplify the problem
g0 ðtÞgðtÞ1 ¼ AðC0 ðtÞÞ of the Chern–Simons integral. In particular, one
498 Chern–Simons Models: Rigorous Results

need only focus on connections which vanish in the The inner products h , ip give rise to a nuclear space
x2 -direction, that is, connections of the form structure on function spaces over E. Let U be the
A = A0 dx0 þ A1 dx1 . For such A, the triple wedge- algebra of functions on E 0 generated by the exponen-
product term in the Chern–Simons action disap- tials e^x , with x running over E and  over C. For each
pears, and we are left with the quadratic expression: p  0, there is an inner product hh , iip on U such that
Z DD 2 2 2 2
EE
SCS ðAÞ ¼ trðA ^ dAÞ ½4 e^x jxjp =2 ; e^y jyjp =2 
¼ ehx;yi p ½7
p
R3
For p = 0 the left-hand side coincides with the L2 ()
This is good, since the functional integral now
inner product. Let [E]p be the Hilbert space
involves a quadratic exponent and so stands a good
completion of U in the hh , iip inner product. Then
chance of rigorous realization, just as Gaussian
measure can be given rigorous meaning in infinite    ½E3  ½E2  ½E1  ½E0 ¼ L2 ðE 0 ; Þ ½8
dimensions. However, in the Chern–Simons situa-
tion, there is no hope of actually getting a measure, Let [E] = \p 0 [E]p , equipped with topology from all
not even a complex measure. the norms kkp , and [E]0 its topological dual.
The next best thing to a measure is a distribution Elements of [E]0 , being continuous linear functionals
or ‘‘generalized function.’’ A distribution over a space on the ‘‘test function space’’ [E], are called distribu-
Y is a continuous linear functional on a topological tions over E, in the language of white-noise analysis.
vector space of functions on Y. Thus, the objective is A fundamental tool in the study of infinite-
to realize the Chern–Simons functional integral as a dimensional distributions is the S-transform. This
continuous linear functional on some space of test generalizes the traditional Segal–Bargmann trans-
functions over A (more precisely, on an extension of form from the L2 -setting to the context of distribu-
A). Before turning to the specific case of the Chern– tions. Let E c be the complexification of E. The inner
Simons integral, let us examine some elements of the product h , i0 on E extends to a complex-bilinear
theory of infinite-dimensional distributions, in as pairing E c  E c ! C : (z, w) 7! z  w. The evaluation
much as they are relevant to our needs. pairing E 0  E ! R also extends naturally to the
Let us consider a Hilbert space E 0 , and a positive complexifications. For  a distribution belonging to
Hilbert–Schmidt operator T on E 0 . For each integer [E]0 , define a function S on E by
p  0, let E p = T p (E 0 ), which is a Hilbert space with SðzÞ ¼ ðcz Þ
the inner product hx, yip = hT p x, T p yi. Then we
have the chain of inclusions for all z 2 E c . Here cz is the coherent state function on
\ E 0 given by cz () = e(z)(1=2)zz . A fundamental and
E¼ Ep     E2  E1  E0 ½5 useful result in white-noise analysis, due originally to
p1 Potthoff and Streit, specifies the range of the transform
S and allows reconstruction of a distribution  from
with each inclusion E pþ1 ! E p being Hilbert–
the function S. Briefly, the range of S consists of
Schmidt. Let E p = E 0p be the topological dual of E p ,
functions which are holomorphic, in an appropriate
the space of continuous linear functionals on E p , and
sense, and have at most quadratic exponential growth.
let E 0 be the topological dual of E, where the latter is
In particular, this theorem implies that a function of the
given the topology generated by all the norms kkp .
form z 7! eazz , for any constant a, is in the range of .
Then we have the inclusions
[
E 0 ’ E 00  E 1  E 2      E 0 ¼ E p ½6
p0
Rigorous Realization of Chern–Simons
For each x 2 E there is the evaluation map Integrals
^ : E 0 ! R :  7! (x). A very special case of a general
x We return to the Chern–Simons context. As men-
theorem of Minlos guarantees that on the dual E 0 there tioned earlier, gauge symmetry may be invoked to
is a measure  on the sigma algeba generated by all the reduce the space of connections to the smaller space:
functions x ^ such that each x ^ is a Gaussian random
variable of mean zero and variance jxj20 , that is, E ¼X X ½9
Z 3
where X = S(R )
LG is the space of rapidly
2 2
eit^x d ¼ et jxj0 =2 decreasing functions with values in the Lie algebra
E0
LG. Let
for all x 2 E and t 2 R. This measure  is the !1
standard Gaussian measure on E 0 for the infinite- d2 x2
T1 ¼  2 þ
dimensional nuclear space E. dx 4
Chern–Simons Models: Rigorous Results 499

as a linear operator on L2 (R 3 ), T2 = T1
3
I the by  (x) = 3 (x=). Next, for a smooth loop
induced operator on L2 (R 3 )
LG, and T = T2 T2 . [0, 1] ! l(t) = (l0 (t), l1 (t), l2 (t)), let l (t) =  (  l(t)),
Then, as described in the preceding section, we have the scaled bump function centered now at the path
the space E and its dual E 0 . There is then the point l(t). Now consider a generalized connection
standard Gaussian measure  on E 0 , and the space A = (A0 , A1 ) 2 E 0 . Set
[E]0 of distributions over E 0 . 
The normalized Chern–Simons integral may be BlA ðtÞ ¼ A0 ðl ðtÞÞl0 ðtÞ0 þ A1 ðl ðtÞÞl0 ðtÞ1 ½13
viewed as a linear functional
The equation of parallel transport can be reformu-
Z
1 lated as a differential equation for a matrix-valued
CS : F 7! eiðk=4ÞSCS ðAÞ FðAÞDA ½10 
path t 7! PlA (t) satisfying
N E
where N is a ‘‘normalizing’’ factor. Rigorous mean- d l  
P ðtÞ þ BlA ðtÞPlA ðtÞ ¼ 0 ½14
ing can be given to this by first formally working out dt A
what the S-transform of CS ought to be. Calcula- 
and the initial condition PlA (t) = I. With this smear-
tion shows that S is indeed a holomorphic function
ing, one can consider functions of the form
on E c of quadratic growth. The Potthoff–Streit
theorem then implies that CS does exist as a Y
n


distribution in the space [E]0 . Let us examine this W ðL; AÞ ¼ trðPli ðAÞÞ ½15
i¼1
in some more detail.
As before, we take A to be of the form for a link L consisting of loops l1 , . . . , ln , instead of
A = A0 dx0 þ A1 dx1 , with the component A2 equal the classical Wilson loop variable.
to 0. Integration by parts shows that At this stage, it would be natural to consider
Z taking  # 0 in (W (L)). However, this is still
k k problematic. A further regularization is needed,
SCS ðAÞ ¼  trðA0 @2 A1 Þ dvol ½11
4 2 R3 roughly corresponding to the geometric notion of
A formal computation reveals that S(CS )(j) should framing. In the definition of CS , alteration is made
be given by to the quadratic form Q(j, j) in the exponent which
  appears in the expression for S(CS ), replacing it
2i  1  with Q(j, s j), where {s }s>0 is a family of suitable
exp tr j0 @2 j1 ½12
k diffeomorphisms of R3 , with 0 being the identity.
In a sense, this splits a single loop l into l and a
where j = (j0 , j1 ), and neighboring loop s l. At the end, one has to take
Z s # 0. The resulting limiting value is the expected
1
@21 f ðxÞ ¼ ds½1ð1;x2  ðsÞ  1½x2 ;1Þ ðsÞ f ðx0; x1; sÞ link-invariant. We shall not go into the case of
2
nonabelian G, which is more complex, for which
The Potthoff–Streit criterion implies the existence of work continues to be in progress.
a distribution CS , whose S-transform is given by the Infinite-dimensional distributions can be used to
above expression. formulate a rigorous theory for normalized Chern–
The distribution CS is, however, not a suffi- Simons functional integrals. The more specific ques-
ciently powerful object to allow determination of tions raised by the Wilson-loop integrals in this setting
the Wilson loop expectations that one would really opens up new problems for further developments in
like to have. For instance, CS does not live on the the distribution theory, connecting geometry, topol-
space of smooth connections and so the meaning of ogy, and infinite-dimensional analysis.
parallel transport needs to be defined. The state of
knowledge, at the rigorous level, at this point is still
evolving, with progress reported by A. Hahn. We Acknowledgments
describe some ideas for the Wilson loop expecta-
tions in the following. This research is supported by US NSF grant DMS-
The strategy for defining parallel transport along 0201683.
a path is to smear out the path by means of bump
See also: BF Theories; Feynman Path Integrals;
functions and essentially replace the path by a path
Fractional Quantum Hall Effect; Knot Theory and
of test functions in E. The description given here is Physics; Large-N and Topological Strings; Large-N
mainly for the case of abelian G. Choose first a C1 Dualities; Quantum 3-Manifold Invariants; Quantum Hall
non-negative bump function on R3 , vanishing Effect; Spin Foams; String Field Theory; Topological
1
outside the unit ball and having L norm equal to 1. Quantum Field Theory: Overview; Twistor Theory: Some
For  > 0, let  be the scaled bump function given Applications.
500 Classical Groups and Homogeneous Spaces

Further Reading Fröhlich J and King C (1989) The Chern–Simons theory and Knot
polynomials. Communications in Mathematical Physics
Albeverio S, Hahn A, and Sengupta AN (2003) Chern–Simons 126: 167–199.
theory, Hida distributions, and state models. Infinite Dimen- Kondratiev Yu, Leukert P, Potthoff J, Streit L, and Westerkamp W
sional Analysis Quantum Probability and Related Topics (1996) Generalized functionals in Gaussian spaces – the
6: 65–81. characterization theorem revisited. Journal of Functional
Albeverio S and Schäfer J (1994) Abelian Chern–Simons Analysis 141 (suppl. 2): 301–318.
theory and linking numbers via oscillatory integrals. Kuo H-H (1996) White Noise Distribution Theory. Boca Raton,
Journal of Mathematical Physics (N.Y.) 36 (suppl. FL: CRC Press.
5): 2135–2169. Landsman NP, Pflaum M, and Schlichenmaier M (2001)
Albeverio S and Sengupta A (1997) A mathematical construction Quantization of Singular Symplectic Quotients. Basel–Boston–
of the non-Abelian Chern–Simons functional integral. Com- Berlin: Birkhäuser.
munications in Mathematical Physics 186: 563–579. Leukert P and Schäfer J (1996) A rigorous construction of Abelian
Altschuler D and Freidel L (1997) Vassiliev Knot invariants and Chern–Simons path integrals using White Noise analysis. Rev.
Chern–Simons perturbation theory to all orders. Communica- Math. Phys. 8 (suppl. 3): 445–456.
tions in Mathematical Physics 187: 261–287. Sen Samik, Sen Siddhartha, Sexton JC, and Adams DH (2000)
Atiyah M (1990) The Geometry and Physics of Knot Polyno- Geometric discretization scheme applied to the Abelian
mials. Cambridge: Cambridge University Press. Chern–Simons theory. Physical Review E 61: 3174–5185.
Bar-Natan D (1995) Perturbative Chern–Simons theory. Journal Simon B (1971) Distributions and their Hermite expansions.
of Knot Theory and its Ramifications 4: 503. Journal of Mathematical Physics (N.Y.) 12: 140–148.
Chern S-S and Simons J (1974) Characteristic forms and Witten E (1989) Quantum field theory and the Jones polynomial.
geometric invariants. Annals of Mathematics 99: 48–69. Communications in Mathematical Physics 121: 351–399.

Classical Groups and Homogeneous Spaces


S Gindikin, Rutgers University, Piscataway, NJ, USA
interpretation (see below the consideration of the
ª 2006 Elsevier Ltd. All rights reserved. cone of symmetric positive forms). Between classical
manifolds there are Minkowski space, Grassman-
nians, and multidimensional analogs of the disk and
Classical groups are Lie groups corresponding to the half-plane. A substantial part of this theory is a
three classical geometries – linear, metric, and matrix geometry, which serves as a background for
symplectic. Let us start with the complex field C. matrix analysis. A rich geometry on classical
We consider the linear space Cn and the group manifolds with many symmetries is a background
GL(n; C) of its automorphisms – nondegenerate for a rich multidimensional analysis with many
(invertible) linear transformations. The complex explicit formulas. Classical geometries, starting with
linear metric space is the space Cn endowed by a Minkowski geometry, have appeared in some
nondegenerate symmetric bilinear form; the orthogo- problems of mathematical physics.
nal group O(n; C) is the subgroup in GL(n; C) of A crucial technical fact is the embedding of the
automorphisms of this structure. If, for n = 2l, we classical groups in the class of semisimple Lie groups;
replace the symmetric form by a nondegenerate skew- it gives a very strong unified method to work with
symmetric form, we obtain the linear symplectic semisimple groups and corresponding geometries – the
space and the group Sp(l; C) of its automorphisms – method of roots. Nevertheless, some special realiza-
the symplectic group. tions and constructions for classical groups can also be
A fundamental observation of nineteenth century very useful. A very impressive example is the twistors
geometry was that the transfer from the complex of Penrose, where an initial construction is the
field to the real one, gives not only three corres- realization of points of four-dimensional Minkowski
ponding groups for R but a much reacher collection space as lines in three-dimensional complex projective
of real forms of complex classical groups: unitary, space. We mention below some general facts about
pseudounitary, pseudoorthogonal, etc. (see below). semisimple groups and homogeneous manifolds, but
Classical geometries correspond to homogeneous the focus will be on special possibilities for the classical
manifolds with classical groups of transformations. groups. The class of simple Lie groups contains,
Geometers understood that this produces a very besides the classical groups, only a finite number of
reach world of non-Euclidean geometries, including exceptional groups which are also very interesting and
the first example of non-Euclidean geometry – are connected, in particular, with noncommutative
hyperbolic geometry. Some classical algebraic the- and nonassociative geometries; they have applications
ories through such an approach obtain a geometrical to mathematical physics.
Classical Groups and Homogeneous Spaces 501

Complex Groups and Homogeneous Flag Manifolds


Manifolds These homogeneous spaces F = G=P with semi-
Complex Classical Groups simple (in our case with classical) groups G have
parabolic subgroups P as the isotropy subgroups.
The complete linear group GL(n; C) is the group of The group G = GL(n; C) transitively acts on the
nongenerate matrices g of order n (det g 6¼ 0) and the flag manifolds F(n1 , . . . , nr ), 0 < n1 <    < nr < n,
special linear group SL(n; C) is its subgroup of whose elements are (n1 , . . . , nr )-flags – sequences of
matrices with the determinant equal 1 (unimodular embedded subspaces in Cn of the dimensions
condition). The unimodular condition kills the one- (n1 , . . . , nr ). The isotropy subgroup P = P(n1 , . . . , nr )
dimensional center, perhaps, leaving only a finite is the subgroup of blocktriangle matrices with the
center. We realize the direct products of several copies diagonal blocks of sizes k1 , . . . , krþ1 , kj = (nj 
of complete linear groups with different dimensions, nj1 ), n0 = 0, nrþ1 = n. The flag manifolds are com-
for example, GL(k; C)  GL(l; C), as the groups of the pact complex manifolds. The matrices proportional
blockdiagonal nondegenerate matrices. The letter S to the unit matrix En act trivially and we can
always means that we take matrices with determinant consider instead of the action of G = GL(n; C) the
1. So the notation S(L(k; C)  L(l; C)) means that we transitive action of G = SL(n; C).
take blockdiagonal matrices with blocks of sizes k, l Let us pay particular attention to two extremal
and with the determinant 1. cases. The first one is the case of the maximal
Let I be a nondegenerate symmetric matrix of flag manifold when we have the sequence of
order n; then the orthogonal group O(n; C) is the all integers (1, 2, 3, . . . , n  1) – complete flags; the
subgroup in GL(n; C) of matrices preserving the subgroup P in this case is called Borelian. Another
corresponding symmetric form so that case is minimal flag manifolds with r = 1 (for them
g> Ig ¼ I the unipotent radical of the parabolic subgroups is
commutative). Then in the case of SL(n; C) the
These matrices can have the determinant 1. The sequence has only one element n1 = k < n and we
special orthogonal group SO(n; C) is the subgroup have Grassmannian manifolds GrC (k; n) = F(k) of
of orthogonal matrices with determinant 1. Differ- k-dimensional subspaces in Cn . If k = 1 or k = n  1,
ent I’s give isomorphic orthogonal groups since they we obtain the dual realizations of the complex
are all linearly equivalent. If we take as I the unit projective space CPn1 . We can interpret points
matrix E = En , then we receive the group of of GrC (k; n) also as (k  1)-dimensional planes in
orthogonal matrices in the classical sense: g> g = E. CPn1 .
If n = 2l and we replace in this definition the We can define points of the projective space
symmetric matrix I by a nondegenerate skew- CPn1 by homogeneous coordinates – as the
symmetric matrix J, we obtain the symplectic equivalency classes (z  cz, z 2 Cn n {0}, c 2 C n 0).
group Sp(l; C). Again, different J’s give isomorphic For the Grassmannians we can similarly use matrix
groups. The typical example of J is homogeneous coordinates (Stiefel’s coordinates):
  classes of (k  n)-matrices Z 2 Mat(k, n) of the
0 El maximal rank k relative to the equivalency

El 0
Z  uZ; u 2 GLðk; CÞ
It is convenient then to represent matrices g as
  The rows of a matrix Z correspond to a base in
A B subspace with the homogeneous coordinate Z; the

C D left multiplication on a matrix u replaces this base,
where the blocks A, B, C, D are matrices of order l. but does not change the subspace. The group
Then the symplectic condition is that A> D  GL(n; C) acts by right multiplications:
C> A = E and matrices A> C and D> B are symmetric. Z 7! Zg
If C = 0 then D = (A> )1 and A1 B is a symmetric
matrix. In this way, we have in Sp(l; C) a subgroup and this action preserves the equivalency classes.
P of blocktriangular matrices of a very simple Suppose k  n  k and the left k-minor of Z is not
structure; it is an example of subgroups which are zero. Such matrices give the dense coordinate chart
called parabolic. Ck(nk) : we can pick in the equivalency classes the
There are two principal classes of homogeneous representatives (Ek , z), z 2 Mat(k, n  k), and con-
spaces with complex semisimple Lie groups: flag sider the matrices z as (inhomogeneous) local
manifolds and Stein manifolds. coordinates. In the inhomogeneous coordinates the
502 Classical Groups and Homogeneous Spaces

action of the group has a matrix fractional linear (inhomogeneous) coordinate chart we obtain the
form: let condition that the matrix z is symmetric. Thus, we
  have the (dense) coordinate chart on the Lagrangian
A B Grassmannian CN = Sym(k), N = k(k þ 1)=2 – the

C D linear space of symmetric matrices.
A 2 MatðkÞ; D 2 Matðn  kÞ; There is one more type of minimal flag manifolds
B 2 Matðk; n  kÞ; C 2 Matðn  k; kÞ for the orthogonal group SO(n; C) – the quadric Q
in the projective space:
Then we have the transformation in inhomogeneous
coordinates: IðzÞ ¼ zIz> ¼ 0
where rows z 2 Cn n{0} represent, in homogeneous
z 7! ðA þ zCÞ1 ðB þ zDÞ coordinates, points in CPn1 . If I = En we have the
The condition C = 0 defines the parabolic sub- equation (z1 )2 þ    þ (zn )2 = 0. This quadric is the
group which has affine action in inhomogeneous complex compact conformal flat manifold
coordinates which is transitive in the coordinate CCN , N = n  2; it is the compactification of CN
chart. In such a way the Grassmannian is a endowed with the flat conformal structure corre-
compactification of Ck(nk) (realized as a space of sponding to the quadratic isotropic cone. The
k  (n  k) matrices). If n = 2k, we can consider it as parabolic group is generated by linear conformal
the compactification of the space of square matrices transformations and translations. On the quadric Q
z of order k with the flat generalized conformal the conformal structure is defined by intersections of
structure defined by translations of the isotropy cone tangent spaces with Q. Apparently, this structure is
{det z = 0}. invariant relative to the natural action of SO(n; C).
There are similar constructions of flag manifolds
for other classical groups. We will consider only the Classical Stein Manifolds
minimal flag manifolds. For O(2k; C) we consider
the isotropic Grassmannian GrIC (2k; C) of isotropic Such homogeneous complex manifolds X = G=H have
k-subspaces relative to the symmetric form I. We complex reductive isotropy subgroups H. Contrary to
take the matrix realization of GrC (k; 2k), using the flag manifolds which are compact, these manifolds
Stiefel’s homogeneous coordinates, and add the are Stein ones and there are many holomorphic
matrix equation functions on them. The typical examples for
G = GL(n; C) are homogeneous spaces S(k1 , . . . ,
ZIZ> ¼ 0 krþ1 ), n = k1 þ    þ krþ1 , for which the isotropy sub-
groups are blockdiagonal matrices with the blocks of
which is well defined in the homogeneous coordi- sizes k1 , . . . , krþ1 . Then points of the manifold can be
nates (compatible with the equivalency classes) and realized as generic sets of subspaces Lj  Cn ,
defines isotropic subspaces relative to I. This matrix dim Lj = kj , 1  j  r þ 1 or, what is equivalent, gen-
cone is preserved by the subgroup O(2k; C)  eric sets of (kj  1)-dimensional planes in CPn1 . Since
GL(2k; C) corresponding to the matrix I. If we the isotropy subgroup of such a homogeneous space is a
take the symmetric matrix subgroup of the parabolic subgroup P(n1 , . . . , nr ),
  kj = nj  nj1 , we have the natural fibering S(k1 , . . . ,
0 Ek
I¼ krþ1 ) ! F(n1 , . . . , nr ) (it is simple to see this geo-
Ek 0
metrically: the ith subspace of a flag in the base is the
then in inhomogeneous coordinates (z is a square direct sum of first i subspaces representing a point in
k-matrix) this equation is transformed into the the fiber). This is a convenient tool to apply
condition that the matrix z is skew-symmetric. So, complex analysis on S to the compact manifold F
in a natural sense, the isotropic Grassmannian is where there are no nontrivial holomorphic functions.
the compactification of the linear space of skew- Let us emphasize that such a connection exists only
symmetric matrices Alt(k) = CN , N = k(k  1)=2. for special classes of classical Stein manifolds.
A similar construction makes sense for the Let us pay special attention to the subclass of
symplectic group: if we replace the symmetric form symmetric Stein manifolds. For such manifolds X, the
I with the skew-symmetric form J, we obtain the isotropy subgroup H is fixed relative to a holomorphic
equation of the matrix cone representing the involutive automorphism of G. Complex semisimple
Lagrangian Grassmannian GrLC (k; 2k) of Lagrangian Lie groups G (including classical ones) are symmetric
subspaces in 2k-dimensional linear symplectic space. Stein manifolds relative to the action of their square
If we were to choose J as above, then in the G  G by left and right multiplications.
Classical Groups and Homogeneous Spaces 503

Classical Stein manifolds for SL(n; C) considered Similarly, we can interpret the local isomorphism
above are symmetric if r = 1 and we have the SO(4; C) ffi SL(2; C)  SL(2; C). We realize C4 as the
manifold of pairs of subspaces of complimentary space of square matrices z of order 2 with the
dimensions intersecting only on {0}. The simplest symmetric quadratic form I(z, z) = det (z). Then left
example is the manifold of pairs of different points and right multiplications of z on unimodular
of the projective line CP1 . Let us point out again matrices (z 7! uzv, u, v 2 SL(2; C)) induce orthogonal
that the transition to the generic pairs of points transforms for the form I and any orthogonal
transforms the compact complex manifold without transform can be represented in such a form (one
nonconstant holomorphic functions into a Stein can see it by the calculation of dimensions).
manifold with a large collection of holomorphic The local isomorphism SL(4; C) ffi SO(6; C) has a
functions. slightly more complicated nature. Let us consider the
Some other examples of symmetric Stein mani- Grassmannian GrC (2; 4) of lines in the projective
folds are connected with classical geometry and space CP3 with 2  4 matrices Z as matrix homo-
linear algebra. The affine hyperboloid in Cn , geneous coordinates. Let pij , i < j, be the minors of Z
with ith and jth columns. They are called Plücker
QðzÞ ¼ 1
coordinates on GrC (2; 4): the equivalency class of
is a symmetric space for G = O(n; C), H = O(n  1; C). Z is defined by the sequence of six numbers
We can compare it with the projective quadric p = (pij , 1  i < j  j) 6¼ (0, . . . , 0) up to a constant
Q(z) = 0 which is a minimal flag manifold. Let us factor. Thus, we have an imbedding of GrC (2; 4) in the
remark that there is a duality here: it is possible to projective space CP5 . The image will be the quadric
interpret points of the hyperboloid of dimension n
p12 p34  p13 p24 þ p14 p24 ¼ 0
as generic hyperplane sections of the projective
quadric of dimension n  1. Thus, we have the isomorphism of two flag manifolds
The space X of complex symmetric matrices of and the action of SL(4; C) on the Grassmannian
order n with determinant 1 is symmetric for the transforms in orthogonal transformations of four-
group SL(n; C) which acts by the changes of dimensional quadric in CP5 . The Plücker coordinates
variables in the corresponding quadratic forms: can be defined for any Grassmannian, but they do not
produce in other cases some isomorphisms with other
z 7! g> zg; g 2 SLðn; CÞ flag manifolds; nevertheless, they realize them as
The transitive action reflects the possibility of intersections of quadrics in projective spaces.
transforming such a form into a sum of squares.
The isotropy subgroup is SO(n; C).
The Stein symmetric manifold X = SO(n; C)= Compact Classical
S(O(k; C)  O(n  k; C)) is realized as the manifold Homogeneous Manifolds
of k-dimensional subspaces in Cn on which the
restriction of the principal symmetric form I is Compact classical groups U(n), SU(n), O(n), SO(n),
nondegenerate. Sp(l) are maximal compact subgroups in the corre-
sponding classical complex groups GL(n; C), SL(n; C),
O(n; C), SO(n; C), Sp(l; C). This condition defines
Isomorphisms in Small Dimensions
them up to an isomorphism. They are fixed subgroups
Isomorphisms of classical groups in small dimen- of some antiholomorphic involutive automorphisms.
sions produce isomorphisms of some classical The unitary groups U(n) and SU(n) are the groups
homogeneous manifolds. Such isomorphisms were of unitary matrices (g
g = E,) correspondingly, of
very important in the history of geometry; below are unitary matrices with determinant 1. As the compact
a few examples. We will consider local isomorph- orthogonal group we can take the intersection U(n) \
isms (up to a finite center). We have SL(2; C) ffi O(n; C). For the standard form I, it will be the group of
SO(3; C). Let us realize C3 as the space of symmetric real orthogonal matrices: g> g = E (so the involution in
matrices z of order 2. Then, as we remarked above, O(n; C) is the conjugation g 7! g). Similarly, we can
the two-dimensional submanifold X of matrices take Sp(l) = SU(2l) \ Sp(l; C) (then the involution is
with determinant 1 is the symmetric Stein manifold g 7! JgJ).
for the group SL(2; C). On the other hand, we can Compact classical groups act on compact homo-
take det z as the quadratic symmetric form I in C3 ; geneous Riemann manifolds. There are two mech-
then X is the hyperboloid for this form and the anisms connecting compact and complex
action of SL(2; C) on symmetric matrices gives the homogeneous manifolds. We observe the first
orthogonal transformations relative to this form I. possibility in the case of flag manifolds which are
504 Classical Groups and Homogeneous Spaces

compact. We considered them so far relative to the real Grassmannian GrR (k; n) of k-subspaces in Rn
action of complex (noncompact) groups. It turns out can be defined as SO(n)=S(O(k)  O(n  k)). This
that on the flag manifold F = G=P the maximal representation corresponds to the characterization
compact subgroup U  G continues to be transitive: of subspaces by orthonormal bases. The considera-
so we can consider flag manifolds also as being tion of arbitrary bases defines the action of the
homogeneous with compact groups. Then F = U=C, larger group GL(n; R) on GrR (k; n). Relative to this
where C is the centralizer of a torus in U. There is a action, the real Grassmannian is not symmetric since
Kähler metric on F, invariant relative to U. Thus, G the isotropy subgroup is parabolic and is not
is the group of all automorphisms of F as the involutive. Such a possibility to extend the group is
complex manifold, but U is the group of its typical for a class of compact symmetric manifolds
automorphisms as the Kähler manifold. It defines called symmetric R-spaces. They are real forms of
two sides of geometry of flag manifolds: complex Hermitian compact symmetric manifolds (minimal
and Kähler. Flag manifolds are the only compact flag manifolds). Let us also mention compact
homogeneous Kähler manifolds with semisimple Lie symmetric spaces SU(n)=SO(n), which is the compact
groups (the class of all compact Kähler manifolds form of the space of unimodular symmetric matrices
also contains locally flat compact manifolds – and can be presented by the submanifold of unitary
toruses). In the example considered above we have matrices in it. Also, all compact Lie groups G are
F(n1 , . . . , nr ) = SU(n)=S(U(k0 )     U(kr )). In the lan- symmetric spaces relative to the action of G  G.
guage of Stiefel (homogeneous) coordinates, we fix a
positive Hermitian form in Cn and characterize
subspaces by orthonormal bases. For r = 1 we have
Noncompact Riemannian
Grassmannians GrC (k; n), in particular the projec-
Symmetric Manifolds
tive space CPn1 which we consider relative to the
action of the unitary groups. Relative to this action This class of symmetric manifolds has the strongest
they are Hermitian symmetric spaces. In the case of connections with classical mathematics. Let us
minimal flag manifolds for other groups the action consider noncompact real semisimple Lie groups –
of maximal compact subgroups also defines on them real forms of complex semisimple Lie groups. They
the structure of compact Hermitian symmetric correspond to antiholomorphic involutions in com-
spaces. Let us emphasize that relative to noncom- plex groups.
pact groups of biholomorphic automorphisms G, Between real forms of SL(C, n) there are real and
the minimal flag manifolds (including the Grass- quaternionic unimodular groups SL(R, n), SL(H, n)
mannians) are not symmetric. and pseudounitary groups SU(p, q) of complex
In the case of homogeneous Stein manifolds matrices preserving a Hermitian form H of the
X = G=H, the picture is different: the maximal signature (p, q). The complex orthogonal group has
compact subgroups have no open orbits. There are as real forms, in particular, pseudoorthogonal
totally real orbits which are the compact forms of groups SO(p, q) of real matrices preserving a
X: XR = GR =HR , where GR and HR are compact quadratic form of the signature (p, q).
forms of G and H, respectively. It is the canonical Let G be a real simple Lie group and K be its
embedding of compact homogeneous manifolds maximal compact subgroup. Then X = G=K is a
in their complexifications. The important special Riemann symmetric manifold of noncompact type;
case is the embedding of compact symmetric K is defined by an involutive automorphism of G.
manifolds in the Stein symmetric manifolds – their Therefore, in irreducible situation there is a corre-
complexifications. spondence between noncompact Riemann sym-
For compact symmetric manifolds X = U=K the metric manifolds and real simple noncompact Lie
groups U, K are compact Lie groups and elements groups. K-orbits on X are parametrized by points of
of K are fixed for an involutive automorphism  the orbit on X of a maximal abelian subgroup A –
such that K contains the connected component of the Cartan subgroup of the symmetric space X. Its
the subgroup of all fixed elements of . This dimension l is the important invariant of X – its
possibility to connect several symmetric manifolds rank. The algebraic base for geometry of X is the
with one involution is illustrated by the next Iwasawa decomposition
example. The sphere Sn1  Rn is the symmetric
G ¼ KAN
space SO(n)=SO(n  1); the real projective space
RPn1 is SO(n)=O(n  1). Here SO(n  1) is the where N is a maximal unipotent subgroup (in a
connected component of O(n  1) and Sn1 is a natural sense compatible with A). Then the para-
double covering of RPn1 . A few more examples, the bolic subgroup P = AN is transitive on X.
Classical Groups and Homogeneous Spaces 505

Symmetric Cones Jordan algebras (Faraut and Koranyi 1994). Such


cones participate as elements of explicit construc-
Let us start with X = GL(n, R)=O(n). This manifold
tions of other classes of symmetric spaces (see
corresponds to the classical theory of quadratic
below).
forms: X can be realized as the manifold Symþ (n) of
Following Siegel, it is possible to connect with
symmetric positive matrices x 0 of order n
homogeneous self-dual cones multidimensional ver-
(corresponding to positive quadratic forms). Then
sions of Euler integrals (- and B-functions) (Faraut
the transitivity of GL(n; R) on X corresponds to the
and Koranyi 1994). They have many applications,
possibility to transform positive forms to a sum of
including those to integral formulas for complex
squares. The sufficiency of triangle matrices for such
symmetric domains.
transformations corresponds to the transitivity on
X = Symþ (n) of the parabolic subgroup P of (upper)
Riemann Symmetric Manifolds of Rank 1
triangle matrices with positive diagonal elements. So
A is the group of diagonal matrices with positive The first example of non-Euclidean geometry is
elements and the submanifold of diagonal matrices connected with the Riemann symmetric manifolds of
in X parametrizes K-orbits. The general fact about rank 1 – hyperbolic spaces; X = SO(1, n)=O(n) is the
A-parametrization in this example is the classical hyperbolic space of dimension n. It can be realized
fact about the reduction of quadratic forms to as the upper sheet of the two-sheeted hyperboloid:
diagonal form by orthogonal transformations.
x20  x21      x2n ¼ 1; x0 > 0
There are complex and quaternionic versions
of this picture. The symmetric manifold Pseudoorthogonal linear transformations from
X = GL(n; C)=U(n) is realized as the manifold SO(1, n) preserve this surface; they play the role of
Hermþ (n) of positive complex Hermitian matrices hyperbolic motions. The equivalent realization is in
(forms) and similarly classical facts of linear algebra the real ball x21      x2n < 1 relative to the
on Hermitian quadratic forms are transformed into projective transformations preserving this ball.
geometrical statements on symmetric spaces. Let us Another example of a Riemann symmetric mani-
emphasize that we consider here the group GL(n; C) fold of rank 1 is the complex hyperbolic symmetric
as the real group. The same situation exists with the space X = SU(1; n)=U(n). It can similarly be realized
manifold Hermþ (H, n) of positive quaternionic either as the hyperboloid
Hermitian matrices, which is the symmetric mani-
fold for the real group GL(n; H). jz0 j2  jz1 j2      jzn j2 ¼ 1
These three manifolds can be included in an in Cnþ1 relative to pseudounitary linear transforma-
impressive geometrical structure. They all are con- tions or as the complex ball jz1 j2 þ    þ jzn j2 < 1
vex homogeneous cones V in linear spaces RN which relative to complex projective transformations pre-
are self-dual (V = V
) relative to a bilinear form serving it. There are also quaternionic hyperbolic
h , i. Let us recall that spaces which are realized as the quaternionic balls in
V
¼ fx; hx; yi > 0; y 2 V n 0g the quaternionic projective spaces. These three series
exhaust all classical Riemann symmetric manifolds
Here V is the closure of V. So these three symmetric of rank 1 of noncompact type. There is only one
manifolds are linear homogeneous self-dual cones. exceptional symmetric manifold of rank 1: it has the
There is only one more type of classical homo- dimension 16 and can be interpreted as a two-
geneous self-dual cones – quadratic (Lorentzian) dimensional ball for Cayley numbers.
cones
Classical Symmetric Domains in Cn
Ln ¼ fx 2 Rnþ1 ; x21  x22      x2nþ1 > 0; x1 > 0g (Cartan Domains)
which is also called the future light cone (the Riemann symmetric manifolds of noncompact type
condition x1 < 0 defines the past light cone). The which admit an invariant complex structure also
group of linear automorphisms of this cone is have an invariant Hermitian form corresponding to
SO(1, n)  Rþ ; the first factor is the Lorentz group. the Riemann metrics. For this reason, we will call
There is also one exceptional 27-dimensional them noncompact Hermitian symmetric manifolds
cone; it is possible to interpret this cone as the (we considered above the compact Hermitian sym-
cone of positive Hermitian matrices of third order metric manifolds). They are Stein manifolds, but as
over Cayley numbers. There is a very nice structural opposed to symmetric Stein manifolds, which we
theory of homogeneous self-dual cones; it is con- considered above, they are homogeneous relative to
venient to develop this theory in the language of real groups. The condition for a Riemann symmetric
506 Classical Groups and Homogeneous Spaces

manifold X = G=K to be Hermitian is that K has an have the realization of this Hermitian symmetric
one-dimensional center. All Hermitian symmetric space as a bounded domain in CN , N = kq. In the
manifolds of noncompact type can be realized as case k = 1, we have the usual (scalar) complex ball.
bounded domains in Cn (but, of course, not all their Let us remark that the edge of the boundary
holomorphic automorphisms extend in Cn ). In the (Shilov’s boundary) is the compact symmetric space
case of classical manifolds, these domains are called
zz
¼ Ek
Cartan’s domains: Cartan gave their explicit matrix
realizations. with the group of automorphisms S(U(k)  U(q))
The nature of groups of holomorphic automorph- (the isotropy subgroup of X). For k = q the edge
isms of symmetric domains X = G=K  CN is coincides with the set of unitary matrix U(k).
explained by Cartan’s duality. Each such domain Different forms H of the signature (k, q) are
(Hermitian symmetric manifold of noncompact linearly equivalent and they correspond to different
type) admits an embedding in a Hermitian sym- (biholomorphically equivalent) realizations of this
metric manifold of compact type XC such that the Hermitian symmetric spaces. Let us, in the beginning,
complexification GC of G is the group of holo- set k = q; the inhomogeneous matrix coordinates are
morphic automorphisms of XC (correspondingly, square matrices of order k. Let us take the form
D is an open G-orbit on XC ). Moreover, X lies  
inside a (Zariski open) coordinate chart CN , which 0 iEk
H2 ¼
is an orbit of a parabolic subgroup. iEk 0
The simplest example is the complex ball CBn Then, in inhomogeneous matrix coordinates, we
(complex hyperbolic space) imbedded in the com- have the domain X2 :
plex projective space CPn . The affine chart Cn is the
orbit of the parabolic subgroup of affine transfor- 1
ðz  z
Þ 0
mations. Let us consider more complicated i
examples. (complex matrices with positive skew-Hermitian
Let XC be the Grassmannian GrC (k; n), q = n  parts). This domain (but not its boundary) lies in
k p; we will use matrix homogeneous coordinates the chart. It has the structure of the tube domain
Z – k  n matrices – for the description of the T = R n þ iV, n = k2 , corresponding to the symmetric
symmetric domain. Then GC = SL(n; C). Let us take cone of positive Hermitian matrices (we take the
its real form G = SU(k; q), k þ q = n. We fix a space of such matrices as a real form of Cn ). The
Hermitian form H of the signature (k, q) and realize group of affine transformations of the tube domain:
G as the group of matrices preserving H:
z 7! uzu
þ a; u 2 GLðk; CÞ; a 2 HermðkÞ
gHg
¼ H
is transitive on X2 ; it is the parabolic subgroup in
Then X = Xk, q = SU(k, q)=S(U(k)  U(q)) can be rea- SU(k, q).
lized as the domain in the Grassmannian The biholomorphic equivalency of the realizations
of X corresponding to different H is induced by the
ZHZ
0
equivalency of these forms. We have
so that this Hermitian matrix of order k must be pffiffiffi  
positive. It is essential that this condition is invariant
2 Ek iEk
H2 ¼ H1  ;  ¼
relative to multiplications of Z on nondegenerate 2 iEk Ek
matrices u on the left and, therefore, it is a well- Then the transform Z 7! Z transforms X2 in X1 . In
defined condition in homogeneous coordinates. inhomogeneous coordinates it is the fractional linear
Let us specify the choice of H: matrix transform
 
Ek 0
H1 ¼ z 7! iðz þ iEk Þ1 ðz  iEk Þ
0 Eq
It is the matrix version of the classical Cayley transform.
Then the corresponding domain X1 is defined in Similarly, we can write down the inverse transform.
inhomogeneous coordinates Z = (Ek , z), z 2 Mat(k, q), If q 6¼ k, then there is also an analog of the tube
by the condition realization. Let r = q  k > 0 and
Ek  zz
0 0 1
0 iEk 0
This matrix ball lies completely in the coordinate H2 ¼ @iEk 0 0 A
chart Ckq . Its rank is equal to min (k, q). Thus, we 0 0 Er
Classical Groups and Homogeneous Spaces 507

Let us represent the inhomogeneous coordinates The corresponding tubes are called the future (past)
as z = (Ek , w, u), w 2 Mat(k), u 2 Mat(k, r). Then the tube, depending on which light cone was taken.
domain X2 is defined by the condition Let us consider this construction. The group of
holomorphic automorphisms of these domains is
1
ðw  w
Þ  uu
0 G = SO(2; n) – the conformal extension of the
i Lorentz group. To realize this group, let us fix a
This is an example of Siegel domains of the second real symmetric matrix Q of signature (2, n) and the
kind (Pyatetskii-Shapiro 1969). This domain has a group is the group of linear transformations preser-
transitive group of affine transformations: ving simultaneously the quadratic symmetric and
Hermitian forms with this matrix Q:
ðw; uÞ 7! ðw þ a þ 2ub
þ bb
; u þ bÞ
a 2 HermðkÞ; b 2 Matðk; rÞ g> Qg ¼ Q; g
Qg ¼ Q
ðw; uÞ 7! ðcwc
; cuÞ c 2 GL ðk; CÞ The standard realization corresponds to the diagonal
matrix Q with the diagonal (1, 1, 1, . . . , 1).
This class of symmetric domains in Grassman- Cartan’s domains of the fourth class are connected
nians is called Cartan’s domains of the first class. components of the manifold
There are similar constructions for minimal flag
domains (compact Hermitian symmetric spaces) ZQZ> ¼ 0; ZQZ
> 0
with other groups. Let us consider the Lagrangian
where rows Z are homogeneous coordinates in the
Grassmannian GrLC (k; 2k) corresponding to the
projective space CPnþ1 . In other words, we consider
form J above. Here GC = Sp(k, C). Its real form
a domain on the quadric in the projective space
G = Sp(k; R) can be realized as the subgroup
(which is the complex flat conformal space CCn ).
of complex symplectic matrices preserving a
For the standard Q the domain will lie in the
Hermitian form H of the signature (k, k). In other
coordinate chart; thus it is the bounded realization.
words, we intersect the domains from the last
For the tube realization, we take
example with the Lagrangian Grassmannians. We
0 1
consider the coordinate chart with inhomogeneous 0 1 0
coordinates – symmetric matrices z 2 Sym(k). For Q ¼@ 1 0 0 A
H1 we have the domain of symmetric matrices z 0 0 En
with the condition
Let Z = (z0 , z1 , w1 , . . . , wn ), w = u þ iv, q(s, t) = s1 t1 
Ek  zz 0; z ¼ z> s2 t2          sn tn and we consider the affine
chart Cnþ1 = {z0 = 1}. We have
This bounded realization is called Siegel’s disk. For
H2 the real form is the group of real symplectic ZQZ> ¼ 2z1 þ qðw; wÞ ¼ 0
matrices and X2 is the domain ZQZ
¼ 2<z1 þ qðw; wÞ
 >0
1 The first condition gives 2<z1 = q(v, v)  q(u, u) and
=z ¼ ðz  zÞ 0; z ¼ z>
2i then the second condition gives the final description
of complex symmetric matrices with positive ima- of the considered set in Cnw :
ginary parts; it is called Siegel’s half-plane. This is
qðv; vÞ ¼ v21  v22      v2n > 0; w ¼ u þ iv
the third class of Cartan’s domains. There are Siegel
domains of second kind connecting with the cones as the union of the future and the past tubes
of positive symmetric matrices; some of them are (T = {v1 00}). The edge Rn of these tubes (v = 0)
homogeneous, but they are never symmetric. has the structure of the Minkowski space correspond-
There are two more series of classical minimal flag ing to the form q. The parabolic subgroup is the affine
manifolds: the isotropic Grassmannians and quadrics. conformal group of the Minkowski space. It includes
They both contain the dual bounded symmetric the Poincaré group and is transitive on tubes. The
domains (Cartan’s domains of second and fourth complete group of holomorphic automorphisms of
classes correspondingly). Some of these domains in tubes G = SO(2, n) is the group of all (not only affine)
the isotropic Grassmannians admit the realizations as conformal transformations of the Minkowski space.
tubes with the cone of positive Hermitian quaternionic The complete edge of these symmetric domains in the
matrices and others as Siegel domains of the second quadric CCn is the conformal compactification of the
kind corresponding to the same cones. Minkowski space (a compact symmetric R-space with
Symmetric domains in quadrics can be realized as the compact group S(O(2)  O(n)) on which the
tube domains with the Lorentzian (light) cones. noncompact group SO(2, n) also acts).
508 Classical Groups and Homogeneous Spaces

In addition to four Cartan’s classes of classical Geometry of Isomorphisms in Small Dimensions


domains there are two exceptional symmetric
We connected above several local isomorphisms of
domains in the dimensions 27 and 16 (dual to two
complex classical groups with some geometrical
exceptional compact Hermitian symmetric spaces of
facts. Let us mention now several similar examples
these dimensions). The first of them can be realized
for real groups. We start from isomorphisms of
as the tube domain corresponding to the exceptional
symmetric cones. The cone Symþ (2) of symmetric
cone of positive Hermitian matrices with Cayley
positive matrices of second order is (linearly)
numbers of order 3 (the dimension 27) and another
isomorphic to the future light cone L(2). The
can be realized as a Siegel domain of the second
comparison of the groups of automorphisms gives
kind associated with the eight-dimensional future
the local isomorphism
tube. It is possible, using -function of self-dual
homogeneous cones, to write explicit Bergman and SLð2; RÞ ffi SOð1; 2Þ
Cauchy–Szego integral formulas.
This isomorphism corresponds also to the isomorph-
ism of two classical realizations of hyperbolic plane –
Noncompact Symmetric R-Spaces of Poincaré and Klein. Let us also mention that the
isomorphism SL(2, R) ffi SU(1, 1) corresponds to the
There are several other interesting noncompact holomorphic equivalency of the disk and the upper
symmetric manifolds. Let us mention the noncom- half-plane. The isomorphism Hermþ (2) = L(3) corres-
pact symmetric R-spaces which are real forms of ponds to the presentation of any Hermitian matrix of
complex symmetric domains. The typical example is the order 2 in Pauli’s coordinates,
the domain of real square matrices x 2 Mat(k):  
t  x1 x2 þ ix3
Ek  xx> 0 z¼
x2  ix3 t þ x1
The condition is that this symmetric matrix is Then, det z = t2  x21  x22  x23 . To compare the
positive. It is the Riemann symmetric space with groups of automorphisms, we receive
the group G = SO(k, k). It can be embedded in the
real Grassmannian GrR (k; 2k) with the matrix SLð2; CÞ ffi SOð1; 3Þ
homogeneous coordinates X 2 MatR (k, 2k) and the Similarly, in the quaternionic case, the isomorphism
group SL(2k; R) acting of X by right multiplications. of the cones Hermþ (2, H) gives the isomorphism
Let
  SLð2; HÞ ffi SOð1; 5Þ
Ek 0
I1 ¼ The linear isomorphism of cones produces the
0 Ek
holomorphic isomorphism of corresponding tubes
and SO(k, k) be the subgroup of matrices preserving and their groups of holomorphic automorphisms. So
the quadratic form I1 : gI1 g> = I1 . This group will each of these three isomorphisms gives automati-
preserve the domain XI1 X> 0 and, in the inho- cally one more isomorphism. Let us give it for the
mogeneous coordinates X = (Ek , x), x 2 MatR (k), it first two cones:
will be exactly the same as the domain above. The Spð2; RÞ ffi SOð2; 2Þ; SUð2; 2Þ ffi SOð2; 3Þ
group SO(k, k) acts by matrix fractional linear
transformations. This domain is the real form on We just compared the descriptions of automorph-
Siegel’s ball. If we replace the form on isms of classical tubes from above.
  Considering det (x) as the quadratic form of
0 Ek signature (2, 2) on Mat(2) ’ R4 , we obtain
I2 ¼
Ek 0
SOð2; 2Þ ffi SLð2; RÞ  SLð2; RÞ
then we realize our symmetric manifold as the Each of local isomorphisms in the complex case
domain has different real forms which admit some geome-
x þ x> 0 trical interpretations. We mentioned above two real
forms of the isomorphism SL(4; C) ffi SO(6; C). The
So, the symmetric part of the matrix x must be isomorphism for SO(2, 2) admits another interpreta-
positive. This realization is homogeneous relative tion in the language of Plücker’s coordinates: points
to the linear automorphisms: x 7! axa> þ b, a 2 of the quadric in RP5 of the signature (2, 3) can be
GL(k; R), b = b> . A similar construction exists interpreted as (complex) lines in CP3 which lie on a
for rectangular matrices. Hermitian quadric of the signature (2, 2) (Gindikin
Classical Groups and Homogeneous Spaces 509

1983). The isomorphism above for the group manifold of smaller dimension (which plays a role
SL(2, H) also corresponds to Hopf’s fibering of of infinity).
CP3 on complex lines over the sphere S4 or the There are pseudo-Hermitian symmetric manifolds
isomorphism S4 and the quaternionic projective line which are not satellites of Hermitian ones. Let us
HP1 . In all these cases, isomorphisms of homo- give an interesting example. The group SL(2p, R)
geneous manifolds intertwine the actions of locally has two open orbits on the Grassmannian
isomorphic groups. GrC (p; 2p) which are both pseudo-Hermitian sym-
metric spaces. Let us consider as above the Stiefel
coordinates Z 2 MatC (p, 2p) and let Z = X þ iY.
Pseudo-Riemann Symmetric Manifolds Then the orbits are defined by the conditions
 
We obtain the next broad class of homogeneous X
det 00
manifolds if we preserve conditions that the group G Y
is a real semisimple one, the isotropy subgroup H is
involutive, but we remove the restriction that H In the intersection with the coordinate chart
must be (maximal) compact. Such symmetric mani- Z = (E, z), z 2 MatC (p), z = x þ iy, we have the
folds are often called semisimple pseudo-Riemann conditions
symmetric manifolds (since there are also pseudo-
det y00
Riemann symmetric manifolds whose groups are not
semisimple). This class of spaces contains symmetric Therefore, we obtain (nonconvex) tube domains in
Stein manifolds XC = GC =HC . Each semisimple CN = MatC (p), N = p2 , corresponding to nonconvex
symmetric manifold X = G=H admits complexifica- homogeneous cones V of real matrices with
tion as a symmetric Stein manifold. Each real positive (negative) determinants. These tubes do
semisimple Lie group G is symmetric relative to not coincide with the symmetric manifolds which
the group G  G. include also some sets of small dimensions outside of
The simplest family of semisimple symmetric the coordinate chart (on ‘‘infinity’’). There are other
manifolds is the family of all hyperboloids of all homogeneous nonconvex cones such that corre-
signatures sponding tube domains are Zariski open parts of
Hp;q ¼ fx21 þ    þ x2p  x2pþ1      x2n ¼ 1g pseudo-Hermitian symmetric spaces (D’Atri and
Gindikin 1993). Between these cones are cones of
with the groups SO(p, q). Their complexifications nondegenerate skew-symmetric matrices, of skew-
are complex hyperboloids. There are two types Hermitian quaternionic matrices. We again observe
of Riemann manifolds in these families: compact strong connections with classical mathematics. Not
ones – spheres and noncompact ones – two-sheeted all pseudo-Hermitian symmetric manifolds admit
hyperboloids; all others are pseudo-Riemann. such tube realizations of dense parts. Analysis in
The Cartan duality holds for pseudo-Hermitian pseudo-Hermitian symmetric manifolds is very
symmetric manifolds: they are domains in compact interesting: we consider there instead of holo-
Hermitian symmetric manifolds (minimal flag mani- 
morphic functions @-cohomology of some degree.
folds) Z = GC =PC . They are open orbits of real Geometric relations between different symmetric
forms G of the groups of holomorphic automorph- manifolds are usually important for analytic applica-
isms GG . We construct examples of such manifolds tions since they can produce some nontrivial integral
if we consider one of the above-described realiza- transformations. In a broad sense, such transforms are
tions of noncompact Hermitian symmetric mani- considered in integral geometry (Gelfand et al. 2003).
folds (through matrix homogeneous coordinates) An important example is duality between some
and replace the condition of positivity with the compact Hermitian symmetric manifolds (when points
condition that the symmetric (Hermitian) matrix in in one of them are interpreted as submanifolds in
the definition has a fixed nondegenerate signature another one). The simplest example is the projective
(i, k  i). We can call such pseudo-Hermitian sym- duality between dual copies of projective spaces or,
metric manifolds satellites of Hermitian ones. more generally, the realization of points of Grass-
Correspondingly, we can consider nonconvex mannians as projective planes. Such a duality can
tubes, for example, the set T of such symmetric induce a duality between orbits of real forms of groups.
matrices whose imaginary parts have the signature In a special case, it can be a duality between Hermitian
(i, n  i). This domain is linear homogeneous, but it and pseudo-Hermitian symmetric manifolds.
is not symmetric; to receive the symmetric manifold Here is one important example. Let us consider in
we need to extend the nonconvex tube by a the projective space CP2k1 the domain D which in
510 Classical Groups and Homogeneous Spaces

homogeneous coordinates – rows z = (z0 , z1 , . . . , zn ) spite of the fact that this group acts neither on X
are defined by the equation zHz
> 0, where H nor on Hn . Such an extension of the symmetry
is a Hermitian form of the signature (k, k), for group is a very interesting phenomenon. It happens
example, for several other symmetric manifolds, but is not a
general fact. This geometrical construction gives a
jz0 j2 þ    þ jzk j2  jzk þ 1 j2      jzn j2 > 0 possibility to construct a multidimensional version
This domain is (k  1)-pseudoconcave and it con- of the Penrose transform from (n  2)-dimensional

@-cohomology with different coefficients into solu-
tains (k  1)-dimensional complex compact cycles,
namely (k  1)-dimensional planes. The manifold of tions of massless equations on the future (past)
these planes is exactly the domain X in the Grass- tubes.
mannian GrC (k; 2k) (of projective (k  1)-planes) The last duality is connected with some general
which is the noncompact Hermitian symmetric geometrical construction. We mentioned that each of
space – the orbit of the group SU(k, k) (see above). the Riemann symmetric manifolds X = G=K admits a
This picture is the geometrical basis for a deep canonical embedding in the symmetric Stein manifold
analytic construction. In the domain D the spaces XC = GC =KC . It turns out that X has in XC a canonical

of (k  1)-dimensional @-cohomology are infinite Stein neighborhood – the complex crown (X) such
dimensional for some coefficients. Their integration that many analytic objects on X can be holomorphi-
on (k  1)-planes (the Penrose transform) gives cally extended on the crown (Gindikin 2002). For
sections of corresponding vector bundles on X. The example, all solutions of all invariant differential
images are described by differential equations – equations on X (which are elliptic) admit such
generalized massless equations. The basic twistor holomorphic extension. In the last example, Dþ is
theory corresponds to k = 2 when X is isomorphic the crown of the Riemann symmetric space which is
to four-dimensional future tube (see above). defined, in Hn , by the condition =() = 0, <(0 ) > 0.
Similar dual realizations of Hermitian symmetric Symmetric manifolds are distinguished from most
manifolds exist only in special cases. The twistor other homogeneous manifolds by a very rich
realization of four-dimensional future tube was geometry which is a background for deep analytic
possible since the Grassmannian GrC (2; 4) is iso- considerations. There are several important nonsym-
morphic to the quadric in CP5 . This does not work metric homogeneous manifolds. We already men-
for the future tubes of bigger dimensions but there is tioned flag manifolds and Stein homogeneous
another possibility (Gindikin 1998). Let us have the manifolds with complex semisimple Lie groups
quadric Qn1  CPn be defined in the homogeneous which can be nonsymmetric. Pseudo-Riemann sym-
coordinates by the equation metric manifolds are open orbits of real groups on
compact Hermitian symmetric spaces. It turns out
&ðzÞ ¼ ðz0 Þ2  ðz1 Þ2      ðzn Þ2 ¼ 0 that open orbits on other flag manifolds also
produce interesting homogeneous manifolds. Let
and z   is the bilinear form. As already mentioned, F = GC =PC be a flag manifold. Flag domains are
the set of (nondegenerate) hyperplane sections open orbits of a real form G on F. Of course,
  z ¼ 0;  2 Cnþ1 ; &ðÞ ¼ 1 pseudo-Hermitian symmetric manifolds are a special
case of this construction. Let us consider a simple
of Qn1 is the corresponding hyperboloid Hn . Thus, example with GC = SL(3; C) and P – the triangle
we have the duality between a flag manifold (the group. Then points of F are pairs {a point z and a
quadric Qn1 ) and a symmetric Stein manifold (the line l passing through it}. Let G = SU(2; 1); it has
hyperboloid Hn ) with the same group SO(n þ 1, C); two open orbits on CP2 : the complex ball D and its
they have different dimensions. complementary DC . On F, the group G has three
The group SO(1, n) has two orbits on Qn1 : open orbits (flag domains): in the first z 2 D, l is
the real quadric QR = {z 2 Qn1 ; =(z) = 0} and its arbitrary; in the second l  DC ; in the third z 2 DC , l
complement X = Qn1 nQR . Hyperplane sections intersects D. They are all 1-pseudoconcave. In one-
which do not intersect QR (lie at X) correspond 
dimensional @-cohomology of these flag domains
such  2 Hn that with coefficients in line bundles, are realized all
three discrete series of unitary representations of
&ð<ðzÞÞ > 0
SU(2, 1). For arbitrary semisimple Lie groups, all
This set has two connected components D which discrete series of representations can also be realized
are biholomorphically equivalent to the future and 
in @-cohomology of flag domains. Crowns of
past tubes T of the dimension n. Let us emphasize Riemann symmetric spaces which we just mentioned
that their group of automorphisms is SO(2, n) in parametrize cycles (complex compact submanifolds)
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 511

in flag domains. Some general version of the Penrose Faraut J and Koranyi A (1994) Analysis on Symmetric Cones.
transform connects through the integration along Oxford: Oxford University Press.
Gelfand I, Gindikin S, and Graev M (2003) Selected Topics in
cycles cohomology in flag domains with holo- Integral Geometry. Providence, RI: American Mathematical
morphic solutions of some differential equations in Society.
crowns (generalized massless equations). Gindikin S (1983) The complex universe of Roger Penrose.
Mathematical Intellingencer 5(1): 27–35.
See also: Combinatorics: Overview; Compact Groups Gindikin S (1998) SO(1; n)-twistors. Journal of Geometry and
and their Representations; Lie Groups: General Theory; Physics 26: 26–36.
Pseudo-Riemannian Nilpotent Lie Groups; Several Gindikin S (2002) Some remarks on complex crowns of real
symmetric spaces. Acta Mathematica Applicata 73(1–2): 95–101.
Complex Variables: Compact Manifolds; Stability of
Helgasson S (1978) Differential Geometry, Lie Groups and
Minkowski Space; Symmetry Classes in Random Matrix
Symmetric Spaces. New York: Academic Press.
Theory; Twistor Theory: Some Applications; Twistors. Helgasson S (1994) Geometric Analysis on Symmetric Spaces.
Providence, RI: American Mathematical Society.
Onishchik A and Vinberg E (1993) Lie Groups and Lie Algebras I
Further Reading Foundations of Lie Theory. In: Onishchik A (ed.) Lie Groups and
Lie Algebras. Encyclopaedia of Mathematical Sciences, vol. 20.
Akhiezer D (1990) Homogeneous complex manifolds. In: Gindikin
New York: Springer.
S and Henkin G (eds.) Several Complex Variables IV, vol. 10,
Pyatetskii-Shapiro I (1969) Automorphic Functions and Geometry
Encyclopaedia of Mathematical Science. New York: Springer.
of Classical Domains. Amsterdam: Gordon and Breach.
D’Atri J and Gindikin S (1993) Siegel domain realization of
pseudo-Hermitian symmetric manifolds. Geometriae Dedicata
46: 91–126.

Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups


M A Semenov-Tian-Shansky, Steklov Institute of Multiplication in G is a Poisson mapping, if for
Mathematics, St. Petersburg, Russia, and Université any ’, 2 C1 (G), we have
de Bourgogne, Dijon, France
f’; gðxyÞ ¼ fx ’; x gðyÞ þ fy ’; y gðxÞ ½1
ª 2006 Elsevier Ltd. All rights reserved.
Note that in general, multiplicative brackets are
neither left nor right invariant; in other words, for
fixed x translation operators x , x do not preserve
Introduction
Poisson brackets.
The notion of ‘‘classical r-matrices’’ has emerged as Multiplicative Poisson brackets naturally arise in the
a by-product of the quantum inverse scattering study of integrable systems which admit the so-called
method (which was developed mainly by L D ‘‘zero-curvature representation.’’ The study of zero-
Faddeev and his team in their work at the Steklov curvature equations, and in particular, of the Poisson
Mathematical Institute in Leningrad); it has given a properties of the associated monodromy map, was the
new insight into the study of Hamiltonian structures main source of nontrivial examples (associated with
associated with classical integrable systems solvable classical r-matrices, classical Yang–Baxter equations,
by the classical inverse scattering method and its and factorizable Lie bialgebras). The special class of
generalizations. Important classification results for multiplicative Poisson brackets encountered in this
classical r-matrices are due to Belavin and Drinfeld. context is closely related to factorization problems in
Based on the initial results of Sklyanin, Drinfeld Lie groups (in particular, the matrix Riemann pro-
introduced the important concepts of ‘‘Poisson Lie blem); these problems represent the key tools in
groups’’ and ‘‘Lie bialgebras’’ which arise as a constructing solutions of zero-curvature equations.
semiclassical approximation in the study of quan- The equivalent definition of Poisson Lie groups
tum groups. uses the dual language of ‘‘Hopf algebras.’’ Let
A Poisson group is a Lie group G equipped A = F(G) be the commutative algebra of (smooth)
with a Poisson bracket such that the multiplica- functions on a Lie group G equipped with the
tion m : G  G ! G is a Poisson mapping. A standard coproduct  : A ! A  A
Poisson bracket on G with this property is called
’ðx; yÞ ¼ ’ðxyÞ; ’ 2 FðGÞ; x; y 2 G
multiplicative. More explicitly, let x , x be the
left and right translation operators in C1 (G) by as usual, we identify the (topological) tensor product
an element x 2 G, x ’(y) = ’(xy), x ’(y) = ’(yx). F(G)  F(G) with F(G  G). The multiplicative
512 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups

Poisson bracket on G equips F(G) with the structure we conclude from eqn [4] that  : g ! g ^ g is a
of a Poisson–Hopf algebra, that is 1-cocycle on g, that is,

f’; g ¼ f’;  g ½2 ð½X; YÞ ¼ ½X  I þ I  X; ðYÞ


 ½Y  I þ I  Y; ðXÞ
Equation [2] is the starting point for the study of
relations between Poisson groups and quantum Equation [4] implies that (e) = 0, that is, a multi-
groups. Following the general philosophy of defor- plicative Poisson structure is identically zero at the
mation quantization, we can look for a deformation unit element. Its linearization at this point induces
Ah of the commutative Hopf algebra A with the the structure of a Lie algebra on the cotangent space
deformation germ determined by the Poisson struc- Te G ’ g ; namely, for any , 0 2 g , choose ’, ’0 2
ture on A satisfying eqn [2]. The fundamental F(G) in such a way that re ’ = , re ’0 = 0 , and set
theorem (conjectured by Drinfeld and proved by ½; 0  ¼ re f’; ’0 g ½5
Etingof and Kazhdan) asserts that any Poisson
algebra associated with a Poisson group admits a It is easy to see that h[, 0 ] , Xi = h ^ 0 , (X)i,
formal quantization (in the category of Hopf which proves that the bracket is well defined,
algebras). while eqn [5] implies the Jacobi identity.
Definition 1 Let g, g be a pair of linear spaces set
in duality; (g, g ) is called a Lie bialgebra if both g
Poisson Groups and Lie Bialgebras and g are Lie algebras and the mapping  : g ! g 
g which is dual to the commutator map [ , ] : g 
Let G be a Lie group with Lie algebra g equipped g ! g is a 1-cocycle on g.
with a multiplicative Poisson bracket. Any Poisson
bracket is bilinear in differentials of functions; it is Thus if G is a Poisson–Lie group, the pair (g, g ) is
convenient to express it by means of right- or left- a Lie bialgebra (called the ‘‘tangent Lie bialgebra’’ of
invariant differentials. For ’ 2 F(G) set G). Poisson–Lie groups form a category in which the
morphisms are Lie group homomorphisms, which
hr’ðxÞ; Xi ¼ ðd=dtÞt¼0 ’ðetX xÞ; are also Poisson mappings. A morphism
hr0 ’ðxÞ; Xi ¼ ðd=dtÞt¼0 ’ðxetX Þ; (g, g ) V (h, h ) in the category of Lie bialgebras is
X 2 g; r’ðxÞ; r0 ’ðxÞ 2 g a Lie algebra homomorphism g ! h such that the
dual map h ! g is again a Lie algebra homo-
Let us define the Poisson operator :G ! morphism. It is easy to see that morphisms of
Hom(g , g) by setting Poisson groups induce morphisms of their tangent
bialgebras. The converse is also true.
f’; gðxÞ ¼ hðxÞr’ðxÞ; r i ½3
Theorem 1
For a finite-dimensional Lie algebra, we can identify (i) Let (g, g ) be a Lie bialgebra, G a connected,
Hom(g , g) with g  g; the skew symmetry of simply connected Lie group with Lie algebra g.
Poisson bracket implies that  2 g ^ g. By an abuse There is a unique multiplicative Poisson bracket
of language, the same identification is traditionally on G such that (g, g ) is its tangent Lie bialgebra.
used for infinite-dimensional algebras (e.g., for loop (ii) Morphisms of Lie bialgebras induce Poisson
algebras) as well. Of course, in the latter case, the mappings of the corresponding Poisson groups.
corresponding Poisson tensors are represented by
singular kernels which do not lie in the algebraic Basically, the theorem asserts that a Poisson
tensor product and should be regarded as tensor is uniquely restored from the infinitesimal
distributions. cocycle on the corresponding Lie algebra; moreover,
Multiplicativity of Poisson bracket on G implies a the obstruction for the Jacobi identity vanishes
functional equation for  globally if this is true for its infinitesimal part at
the unit element of the group.
ðxyÞ ¼ ðAd x  Ad xÞ  ðyÞ þ ðxÞ ½4 It is important to observe that Lie bialgebras
possess a remarkable symmetry: if (g, g ) is a Lie
which means that  is a 1-cocycle on G (with values bialgebra, the same is true for (g , g). Hence, the
in g ^ g). By setting dual group G (which corresponds to g ) also carries
  a multiplicative Poisson bracket. The duality theory
d for Lie bialgebras, based on the key notion of the
ðXÞ ¼ ðetX Þ; X 2 g
dt t¼0 Drinfeld double, is discussed in the next section.
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 513

3
Classical r-Matrices and Special A coboundary Lie bialgebra with [[r, r]] 2 (^ g)g
Classes of Lie Bialgebras is called ‘‘quasitriangular’’; it is called ‘‘triangular’’
if r satisfies the classical Yang–Baxter equation
The general classification problem for Lie bialgebras [[r, r]] = 0. (Both terms come from another name of
is unfeasible (e.g., classification of abelian Lie the classical Yang–Baxter equation, the ‘‘classical
bialgebras includes classification of all Lie algebras). triangle equation.’’)
In applications, one mainly deals with important When a Lie algebra g admits a nondegenerate
special classes of Lie bialgebras, of which factoriz- invariant inner product, the class of quasitriangular
able Lie bialgebras are probably the most important. Lie bialgebra structures on g admits an important
In a sense, this class may be regarded as exhaustive, specialization. Let g  g ’ g  g be the natural
since (as explained below) any Lie bialgebra is isomorphism induced by the inner product. Let I 2
canonically embedded into a factorizable one. g  g be the canonical element; its image t 2 g  g
Various other special classes discussed in literature under this isomorphism is called the ‘‘tensor
are ‘‘coboundary bialgebras,’’ ‘‘triangular bialge- Casimir element.’’ Clearly, t 2 (S2 g)g and, more-
bras,’’ and ‘‘quasitriangular bialgebras.’’ 3
over, [t12 , t23 ] 2 (^ g)g . When g is semisimple, the
The Lie bialgebra (g, g , ) is called a coboundary 3
mapping (S g) ! (^ g)g : s 7! [s12 , s23 ] is an iso-
2 g
bialgebra if the cobracket  is a trivial 1-cocycle on g, morphism; in particular, if g is simple, both spaces
that is, are one dimensional and generated by a tensor
ðXÞ ¼ ½X  I þ I  X; r for all X 2 g ½6 Casimir (which is unique up to a scalar multiple). A
Lie bialgebra (g, r) is called factorizable if r 2 g ^ g
the constant element r 2 g ^ g is called the ‘‘classical satisfies the modified classical Yang–Baxter
r-matrix.’’ If g is semisimple, H 1 (g, V) = 0 for any equation
g-module V by the classical Whitehead theorem, and
hence all Lie bialgebra structures on g are of ½r; r ¼ c½t12 ; t23 ; c ¼ const 6¼ 0 ½9
coboundary type. The associated Lie bracket on g The convenient normalization is c = 1=4 (it can be
is given by the formula achieved by an appropriate normalization of r).
½; 0  ¼ adg r  0  adg r0   ½7 Instead of dealing with the modified Yang–Baxter
equation, we may relax the antisymmetry condition
where we identified r 2 g ^ g with a skew-symmetric imposed on r. Set r = r  (1=2)t 2 g  g. Since t
linear operator r : g ! g. The restrictions imposed is ad g-invariant, the symmetric part of r drops
on r by the Jacobi identity are formulated in terms out from the cobracket; on the other hand, one
of the so-called ‘‘Yang–Baxter tensor’’ [[r, r]] 2 g ^ has [[r , r ]] = 0. Regarding r as a linear operator,
g ^ g, which is a quadratic expression in r. To define r 2 Hom(g , g), we get the following important
it, let us mark different factors in tensor products, result:
for example, g  g  g, by fixed numbers 1, 2, 3, . . .
Proposition 2 Let (g, g ) be a factorizable Lie
which indicate their place; for simplicity, we assume
bialgebra.
that g is embedded in an associative algebra A with a
unit. The embeddings are defined as (i) The mappings r 2 Hom(g , g) are Lie algebra
homomorphisms; moreover, rþ = r .
i12 ; i23 ; i13 : g  g ! A  A  A (ii) The combined mapping
by setting i12 (X  Y)=X  Y  I, and similarly ir : g ! g g : X 7! ðrþ X; r XÞ
in other cases. For a 2 g  g, we put i12 (a) = a12 ,
etc. Set is a Lie algebra embedding.
(iii) Any X 2 g admits a unique decomposition
½½r; r ¼ ½r12 ; r13  þ ½r12 ; r23  þ ½r13 ; r23  ½8 X = Xþ  X with (Xþ , X ) 2 Im ir .
The commutators in the RHS are computed in the The additive decomposition in a factorizable Lie
associative algebra A  A  A; it is easy to check bialgebra gives rise to a multiplicative factorization
that the result does not depend on the choice of the problem in the associated Lie group. Namely, ir may
embedding g ,! A. be extended to a Lie group embedding ir : G ! G 
Proposition 1 The Jacobi identity for [ , ] is valid if G and any x 2 G, which is sufficiently close to the
and only if [[r, r]] is ad g-invariant, that is, if unit element, admits a decomposition x = xþ x1 
with (xþ , x ) 2 Im ir .
½X  I  I þ I  X  I þ I  I  X; ½½r ¼ 0 Any Lie bialgebra (g, g ) admits a canonical
for all X 2 g embedding into a larger Lie bialgebra (called its
514 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups

‘‘double’’) which is already factorizable. Namely, set triple. Hence, any compact semisimple Lie group
d = g g as a linear space and equip it with the K carries a natural Poisson structure; its double
natural inner product, G = D(K) is the complex group G = KC (regarded
as a real Lie group). The associated factorization
hhðX; FÞ; ðX0 ; F0 Þii ¼ hF; X0 i þ hF0 ; Xi ½10
problem in G is the Iwasawa decomposition
G = KAN, which exists globally.
Theorem 2 2. Let g be a real split semisimple Lie algebra, h its
(i) There exists a unique structure of the Lie algebra Cartan subalgebra, and þ a system of positive
on d such that: (a) g, g
d are Lie subalgebras. roots. Fix an invariant inner product on g which
(b) The inner product [10] is invariant. is positive on h, and let {e ;  2 þ } be the root
(ii) Let Pg , Pg be the projection operators onto vectors normalized in such a way that
g, g
d parallel to the complementary sub- (e , e ) = 1. Let
algebra. Set rdþ = Pg , rd = Pg ; then (d, rd ) is a M
n ¼ R  e
factorizable Lie bialgebra. 2þ
(iii) The inclusion map (g, g ) V (d, d ) is a homo-
morphism of Lie bialgebras and the dual inclusion Fix an orthonormal basis {Hi } in h; let P , P0
map (g , g) V (d, d ) is an antihomomorphism. be the projection operators onto n , h in the
Bruhat decomposition g = n . þh. þnþ . The
Conversely, let a be a Lie algebra equipped with a standard Lie bialgebra structure on g is given
nondegenerate invariant inner product, a
a its Lie by the r-matrices r = P  12 P0 . In tensor
subalgebras such that (i) a are isotropic with respect notation,
to inner product, (ii) a = aþ. þ a as a linear space.
The triple (a, aþ , a ) is called a ‘‘Manin triple.’’ Let X 1X
r ¼  e ^ e  Hi  Hi ½11
P be the projection operators onto a in this 2þ
2 i
decomposition. Set r = P . Then (a, r ) is a
factorizable Lie bialgebra; moreover, aþ and a are Let b = h. þn be the opposite Borel subalge-
set into duality by the inner product in a and inherit bras; the inner product in g sets them into
the structure of a Lie bialgebra, and a is their double. duality, and (bþ , b ) is a Lie sub-bialgebra
If (g, g ) is itself a factorizable Lie bialgebra, its in (g, g ). Let G be the connected, simply
double admits a simple explicit description. Set connected Lie group associated with g, B =
d = g g (direct sum of Lie algebras); let us equip HN its opposite Borel subgroups which corres-
d with the inner product pond to b . Let p : B ! B =N ’ H be the
canonical projection. The associated factoriza-
hhðX; X0 Þ; ðY; Y 0 Þii ¼ hX; Yi  hY; Y 0 i tion problem in G, g = bþ b1  , (bþ , b ) 2 Bþ 
B , p(bþ ) = p(b )1 , is closely related to the
Let g
d be the diagonal subalgebra; we identify Bruhat decomposition; it is solvable for all g in
g with the embedded subalgebra ir (g )
d. the open Bruhat cell Bþ N
G.
Proposition 3 3. Let Lg = g  C((z)) be the loop algebra of a finite
dimensional semisimple Lie algebra g, as usual we
(i) (d, g , ir (g )) is a Manin triple. denote the ring of formal Laurent series by C((z)).
(ii) As a Lie algebra, d = g g is isomorphic to the Put Lgþ = g  C[[z]], Lg = g  z1 C[z1 ]. Fix an
double of g. invariant inner product on g and equip Lg with
Key examples of factorizable Lie bialgebras are the inner product
associated with semisimple Lie algebras and their hhX; Yii ¼ Resz¼0 hXðzÞ; YðzÞi dz
loop algebras.
Then (Lg, Lgþ , Lg ) is a Manin triple. The associa-
1. Let k be a compact semisimple Lie algebra: g = kC ted classical r-matrix is called ‘‘rational r-matrix’’; in
its complexification regarded as a real Lie algebra, tensor notation, it is represented by a singular kernel
 2 Aut g the Cartan involution which fixes k, and
t
g = k p the associated Cartan decomposition. rðz; z0 Þ ¼
z  z0
Fix a real split Cartan subalgebra a
p and the
associated Iwasawa decomposition g = k. þa. þn; where t 2 g  g is the tensor Casimir, which is
put s = a. þn. Let B be the complex Killing form essentially the Cauchy kernel.
on g; let us equip g with the real inner product 4. Let us assume that g = sl(n); in this case, the loop
(X, Y) = Im B(X, Y), then (g, k, s) is a Manin algebra Lg admits a nontrivial decomposition
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 515

associated with the so-called ‘‘elliptic r-matrix.’’ Drinfeld have given a complete classification of
Set factorizable Lie bialgebra structures for semisimple
Lie algebras; in the loop algebra case, the problem they
I1 ¼ diagð1; "; . . . ; "n1 Þ; solved consists of classification of all meromorphic
0 1 solutions of the classical Yang–Baxter equation. In
0 1 ... 0
B C other words, we assume that the distribution kernel
B 0 1 C
B. associated with the classical r-matrix is represented by
B. .. .. C
C
½12
I2 ¼ B . . . C; " ¼ e2 i=n a meromorphic function (of two complex variables).
B .. C Up to an equivalence, any such solution depends
B C
@ . 1A only on one variable and belongs to the rational,
1 0 ... 0 trigonometric, or elliptic type (in the latter case, the
underlying Lie algebra is necessarily sl(n)). Classifi-
Put Z2n = Z=n Z  Z=n Z; for a = (a1 , a2 ) 2 Z2n , cation of solutions in the elliptic case is completely
set Ia = I1a1 I2a2 ; matrices Ia define an irreducible rigid; in the trigonometric case, the moduli space is
projective representation of Z2n (they form the so- finite dimensional and admits an explicit descrip-
called ‘‘finite Heisenberg group)’’. Let us denote tion. In the rational case, the classification is
the elliptic curve of modulus
by E = C=Z þ
Z somewhat less explicit (it has been completed by A
and let P ! E be the n-dimensional holomorphic Stolin under some nondegeneracy condition). Con-
vector bundle with flat connection and with trary to to the popular belief, there are many other
monodromies given by structures of a factorizable Lie bialgebra on loop
algebras, for which the associated r-matrices are
z 7! z þ 1 : h1 ¼ Ad I1 ; z 7! z þ
: h2 ¼ Ad I2 given by more singular distribution kernels.
Let GE
Lg be the subspace of Laurent expansions
at zero of the global meromorphic sections of P
with a unique pole at 0 2 E. Then (Lg, Lgþ , GE ) is Poisson Lie Groups
again a Manin triple. The associated classical
If the tangent Lie bialgebra of a Poisson Lie group is
r-matrix is the kernel of a singular integral operator
of coboundary type, the cocycle  is also trivial,
which associates a meromorphic section of P to its
(g) = r  Ad g  Ad g  r. Hence, the Poisson
principal part at 0. Explicitly, it is given by
bracket on G is given by
n1  
0 1X z  z0 f’; g ¼ hr; r0 ’ ^ r0 i  hr; r’ ^ r i; r2g^g
rðz  z Þ ¼  a  b

n a;b¼0 n ½13
where r’, r0 ’ 2 g are left and right differentials of
 ðAd Ia;b  IÞ  t ’ 2 C1 (G). This is the so-called ‘‘Sklyanin bracket’’.
Let us assume that G is a matrix group; its affine
where is the Weierstrass zeta function.
ring generated by evaluation functions ij which
5. Let g be an arbitrary semisimple Lie algebra
assign to L 2 G its matrix coefficients, ij (L) = Lij .
again. Let us equip the loop algebra Lg with the
The Poisson bracket on G is completely determined
inner product
by its values on ij . Explicitly, we get
hhX; Yii0 ¼ Resz¼0 hXðzÞ; YðzÞiz1 dz  
ij ; km ðLÞ ¼ ½r; L  Likjm ½14
Set N þ = nþ þ _ g  zC[[z]], N  = n þ _ g 
the commutator in the RHS is in Mat(n2 ). By a
z1 C[z1 ]. We have Lg = N þ þ _ hþ
_ N  , where
variation of language, evaluating functions and their
we identify h, n
g with the corresponding
values on a generic element L 2 G are denoted by
subalgebras of constant loops in Lg. Let P , P0
the same letter; using tensor notation to suppress
be the projection operators onto N  , h in this
matrix indices, we get
decomposition and r =  P  (1=2)P0 . The
classical r-matrices r define on Lg the structure fL1 ; L2 g ¼ ½r; L1 L2 ; L1 ¼ L  I; L2 ¼ I  L ½15
of a factorizable Lie bialgebra. The associated
In the case of loop algebras, these Poisson bracket
tensor kernels are called the trigonometric classi-
relations take the form
cal r-matrices.
fL1 ðÞ; L2 ð Þg ¼ ½rð; Þ; L1 ðÞL2 ð Þ
Classical r-matrices described above are associated
with factorization problems in the infinite-dimensional Let us assume that G is factorizable and the
loop groups: matrix Riemann problems or matrix associated factorization problem is globally solvable.
Cousin problems (in the elliptic case). Belavin and The Poisson bracket on the dual group G ’
516 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups

ir (G )
G  G may be characterized in terms of the the Poisson structure. Moreover, the maps , 0 and
matrix coefficients of (hþ , h ) = ir (h), or of their p, p0 form the so-called ‘‘dual pairs’’, that is, the
quotient h = hþ h1
 . Explicitly, we get algebras of functions which are constant on the fibers
     1 2   of and 0 (or of p and p0 ) are mutual centralizers of
h1 ; h2 
¼ r; h1 h2 ; hþ ; h  ¼ rþ ; h1þ h2 ½16 one another in the big Poisson algebra F(Dþ ).
Since D = G  G = G  G, we have G =D ’ G,
fh1 ; h2 g ¼ rh1 h2 þ h1 h2 r  h2 rþ h1  h1 r h2 ; G=D ’ G ; it is easy to check that the quotient
½17 Poisson structure induced on G, G coincides with
r ¼ 12 ðrþ þ r Þ
the original one. Applying the fundamental theorem
on dual pairs of Poisson mappings (going back to S.
The key question in the geometry of Poisson
Lie), we conclude that symplectic leaves in G and G ,
groups consists in description of symplectic leaves in
respectively, coincide with the orbits of G (respec-
G, G . This question is already nontrivial when G is
tively, G) in these quotient spaces. The actions G 
abelian (and hence may be identified with the dual of
G ! G , G  G ! G are called ‘‘dressing transfor-
the Lie algebra g = Lie(G)). The Poisson bracket on
mations’’. Unit elements in G and G are fixed points
g is linear; this is the well-known Lie–Poisson (alias,
of dressing transformations; their linearizations at the
Beresin–Kirillov–Kostant) bracket. Its symplectic
tangent spaces Te G ’ g , Te G ’ g coincide with the
leaves coincide with the orbits of the coadjoint
coadjoint actions of G and G , respectively.
representation of G in g . The natural way to prove
When D 6¼ G  G (i.e., the factorization problem in
this fundamental result (which goes back to Lie) is to
D is not always solvable), dressing actions are still well
consider first the natural action of G on the
defined as global transformations of the quotient
cotangent bundle T  G ’ G  g ; this action is
spaces; in this case G, G may be identified with open
Hamiltonian, and the coadjoint orbits arise as a
cells in D=G , D=G, respectively, which means that
result of Hamiltonian reduction associated with this
dressing action on G, G is, in general, incomplete.
action. The generalization of the theory of coadjoint
If the group G is factorizable, symplectic leaves in the
orbits to the case of arbitrary Poisson groups starts
dual group G admit a nice uniform description: since
with the notion of symplectic double, which is the
in this case D = G  G and G
D is the diagonal
nonlinear analog of the cotangent bundle.
subgroup, the quotient D=G may be modeled on G
Let D be the double of (G, G ); assume for
itself. The quotient Poisson bracket in this realization
simplicity that D = G  G globally and hence the
coincides with [17], while the dressing action coin-
associated factorization problem is always solvable.
cides with conjugation in G (and is independent of
Let rd = (1=2)(Pg  Pg ). Set
r). Hence, symplectic leaves in D/G coincide with
f’; g ¼ hrd r’; r i  hrd r0 ’; r0 i ½18 conjugacy classes in G; the equivalence of this model
with G (equipped with the bracket [16]) is provided
The bracket { , } is the usual Sklyanin bracket which by the factorization map. The description of sym-
defines the structure of a Poisson group on D, while plectic leaves in G is more subtle (and already
{ , }þ is nondegenerate and defines a symplectic crucially depends on the choice of r!); for semisimple
structure on D. Let us denote the copies of D equipped Lie groups with the standard Poisson structure, it is
with the bracket { , } by D . The bracket on Dþ is not related to the geometry of double Bruhat cells.
multiplicative, but it is covariant with respect to the For loop groups with rational, trigonometric, or
action of D by left and right translations; in other elliptic r-matrices, dressing action is associated with
words, the natural mappings D  Dþ ! Dþ and auxiliary factorization problems in the loop group.
Dþ  D ! Dþ , associated with multiplication in D, Roughly speaking, symplectic leaves correspond to
preserve Poisson brackets. Since G,G
D are rational loops with prescribed singularities. Many
Poisson subgroups, natural actions G  Dþ ! Dþ important examples have been described in connection
and G  Dþ ! Dþ by left and right translations are with integrable lattice systems, although a complete
Poisson mappings. Consider the natural projections classification theorem is still not available. For
Dþ Dþ g = sl(2), the elliptic Manin triple described earlier
. & 0 p. &p0 leads to the Poisson structure on the group of ‘‘elliptic
loops’’ with values in SL(2); its simplest symplectic
G ’ D=G GnD ’ G G ’ D=G G nD ’ G
leaves (corresponding to loops with simple poles) are
onto the space of left and right coset classes. It is easy associated with a remarkable Poisson algebra, the
to see that functions on Dþ which are constant on each Sklyanin algebra (with four generators and two
projection fiber are closed with respect to the Poisson Casimir functions), which admits an interesting
bracket. This means that the quotient spaces inherit explicit quantization.
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 517

Dressing action is a nontrivial example of a linear operators). Equations [19] give the compat-
Poisson group action. In general, such actions are ibility conditions for the auxiliary linear system
not Hamiltonian in the usual sense; the appropriate
d m
generalization is provided by the notion of the mþ1 ¼ Lm m; ¼ Mm m; m 2 Z ½20
nonabelian moment map. Let G  M ! M be an dt
action of a Poisson group G on a Poisson manifold The use of finite-difference operators associated with
M, g ! Vect M, the associated homomorphism of a one-dimensional lattice, as in [20], is particularly
Lie algebras. A mapping : M ! G is called the well suited for the study of ‘‘multiparticle’’ lattice
nonabelian moment map associated with this action, models. Let we assume that the ‘‘potential’’ Lm in [20]
if for any X 2 g and ’ 2 F(M), we have is periodic, LmþN = Lm ; the period N may be
interpreted as the number of copies of an ‘‘elemen-
X  ’ ¼ h 1 f ; ’gM ; Xi tary’’ system. It is natural to presume that ‘‘Lax
In this case, G  M ! M is a fortiori a Poisson matrices’’ Lm in [19] are elements of a matrix Lie
map. Both dressing actions G  G ! G and G  group G (or of a loop group, if they depend on an
G ! G admit nonabelian moment maps, which are extra parameter). The auxiliary linear problem [20]
just the identity maps = idG and  = idG . For leads to a family of dynamical systems on GN which
compact Poisson groups, the nonabelian moment remain integrable for any N. Let T : GN ! G be the
map has good convexity properties, which general- ‘‘monodromy map’’ which assigns to the set
ize the convexity properties of the ordinary moment L1 , . . . , LN of local Lax matrices their ordered
map for Hamiltonian group actions. product TL = LN LN1    L1 . Let us assume that G is
The general theory of homogeneous Poisson spaces equipped with the Sklyanin bracket associated with a
has some peculiarities. Typically, the G-covariant factorizable r-matrix r. Then T is a Poisson map. Let
Poisson structure on a given homogeneous space is I(G) be the algebra of central functions on G; for ’ 2
not unique (when it exists); this is true already for I(G), set H’ = ’ T. All functions H’ , ’ 2 I(G) are
principal homogeneous spaces (a simple example is in involution with respect to the product Poisson
provided by the symplectic double Dþ ). Let G be a bracket on GN and give rise to lattice zero-curvature
Poisson Lie group, (g, g ) its tangent Lie bialgebra, d equations of the same form as [19]; for a given ’, we
its double, U its Lie subgroup, u = Lie U. A subalgebra may choose the M-matrix in either of the two forms:
l
d is called Lagrangian if it is isotropic with respect  Y
M m ¼ r
1
m r’ðTL Þ m ; m ¼ Lk
to the canonical inner product in d. The general
1 k m
classification result, according to Drinfeld, asserts that
there is a bijection between G-covariant Poisson Let Lm (t), m = 1, . . . , N, be the integral curve of
structures on G=U and the set of all Lagrangian this equation which starts at L0m . The construction of
subalgebras l
d such that l \ g = u. Various non- this curve reduces to the factorization problem asso-
trivial examples arise, notably in the study of integr- ciated with the chosen r-matrix. Explicitly, we get
able systems. For instance, the geometric proof of the
factorization theorem for lattice zero-curvature equa- Lm ðtÞ ¼ gmþ1 ðtÞ1 0 1 0
þ Lm gm ðtÞþ ¼ gmþ1 ðtÞ Lm gm ðtÞ
tion, which is stated in the following section, uses a where (gm (t)þ , gm (t) ) is the curve in G which
different Poisson structure on the double (the so-called solves the factorization problem
‘‘twisted symplectic double).’’
gm ðtÞþ gm ðtÞ1
 ¼
0
m expðtr’ðTðL0 ÞÞÞ 0 1
m ;
0 0
m ¼ m ðL Þ
Applications to Integrable Systems
This result exhibits the double role of the r-matrix.
The definition of Poisson–Lie groups was motivated On the one hand, it serves to define the Poisson
by key examples which arise in the theory of structure on GN which is adapted to the study of
integrable systems. In applications, one often deals lattice zero-curvature equations; in particular, the
with nonlinear differential equations which may be dynamical flow associated with these equations is
written in the form of the so-called ‘‘lattice zero automatically confined to symplectic leaves in GN .
curvature equations’’ (In applications, G is usually a loop group equipped
with a factorizable r-matrix; despite the fact that
dLm
¼ Lm Mm  Mmþ1 Lm ; m2Z ½19 dim G = 1, it admits plenty finite-dimensional sym-
dt plectic leaves.) In its second incarnation, the r-matrix
where Lm , Mm are matrices, possibly depending on serves to define the factorization problem which
an additional parameter (or, more generally, abstract solves these zero-curvature equations. In the loop
518 Clifford Algebras and Their Representations

group case, this is a matrix Riemann problem; its 1998, Classic Reviews in Mathematics and Mathematical
explicit solution is based on the study of the spectral Physics, vol. 1. Amsterdam: Harwood Academic Publishers.
Chari V and Pressley A (1995) A Guide to Quantum Groups.
curve associated with the ‘‘monodromy matrix’’ TL Cambridge: Cambridge University Press.
and uses the technique of algebraic geometry. Drinfeld VG (1987) Quantum groups. In: Proceedings of the
The monodromy map T : GN ! G may be regarded International Congress of Mathematicians, (Berkeley, Calif.,
as a nonabelian moment map associated with an 1986) vol. 1, pp. 798–820. Providence, RI: American
action of the dual Lie algebra g on the phase space. Mathematical Society.
Etingof P and Schiffman O (1998) Lectures on Quantum Groups.
This action actually extends to an action of the (local) Boston: International Press.
Lie group G which transforms solutions into solu- Frenkel E, Reshetikhin N, and Semenov-Tian-Shansky MA (1998)
tions again. This is the prototype ‘‘dressing’’ action Drinfeld–Sokolov reduction for difference operators and
(originally defined by Zakharov and Shabat in their deformations of W-algebras. I. The case of Virasoro algebra.
study of zero-curvature equations related to Riemann– Communications in Mathematical Physics 192(3): 605–629.
Lu J-H (1991) Momentum mappings and reduction of Poisson
Hilbert problems). Dressing provides an effective tool actions. Symplectic Geometry, Groupoids, and Integrable Sys-
to produce new solutions of zero-curvature equations tems (Berkeley, CA, 1989), Mathematical de Sciences Research
from the ‘‘trivial’’ ones; it was also the first nontrivial Institute Publications vol. 20: 209–226. New York: Springer.
example of a Poisson group action. Lu J-H and Weinstein A (1990) Poisson–Lie groups, dressing
transformations, and Bruhat decompositions. Journal of
See also: Affine Quantum Groups; Bicrossproduct Differential Geometry 31(2): 501–526.
Reshetikhin N (2000) Characteristic systems on Poisson–Lie
Hopf Algebras and Noncommutative Spacetime;
groups and their quantization. In: Integrable Systems:
Bi-Hamiltonian Methods in Soliton Theory; Deformations
From Classical to Quantum (Montréal, QC, 1999), CRM
of the Poisson Bracket on a Symplectic Manifold; Proceedings Lecture Notes, vol. 26, pp. 165–188. Providence,
Functional Equations and Integrable Systems; RI: American Mathematical Society.
Hamiltonian Fluid Dynamics; Hopf Algebras and Reshetikhin NY and Semenov-Tian-Shansky MA (1990) Central
q-Deformation Quantum Groups; Integrable Systems extensions of quantum current groups. Letters in Mathema-
and Recursion Operators on Symplectic and Jacobi tical Physics 19(2): 133–142.
Manifolds; Integrable Systems: Overview; Lie, Symplectic Reyman AG and Semenov-Tian-Shansky MA (1994) Group-
and Poisson Groupoids, and their Lie Algebroids; Multi- theoretical methods in the theory of finite-dimensional integrable
Hamiltonian Systems; Poisson Reduction; Recursion systems. In: Encyclopaedia of Mathematical Sciences, Dynamical
Systems VII, ch. 2, vol. 16, pp. 116–225. Berlin: Springer.
Operators in Classical Mechanics; Toda Lattices;
Semenov-Tian-Shansky MA (1994) Lectures on R-matrices,
Yang–Baxter Equations.
Poisson–Lie groups and integrable systems. In: Babelon O,
Cartier P, and Kosmann-Schwarzbach Y (eds.) Lectures on
Integrable Systems (Sophia-Antipolis, 1991), pp. 269–317.
Further Reading River Edge: World Scientific.
Terng C-L and Uhlenbeck K (1998) Poisson actions and scattering
Babelon O, Bernard D, and Talon M (2003) Introduction to Classical theory for integrable systems. In: Surveys in Differential Geome-
Integrable Systems. Cambridge: Cambridge University Press. try: Integrable Systems, pp. 315–402. Lectures on geometry and
Belavin AA and Drinfel’d VG (1984) Triangle equations and simple topology, sponsored by Lehigh University’s Journal of Differential
Lie algebras. In: Mathematical physics reviews, vol. 4, Soviet Geometry. A supplement to the Journal of Differential Geometry.
Scientific Reviews Section C Mathematical Physics Reviews, Edited by Chuu Lian Terng and Karen Uhlenbeck. Surveys in
pp. 93–165. Chur: Harwood Academic Publishers, Reprinted in Differential Geometry IV, Boston: International Press.

Clifford Algebras and Their Representations


A Trautman, Warsaw University, Warsaw, Poland Euclidean space. Cartan discovered representations of
ª 2006 Elsevier Ltd. All rights reserved.
the Lie algebras son (C) and son (R), n > 2, that do
not lift to representations of the orthogonal groups.
In physics, Clifford algebras and spinors appear for
the first time in Pauli’s nonrelativistic theory of the
Introduction ‘‘magnetic electron.’’ Dirac (1928), in his work on the
relativistic wave equation of the electron, introduced
Introductory and Historical Remarks
matrices that provide a representation of the Clifford
Clifford (1878) introduced his ‘‘geometric algebras’’ algebra of Minkowski space. Brauer and Weyl (1935)
as a generalization of Grassmann algebras, complex connected the Clifford and Dirac ideas with Cartan’s
numbers, and quaternions. Lipschitz (1886) was the spinorial representations of Lie algebras; they found,
first to define groups constructed from ‘‘Clifford in any number of dimensions, the spinorial, projective
numbers’’ and use them to represent rotations in a representations of the orthogonal groups.
Clifford Algebras and Their Representations 519

Clifford algebras and spinors are implicit in every s 2 S1 and ! 2 S2 . If S1 and S2 are complex
Euclid’s solution of the Pythagorean equation x2  vector spaces, then a map f : S1 ! S2 is said to be
y2 þ z2 = 0, which is equivalent to semilinear if it is R-linear and f (is) = if (s). The
! ! complex conjugate of a finite-dimensional complex
yx z p vector space S is the complex vector space S of all
=2 ðp qÞ ½1
z yþx q semilinear maps from S to C. There is a natural
semilinear isomorphism (complex conjugation) S !  S,
and gives x = q2  p2 , y = p2 þ q2 , z = 2pq. If the s 7! s such that h!, si = hs, !i for every ! 2 S .
numbers appearing in [1] are real, then this equation The space S can be identified with S and then s = s.
can be interpreted as providing a representation of a The spaces (S) and S are identified. If f : S1 ! S2
vector (x, y, z) 2 R 3 , null with respect to a quadratic is a complex-linear map, then there is the complex-
form of signature (1, 2), as the ‘‘square’’ of a spinor conjugate map f : S1 ! S2 given by f (s) = f (s) and
def  
(p, q) 2 R2 . The pure spinors of Cartan (1938) the Hermitian conjugate map f y ¼ f : S1 ! S2 .

provide a generalization of this observation to A linear map A : S ! S such that A = A is said to
y

higher dimensions. be Hermitian. K(N) denotes, for K = R, C or H, the


Multiplying the square matrix in [1] on the left by set of all N by N matrices with elements in K.
a real, 2  2 unimodular matrix, on the right by its
transpose, and taking the determinant, one arrives at
the exact sequence of group homomorphisms: Real, Complex, and Quaternionic Structures
A real structure on a complex vector space S is a
1 ! Z2 ! SL2 ðRÞ = Spin01;2 ! SO01;2 ! 1
complex-linear map C : S ! S such that CC  = idS .
Multiplying the same matrix by A vector s 2 S is said to be real if s = C(s). The set of
all real vectors is a real vector space; its real
!
0 1 dimension is the same as the complex dimension of S.
"= ½2 A complex-linear map C : S ! S such that
1 0 
CC =  idS defines on S a quaternionic structure; a
on the left and computing the square of the product, necessary condition for such a structure to exist is
one obtains that the complex dimension m of S be even, m = 2n,
n 2 N. The space S with a quaternionic structure
!2 ! can be made into a right vector space over the field
z xþy 1 0
= ðx2  y2 þ z2 Þ H of quaternions. In the context of quaternions, it is
xy z 0 1 convenient to represent the imaginary unit of C as
p ffiffiffiffiffiffiffi
1. Multiplication on the right by the quaternion
This equation is an illustration of the idea of
unit
p ffiffiffiffiffiffiffii is realized as the multiplication (on the left) by
representing a quadratic form as the square of a
1. If j and k = ij are the other two quaternion
linear form in a Clifford algebra. Replacing y by iy,  s) and sk = sij.
units and s 2 S, then one puts sj = C(
one arrives at complex spinors, the Pauli matrices,
A real vector space S can be complexified by
! ! forming the tensor product C R S = S  iS.
0 1 1 0
x = ; y = i"; z = The realification of a complex vector space S is the
1 0 0 1 real vector space having S as its set of vectors so that
dimR S = 2 dimC S. The complexification of a realifica-
Spin3 = SU2 , etc.
tion of S is the ‘‘double’’ S  S of the original space.
This article reviews Clifford algebras, the asso-
ciated groups, and their representations, for quad-
ratic spaces over complex or real numbers. These Inner-Product Spaces and Their Groups
notions have been generalized by Chevalley (1954)
to quadratic spaces over arbitrary number fields. Definitions: quadratic and symplectic spaces A
bilinear map B : S  S ! K on a vector space S over
K is said to make S into an inner-product space. To
Notation save on notation, one also writes B : S ! S so that
If S is a vector space over K = R or C, then S hs, B(t)i = B(s, t) for all s, t 2 S. The group of
denotes its dual, that is, the vector space over K automorphisms of an inner-product space,
of all K-linear maps from S to K. The value of ! 2 AutðS; BÞ = fR 2 GLðSÞjR  B  R = Bg
S on s 2 S is sometimes written as hs, !i.
The transpose of a linear map f : S1 ! S2 is the is a Lie subgroup of the general linear group GL(S).
map f  : S2 ! S1 defined by hs, f  (!)i = hf (s), !i for An inner-product space (S, B) is said here to be
520 Clifford Algebras and Their Representations

quadratic (resp., symplectic) if B is symmetric (resp., chosen in V so that, defining g = g(e , e ), one
antisymmetric and nonsingular). A quadratic space is has g = (1)þ1 and, if  6¼ , then g = 0.

characterized by its quadratic form s 7! B(s, s). For If A : S ! S is a Hermitian isomorphism, then

K = C, a Hermitian map A : S !  S defines a there is a (pseudo)unitary frame (e ) in S such that
Hermitian scalar product A(s, t) = hs, A(t)i. the matrix A  = A(e , e ) is diagonal, has p 1’s
An orthogonal space is defined here as a quadratic and q 1’s on the diagonal, p þ q = dim S. If p = q,
space (S, B) such that B : S ! S is an isomorphism. then A is said to be neutral. A is definite if either p
The group of automorphisms of an orthogonal space or q = 0.
is the orthogonal group O(S, B). The group of
automorphisms of a symplectic space is the sym-
plectic group Sp(S, B). The dimension of a symplec-
tic space is even. If S = K2n is a symplectic space Algebras
over K = R or C, then its symplectic group is Definitions An algebra over K is a vector space A
denoted by Sp2n (K). Two quaternionic symplectic over K with a bilinear map A  A ! A, (a, b) 7! ab,
groups appear in the list of spin groups of low- which is distributive with respect to addition.
dimensional spaces: The algebra is associative if (ab)c = a(bc) holds for
all a, b, c 2 A. It is commutative if ab = ba for all
Sp2 ðHÞ = fa 2 Hð2Þ j ay a = Ig
a, b 2 A. An element 1A is the unit of A if
and 1A a = a1A = a holds for every a 2 A.
From now on, unless otherwise specified, the bare
Sp1;1 ðHÞ = fa 2 Hð2Þ j ay z a = z g word algebra denotes a finite-dimensional, associa-
tive algebra over K = R or C, with a unit element.
Here ay denotes the matrix obtained from a by
If S is an N-dimensional vector space over K, then the
transposition and quaternionic conjugation.
set End S of all endomorphisms of S is an N2-
dimensional algebra over K, the product being
defined by composition; if f , g 2 End S, then one
Contractions, frames, and orthogonality From now writes fg instead of f g; the unit of End S is
on, unless otherwise specified, (V, g) is a quadratic the identity map I. By definition, homomorphisms
space of dimension m. Let ^V = m p
p = 0 ^ V be its of algebras map units into units. The map K ! A,
exterior (Grassmann) algebra. For every v 2 V and a 7! a1A is injective and one identifies K with its
w 2 ^V there is the contraction gðvÞcw characterized image in A by this map so that the unit can be
as follows. The map V  ^V ! ^V, ðv, wÞ 7! represented by 1 2 K A. A set B A is said to
gðvÞcw, is bilinear; if x 2 ^p V, then gðvÞcðx ^ wÞ = generate A if every element of A can be represented
ðgðvÞcxÞ ^ w þ ð1Þp x ^ ðgðvÞcwÞ and gðvÞcv= gðv, vÞ. as a linear combination of products of elements of B.
A frame (e ) in a quadratic space (V, g) is said to For example, if V is a vector space over K, then its
be a quadratic frame if  6¼  implies g(e , e ) = 0. tensor algebra
For every subset W of V there is the orthogonal
p
subspace W ? containing all vectors that are ortho- T ðVÞ = 1
p=0  V
gonal to every element of W.
If (V, g) is a real orthogonal space, then there is an is an (infinite-dimensional) algebra over K generated
orthonormal frame (e ),  = 1, . . . , m, in V such that by K  V. The algebra of all N  N matrices
k frame vectors have squares equal to 1, l frame with entries in an algebra A is denoted by A(N).
vectors have squares equal to 1 and k þ l = m. The Its unit element is the unit matrix I. In particular,
pair (k, l) is the signature of g. The quadratic form g R(N), C(N), and H(N) are algebras over R. The
is said to be neutral if the orthogonal space (V, g) algebra R(2) is generated by the set fx , z g. As a
admits two maximal totally null subspaces W and vector space, the algebra R(2) is spanned by the set
W 0 such that V = W  W 0 . Such a space V is 2n- fI, x , ", z g.
dimensional, either complex or real with g of The direct sum A  B of the algebras A and B
signature (n, n). A Lorentzian space has maximal over K is an algebra over K such that its underlying
totally null subspaces of dimension 1 and a vector space is A  B and the product is defined by
Euclidean space, characterized by a definite quad- (a, b)
(a0 , b0 ) = (aa0 , bb0 ) for every a, a0 2 A and
ratic form, has no null subspaces. The Minkowski b, b0 2 B. Similarly, the product in the tensor
space is a Lorentzian space of dimension 4. product algebra A K B is defined by
If (V, g) is a complex orthogonal space, then an
orthonormal frame (e ),  = 1, . . . , m, can be ða  bÞ
ða0  b0 Þ = aa0  bb0 ½3
Clifford Algebras and Their Representations 521

For example, if A is an algebra over R, then the isomorphism, then the representations 1 and 2 are
tensor product algebra R(N) R A is isomorphic to said to be equivalent, 1 2 . The following two
A(N) and propositions are classical:
KðNÞ K KðN 0 Þ = KðNN 0 Þ ½4 Proposition (A)
for K = R or C and N, N0 2 N. There are isomorph- (i) An algebra over K is simple if and only if it
isms of algebras over R: admits a faithful irreducible representation in a
vector space over K. Such a representation is
C R C = C  C unique, up to equivalence.
C R H = Cð2Þ ½5 (ii) The complexification of a central simple algebra
H R H = Rð4Þ over R is a central simple algebra over C.

An algebra over R can be complexified by complex- For real algebras, one often considers complex
ifying its underlying vector space; it follows from [5] representations, that is, representations in complex
that C(2) is the complex algebra obtained by vector spaces. Two such representations 1 : A !
complexification of the real algebra H. End S1 and 2 : A ! End S2 are said to be complex
The center of an algebra A is the set equivalent if there is a complex isomorphism F : S1 !
S2 intertwining the representations; they are real
ZðAÞ = fa 2 A j ab = ba 8 b 2 Ag equivalent if there is an isomorphism among the
The center is a commutative subalgebra containing realifications of S1 and S2 , intertwining the
K. An algebra over K is said to be central if its center representations. For example, C, considered as an
coincides with K. The algebras R(N) and H(N) are algebra over R, has two complex-inequivalent
central over R. The algebra C(N) is central over C, representations in C : the identity representation
but not over R. and its complex conjugate. The realifications of
these representations, given by i 7! " and i 7! ",
respectively, are real equivalent: they are intertwined
Simplicity and representations Let B1 and B2
by z . The real algebra H, being central simple, has
be subsets of the algebra A. Define B1 B2 = fb1 b2 j
only one, up to complex equivalence, representation
b1 2 B1 , b2 2 B2 g. A vector subspace B of A is said
in C2: every such representation is equivalent to the
to be a left (resp., right) ideal of A if AB B (resp.,
one given by
BA B). A two-sided ideal – or simply an ideal – is
pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi
a left and right ideal. An algebra A 6¼ f0g is said to i 7! x = 1; j 7! y = 1; k 7! z = 1
be simple if its only two-sided ideals are f0g and A.
For example, the algebras R(N) and H(N) are This representation extends to an injective homo-
simple over R; the algebra C(N) is simple when morphism of algebras i : H(N) ! C(2N) which is used
considered as an algebra over both R and C; every to define the quaternionic determinant of a matrix a 2
associative, finite-dimensional simple algebra over R H(N) as detH ðaÞ= det iðaÞ, so that detH (a)5 0 and
or C is isomorphic to one of them. detH (ab)= detH (a)detH (b) for every a, b 2 H(N). In
A representation of an algebra A over K in a vector particular, if q 2 H and ,  2 R, then detH (q)= q
q and
space S over K is a homomorphism of algebras  : A ! !
 q
End S. If  is injective, then the representation is said to detH = ð þ qqÞ2 ½6
be faithful. For example, the regular representation  : q 
A ! End A of an algebra A, defined by (a)b = ab There are quaternionic unimodular groups
for all a, b 2 A, is faithful. A vector subspace T of SLN ðHÞ = fa 2 HðNÞ j detH ðaÞ = 1g. For example,
the vector space S carrying a representation  of A the group SL1 (H) is isomorphic to SU2 and SL2 (H)
is said to be invariant for  if (a)T T for every is a noncompact, 15-dimensional Lie group, one of
a 2 A; it is proper if distinct from both f0g and S. the spin groups in six dimensions.
For example, a left ideal of A is invariant for the
regular representation. Given an invariant subspace
T of  one can reduce  to T by forming the Antiautomorphisms and inner products An auto-
representation T : A ! End T, where T (a)s = (a)s morphism of an algebra A is a linear isomorphism  :
for every a 2 A and s 2 T. A representation is A ! A such that (ab) = (a)(b). An invertible
irreducible if it has no proper invariant subspaces. element c 2 A defines an inner automorphism Ad(c) 2
A linear map F : S1 ! S2 is said to intertwine the GL(A), Ad(c)a = cac1 . Complex conjugation in C,
representations 1 : A ! End S1 and 2 : A ! End S2 if considered as an algebra over R, is an automorphism
F1 (a) = 2 (a)F holds for every a 2 A. If F is an that is not inner. An antiautomorphism of an
522 Clifford Algebras and Their Representations

algebra A is a linear isomorphism  : A ! A such that when one reduces the degree of every element
(ab) = (b)(a) for all a, b 2 A. An (anti)auto- mod 2. A graded isomorphism of graded algebras
morphism  is involutive if 2 = id. For example, is an isomorphism that preserves the grading.
conjugation of quaternions defines an involutive A Z2 -grading of A is characterized by the
antiautomorphism of H. involutive automorphism  such that, if a 2 Ap ,
Let  : A ! End S be a representation of an algebra then (a) = (1)p a. From now on, grading means
with an involutive antiautomorphism . There is then Z2 -grading unless otherwise specified. The elements
the contragredient representation  : A ! End S given of A0 (resp., A1 ) are said to be even (resp., odd). It
by (a) = (((a))) . If, moreover, A is central simple is often convenient to denote the graded algebra as
and  is faithful irreducible, then there is an isomorph-
ism B : S ! S intertwining  and  which is either A0 ! A ½7
symmetric, B = B, or antisymmetric, B = B. It Given such an algebra over K and N 2 N, one
defines on S the structure of an inner-product space. constructs the graded algebra A0 (N) ! A(N). Two
This structure extends to End S: there is a symme- graded algebras over K, A0 ! A and A00 ! A0 are
tric isomorphism B  B1 : End S ! (End S) = End S said to be of the same type if there are integers N
given, for every f 2 End S, by (B  B1 )(f ) = Bf B1 . and N 0 such that the algebras A0 (N) ! A(N) and
Let K = Knf0g be the multiplicative group of the A00 (N 0 ) ! A0 (N 0 ) are graded isomorphic. The prop-
field K. Given a simple algebra A with an involutive erty of being of the same type is an equivalence
antiautomorphism , one defines N(a) = (a)a and relation in the set of all graded algebras over K.
the group Given an algebra A, one constructs two ‘‘canoni-
GðÞ = fa 2 A j NðaÞ 2 K g cal’’ graded algebras as follows:

Let  : A ! End S be the faithful irreducible represen- 1. the double algebra


tation as above, then, for a 2 A and s, t 2 S, one has A!AA
BððaÞs; ðaÞtÞ = NðaÞBðs; tÞ graded by the ‘‘swap’’ automorphism, (a1 , a2 ) =

If a 2 G() and  2 K , then a 2 G() and the norm (a2 , a1 ) for a1 , a2 2 A;
N satisfies N(a) = 2 N(a). The inner product B is 2. the algebra
invariant with respect to the action of the group A  A ! Að2Þ
G1 ðÞ = fa 2 GðÞ j NðaÞ = 1g is defined by declaring the diagonal (resp., anti-
diagonal) elements of A(2) to be even (resp., odd).
Proposition (B) Let A be a central simple algebra The real algebra R(2) has also another grading,
over K with an involutive antiautomorphism  and a given by the involutive automorphism  such that
faithful irreducible representation  so that (a) = "a"1 , where a 2 R(2) and " is as in [2]. In
this case, [7] reads
ðaÞ = BðaÞB1
C ! Rð2Þ
The map h : A  A ! K defined by
There are also graded algebras over R:
hða; bÞ = tr ððaÞbÞ
R ! C; C ! H; and H ! Cð2Þ
is bilinear, symmetric, and nondegenerate. The map
 is an isometry of the quadratic space (A, h) on its The grading of the last algebra can be defined by
image in the quadratic space (End S, B  B1 ). declaring the Pauli matrices and iI to be odd.

Super Lie algebras A super Lie algebra is a graded


Graded Algebras
algebra A such that the product (a, bÞ 7! ½a, b is
Definitions An algebra A is said to be Z-graded super anticommutative, ½a, b =  (1)pq ½b, a, and
(resp., Z2 -graded) if there is a decomposition of the satisfies the super Jacobi identity,
underlying vector space A = p2Z Ap (resp.,
½a; ½b; c = ½½a; b; c þ ð1Þpq ½b; ½a; c
A = A0  A1 ) such that Ap Aq Apþq . In a Z2 -graded
algebra, it is understood that p þ q is reduced mod 2. If for every a 2 Ap , b 2 Aq and c 2 A. To every graded
a 2 Ap , then a is said to be homogeneous of degree p. associative algebra A there corresponds a super Lie
The exterior algebra ^V of a vector space V is algebra GLA: its underlying vector space and
Z-graded. Every Z-graded algebra becomes Z2 -graded grading are as in A and the product, for a 2 Ap
Clifford Algebras and Their Representations 523

and b 2 Aq , is given as the supercommutator ½a, b = map such that f (v)2 = g(v, v)1A for every v 2 V. There
ab  (1)pq ba. then exists a homomorphism f̂ : C‘(V, g) ! A of
algebras with units, an extension of f, so that f (v) = f̂(v)
Supercentrality and graded simplicity A graded for every v 2 V.
algebra A over K is supercentral if Z(A) \ A0 = K. As a corollary, one obtains
The algebra R ! C is supercentral, but the real
ungraded algebra C is not central. Proposition (D) If f is an isometry of (V, g) into
A subalgebra B of a graded algebra A is said to be (W, h), then there is a homomorphism of algebras
a graded subalgebra if B = B \ A0  B \ A1 . A C‘(f ) : C‘(V, g) ! C‘(W, h) extending f so that there
graded ideal of A is an ideal that is a graded is the commutative diagram
subalgebra. A graded algebra A 6¼ f0g is said to be C‘ðf Þ
graded simple if it has no graded ideals other than C‘ðV; gÞ ! C‘ðW; hÞ
f0g and A. The double algebra of a simple algebra is " "
graded simple, but not simple. V ! W
f

For example, the isometry v 7! v extends to the


The graded tensor product Let A and B be graded
involutive main automorphism  of C‘(V, g), defin-
algebras; the tensor product of their underlying
ing its Z2 -grading:
vector spaces admits a natural grading, (A  B)p =
q Aq  Bpq . The product defined in [3] makes C‘ðV; gÞ = C‘0 ðV; gÞ  C‘1 ðV; gÞ
A  B into a graded algebra. There is another ‘‘super’’
product in the same graded vector space given by The algebra C‘(V, g) admits also an involutive cano-
nical antiautomorphism  characterized by (1) = 1
ða  bÞ
ða0  b0 Þ = ð1Þpq aa0  bb0 and (v) = v for every v 2 V.

for a0 2 Ap and b 2 Bq . The resulting graded algebra


is referred to as the graded tensor product and The Vector Space Structure of Clifford Algebras
denoted by A  ^ B. For example, if V and W are
Referring to proposition (D), let A = End( ^V) and, for
vector spaces, then the Grassmann algebra ^(V  every v 2 V and w 2 ^V, put f (v)w = v ^ w þ g(v)cw,
W) is isomorphic to ^V  ^ ^ W.
then f : V ! End( ^V) is a Clifford map and the map
i : C‘ðV; gÞ ! ^V ½9
Clifford Algebras
given by i(a) = f̂(a)1^V is an isomorphism of vector
Definitions: The Universal Property and Grading spaces. This proves
The Clifford algebra associated with a quadratic Proposition (E) As a vector space, the algebra
space (V, g) is the quotient algebra C‘(V, g) is isomorphic to the exterior algebra ^V.
C‘ðV; gÞ = T ðVÞ=J ðV; gÞ ½8 If V is m-dimensional, then C‘(V, g) is
where J (V, g) is the ideal in the tensor algebra T (V) 2m -dimensional. The linear isomorphism [9] defines a
generated by all elements of the form v  v  Z-grading of the vector space underlying the Clifford
g(v, v)1T (V) , v 2 V. algebra: if i(ak ) 2 ^k V, then ak is said to be of
The Clifford algebra is associative with a unit Grassmann degree k. Every element a 2 C‘(V, g)
element denoted by 1. One denotes by the decomposes
P into its Grassmann components,
canonical map of T (V) onto C‘(V, g) and by ab a = k2Z ak . The Clifford product of two elements of
the product of two elements a, b 2 C‘(V, g) so that Grassmann
P degrees k and l decomposes as follows:
(P  Q) = (P) (Q) for P, Q 2 T (V). The map is ak bl = p2Z (ak bl )p , and (ak bl )p = 0 if p < jk  lj or
injective on K  V, and one identifies this subspace of p k  l þ 1 mod 2 or p > m  jm  k  lj.
T (V) with its image under . With this identification, One often uses [9] to identify the vector spaces ^V
for all u, v 2 V, one has and C‘(V, g); this having been done, one can write,
for every v 2 V and a 2 C‘(V, g),
uv þ vu = 2gðu; vÞ
va = v ^ a þ gðvÞca ½10
Clifford algebras are characterized by their universal
so that [v, a] = 2g(v)ca, where [ , ] is the supercommu-
property described in the following proposition.
tator. It defines a super Lie algebra structure in the
Proposition (C) Let A be an algebra with a unit 1A vector space K  V. The quadratic form defined by g
and let f : V ! A be a Clifford map, that is, a linear need not be nondegenerate; for example, if it is the
524 Clifford Algebras and Their Representations

0-form, then [10] shows that the Clifford and exterior The Chevalley Theorem and the Brauer–Wall
multiplications coincide and C‘(V, 0) is isomorphic, as Group
an algebra, to the Grassmann algebra. If (V, g) and (W, h) are quadratic spaces over K, then
their sum is the quadratic space (V  W, g  h)
Complexification of Real Clifford Algebras characterized by g  h : V  W ! V   W  so that
Proposition (F) If (V, g) is a real quadratic space, (g  h)(v, w) = (g(v), h(w)). By noting that the map
^ C‘(W, h)
V  W 3 (v, w)7! v  1 þ 1  w 2 C‘(V, g) 
then the algebras C  C‘(V, g) and C‘(C  V, C  g)
are isomorphic, as graded algebras over C. has the Clifford property, Chevalley proved

From now on, through the end of the article, one Proposition (I) The algebra C‘(V  W, g  h) is
^ C‘(W, h).
isomorphic to the algebra C‘(V, g) 
assumes that (V, g) is an orthogonal space over
K = R or C. The type of the (graded) algebra C‘(V  W, g  h)
The Clifford algebra associated with the orthogo- depends only on the types of C‘(V, g) and C‘(W, h).
nal space Cm is denoted by C‘m . The Clifford The Chevalley theorem (I) shows that the set of types
algebra associated with the orthogonal space of Clifford algebras over K forms an abelian group for
(Rkþl , g), where g is of signature (k, l), is denoted a multiplication induced by the graded tensor product.
by C‘k, l , so that C  C‘k, l = C‘kþl . The unit of this Brauer–Wall group of K is the type of
the algebra C‘(K2 , h) described in [11]; for a full
Relations between Clifford Algebras in Spaces of account with proofs, see Wall (1963).
Adjacent Dimensions
The Volume Element and the Centers
Consider an orthogonal space (V, g) over K and the
Let e = (e ) be an orthonormal frame in (V, g). The
one-dimensional orthogonal space (K, h1 ), having a
unit vector w 2 K, h1 (w, w) = ", where " = 1 or 1. volume element associated with e is
The map V 3 v 7! vw 2 C‘0 (V  K, g  h1 ) satisfies
= e1 e2


em
(vw)2 = "g(v, v) and extends to the isomorphism
of algebras C‘(V, "g) ! C‘0 (V  K, g  h1 ). This If
0 is the volume element associated with another
proves orthonormal frame e0 in the same orthogonal space,
then either
0 =
(e and e0 are of the same
Proposition (G) There are isomorphisms of algebras: orientation) or
0 = 
(e and e0 are of opposite
C‘m ! C‘0mþ1 and C‘k, l ! C‘0kþ1, l . orientation). For K = C, one has
2 = 1; for K = R
Consider the orthogonal space (K2 , h) with a and g of signature (k, l) one has
neutral h such that, for ,  2 K, one has
2 = ð1Þð1=2ÞðklÞðklþ1Þ ½13
h(, ), h(, )i = . The map
! It is convenient to define 2 f1, ig so that
2 = 2 . For
2
0  every v 2 V one has v
= (1)mþ1
v. The structure of
K ! Kð2Þ; ð; Þ 7! the centers of Clifford algebras is as follows:
 0
Proposition (J) If m is even, then Z(C‘(V, g)) = K
has the Clifford property and establishes the and Z(C‘0 (V, g)) = K  K
. If m is odd, then
isomorphisms represented by the horizontal arrows Z(C‘(V, g)) = K  K
and Z(C‘0 (V, g)) = K.
in the diagram The graded algebra C‘(V, g) is supercentral for
every m.
C‘ðK2 ; hÞ ! Kð2Þ
" " ½11 The Structure of Clifford Algebras

C‘0 ðK2 ; hÞ ! KK The complex case Using [4] one obtains from [11]
and [12] the isomorphisms of algebras
C‘02nþ1 = C‘2n = Cð2n Þ ½14
2
Proposition (H) If (K , h) is neutral and (V, g) is
over K, then the algebra C‘(V  K2 , g  h) is
C‘2nþ1 = C‘02nþ2 = Cð2n Þ  Cð2n Þ ½15
isomorphic to the algebra C‘(V, g)  K(2)_ Specifically,
there are isomorphisms for n = 0, 1, 2 , . . . : Therefore, there are only two types
of complex Clifford algebras, represented by
C‘kþ1;lþ1 = C‘k;l  Rð2Þ
½12 C ! C  C and C  C ! C(2) : the Brauer–Wall
C‘mþ2 = C‘m  Cð2Þ group of C is Z2 .
Clifford Algebras and Their Representations 525

The real case In view of proposition (I) and The spinorial clock is symmetric with respect to
C‘1, 1 = R(2), the algebra C‘k, l is of the same type as the reflection in the vertical line through its center;
C‘kl, 0 if k > l and of the same type as C‘0, lk this is a consequence of the isomorphism of algebras
if k < l. Since C‘k, l  ^ C‘l, k = C‘kþl, kþl , the type C‘k, lþ2 = C‘l, k  R(2).
of C‘l, k is the inverse of the type of C‘k, l . The algebra Note that the ‘‘abstract’’ algebra C‘k, l carries, in
C‘04, 0 ! C‘4, 0 is isomorphic to H  H ! H(2): if general, less information than the Clifford algebra
x = (x1 , x2 , x3 , x4 ) 2 R4 C‘4, 0 , and q = ix1 þ jx2 þ defined in [8], which contains V as a distinguished
kx3 þ x4 2 H, then an isomorphism is obtained from vector subspace with the quadratic form
the Clifford map f , v 7! v2 = g(v, v). For example, the algebras C‘8, 0 ,
! C‘4, 4 , and C‘0, 8 are all graded isomorphic.
0 q
f ðxÞ = ½16
q 0 Theorem on Simplicity

In view of [13], the volume element


satisfies
2 = 1. From general theory (Chevalley 1954) or by inspec-
By replacing q  with q  in [16], one shows that C‘0, 4 tion of [14], [15], and [17], one has
is also isomorphic to H(2). The map R4  Rkþl ! Proposition (L) Let m be the dimension of the
H(2)  C‘k, l given by (x, y) 7! f (x)  1 þ
 y has orthogonal space (V, g) over K.
the Clifford property and establishes the isomorphism
of algebras C‘kþ4, l = H  C‘k, l . Since, similarly, (i) If m is even (resp., odd), then the algebra
C‘k, lþ4 = H  C‘k, l , one obtains the isomorphism C‘(V, g) (resp., C‘0 (V, g)) over K is central simple.
(ii) If K = C and m is odd (resp., even), then the
C‘kþ4;l = C‘k;lþ4 algebra C‘(V, g) (resp., C‘0 (V, g)) is the direct
Therefore, sum of two isomorphic complex central simple
algebras.
C‘kþ8;l = C‘kþ4;lþ4 = C‘k;lþ8 = C‘k;l  Rð16Þ (iii) If K = R and m is odd (resp., even), then the
algebra C‘(V, g) (resp., C‘0 (V, g)) when
2 = 1 is
and the algebras C‘k, l , C‘kþ8, l , and C‘k, lþ8 are all of the
the direct sum of two isomorphic central simple
same type. This double periodicity of period 8 is
algebras and when
2 = 1 is simple with a
subsumed by saying that real Clifford algebras can be
center isomorphic to C.
arranged on a ‘‘spinorial chessboard.’’ The type of
C‘0k, l ! C‘k, l depends only on k  l mod 8; the eight
types have the following low-dimensional algebras as Representations
representatives: C‘1, 0 , C‘2, 0 , C‘3, 0 , C‘4, 0 = C‘0, 4 , C‘0, 3 ,
C‘0, 2 , and C‘0, 1 . The Brauer–Wall group of R is Z8 , The Pauli, Cartan, Dirac, and Weyl
Representations
generated by the type of C‘01, 0 ! C‘1, 0 , that is, by R !
C. Bearing in mind the isomorphism C‘k, l = C‘0kþ1, l Odd dimensions Let (V, g) be of dimension
and abbreviating C ! R(2) to C ! R, etc., one can m = 2n þ 1 over K. From propositions (A) and (L) it
arrange the types of real Clifford algebras in the form follows that the central simple algebra C‘0 (V, g) has a
of a ‘‘spinorial clock’’: unique, up to equivalence, faithful, and irreducible
7 0 representation in the complex 2n -dimensional vector
R ! RR ! R space S of Pauli spinors. By putting (
) = I it is
6" #1 extended to a Pauli representation  : C‘(V, g) !
C C ½17 End S. Given an orthonormal frame (e ) in V, Pauli n
5" #2 endomorphisms (matrices if S is identified with C2 )
H HH H are defined as  = (e ) 2 End S. The representations
4 3
 and    are complex inequivalent. For K = C
none of them is faithful; their direct sum is the faithful
Proposition (K) Recipe for determining C‘0k, l !
Cartan representation of C‘(V, g) in S  S. For K = R
C‘k, l :
and (1=2)(k  l  1) even, the representations  and
(i) find the integers  and  such that    are real equivalent and faithful. On computing
k  l = 8 þ  and 0 v 7; (
) one finds that the contragredient representation ˇ
(ii) from the spinorial clock, read off A0v ! vAv and is equivalent to  for n even and to    for n odd.
0
compute the real dimensions, dim A0v = 2 and

dim Av = 2 ; and Even dimensions Similarly, for (V, g) of dimension
0
(iii) form C‘0k, l = A0v (2(1=2)(kþl1 ) ) and C‘k, l = m = 2n over K, the central simple algebra C‘(V, g)
Av (2(1=2)(kþl ) ). has a unique, up to equivalence, faithful, and
526 Clifford Algebras and Their Representations

irreducible representation : C‘(V, g) ! End S in the Example One of the most used representations :
2n-dimensional complex vector space S of Dirac C‘3, 1 ! C(4) is given by the Dirac matrices
spinors. The Dirac endomorphisms (matrices) are ! !
0 x 0 y
 = (e ). Put  ¼ (
) so that 2 = I: the matrix  1 = ; 2 =
generalizes the familiar 5 . The Dirac representation x 0 y 0
restricted to C‘0 (V, g) decomposes into the sum þ  
of two irreducible representations in the vector spaces
! ! ½20
S = fs 2 S j s = sg 0 z 0 I
3 = ; 4 =
of Weyl (chiral) spinors. The elements of Sþ are said z 0 I 0
to be of opposite chirality with respect to those of
S . The transpose  defines a similar split of S . Change Conjugation and Majorana Spinors
The representations þ and  are never complex-
equivalent, but they are real equivalent and Throughout this section and next, one assumes
faithful for K = R and (1=2)(k  l) odd. K = R so that, given a representation  : C‘(V, g) !
The representations   and ˇ are both equiva- End S,one can form the complex- (‘‘charge’’) conjugate
lent to . It is convenient to describe simultaneously representation  : C‘(V, g) ! End S defined by
the properties of the transpositions of the Pauli and (a) = (a) and the Hermitian conjugate representa-

Dirac matrices; let  be either the Pauli matrices tion y : C‘(V, g) ! End S , where y (a) = (a).
for V of dimension 2n þ 1 or the Dirac matrices for
V of dimension 2n. There is a complex isomorphism Even dimensions The representations  and are
B : S ! S such that equivalent: there is an isomorphism C : S ! 
S such
that
 = ð1Þn B B1 ½18
 = C  C1 ½21
n
In the case of the Dirac matrices, the factor (1) in The automorphism CC  is in the commutant of ; it
[18] implies that this equation also holds for  in is, therefore, proportional to I and, by a change of
place of  . The isomorphism B preserves (resp., scale, one can achieve CC  = I for k  l 0 or
changes) the chirality of Weyl spinors for n even 
6 mod 8 and CC ¼ I for k  l 2 or 4 mod 8.
(resp., odd). Every matrix of the form B 1 . . . p , The spinor sc ¼ C1s 2 S is the charge conjugate of
where s 2 S. If : V ! S is a solution of the Dirac equation
141 <


< p 2n ½19 ð  ð@  iqA Þ  Þ = 0

is either symmetric or antisymmetric, depending on for a particle of electric charge q, then c is a


p and the symmetry of B. A simple argument, based solution of the same equation with the opposite
on counting the number of such products of one charge. Since
symmetry, leads to the equation  = 2 CC1
B = ð1Þð1=2Þnðnþ1Þ B charge conjugation preserves (resp., changes) the
chirality of Weyl spinors for (1=2)(k  l) even (resp.,
valid in dimensions 2n and 2n þ 1. odd).
 = I, then
If CC

Inner products on spinor spaces Let S be the Re S = fs 2 S j sc = sg


complex vector space of Dirac or Pauli spinors is a real vector space of dimension 2n , the space of
associated with (V, g) over K. The isomorphism B : Dirac–Majorana spinors. The representation is
S ! S defines on S an inner product real: restricted to Re S and expressed with respect to
B(s, t) = hs, B(t)i, s, t 2 S, which is orthogonal for a frame in this space, it is given by real 2n  2n
m 0, 1, 6, or 7 mod 8 and symplectic for m matrices. For k  l 0 mod 8 the representations þ
2, 3, 4, or 5 mod 8. For m 0 mod 4, this product and  are both real: in this case there are
restricts to an inner product on the space of Weyl Weyl–Majorana spinors.
spinors that is orthogonal for m 0 mod 8 and
symplectic for m 4 mod 8. For m 2 mod 4, the Odd dimensions On computing (
) one finds that

map B defines the isomorphisms B : S ! S . the conjugate representation  is equivalent to 
Clifford Algebras and Their Representations 527

(resp.,   ) if
2 = 1 (resp.,
2 = 1). There is an  of dimension 2(m) , where (m) is the mth Radon–
isomorphism C : S !  S such that Hurwitz number given by
m= 1 2 3 4 5 6 7 8
 = ð1Þð1=2Þðklþ1Þ C C1 ½22
ðmÞ = 1 2 2 3 3 3 3 4
 = I (resp., CC
and CC  =  I) for k  l 1 or 7 mod 8
(resp., k  l 3 or 5 mod 8). For k  l 1 mod 8, the and (m þ 8) = (m) þ 4. The matrices  2 R(2(m) ),
restriction of the Pauli representation to C‘0k, l is real  = 1, . . . , m, defining these representations satisfy
and the Pauli matrices are pure imaginary; for k  l  v þ v  = 2v I
7 mod 8, the Pauli representations of C‘k, l are both real
and so are the Pauli matrices. In both these cases there and can be chosen so as to be antisymmetric. In all
are Pauli–Majorana spinors. dimensions other than m 3 mod 4 the representa-
tions are faithful.
For m 2 and 4 mod 8 (resp., m 1, 3, and
Hermitian Scalar Products and Multivectors 5 mod 8) the representations  are the realifications of
For m = k þ l odd and C as in [22], the map the corresponding Dirac (resp., Pauli) representations.

 :S ! 
A = BC S intertwines the representations y In dimensions m 0 and 6 mod 8 (resp.,
and  (resp.,   ) for k even (resp., odd), m 7 mod 8) the Dirac (resp., Pauli) representations
themselves are real.
y = ð1Þk A A1
By rescaling of B, the map A can be made Inductive Construction
Hermitian. The corresponding Hermitian form
of Representations
s 7! A(s, s) is definite if and only if k or l = 0;
otherwise, it is neutral. An inductive construction of the Pauli
For m = k þ l even, the representations y and representations
are equivalent and one can define a Hermitian
isomorphism A : S ! 

S so that  : C‘n1;n ! Rð2n1 Þ; n = 1; 2; . . .
and of the Dirac representations
y = A  A1 ½23
: C‘n;n ! Rð2n Þ; n = 1; 2; . . .
The isomorphism A0 = A intertwines the represen-
tations y and  ; it can also be made Hermitian is as follows.
by rescaling. The Hermitian form A(s, s) is definite 1. In dimension 1, put 1 = 1.
for k = 0 and A0 (s, s) is definite for l = 0; otherwise, 2. Given  2 R(2n1 ),  = 1, . . . , 2n  1, define
these forms are neutral. For example, in the familiar !
representation [20], one has A = 4 , a neutral form. 0 
For p = 0, 1, . . . , m = 2n, two spinors s and t 2 S  = for  = 1; . . . ; 2n  1
 0
define the p-vector with components
and
A1 ...p ðs; tÞ = hs; A 1 . . . p ti ½24
!
0 I
where the indices are as in [19]. The Hermiticity of 2n =
A and [23] imply I 0

A1 ...p ðs; tÞ = ð1Þð1=2Þpðp1Þ A1 ...p ðt; sÞ 3. Given  2 R(2n ),  = 1, . . . , 2n, define  = 
for  = 1, . . . , 2n, and 2nþ1 = 1


2n .
In view of y = (1)k AA1 , the map A defines,
for k even, a nondegenerate Hermitian scalar All entries of these matrices are either 0, 1, or 1;
product on the spaces S whereas A(s, t) = 0 if s therefore, they can be used to construct representa-
and t are Weyl spinors of opposite chiralities. For k tions of Clifford algebras of orthogonal spaces over
odd, A changes the chirality. any commutative field of characteristic 6¼ 2.
By induction, one has  = (1)þ1  . Therefore,
the isomorphisms appearing in [18] are
B = 2 4


2n for both m = 2n and 2n þ 1.
The Radon–Hurwitz Numbers
By multiplying some of the matrices  or  by the
Proposition (M) For every integer m > 0, the imaginary unit, one obtains complex representations
algebra C‘m, 0 has an irreducible real representation of the Clifford algebras associated with the quadratic
528 Clifford Algebras and Their Representations

forms of other signatures. For example, in dimension fields on odd-dimensional spheres can be constructed
3, (1 , i2 , 3 ) are the Pauli matrices. In dimension 4, with the help of the representation  described in
multiplying 2 by i one obtains the Dirac matrices for g proposition (M). Given a positive even integer N, let
of signature (1, 3), in the ‘‘chiral representation’’: m be the largest integer such that N = 2(m) p, where
    p is an odd integer. Consider the unit sphere
0 x 0 y
1 ¼ ; 2 ¼ SN1 = fx 2 RN j jjxjj = 1g of dimension N  1. For
x 0 y 0 v 2 Rm , put 0 (v) = (v)  I, where I 2 R(p) is the
    ½25
0 z 0 I unit matrix. Since (v) is antisymmetric, so is the
3 ¼ ; 4 ¼
z 0 I 0 matrix 0 (v) 2 R(N). Therefore, for every x 2 SN1 ,
the vector 0 (v)x is orthogonal to x. The map
To obtain the real Majorana representation one uses x 7! 0 (v)x defines a vector field on SN1 that
the following fact: vanishes nowhere unless v = 0 : the (N1)-sphere
Proposition (N) If the matrix C 2 R(2n ) is such admits a set of m tangent vector fields which are
that C2 = I and [21] holds, then the matrices linearly independent at every point. Using methods of
(I þ iC)  (I þ iC)1 ,  = 1, . . . , 2n, {\it are real}. algebraic topology, it has been shown that this
method gives the maximum number of linearly
For the matrices [25], one can take C = 1 3 4 to independent tangent vector fields on spheres.
obtain If m = 1, 3, or 7, then m þ 1 = 2(m) and, for these
! ! values of m, the sphere Sm is parallelizable. More-
0 x I 0
0
1 = ; 0
2 = over, one can then introduce in Rmþ1 the structure
x 0 0 I of an algebra Am as follows. Put 0 = I. If e0 2 Rmþ1
! ! is a unit vector and e =  (e0 ), then (e0 , e1 , . . . , em )
0 z 0 I is anPorthonormal framePin Rmþ1 . The product of
30 = ; 40 = x= m m
 = 0 x e and y =  = 0 y e is defined to be
z 0 I 0
X
m

The real representations described in proposition x


y= x yv  ðev Þ
;v = 0
(M) can be obtained by the following direct inductive
construction. Consider the following seven real anti- so that e0 is the unit element for this product.
symmetric and anticommuting 8  8 matrices: Defining Re x= x0 e0 , Im x = x  Re x, x  = Rex  Im x,
one has x
x = e0 jjxjj2 and x

(x
y)= (
x
x)
y, so that
1 ¼ z  I  ";  2 ¼ z  "  x
x
y= 0 implies x = 0 or y = 0: Am is a normed
 3 ¼ z  "  z ;  4 ¼ x  "  I algebra without zero divisors. The algebras A1 and
½26 A3 are isomorphic to C and H, respectively, and A7
5 ¼ x  x  ";  6 ¼ x  z  "
is, by definition, the algebra O of octonions
7 ¼ "  I  I discovered by Graves and Cayley. The algebra O is
nonassociative; its multiplication table is obtained
For m = 4, 5, 6, and 7 the matrices 1 , . . . , m gener-
with the help of [26].
ate the representations of C‘m, 0 in R8 . The eight
matrices  = x   ,  = 1, . . . , 7, and 8 = "  I 
I  I give the required representation of C‘8, 0 in Spinor Groups
R16 . By dropping the first factor in 1 , 2 , 3 , one
obtains the matrices generating a representation of Let (V, g) be a quadratic space over K. If u 2 V is
C‘3, 0 in R 4 , etc. The symmetric matrix not null, then it is invertible as an element of
 = 1


8 = z  I  I  I anticommutes with all C‘(V, g) and the map v 7! uvu1 is a reflection in
the s and 2 = I. If the matrices  2 R(2(m) ) the hyperplane orthogonal to u. The orthogonal
correspond to a representation of C‘m, 0 , then the group O(V, g) = O(V, g) = fR 2 GL(V) j R  g 
m þ 8 matrices   1 , . . . ,   m , 1  I, . . . , 8  I R = gg is generated by the set of all such reflections.
generate the required representation of C‘mþ8, 0 . A spinor group G is a subset of C‘(V, g) that is a
group with respect to multiplication induced by the
product in the algebra, with a homomorphism
 : G ! GL(V) whose image contains the connected
Vector Fields on Spheres
component SO0 (V, g) of the group of rotations of
and Division Algebras
(V, g). In the case of real quadratic spaces, one
It is known that even-dimensional spheres have no considers also spinor groups that are subsets of C 
nowhere-vanishing tangent vector fields. All such C‘(V, g) with similar properties. By restriction, every
Clifford Algebras and Their Representations 529

representation of C‘(V, g) or C  C‘(V, g) gives u1 . . . u2p v1 . . . v2q such that u2i = 1 and v2j = 1.
spinor representations of the spinor groups it The connected groups Spinm:0 and Spin0, m are
contains. isomorphic and denoted by Spinm . Since Spin0k, l
G1 (), the Hermitian form A and the bilinear form
Pin Groups
B are invariant with respect to the action of this
group. Moreover, for k þ l even, from [24] and
It is convenient to define a unit vector v 2 V [28] there follows the transformation law of
C‘(V, g) to be such that v2 = 1 for V complex and multivectors formed from pairs of spinors,
v2 = 1 or 1 for V real. The group Pin(V, g) is
A1

p ð ðaÞs; ðaÞtÞ
defined as the subgroup of Cpin(V, g) consisting of
products of all finite sequences of unit vectors. = Av1 ...vp ðs; tÞRv11 ða1 Þ . . . Rvpp ða1 Þ
f
Defining now the twisted adjoint representation Ad
f 1
Consider Spin0 (V, g) and assume that either V is
by Ad(a)v = (a)va , one ontains the exact sequence complex of dimension 52 or real with k or l 5 2.
e
Ad Then there are two unit orthogonal vectors
1 ! Z2 ! PinðV; gÞ!OðV; gÞ ! 1 ½27
e1 , e2 2 V such that (e1 , e2 )2 = 1. The vector
If dimV is even, then the adjoint representation u(t) = e1 cos t þ e2 sin t is obtained from e1 by rotation
Ad(a)v = ava1 also yields an exact sequence like in the plane span fe1 , e2 g by the angle t 2 R. The
[27]; if it is odd, then the image of Ad is SO(V, g) and curve t 7! e1 u(t), 0  t  , connects the elements
the kernel is the four-element group f1, 1,
, 
g. 1 and 1 of Spin0 (V, g). Its image in SO0 (V, g), that
Given an orthonormal frame (e ) in (V, g) and is, the curve t 7! Ad(e1 u(t)), 0  t  , is closed:
a 2 Pin(V, g), one defines the orthogonal matrix Ad(1) = Ad(1). This fact is often expressed by
R(a) = (Rv (a)) by saying that ‘‘a spinor undergoing a rotation by 2
f v
changes sign.’’ There is no homomorphism – not
AdðaÞe  = ev R ðaÞ ½28 even a continuous map – f : SO0 (V, g) ! Spin0 (V, g)
If (V, g) is complex, then the algebras C‘(V, g) and such that Ad  f = id.
C‘(V, g) are isomorphic; this induces an iso-
morphism of the groups Pin(V, g) and Pin(V, g). Spinc Groups
If V = Cm , then this group is denoted by Pinm (C). If
V = Rkþl and g of signature (k, l), then one writes For the purposes of physics, to describe charged
Pin(V, g) = Pink, l . A similar notation is used for the fermions, and in the theory of the Seiberg–Witten
groups spin, see below. invariants, one needs the Spinc groups that are spinorial
extensions of the real orthogonal groups by the group U1
of ‘‘phase factors.’’ Assume V to be real and g of
Spin Groups
signature (k, l) so that the sequence [29] can be
The spin group Spin(V, g) = Pin(V, g) \ C‘0 (V, g) is written as
generated by products of all sequences of an even
1 ! Z2 ! Spink;l ! SOk;l ! 1
number of unit vectors. Since the algebras C‘0 (V, g)
and C‘0 (V, g) are isomorphic, so are the groups Define the action of Z2 = f1, 1g in Spink, l  U1 so
Spin(V, g) and Spin(V, g). Since (a) = a for a 2 that (1)(a, z) = ( a,  z). The quotient (Spink, l 
Spin(V, g), the twisted adjoint representation U1 )=Z2 = Spinck, l yields the extensions
reduces to the adjoint representation and yields the
exact sequence 1 ! U1 ! Spinck;l ! SOk;l ! 1
Ad
1 ! Z2 ! SpinðV; gÞ ! SOðV; gÞ ! 1 ½29 and
1 ! Spink;l ! Spinck;l ! U1 ! 1
For V = Cm , the spin group is denoted by Spinm (C).
Since Spinm (C) G1 (), the bilinear form B is For example, Spin3 = SU2 and Spinc3 = U2 .
invariant with respect to the action of this group.

Spin0 Groups Spin Groups in Dimensions <6


The connected component Spin0 (V, g) of the group The connected components of spin groups asso-
Spin(V, g) coincides with Spin(V, g) if either the ciated with orthogonal spaces of dimension 46 are
quadratic space (V, g) is complex or real and kl = 0. isomorphic to classical groups. They can be expli-
In signature (k, l), the connect group Spin0k, l is citly described starting from the following
generated in C‘0k, l by all products of the form observations.
530 Clifford Algebras and Their Representations

Consider the four-dimensional vector space See also: Dirac Operator and Dirac Field; Index
(of twistors) T over K, with a volume element Theorems; Relativistic Wave Equations Including Higher
vol 2 ^4 T. The six-dimensional vector space Spin Fields; Spinors and Spin Coefficients; Twistors.
V = ^2 T has a scalar product g defined by
g(u, v)vol = 2u ^ v for u, v 2 V. The quadratic form
g(u, u) is the Pfaffian, Pf(u). If u 2 V is represented Further Reading
by the corresponding isomorphism T  ! T and a 2
End T, then Pf(aua ) = det aPf(u). The last for- Adams JF (1981) Spin (8), triality, F4 and all that. In: Hawking
SW and Roček M (eds.) Superspace and Supergravity.
mula shows Spin0 (V, g) = SL(T), so that Spin6 (C) =
Cambridge: Cambridge University Press.
SL4 (C). For K = R, the Pfaffian is of signature (3, 3), so Atiyah MF, Bott R, and Shapiro A (1964) Clifford modules.
that Spin03, 3 = SL4 (R). A non-null vector v 2 V defines Topology 3(suppl. 1): 3–38.
a symplectic form on T  . The five-dimensional vector Baez JC (2002) The octonions. Bulletin of the American
space v? V is invariant with respect to the symplec- Mathematical Society 39: 145–205.
Brauer R and Weyl H (1935) Spinors in n dimensions. American
tic group Sp(T  , u) = Spin0 (v? , Pfjv? ). This shows that
Journal of Mathematics 57: 425–449.
Spin5 (C) = Sp4 (C) and Spin02, 3 = Sp4 (R). Spin groups Budinich P and Trautman A (1988) The Spinorial Chessboard,-
for other signatures in real dimensions 6 and 5 are Trieste Notes in Physics. Berlin: Springer.
obtained by considering appropriate real subspaces of Cartan É (1938) Théorie des spineurs. Actualités Scientifiques et
C6 and C5 , respectively. For example, [6] is used to Industrielles, No. 643 et 701. Paris: Hermann (English
transl.:The Theory of Spinors. Paris: Hermann, 1966).
show that Spin01, 5 = SL2 (H).
Chevalley C (1954) The Algebraic Theory of Spinors. New York:
Spin groups in dimensions 4 and lower are Columbia University Press.
similarly obtained from the observation that det is Clifford WK (1878) Applications of Grassmann’s extensive
a quadratic form on the four-dimensional space K(2) algebra. American Journal of Mathematics 1: 350–358.
and C‘0 (K(2), det) = K(2)  K(2). Clifford WK (1882) On the classification of geometric algebras.
In: Tucker R (ed.) Mathematical Papers by William Kingdon
Several spin groups are listed below.
Clifford, pp. 397–401. London: Macmillan.
The complex spin groups Dirac PAM (1928) The quantum theory of the electron.
Proceedings of the Royal Society of London A 117: 610–624.
Spin2 ðCÞ = C ; Spin3 ðCÞ = SL2 ðCÞ Eckmann B (1942) Gruppentheoretische Beweis des Satzes von
Hurwitz–Radon über die Komposition quadratischer Formen.
Spin4 ðCÞ = SL2 ðCÞ  SL2 ðCÞ Commentarii Mathematici Helvetici 15: 358–366.
Spin5 ðCÞ = Sp4 ðCÞ Karoubi M (1968) Algèbres de Clifford et K-théorie. Annales
Scientifiques de l’École Normale Superieure 4ème sér 1: 161–270.
Spin6 ðCÞ = SL4 ðCÞ Lipschitz RO (1886) Untersuchungen über die Summen von
Quadraten. Berlin: Max Cohen und Sohn.
The real, compact spin groups Lounesto P (2001) Clifford Algebras and Spinors, 2nd edn.
London Math. Soc. Lecture Note Series, vol. 286. Cambridge:
Spin2 = U1 ; Spin3 = SU2
Cambridge University Press.
Spin4 = SU2  SU2 ; Spin5 = Sp2 ðHÞ Pauli W (1927) Zur Quantenmechanik des magnetischen
Elektrons. Z. Physik 43: 601–623.
Spin6 = SU4
Penrose R and MacCallum MAH (1973) Twistor theory: an
The groups Spin0k, l for 1 4 k 4 l and k þ l  6 approach to the quantisation of fields and space-time. Physics
Report 6C(4): 241–316.
Spin01;1 = R ; Spin01;2 = SL2 ðRÞ Porteous IR (1995) Clifford Algebras and the Classical Groups,
Cambridge Studies in Advanced Mathematics, vol. 50. Cam-
Spin01;3 = SL2 ðCÞ bridge: Cambridge University Press.
Postnikov MM (1986) Lie groups and Lie algebras. Mir: Moscow.
Spin02;2 = SL2 ðRÞ  SL2 ðRÞ Sudbery A (1987) Division algebras (pseudo)orthogonal groups
and spinors. Journal of Physics A17: 939–955.
Spin01;4 = Sp1;1 ðHÞ Trautman A (1997) Clifford and the ‘‘square root’’ ideas.
Spin02;3 = Sp4 ðRÞ; Spin01;5 = SL2 ðHÞ Contemporary Mathematics 203: 3–24.
Trautman A and Trautman K (1994) Generalized pure spinors.
Spin02;4 = SU2;2 Journal of Geometry and Physics 15: 1–22.
Wall CTC (1963) Graded Brauer groups. Journal für die Reine
Spin03;3 = SL4 ðRÞ und Angewandte Mathematik 213: 187–199.
Cluster Expansion 531

Cluster Expansion
R Kotecký, Charles University, Prague, and
Czech Republic, and the University of Warwick, UK
1 @
ª 2006 Elsevier Ltd. All rights reserved. ð; Þ ¼ lim  log Zð; ; VÞ ½6
V!1 jVj @

Mayer series are the expansions of p and  in powers


of :
Introduction
X
1

The method of cluster expansions in statistical pð; Þ ¼ bn n ½7


n¼1
physics provides a systematic way of computing
power series for thermodynamic potentials (loga- and
rithms of partition funtions) as well as correlations.
It originated from the works of Mayer and others X
1
ð; Þ ¼ nbn n ½8
devoted to expansions for dilute gas. n¼1

Mayer’s idea for a systematic computation of


coefficients bn was based on a reformulation of
Mayer Expansion partition function Z(, , V) in terms of cluster
Consider a system of interacting particles with integrals. Introducing the function
Hamiltonian
f ðrÞ ¼ eðrÞ  1 ½9
HN ðp1 ; . . . ; pN ; r 1 ; . . . ; r N Þ
and using G[N] to denote the set of all graphs on N
XN
p2i X
N
vertices {1, . . . , N}, we get
¼ þ ðr i  r j Þ ½1
2m i; j¼1 Z
i¼1 X
1
N Y
N  Y 3
Zð; ; VÞ ¼ 1 þ f ðr i  r j Þ d ri
where  is a stable and regular pair potential. N¼0
N! V N i;j¼1
Namely, we assume that there exists B  0 such that X
1
N X
X
N ¼ wðgÞ ½10
N¼0
N g2G½N
ðr i  r j Þ   BN ½2
i;j¼1
where
for all N = 2, 3, . . . and all (r 1 , . . . , r N ) 2 R3N , and Z Y Y
that wðgÞ ¼ f ðr i  r j Þ d3 r i ½11
Z V N fi;jg2g
 
CðÞ ¼ eðrÞ  1d3 r < 1 ½3
Observing that the weight w is multiplicative in
for some  > 0 (and hence all  > 0). Basic connected components (clusters) g1 , . . . , gk of the
thermodynamic quantities are given in terms of the graph g,
grand-canonical partition function Y
k

X1 Z Q 3 Q 3 wðgÞ ¼ wðg‘ Þ ½12


zN HN d pi d r i
Zð;; VÞ ¼ e ‘¼1
N¼0
N! 3N
R V N h3N
Z P we can rewrite
X1
N  ðr i r j Þ
Y 3
¼ e i;j d ri ½4 X
1
N X Y
N¼0
N! V N Zð; ; VÞ ¼ wðgÞ ½13
N¼0
N! fgl g g2G
In the second expression we absorbed the factor
resulting from the integration over impulses into with the sum running over all disjoint collections fgl g
(configurational) activity = (2m=h2 )3=2 z. In par- of connected graphs with vertices in {1, . . . , N}. A
ticular, the pressure p and the density  are defined straightforward exponential expansion can be used to
by the thermodynamic limits (with V ! 1 in the show that, at least in the sense of formal power series,
sense of Van Hove)
X1
n X
1 1 log Zð; ; VÞ ¼ wðgÞ ½14
pð; Þ ¼ lim log Zð; ; VÞ ½5 n¼1
n! g2C½n
 V!1 jVj
532 Cluster Expansion

where C[n] is the set of all connected graphs on n Vertices v 2 V are called abstract polymers, with
vertices. Using bn(V) to denote the coefficients two abstract polymers connected by an edge in the
graph G called incompatible. We shall refer to w(v)
1 1 X
bðVÞ
n ¼ wðgÞ ½15 as to the weight of the abstract polymer v. For any
jVj n! g2C½n finite W  V, we consider the induced subgraph
G[W] of G spanned by W and define
and observing that the limits limV ! 1 (1=jVj)w(g) of XY
cluster integrals exist, we get bn = limV ! 1 b(V)
n . The ZW ðwÞ ¼ wðvÞ ½18
convergence of Mayer series can be controlled directly IW v2I
by combinatorial estimates on the coefficients b(V)
n . As a Here the sum runs over all collections I of
result, the diameter of convergence of the series [7] and
compatible abstract polymers – or, in other words,
[8] can be proved to be at least (C()e2Bþ1 )1 . A less
the sum is over all independent sets I of vertices in
direct proof is based on an employment of linear
W (no two vertices in I are connected by an edge).
integral Kirkwood–Salsburg equations in a suitable
The partition function ZW (w) is an entire function
Banach space of correlation functions.
in w = {w(v)}v2W 2 CjWj and ZW (0) = 1. Hence, it is
Similar combinatorial methods are available also
nonvanishing in some neighborhood of the origin
for evaluation of coefficients of the virial expansion
w = 0 and its logarithm is, on this neighbourhood, an
of pressure in powers of gas density,
analytic function yielding a convergent Taylor series
X
1 X
pð; Þ ¼ n  n ½16 log ZW ðwÞ ¼ aW ðXÞwX ½19
n¼1 X2X ðWÞ

obtained by inverting [8] (notice that b1 = 1) and Here, X (W) is theQset of all multi-indices X : W !
inserting it into [7]. One is getting n = limV ! 1 n(V) {0 1, . . . } and wX = v w(v)X(v) . Inspecting the formula
with for aW (X) in terms of corresponding derivatives of
1 1 X log ZW (w), it is easy to show that the Taylor coefficients
nðVÞ ¼ wðgÞ ½17 aW (X) actually do not depend on W : aW (X) = asupp
jVj n! g2B½n
X(X), where supp X = {v 2 V: X (v) 6¼ 0}. As a result,
where B[n]  C[n] is the set of all 2-connected one is getting the existence of coefficients a(X) such that
X
graphs on {1, . . . , n}; namely, those graphs that log ZW ðwÞ ¼ aðXÞwX ½20
cannot be split into disjoint subgraphs by erasing X2X ðWÞ
one vertex (and all adjacent edges). The diameter of
convergence of the virial expansion turns out to be for every finite W  V.
no less than (C()e(e2B þ 1))1 . The coefficients a(X) can be obtained explicitly.
One can pass from [18] to [20] in a similar way as
passing from [10] to [13]. The starting point is to
Abstract Polymer Models replace the restriction to compatible collections of
abstract
Q polymers in the sum [18] by the factor
An application of the ideas of Mayer expansions to 0 (1 þ F(v; v0 )) with
v; v 2W
lattice models is based on a reformulation of the 8 0
partition function in terms of a polymer model, a < 0 if v and v are compatible
>
formulation akin to [13] above. Namely, the partition Fðv; v0 Þ ¼  1 otherwise ðv and v0 ½21
function is rewritten as a sum over collections of >
:
connected by an edge from GÞ
pairwise compatible geometric objects – polymers.
Most often, the compatibility means simply their and to expand the product afterwards. The resulting
disjointness. formula is
While the reformulation of ‘‘physical partition X
function’’ in terms of a polymer model (including the aðXÞ ¼ ðX!Þ1 ð1ÞjEðHÞj ½22
HGðXÞ
definition of compatibility) depends on particularities P
of a given lattice model and on the considered region of Here, G(X) is the graph with jXj = jX(v)j vertices
parameters – high-temperature, low-temperature, large induced from G[supp X] by replacing each of its
external fields, etc. – the essence and results of cluster vertices v by the complete graphQon jX(v)j vertices
expansion may be conveniently formulated in terms of and X! is the multifactorial X! = v2supp X X(v)!. The
an abstract polymer model. sum is over all connected subgraphs H  G(X)
Let G = (V, E) be any (possibly infinite) countable spanned by the set of vertices of G(X) and jE(H)j
graph and suppose that a map w : V ! C is given. is the number of edges of the graph H.
Cluster Expansion 533

A useful property of the coefficients a(X) is their The restriction to compatible collections of polymers
alternating sign, can be actually relaxed. Namely, replacing [25] by
X Y Y
ð1ÞjXjþ1 aðXÞ  0 ½23 ZW ðwÞ ¼ wðvÞ Uðv; v0 Þ ½25
W 0 W v2W 0 v;v0 2W 0
More important than an explicit form of the
coefficients a(X) are the convergence criteria for the with U(v, v0 ) 2 [0, 1] (soft repulsive interaction), and
series [20]. One way to proceed is to find direct the condition [24] by
combinatorial bounds on the coefficients as expressed Y 1  rðv0 Þ
by [22]. While doing so, one has to take into account the RðvÞ  rðvÞ ½26
1  Uðv; v0 Þrðv0 Þ
cancelations arising in view of the presence of terms of v0 6¼v
opposite signs in [22]. Indeed, disregarding them would
one can prove that the partition function ZW (w)
lead to a failure since, as it is easy to verify, the number
does not vanish on the polydisk DW, R implying thus
of connected graphs on jXj vertices is bounded from
that the power series of log ZW (w) converges
below by 2(jXj1)(jXj2)=2 . An alternative approach is to
absolutely on DW, R .
prove the convergence of [20] on polydisks DW, R =
Polymers that arise in typical applications are
{w : jw(v)j  R(v) for v 2 W} by induction in jWj,
geometric objects endowed with a ‘‘support’’ in the
once a proper condition on the set of radii R = {R(v);
considered lattice, say Zd , d  1, and their weights
v 2 V} is formulated. The most natural for the inductive
satisfy the condition of translation invariance. Cluster
proof (leading in the same time to the strongest claim)
expansions then yield an explicit power series for the
turns out to be the Dobrushin condition:
pressure (resp. free energy) in the thermodynamic
There exists a function r : V ! [0; 1) such that, for
limit as well as its finite-volume approximation.
each v 2 V
To formulate it for an abstract polymer model, we
Y
RðvÞ  rðvÞ ð1  rðv0 ÞÞ ½24 assume that for each x 2 Zd , an isomorphism
v0 2N ðvÞ
x : G ! G is given and that with each abstract polymer
v 2 V a finite set (v)  Zd is associated so that
Here N (v) is the set of vertices v0 2 V adjacent in (x (v)) = (v) þ x for every v 2 V and every x 2 Zd .
graph G to the vertex v. For any finite W  V and any multi-index X, let
Using X to denote the set of all P multi-indices (W) = [v2W (v) and (X) = (supp(X)). On the
X : V ! {0; 1, . . . } with finite jXj = jX(v)j and other hand, for any finite   Zd , let W() = {v 2
saying that X 2 X is a cluster if the graph G(supp V : (v)  }. Assuming also that the weight w : V ! C
X) is connected, we can summarize the cluster is translation invariant – that is, w(v) = w(x (v)) for
expansion claim for an abstract polymer model in every v 2 V and every x 2 Zd – we get an explicit
the following way: expression for the ‘‘pressure’’ of abstract polymer model
in the thermodynamic limit
Theorem (Cluster expansion). There exists a func-
tion a : X ! R that is nonvanishing only on clusters, 1 X aðXÞwX
so that for any sequence of diameters R satisfying p ¼ lim log ZWðÞ ðwÞ ¼ ½27
!1 jj jðXÞj
X:ðXÞ30
the condition [24] with a sequence {r(v)}, the
following holds true: In addition, the finite-volume approximation can be
(i) For every finite W  V, and any contour weight explicitly evaluated, yielding
w 2 DW, R, one has ZW (w) 6¼ 0 and
X log ZWðÞ ðwÞ
log ZW ðwÞ ¼ aðXÞwX X jðXÞ \ j
X2X ðWÞ
¼ pjj þ aðXÞwX ½28
X:ðXÞ\c 6¼;
jðXÞj
P
(ii) X2X : suppX3v ja(X)jjwjX   log(1  r(v)).
Using the claim (ii), the second term can be bounded
Notice that, we have got not only an absolute by const. j@j.
convergence of the Taylor series of log ZW in the closed
polydisk DW, R , but also the bound (ii) (uniform in W)
Cluster Expansions for Lattice Models
on the sum over all terms containing a fixed vertex v.
Such a bound turns out to be very useful in applications There is a variety of applications of cluster expan-
of cluster expansions. It yields, eventually, bounds on sions to lattice models. As noticed above, the first
various error terms, avoiding a need of an explicit step is always to rewrite the model in terms of a
evaluation of the number of clusters of ‘‘given size.’’ polymer representation.
534 Cluster Expansion

High-Temperature Expansions yielding [34] (1  t > e2t for t < 1=2). To have w 2
DW, R (for any W) is, for R(B) = (e2 )jBj , sufficient
Let us illustrate this point in the simplest case of the Ising
to take   0 with tanh 0 = e2 .
model. Its partition function in volume   Zd , with
As a consequence, for   0 we can use the
free boundary conditions and vanishing external field, is
8 9 cluster expansion theorem to obtain a convergent
>
< >
= power series in powers of tanh . In particular,
X X
Z ðÞ ¼ exp x y ½29 using (X) = [B2suppX (B), we get the pressure by

>
: x;y2 >
; the explicit formula
jxyj¼1

pðÞ ¼
Using the identity
X aðXÞ X ½37
ex y ¼ cosh  þ x y sinh  ½30 log 2 þ d logðcosh Þ þ w
X:ðXÞ3x
jðXÞj
it can be rewritten in the form
X for any fixed x 2 Zd (by translation invariance of
Z ðÞ ¼ 2jj ðcosh ÞjBðÞj ðtanh ÞjBj ½31 the contributing terms, the choice of x is irrelevant).
B The function p() is analytic on the region   0
Here, the sum runs over all subsets B of the set B() of since it is obtained as a uniformly absolutely
all bonds in  (pairs of nearest-neighbor sites from ) convergent series of analytic terms ( tanh )jXj .
such that each site is contained in an even number of This type of high-temperature cluster expansion
bonds from B. Using (B) to denote the set of sites can be extended to a large class of models P with
contained in bonds from B, we say that B1 , B2  B() Boltzmann factor in the form exp { A UA (
)},
are disjoint if (B1 ) \ (B2 ) = ;. Splitting now B into a where
= (
x ; x 2 Zd ) is the configuration with
collection B = {B1 , . . . , Bk } of its connected components a priori on-site probability distribution (d
x ) and
called (high-temperature) polymers and using B() to UA , for any finite A  Zd , are the multi-site
denote the set of all polymers in , we are getting interactions (depending only on (
x ; x 2 A)). Using
X Y the Mayer trick we can rewrite
Z ðÞ ¼ 2jj ðcosh ÞjBðÞj ðtanh ÞjBj ½32 ( )
X Y
BBðÞ B2B
exp  UA ð
Þ ¼ ð1 þ fA ð
ÞÞ ½38
with the sum running over all collections B of mutually A A

disjoint polymers. This expression is exactly of the with fA (


) = exp {UA (
)}  1. Expanding the
form [18], once we define compatibility of polymers product we will get a polymer representation with
by their disjointness. Introducing the weights polymers A consisting of connected collections
A = (A1 , . . . , Ak ) with weights
wðBÞ ¼ ðtanh  ÞjBj ½33 Z Y Y
and taking the set B() of all polymers in  for W, wðAÞ ¼ fA ð
Þ ðd
x Þ ½39
we get the polymer representation Z () = A2A x2[A2A A

2jj ( cosh )jB()j ZB() (w). under appropriate bounds on the interactions UA
To apply the cluster expansion theorem, we have to and for  small enough, using (A) to denote the set
find a function r such that the right-hand side of [24] is [A2A A, we get,
positive and yields thus the radius of a polydisk of X
convergence. Taking r(B) = jBj with a suitable , we get jwðAÞj  1 ½40
Y A:ðAÞ 3 x
ð1  rðB0 ÞÞ  e2jBj ½34
B0 2NðBÞ This assumption allows, as before in the case of the
2 jBj high-temperature Ising model, to apply the cluster
allowing to choose R(B) = r(B)e2jBj = (e ) .
expansion theorem yielding an explicit series expan-
Indeed, to verify [34] we just notice that the number
sion for the pressure.
of polymers of size n containing a fixed site is
bounded by n with a suitable constant . Thus,
X X
1 Correlations
0
jB j  n n  1 ½35
Cluster expansions can be applied for evaluation of
B0 : ðB0 Þ3x n¼1
decay of correlations. Let us consider, for the class
once  is sufficiently small, and thus of models discussed above, the expectation
X Z Y
jBj  jðBÞj  jBj ½36 1
hi ¼ ð
Þ eH ð
Þ ðd
x Þ ½41
B0 2NðBÞ Z x2
Cluster Expansion 535

P
with H (
) = A UA (
) and a function  we extend AS () to AS = [ AS () and X S,A0 () to
depending only on variables
x on sites x from a X S,A0 = [ X S,A0 (). As a result, we have an explicit
finite set S    Zd . expression for the limiting expectation hi in terms of
A convenient way of evaluating the expectation starts an absolutely convergent power series. This can be
with introduction of the modified partition function immediately applied to show that jhi  hi j decay
exponentially in distance between S and the comple-
Z; ð Þ ¼ Z þ Z; ¼ Z ð1 þ hi Þ ½42
ment
P of . Indeed, it suffices to find a suitable bound on
X
Clearly, X ja(X)jjwj with the sum running over all clusters
 X reaching from the set S to c . To this end one does not
d log Z; ð Þ 
hi ¼  ½43 need to evaluate explicitly the P number of clusters of
d ¼0 given ‘‘diameter’’ diam(X)= A X(A) diam((A))=m
Thus, one may get an expression for the expectation with m  dist(S,c ). The needed estimate is actually
hi , by forming a polymer representation of Z,  ( ) already contained in the condition (ii) from the cluster
and isolating terms linear in in the corresponding expansion theorem. It just suffices to choose a suitable
cluster expansion. For the first step, in the just cited k and assume that  is small P enough to assure validity
high-temperature case with general multi-site inter- of (40) in a stronger form, A:(A)3x jw(A)jK(A)j  1,
actions, we first enlarge the original set A() of all yielding eventually
X
polymers in  (consisting of connected collections c
jaðXÞjjwjX  KdistðS;  Þ jSj
A = (A1 , . . . , Ak )) to W S () = A() [ AS (), where X : diamðXÞ  distðS; c Þ
AS () is the set of all collections (A1 , . . . , Ak ) of X P
polymers such that each of them intersects the set S jaðXÞjjwjX K XðAÞjðAÞj

X:[A 2 supp X ðAÞ3 x


(polymers (A1 , . . . , Ak ) are ‘‘glued’’ by S into a single c
entity). Compatibility is defined as before by disjoint-  jSjKdistðS;  Þ ½49
ness; in addition, any two collections from AS () are
Exponential decay of correlations h1 ; 2 i =
declared to be incompatible as well as any polymer A
h1 2 i  h1 i h2 i (and the limiting h1 ; 2 i)
from A() intersecting S is considered to be incompa-
in distance between supports of 1 and 2 can be
tible with any collection from AS (). Defining now
established in a similar way by isolating terms
w (A) = w(A) for A 2 A() and
Z proportional to 1 2 in the cluster expansion of
Y log Z, 1 ; 2 ( 1 ; 2 ) with
w ðAÞ ¼ ð
ÞeH ð
Þ ðd
x Þ
x2[A2A1 [  [ Ak A[S Z;1 ;2 ð 1 ; 2 Þ
½44 ¼ Z ð1 þ 1 h1 i þ 2 h2 i þ 1 2 h1 2 i Þ ½50

for A = (A1 , . . . , Ak ) 2 AS (), we get Z,  ( ) The resulting claim can be readily generalized to one
exactly in the form [18], about the decay of the correlation h1 ; . ..; k i in
X Y terms of the shortest tree connecting supports
Z; ð Þ ¼ w ðAÞ ½45 S1 , ... , Sk of the functions 1 , . .., k .
I W S ðÞ A2I
Low-Temperature Expansions
As a result, we have
X Finally, in some models with symmetries, we can apply
log Z ; ð Þ ¼ aðXÞwX
½46 cluster expansion also at low temperatures. Let us
X2XðW S ðÞÞ illustrate it again in the case of Ising model. This time,
we take the partition function Zþ  () with plus
allowing easily to isolate terms linear in : namely,
boundary conditions. First, let us define for each
the terms with multi-indices X with supp X \ AS ()
nearest-neighbor bond hx, yi its dual as the (d  1)-
consisting of a single collection, say A0 , that occurs
dimensional closed unit hypercube orthogonal to the
with multiplicity one, X(A0 ) = 1. Explicitly, using
segment from x to y and bisecting it at its center. For a
X S;A0 ðÞ ¼ fX 2 X ðW S ðÞÞ : supp X \ AS ðÞ given configuration  , we consider the boundary of
¼ fA0 g; XðA0 Þ ¼ 1g ½47 the regions of constant spins consisting of the union
@( ) of all hypercubes that are dual to nearest-
we get neighbor bonds hx, yi for which x 6¼ y . The contours
X X corresponding to  are now defined as the connected
hi ¼ aðXÞwX ½48
A0 2AS ðÞ X2X S;A0 ðÞ
components of @( ). Notice that, under the fixed
boundary condition, there is a one-to-one correspon-
It is easy to show that, for sufficiently small , the series dence between configurations  and sets  of
on the right-hand side is absolutely convergent even if mutually compatible (disconnected) contours in .
536 Cluster Expansion

Observing that the number of faces in @( ) is just does not vanish only if A(X) \  6¼ ;, we can expand
the sum of the areas j j of the contours 2 , we the product to obtain ‘‘decorations’’ of the boundary
get the polymer representation @ by clusters fX . In the case of interface these clusters
! can be incorporated into the weight of interface, while
X X
þ
Z ðÞ ¼ e jEðÞj
exp  j j ½51 on a fixed boundary they yield a ‘‘wall free energy.’’
 2 The possibility of the (low-temperature) polymer
representation of the partition function in terms of
where the sum is over all collections of disjoint contours is based on the þ $  symmetry of the
contours in . Here E() is the set of all bonds hx, yi Ising model. In absence of such a symmetry, cluster
with at least one endpoint x, y in . expansions can still be used, but in the framework of
The condition [24] with r( ) =  yields a similar Pirogov–Sinai theory (see Pirogov–Sinai Theory).
bound on the weights w( ) = ej j as in the high-
temperature expansion. To verify it, for  sufficiently
large, boils down to the evaluation of number of Bibliographical Notes
contours of size n that contain a fixed site.
As a result, we can employ the cluster expansion Cluster expansions originated from the works of Ursell,
theorem to get Yvon, Mayer, and others and were first studied in terms
X of formal power series. The combinatorial and enu-
log Zþ ðÞ ¼ jEðÞj þ aðXÞwX ½52 meration problems considered in this framework were
X:X2X ðCðÞÞ summarized in Uhlenbeck and Ford (1962). For related
with an explicit formula for the limit topics in modern language, see Bergeron et al. (1998).
The convergence results for Mayer and virial expansions
X aðXÞ X for dilute gas were first proved in the works of Penrose,
pðÞ ¼ d þ w ½53
jAðXÞj Lebowitz, Groenveld, and Ruelle (see Ruelle (1969) for
X:AðXÞ30
a detailed survey). General polymer models on lattice
Here, A(X) is the set of sites attached to contours were discussed by Gruber and Kunz (1971) (see also
from supp X, Simon (1993) for discussion of high-temperature and
low-temperature cluster expansions of lattice models).
AðXÞ ¼ [ 2supp X Að Þ ½54
Abstract polymer models were introduced in Kotecký
with and Preiss (1986). An elegant proof of a general claim
presented by Dobrushin (1996) was further extended
Að Þ ¼ fx 2 Zd j such that distðx; Þ  1=2g ½55 and summarized by Scott and Sokal (2005). We follow
As a consequence of the fact that [53] is, for large their reformulation of the Dobrushin condition. Cluster
, an absolutely convergent
P sum of analytic terms expansions with a view on applications in quantum field
a(X)wX = a(X)e

X( )j j
(considered as functions theory are reviewed in Brydges (1986).
of ), the function p() is, for large , analytic in .
See also: Phase Transitions in Continuous Systems;
The fact that one can explicitly express the
Pirogov–Sinai Theory; Wulff Droplets.
difference log Zþ  ()  jjp() (cf. [28]) found
numerous applications in situations where one
needs an accurate evaluation of the influence of the Further Reading
boundary of the region  on the partition function.
One such example is a study of microscopic Bergeron F, Labelle G, and Leroux P (1998) Combinatorial
behavior of interfaces. The main idea is to use the Species and Tree-Like Structures, Coll. Encyclopaedia of
Mathematics and Its Applications, vol. 67. Cambridge, MA:
explicit expression in the form Cambridge University Press.
Zþ Brydges DC (1986) A short course on cluster expansions. In:
 ðÞ
8 9 Osterwalder K and Stora R (eds.) Critical Phenomena, Random
< X = Systems, Gauge Theories, pp. 129–183. Les Houches, Session
X jAðXÞ \ j
¼ expfpðÞjjgexp aðXÞw XLIII, 1984. Amsterdam/New York: Elsevier.
: jAðXÞj ;
X:AðXÞ\c 6¼; Dobrushin RL (1996) Estimates of semi-invariants for the Ising
Y model at low temperatures. In: Dobrushin RL, Minlos RA,
¼ expfpðÞjjg ð1 þ fX Þ ½56 Shukin MA, and Vershik AM (eds.) Topics in Statistical and
X:AðXÞ\c 6¼; Theoretical Physics, pp. 59–81. Providence, RI: American
Mathematical Society.
Noticing that
Gruber C and Kunz H (1971) General properties of polymer
  systems. Communications Mathematical Physics 22: 133–161.
jAðXÞ \ j
fX ¼ exp aðXÞwX 1 Kotecký R and Preiss D (1986) Cluster expansion for abstract polymer
jAðXÞj models. Communications in Mathematical Physics 103: 491–498.
Coherent States 537

Ruelle D (1969) Statistical Mechanics: Rigorous Results, The Simon B (1993) The Statistical Mechanics of Lattice Gases, Princeton
Mathematical Physics Monograph Series. Reading, MA: Series in Physics, vol. 1. Princeton: Princeton University Press.
Benjamin. Uhlenbeck GE and Ford GW (1962) The theory of linear graphs with
Scott AD and Sokal AD (2005) The repulsive lattice gas, the applications to the theory of the virial development of the
independent-set polynomial, and the Lovász local lemma. properties of gases. In: de Boer J and Uhlenbeck GE (eds.) Studies
Journal of Statistical Physics 118: 1151–1261. in Statistical Mechanics, vol. I, Amsterdam: North-Holland.

Coherent States
S T Ali, Concordia University, Montreal, QC, Canada and group-theoretical properties which are taken as
ª 2006 Elsevier Ltd. All rights reserved.
starting points in looking for generalizations. We
now define the canonical coherent states mathemati-
cally and enumerate a few of these properties.
Introduction Suppose that the vectors j0i, j1i, . . . , jni, . . . , cor-
respond to quantum states of 0, 1, . . . , n, . . . , exci-
Very generally, a family of coherent states is a set of tons, respectively. The Hilbert space of these states,
continuously labeled quantum states, with specific in which they form an orthonormal basis, is often
mathematical and physical properties, in terms known as Fock space. The canonical coherent states
of which arbitrary quantum states can be expressed are then defined in terms of this basis, for each
as linear superpositions. Since coherent states are complex number z, by the analytic expansion:
continuously labeled, they form overcomplete
sets of vectors in the Hilbert space of states. 2 X1
zn
jzi ¼ ejzj =2
pffiffiffiffi jni ½1
Originally these states were introduced into physics n!
n¼0
by Schrödinger (1926), as a family of quantum
states in terms of which the transition from quantum The states jzi are normalized to unity: hzjzi = 1.
to classical mechanics could be conveniently studied. They satisfy the formal eigenvalue equation
These states have the minimal uncertainty property,
in the sense that they saturate the Heisenberg ajzi ¼ zjzi ½2
uncertainty relations. The name coherent state was where a is the annihilation operator for excitons, which
applied when these states were rediscovered in the acts on the basis vectors (Fock states) jni as follows:
context of quantum optical radiation by Glauber, pffiffiffi
Klauder, and Sudarshan. It was demonstrated that in ajni ¼ njn  1i ½3
these states the correlation functions of the quantum
optical field factorize as they do in classical optics, Its adjoint ay has the action
so that the optical field has a near-classical behavior, pffiffiffiffiffiffiffiffiffiffiffiffi
ay jni ¼ n þ 1jn þ 1i ½4
with the optical beam being coherent. In this article,
we shall refer to these originally studied coherent and
states as canonical coherent states (CCS).
The canonical coherent states, apart from their ½a; ay  ¼ aay  ay a ¼ I ½5
use in quantum optics, have also been found to be
I being the identity operator on Fock space.
extremely useful in computations in atomic and
Introducing the self-adjoint operators Q and P, of
molecular physics, in quantum statistical mechanics,
position and momentum, respectively,
and in certain areas of mathematics and mathema-
tical physics, including harmonic analysis, symplec- a þ ay a  ay
tic geometry, and quantization theory. Their wide Q ¼ pffiffiffi ; P¼ pffiffiffi ½6
2 i 2
applicability has prompted the search for other
families of states sharing similar mathematical and it is possible to demonstrate the minimal uncertainty
physical properties. These other families of states are property referred to above (we take h = 1):
usually called generalized coherent states, even when hQihPi ¼ 12 ½7
there is no link to optical coherence in such studies.
where for any observable A,
h i1=2
Some Properties of CCS hAi ¼ hzjA2 jzi  hzjAjzi2
In addition to the minimal uncertainty property, the
canonical coherent states have a number of analytical is its dispersion in the state jzi.
538 Coherent States

One can also prove the resolution of the identity, The operators U(q, p) realize a (projective) unitary,
Z irreducible representation of the Weyl–Heisenberg
dq dp
jzihzj ¼I ½8 group, which is the group whose Lie algebra has the
C 2 generators Q, P, and I, obeying the commutation
pffiffiffi
where z = (1= 2)(q  ip) has been written pffiffiffiin terms relations [Q, P] = iI. The existence of the resolution
of pits
ffiffiffi real and imaginary parts (1= 2)q and of the identity [8] is the statement of the fact that
(1= 2)p, respectively. The above operator integral this representation is square integrable (a notion
is to be understood in the weak sense, as will be which will be elaborated upon in the section ‘‘Some
explained later. Equation [8] incorporates the examples’’) which gives us the next paradigm for
mathematical fact that the set of vectors jzi is building coherent states, namely by the action, on a
overcomplete in the Hilbert space. Indeed, using [8] fixed vector, of the unitary operators of a square-
any vector ji in the Hilbert space can be written as integrable representation of a locally compact
a linear (integral) superposition of these states: group.
Z The above range of properties, which are enjoyed
dq dp by the CCS, cannot all be expected to hold when
ji ¼ ðzÞjzi
C 2 looking for generalizations. It then becomes neces-
sary to adopt one or other of these properties as the
where  is the component function, (z) = hjzi.
starting point and to proceed from there. In so
Thus, the coherent states jzi form a continuously
doing, it is best first to set down a general definition
labeled total set of vectors in the Hilbert space and
of coherent states, involving a minimal mathema-
since this space is separable, they are an over-
tical structure. Motivated more by possible applica-
complete set.
tions to physics, we do this in the following section.
Analytic properties of the vectors jzi emerge when
the scalar product hjzi is taken with respect to an
arbitrary vector ji in Fock space. From [1] it is General Definition
clear that
Let H be an abstract, separable Hilbert space over
jzj2 =2 the complexes, X a locally compact space and d a
FðzÞ ¼ hjzi ¼ e f ðzÞ
measure on X. Let jx, ii be a family of vectors in H ,
where f is an entire analytic function in the complex defined for each x in X and i = 1, 2, 3, . . . , N, where
variable z. Moreover, the mapping  7! f is an N is usually a finite integer, although it could also
isometric embedding of the Fock space onto the be infinite. We assume that this set of vectors
Hilbert space of analytic functions, with respect to possesses the following properties:
the norm
1. For each i, the mapping x 7! jx, ii is weakly
Z 1=2 continuous, that is, for each vector ji in H , the
kf k ¼ jf ðzÞj2 dðz; zÞ ½9 function i (x) = hx, iji is continuous (in the
C
topology of X).
2
defined by the measure d(z, z) = (1=2)ejzj dq dp. 2. For each x in X, the vectors jx, ii, i = 1, 2, . . . , N,
Group-theoretical properties of the CCS can be are linearly independent.
demonstrated by noting that 3. The resolution of the identity
XN Z
ðay Þn
jni ¼ pffiffiffiffi j0i and aj0i ¼ 0 jx; iihx; ijdðxÞ ¼ IH ½12
n! i¼1 X

using which [1] can be recast into the form holds in the weak sense on the Hilbert space H ,
2
that is, for any two vectors ji,j i in H , the
=2 zay
jzi ¼ ejzj e j0i ¼ UðzÞj0i following equality holds:
½10
zay
UðzÞ ¼ e  za X N Z
hjx; iihx; ij idðxÞ ¼ hj i
The vectors jzi and the unitary operator U(z) can be i¼1 X
reexpressed in terms of the real variables q, p and the
A set of vectors jx, ii satisfying the above three
operators Q, P as
properties is called a family of generalized vector
jzi ¼ jq; pi ¼ Uðq; pÞj0i coherent states. In case N = 1, the set is called a family
½11 of generalized coherent states. Sometimes the resolu-
Uðq; pÞ ¼ eiðpQqPÞ tion of the identity condition is replaced by a weaker
Coherent States 539

i
condition, with the vectors jx, ii simply forming a total defined by xx (y) = K(y, x)ei , is the image in H K of
set in H and the functions Fi (x) = hx, iji, as ji runs the generalized vector coherent state jx, ii, under the
i
through H , forming a reproducing kernel Hilbert above-mentioned isometry. The vectors xx span
space. Alternatively, the identity on the right-hand the space H K and for an arbitrary element Y of this
side of [12] could also be replaced by a bounded, Hilbert space, the reproducing property [16] of the
positive operator T with bounded inverse. In this case, kernel implies the relation
the term frame is also used for the family of general- Z
ized coherent states. For physical applications, how- Kðx; yÞYðyÞdðyÞ ¼ YðxÞ ½17
ever, the resolution of the identity condition is always X
assumed to hold, although the measure d could be of Conversely, given any reproducing kernel Hilbert
a very general nature (possibly also singular). The space, with a kernel satisfying the relations [15] and
objective in all these cases is to ensure that an arbitrary [16], generalized coherent states can be constructed
vector ji be expressible as a linear (integral) as above in terms of this kernel. Mathematically,
combination of these vectors. Indeed, [12] is immedi- therefore, generalized coherent states are just the set
ately seen to imply that of vectors naturally defined by the kernel in a
XN Z reproducing kernel Hilbert space.
ji ¼ i ðxÞjx; iidðxÞ ½13
i¼1 X

where i (x) = hx, iji. Some Examples


Associated to a family of generalized coherent
states on a Hilbert space H , there is an intrinsic We present in this section some of the more
isomorphism between this space and a Hilbert space commonly used types of coherent states, as illustra-
of (in general, vector valued) continuous functions tions of the general structure given above.
over X. Using this isomorphism, it is always possible A large class of generalizations of the canonical
to look upon coherent states as a family of coherent states [1] is obtained by a simple modifica-
continuous functions which are square integrable tion of their analytic structure. Let x1  x2     
with respect to the measure d. To demonstrate this, xn     be an infinite sequence of positive numbers
we note that, in view of [12], for each vector ji in (x1 6¼ 0). Define xn ! = x1 x2    xn and by convention
H , the vector-valued function Y(x) on x, with set x0 ! = 1. In the same Fock space in which the CCS
components i (x) = hx, iji, i = 1, 2, . . . , N, satisfies were described, we now define the related deformed
the norm condition or nonlinear coherent states via the analytic
expansion
XN Z
ji ðxÞj2 dðxÞ ¼ kk2H X1
zn
i¼1 X jzi ¼ N ðjzj2 Þ1=2 pffiffiffiffiffiffiffi jni ½18
n¼0 xn !
This means that the set of vectors Y, as ji runs
through H , is a closed subspace of the Hilbert space The normalization factor N (jzj2 ) is chosen so that
L2CN (X, d) of N-vector-valued functions on x. Let us hz j zi = 1. These generalized coherent states are
denote this subspace by H K and note that this space overcomplete in the Fock space and satisfy a
is a reproducing kernel Hilbert space with a matrix- resolution of the identity of the type
valued kernel K(x, y) having matrix elements Z
jzihzjN ðjzj2 Þdðz; zÞ ¼ I ½19
Kðx; yÞij ¼ hx; ijy; ji; i; j ¼ 1; 2; . . . ; N ½14 D

and enjoying the properties D being an open disk in the complex plane of radius
L,
P1 the n radius
pffiffiffiffiffiffiffi of convergence of the series
Kðx; yÞij ¼ Kðy; xÞji ; Kðx; xÞii > 0 ½15 n=0 (z = xn !). (In the case of the CCS, L = 1.)
The measure d is generically of the form d d(r)
and (for z = rei ), where d is related to the xn ! through
N Z
X the moment condition
Kðx; zÞi‘ Kðz; yÞ‘j dðzÞ ¼ Kðx; yÞij ½16 Z L
X xn !
‘¼1 ¼ r 2n dðrÞ; n ¼ 0; 1; 2; . . . ½20
2 0
If ei , i = 1, 2, . . . , N, are the vectors constituting the
canonical basis of CN , then for each x in X and This means that once the quantities xn ! are specified,
i
i = 1, 2, . . . , N, the vector-valued function xx on X, the measure d is to be determined by solving the
540 Coherent States

moment problem [20], which of course may not generalized coherent states arise from representa-
always have a solution. This puts a constraint on the tions of the group SU(1, 1) belonging to the discrete
type of sequences {xn } which may be used in the series, each irreducible representation being labeled
construction. by a specific value of the index . The associated
Once again, we see that for an arbitrary vector ji Hilbert space of functions, analytic on the unit disk,
in the Fock space, the function F(z) = h j zi, of the is a subspace of L2 (D, d ), with
complex variable z, is of the form F(z) =
N (jzj2 )1=2 f (z), where f is an analytic function on ð1  r2 Þ22
d ðz; zÞ ¼ ð2  1Þ r dr d
the domain D. The reproducing kernel associated to 
these coherent states is z ¼ rei

Kðz; z0 Þ ¼ hzjz0 i which can be obtained by solving the moment


h i1=2 X problem [20]. The resolution of the identity satisfied
1
ðzz0 Þn
¼ N ðjzj2 ÞN ðjz0 j2 Þ ½21 by these states is
n¼0
xn ! Z
2  1 r dr d
jzihzj ¼I ½25
By analogy with [2], one can define a generalized  D ð1  r2 Þ2
annihilation operator A by its action on the vectors jzi,
The associated generalized creation and annihilation
Ajzi ¼ zjzi ½22 operators are
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
and its adjoint operator Ay . These act on the Fock n
Ajni ¼ jn  1i
states jni as follows: 2 þ n  1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½26
pffiffiffiffiffi nþ1
Ajni ¼ xn jn  1i y
A jni ¼ jn þ 1i
pffiffiffiffiffiffiffiffiffiffi ½23 2 þ n
Ay jni ¼ xnþ1 jn þ 1i
so that, clearly, [A, Ay ] 6¼ I.
Depending on the exact values of the quantities xn , Operators A and Ay of the general type defined in
these two operators, together with the identity I and [23] are also known as ladder operators. When such
all their commutators, could generate a wide range operators appear as generators of representations of
of algebras including various deformed quantum Lie algebras, their eigenvectors (see [22]) are usually
algebras. The term nonlinear, as often applied to called Barut–Girardello coherent states. As an example,
these generalized coherent states, comes again from the representation of the Lie algebra of SU(1,1) on the
quantum optics, where many such families of states Fock space is generated by the three operators Kþ , K ,
are used in studying the interaction between the and K3 , which satisfy the commutation relations
radiation field and atoms, and the strength of the
interaction itself depends on the frequency of ½K3 ; K  ¼ K ; ½K ; Kþ  ¼ 2K3 ½27
radiation. Of course, these coherent states will not They act on the vectors jni as follows:
in general have either the group-theoretical or the pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
minimal uncertainty properties of the CCS. K jni ¼ nð2 þ n  1Þjn  1i
The following is an example of generalized Kþ ¼ Ky ½28
coherent states of the above type, built over the
K3 jni ¼ ð þ nÞjni
unit disk, D = {z 2 C j jzj < 1}: on the Fock space,
we define the states Thus, K j0i = 0 and
X1   1
ð2Þn 1=2 n jni ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Knþ j0i
jzi ¼ ð1  r2 Þ z jni r ¼ jzj ½24 n!ð2Þn
n¼0
n!
The Barut–Girardello coherent states jzi are now
where  = 1, 3=2, 2, 5=2, . . . , and
defined as the formal eigenvectors of the ladder
ða þ mÞ operator K :
ðaÞm ¼
ðaÞ K jzi ¼ zjzi; z2C ½29
¼ aða þ 1Þða þ 2Þ    ða þ m  1Þ
They have the analytic form
Comparing [24] with [18] we see that xn = n=(2 þ X
jzj21 1
zn
n  1) so that limn ! 1 xn = 1. Thus, the infinite sum jzi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jni ½30
is convergent for any z lying in the unit disk. These I21 ð2jzjÞ n¼0 n!ð2 þ n  1Þ!
Coherent States 541

where I (x) is the order- modified Bessel function independent of whether the left- or the right-invariant
of the first kind. These coherent states satisfy the measure is used, so we could just as well have used
resolution of the identity, the right-invariant measure.) A vector j i, satisfying
Z [35], is said to be admissible, and it can be shown
2
jzihzjK21 ð2rÞI21 ð2rÞr dr d ¼ I that the existence of one such vector guarantees the
 C ½31 existence of an entire dense set of such vectors in H .
z ¼ rei Moreover, if the group G is unimodular, that is, if the
left- and the right-invariant measures coincide, then
where again, K (x) is the order- modified Bessel the existence of one admissible vector implies that
function of the second kind. every vector in H is admissible. Given a square-
A nonanalytic extension of the expression [18] is integrable representation and an admissible vector
often used to define generalized coherent states j i, let us define the vectors
associated to physical Hamiltonians having pure
point spectra. These coherent states, known as 1
Gazeau–Klauder coherent states, are labeled by jgi ¼ pffiffiffiffiffiffiffiffiffi UðgÞj i ½36
cð Þ
action–angle variables. SupposePthat we are given
the physical Hamiltonian H = 1 n = 0 En jnihnj, with for all g in the group G. These vectors are to be seen
E0 = 0, that is, it has the energy eigenvalues En and as the analogs of the canonical coherent states [11],
eigenvectors jni, which we assume to form an written there in terms of the representation of the
orthonormal basis for the Hilbert space of states H . Weyl–Heisenberg group. Next, it can be shown that
Let us write the eigenvalues as En = ! n by introdu- the resolution of the identity
cing a sequence of dimensionless quantities { n } Z
ordered as: 0 = 0 < 1 < 2 <    . Then, for all J  0 jgihgjdðgÞ ¼ IH ½37
and
2 R, the Gazeau–Klauder coherent states are G
defined as
holds on H . Thus, the vectors jgi constitute a family
X1 n=2 i n

J e of generalized coherent states. The functions


jJ;
i ¼ N ð JÞ1=2 pffiffiffiffiffiffi jni ½32 F(g) = hgji for all vectors ji in H are square
k¼0
n !
integrable with respect to the measure d and the
where again N is a normalization factor, which set of such functions, which in fact are continuous in
turns out to be dependent on J only. These coherent the topology of G, forms a closed subspace of
states satisfy the temporal stability condition L2 (G, d). Furthermore, the mapping  7! F is a
linear isometry between H and L2 (G, d) and under
eiHt j J;
i ¼ j J;
þ !ti ½33 this isometry the representation U gets mapped to a
subrepresentation of the left regular representation
and the action identity
of G on L2 (G, d).
h J;
jHj J;
iH ¼ !J ½34 A typical example of the above construction is
provided by the affine group, GAff . This is the group
While these generalized coherent states do form an of all 2  2 matrices of the type
overcomplete set in H , the resolution of the identity
 
is generally not given by an integral relation of the a b
type [12]. g¼ ½38
0 1
For the second set of examples of generalized
coherent states, we take the group-theoretical structure a and b being real numbers with a 6¼ 0. We shall
of the CCS as the point of departure. Let G be a also write g = (b, a). This group is nonunimodular,
locally compact group and suppose that it has a with the left-invariant measure being given by
continuous, irreducible representation on a Hilbert d(b, a) = (1=a2 ) db da. (The right-invariant measure
space H by unitary operators U(g), g 2 G. This is (1=a) db da.) The affine group has a unitary
representation is called square integrable if there exists irreducible representation on the Hilbert space
a nonzero vector j i in H for which the integral L2 (R, dx). Vectors in L2 (R, dx) are measurable
Z functions (x) of the real variable x and the
cð Þ ¼ jh jUðgÞ ij2 dðgÞ ½35 (unitary) operators U(b, a) of this representation
G act on them in the manner
converges. Here d is a Haar measure of G, which  
1 xb
for definiteness, we take to be the left-invariant ðUðb; aÞÞðxÞ ¼ pffiffiffiffiffiffi  ½39
measure. (The value of the above integral is jaj a
542 Coherent States

If is a function in L2 (R, dx) such that its Fourier Choosing a coset representative g(x) 2 G, for each
transform b satisfies the condition coset x, we define the vectors
Z jxi ¼ UðgðxÞÞj i ½45
j bðkÞj2
dk < 1 ½40
R jkj in H . The dependence of these vectors on the specific
choice of the coset representative g(x), is only
then it can be shown to be an admissible vector, that is, through a phase. Thus, if instead of g(x) we took a
Z different representative g(x)0 2 G for the same coset
db da
cð Þ ¼ jh jUðb; aÞ ij2 <1 x, then since g(x)0 = g(x)h for some h 2 H, in view of
GAff a2 [44] we would have U(g(x)0 )j i = ei!(h) jxi. Hence,
quantum mechanically, both jxi and U(g(x)0 )j i
Thus, following the general construction outlined
represent the same physical state and in particular,
above, the vectors
the projection operator jxihxj depends only on the
1 coset. Vectors jxi, defined in this manner, are called
jb; ai ¼ pffiffiffiffiffiffiffiffiffi Uðb; aÞ ; ðb; aÞ 2 GAff ½41 Gilmore–Perelomov coherent states. Since U is
cð Þ
assumed to be irreducible, the set of all these vectors
as x runs through G=H is dense in H . In this
define a family of generalized coherent states and
definition of generalized coherent states, no resolu-
one has the resolution of the identity
tion of the identity is postulated. However, if X
Z carries an invariant measure, under the natural
db da
jb; aihb; aj ¼I ½42 action of G, and if the formal operator B defined as
GAff a2
Z
on L2 (R, dx). B¼ jxihxj dðxÞ
X
In the signal-analysis literature a vector satisfying
the admissibility condition [40] is called a mother is bounded, then it is necessarily a multiple of the
wavelet and the generalized coherent states [41] are identity and a resolution of the identity is again
called wavelets. Signals are then identified with retrieved.
vectors ji in L2 (R, dx) and the function The Perelomov construction can be used to define
coherent states for any locally compact group. On
Fðb; aÞ ¼ hb; aji ½43 the other hand, there exist other constructions of
generalized coherent states, using group representa-
is called the continuous wavelet transform of the tions, which generalize the notion of square integr-
signal . ability to homogeneous spaces of the group. Briefly,
There exist alternative ways of constructing in this approach one starts with a unitary irreducible
generalized coherent states using group representa- representation U and attempts to find a vector j i, a
tions. For example, the Perelomov method is based subgroup H and a section : G=H ! G such that
on the observation that the vector j0i, appearing in Z
the construction of the canonical coherent states in
jxihxj dðxÞ ¼ T ½46
[10] and [11] using the representation of the Weyl– G=H
Heisenberg group, is invariant up to a phase, under
the action of its center. Consequently, the coherent where jxi = U( (x))j i, T is a bounded, positive
states jzi, as written in [10], are labeled, not by operator with bounded inverse and d is a quasi-
elements of the group itself, but only by the points in invariant measure on X = G=H. It is not assumed
the quotient space of the group by its (central) phase that j i be invariant up to a phase under the action
subgroup. Generally, let G be a locally compact of H and clearly, the best situation is when T is a
group and U a unitary irreducible representation of multiple of the identity. Although somewhat techni-
it on the Hilbert space H . We do not assume U to be cal, this general construction is of enormous
square integrable. We fix a vector j i in H , of unit versatility for semidirect product groups of the type
norm and denote by H the subgroup of G consisting Rn o K, where K is a closed subgroup of GL(n, R).
of all elements h for which Thus, it is useful for many physically important
groups, such as the Poincaré or the Euclidean group,
UðhÞj i ¼ ei!ðhÞ j i ½44 which do not have square-integrable representations
in the sense of the earlier definition (see eqn [35]).
where ! is a real-valued function of h. Let X = G=H The integral condition [46] ensures that any vector
be the left-coset space and x an arbitrary element in X. ji in H can be written in terms of the jxi. Indeed, it
Coherent States 543

is easy to see that one has the integral representation taking the combination Q þ iP, one obtains the
of a vector, minimal uncertainty states,
Z pffiffi
y 2 y
ji ¼ ðxÞjxi dðxÞ jz; i ¼ N ðz; Þ1=2 ewða Þ =2 eðz= 2Þð1þwÞa j0i ½50
X
ðxÞ ¼ hxjT 1 i N (z, ) being a normalization constant and
w = (1  )=(1 þ ). The case  = 1 does not lead
in terms of the generalized coherent states. to any solutions, while  = 1 gives the canonical
The canonical coherent states satisfy the minimal coherent states [10]. For real  6¼ 1 the above states
uncertainty relation [7]. It is possible to build are the well-known squeezed states of quantum
families of coherent states by generalizing from this optics.
condition. To do this, one typically starts with two Our final example is that of a family of vector
self-adjoint generators in the Lie algebra of a coherent states, which will be obtained essentially
particular group representation and then looks for by replacing the complex variable z in [18] by a
appropriate eigenvectors of a complex combination matrix variable. We choose the domain  = C22
of these two generators. For two self-adjoint (all 2  2 complex matrices), equipped with the
operators B and C on a Hilbert space H , satisfying measure
the commutation relation [B, C] = iD and any
y
normalized vector  in H , one can prove the y etr½Z Z  Y
2

Heisenberg uncertainty relation dðZ ; Z Þ ¼ dxkj ^ dykj


4 k;j¼1
hDi2
ðBÞ2 ðCÞ2  ½47 where Z is an element of  and zkj = xkj þ iykj are its
4
entries. One can then prove the matrix orthogon-
where hXi = hjXi and ðXÞ2 = hX2 i  hXi2 , for ality relation
any operator X on H . More generally, one can prove Z
the Schrödinger–Robertson uncertainty relation Z k Z y‘ dðZ ; Z y Þ
1h i 
Z
ðBÞ2 ðCÞ2  hDi2 þ hFi2 ½48 ¼
1
tr½Z k Z y‘  dðZ ; Z y ÞI2
4 2 
where hFi = hBC þ CBi  2hBihCi measures the ¼ bðkÞI2 ; k; ‘ ¼ 0; 1; 2; . . . ; 1 ½51
correlation between B and C in the state .
If hFi = 0, the above relation reduces to the I2 being the 2  2 identity matrix and
Heisenberg uncertainty relation. On the other
hand, if hDi = 0, the Heisenberg uncertainty rela- ðk þ 3Þ!
bðkÞ ¼
tions become redundant. Suppose now that B and 2ðk þ 1Þðk þ 2Þ ½52
C are two self-adjoint elements of the Lie algebra in k ¼ 1; 2; 3; . . . ; bð0Þ ¼ 1
the unitary irreducible representation of a Lie group
and we look for states ji which minimize the Consider the Hilbert space H~ = L2C2 (, d) of square
uncertainty relation [48], that is, for which integrable, two-component vector-valued functions
the equality holds. It turns out that such states on  and in it consider the vectors jY ik i, i = 1, 2,
can be found by considering the linear combination k = 0, 1, 2, . . . ,1, defined by the C2 -valued
B þ iC, for a fixed complex number , and solving functions,
the formal eigenvalue equation
1
Y ik ðZ y Þ ¼ pffiffiffiffiffiffiffiffiffiffi Z yk i ½53
½B þ iCjz; i ¼ zjz; i bðkÞ
½49
with z ¼ hBi þ ihCi
where the vectors i , i = 1, 2, form an orthonormal
Solutions to this equation for which jj = 1 are basis of C2 . By virtue of [51], the vectors jYik i
called squeezed states, since in this case B 6¼ C. constitute an orthonormal set in H~ , that is,
Generally, the states jz, i are known as intelligent
states. As an example, for the operators Q and P in hY ik jY j‘ iH~ ¼ k‘ ij
[6], for which one has
Denote by H K the Hilbert subspace of H~ generated
1h i
by this set of vectors. This can be shown to be a
ðQÞ2 ðPÞ2  1 þ hFi2
4 reproducing kernel Hilbert space of analytic
544 Coherent States

functions in the variable Z y , with the matrix valued As already mentioned, generalized coherent states
kernel K :    7! C22 : are widely used in signal analysis. The wavelet
transform F(b, a) = hb, aji, introduced in [43], is a
2 X
X 1
KðZ 0y ; Z Þ ¼ Y ik ðZ 0y ÞY ik ðZ y Þy time–frequency transform, in which the parameter b
i¼1 k¼0 is identified with time and 1/a with frequency.
2 X
X Wavelet transforms are used extensively to analyze,
1
Z 0yk Z k
¼ ½54 encode, and reconstruct signals arising in many
i¼1 k¼0
bðkÞ different branches of physics, engineering, seismo-
graphy, electronic data processing, etc. Similarly, the
Vector coherent states in H K are then naturally
canonical coherent states, as written in [11], give
associated to this kernel and are given by
rise to the transform F(q, p) = hq, p j i. Again, if q is
X2 X 1
jy Z k i j interpreted as time and p as frequency, then this is
jZ ; ii ¼ pffiffiffiffiffiffiffiffiffiffi jY k i just the windowed Fourier transform, also used
j¼1 k¼0 bðkÞ ½55
extensively in signal processing. More general
0y 0y
that is; jZ ; iiðZ Þ ¼ KðZ ; Z Þ i wavelets, from higher-dimensional affine groups,
are used to analyze higher-dimensional signals,
for i = 1, 2 and all Z in . They satisfy the resolution while wavelet like transforms from other groups
of the identity have been used to study signals exhibiting different
X2 Z geometries. In particular, wavelet transforms from
jZ ; iihZ ; ijdðZ ; Z y Þ ¼ IH K ½56 spherical geometries have been applied to the study
i¼1  of brain signals and to astrophysical data.
Our final example is taken from quantization
The expression for the jZ , ii in [55], involving the theory. A quantization technique is a method for
sum, should be compared to [18], of which it is a performing the transition from a given classical
direct analog. mechanical system to its quantum counterpart.
Many methods have been developed to accomplish
this and the use of coherent states is one of them.
Some Applications of Coherent States Suppose that we are given a family of coherent
states jxi in a Hilbert space H , where the set X from
Generalized coherent states have many applications
which x is taken is a classical phase space. This
in physics, signal analysis, and mathematics, of
means that X is a symplectic manifold with an
which we mention a few here. As an example of
associated 2-form !, which defines a Poisson
an application of deformed coherent states, we take
bracket on the set of observables of the classical
 n  system, which are real-valued functions on X. There
q  qn 1=2
xn ¼ ; q>0 ½57 is a natural measure d!, defined on X by the 2-form
q  q1
!. Let us assume that the coherent states jxi satisfy a
in the definition of these states in [18]. It is then easy resolution on the identity with respect to this
to see that the operators A and Ay , defined in [23], measure:
satisfy the q-deformed commutation relation
Z
y y N
AA  qA A ¼ q ½58 jxihxjd!ðxÞ ¼ IH
X
where N is the usual number operator, which acts
on the Fock states as Njni = njni. Clearly, in the In this case, the coherent states may be used to
limit as q ! 1, these q-deformed coherent states go quantize the observables of the classical system in
over to the canonical coherent states, with the the following way: let f be a real-valued function on
operators A and Ay becoming the usual creation X, representing a classical observable and suppose
and annihilation operators a and ay , respectively. that the formal operator
The operators A and Ay and the commutation Z
relation [58] describe a system of q-deformed b
f ¼ f ðxÞjxihxjd!ðxÞ ½59
oscillators, which have been used to describe, for X
example, the vibrations of polyatomic molecules.
The potential energy between the atoms of such is well defined as a self-adjoint operator on H . Then
a molecule has anharmonic terms, leading to we may take the operator b f to be the quantized
a deformation of the usual oscillator algebra, observable corresponding to the classical observable
generated by the operators a and ay . f. Suppose that we have two such operators, b f and b
g,
Cohomology Theories 545

corresponding to the two classical observables f and It can be verified that these two operators satisfy the
g, which have the Poisson bracket {f , g}, defined via canonical commutation relations [Q, P] = iIH , as
the 2-form !. We then check if the quantization required.
condition
2 b See also: Solitons and Kac–Moody Lie Algebras;
ff;d
gg ¼ ½f ; b
g ½60 Wavelets: Mathematical Theory.
ih
where h is Planck’s constant, is satisfied. Generally
this will be the case for a certain number of classical
Further Reading
observables. This method of quantization has been
most successfully used for manifolds X which have a Ali ST, Antoine J-P, and Gazeau J-P (2000) Coherent States,
(complex) Kähler structure. Over such a manifold, Wavelets and Their Generalizations. New York: Springer.
one can define a Hilbert space of analytic functions, Ali ST and Engliš M (2005) Quantization methods – a guide for
physicists and analysts. Reviews in Mathematical Physics
which has a reproducing kernel and hence a 17: 391–490.
naturally associated set of coherent states. As a Brif C (1997) SU(2) and SU(1,1) algebra eigenstates: a unified
specific example, we take the case of canonical analytic approach to coherent and intelligent states. Interna-
coherent states [11]. We can identify the complex tional Journal of Theoretical Physics 36: 1651–1682.
plane C with the phase space R2 of a free classical Klauder JR and Sudarshan ECG (1968) Fundamentals of
Quantum Optics. New York: Benjamin.
particle having a single degree of freedom. The Klauder JR and Skagerstam BS (1985) Coherent States –
measure d! in this case is just (1=2)dq dp. If we Applications in Physics and Mathematical Physics. Singapore:
now quantize the classical observables f (q, p) = q World Scientific.
and f (q, p) = p, of position and momentum, respec- Perelomov AM (1986) Generalized Coherent States and their
tively, using the canonical coherent states, we obtain Applications. Berlin: Springer.
Schrödinger E (1926) Der stetige Übergang von der Mikro- zur
the two operators Makromechanik. Naturwissenschaften 14: 664–666.
Z Sivakumar S (2000) Studies on nonlinear coherent states. Journal
dq dp
Q¼ qjq; pihq; pj of Optics B: Quantum Semiclass. Opt. 2: R61–R75.
2 2 Zhang W-M, Feng DH, and Gilmore RG (1990) Coherent states:
ZR ½61
dq dp theory and some applications. Reviews of Modern Physics
P¼ pjq; pihq; pj 62: 867–927.
R2 2

Cohomology Theories
U Tillmann, University of Oxford, Oxford, UK To illustrate the interplay between the local and
ª 2006 Elsevier Ltd. All rights reserved. global structure, consider the Euler characteristic of
a compact manifold; as will be explained below,
cohomology is a refinement of the Euler character-
Introduction istic. For simplicity, assume that the manifold M is a
surface and that we have chosen a way of dividing
The origins of cohomology theory are found in the surface into triangles. The Euler characteristic is
topology and algebra at the beginning of the last then defined to be
century but since then it has become a tool of nearly
every branch of mathematics. It’s a way of life! ðMÞ ¼ F  E þ V
Naturally, this article can only give a glimpse at the where F denotes the number of faces, E the number
rich subject. We take here the point of view of of edges, and V the number of vertices in the
algebraic topology and discuss only the cohomology triangulation. Remarkably, this number does not
of spaces. depend on the triangulation. Yet, this simple, easy to
Cohomology reflects the global properties of a compute number can already distinguish the differ-
manifold, or more generally of a topological space. ent types of closed, oriented surfaces: for the sphere
It has two crucial properties: it only depends on the we have = 2, the torus = 0, and in general for
homotopy type of the space and is determined by any surface Mg of genus g
local data. The latter property makes it in general
computable. ðMg Þ ¼ 2  2g
546 Cohomology Theories

The Euler characteristic also tells us something Z, C2 , C1 , C0 are the free abelian groups generated
about the geometry and analysis of the manifold. For by the set of faces, edges, and vertices, respectively;
example, the total curvature of a surface is equal to its Ci = {0} for i  3. The map @2 assigns to a triangle
Euler characteristic. This is the Gauss–Bonnet theo- the sum of its edges; @1 maps an edge to the sum of
rem and an analogous result holds in higher dimen- its endpoints. If we are working with Z2 coeffi-
sions. Another striking result is the Poincaré–Hopf cients, this defines for us a chain complex as [2] is
theorem which equates the Euler characteristic with clearly satisfied; in general, one needs to keep track
the total index of a vector field and thus gives strong of the orientations of the triangles and edges and
restrictions on what kind of vector fields can exist on take sums with appropriate signs (cf. [6] below). An
a manifold. This interplay between global analysis easy calculation shows that for an oriented, closed
and topology has been one of the most exciting and surface Mg of genus g, we have
fruitful research areas and is most powerfully
H0 ðMg ; ZÞ ¼ Z
expressed in the celebrated Atiyah–Singer index
theorem, which determines the analytic index of an H1 ðMg ; ZÞ ¼ Z2g
½4
elliptic operator, such as the Dirac operator on a spin H2 ðMg ; ZÞ ¼ Z
manifold, in terms of cohomology classes.
Hi ðMg ; ZÞ ¼ 0 for i  3
Note that the Euler characteristic can be recov-
Chain Complexes and Homology ered as the alternating sum of the rank of the
homology groups:
There are several different geometric definitions of
the cohomology of a topological space. All share XM
dim

some basic algebraic structure which we will explain ðMÞ ¼ ð1Þi rk Hi ðM; ZÞ ½5
i¼0
first.
A ‘‘chain complex’’ (C , @ ) Every smooth manifold M has a triangulation, so
@iþ1 @i @1
that its simplicial homology can be defined just as
   Ciþ1 ! Ci ! Ci1    ! C0 ½1 above. More generally, simplicial homology can be
is a collection of vector spaces (or R-modules more defined for any simplicial space, that is, a space that
generally) Ci , i  0, and linear maps (R-module is built up out of points, edges, triangles, tetrahedra,
maps) @i : Ci ! Ci1 with the property that for all i etc. Formula [5] remains valid for any compact
manifold or simplicial space.
@i  @iþ1 ¼ 0 ½2
The scalar fields one tends to consider are the Singular Homology
rationals Q, reals R, complex numbers C, or a
Let X be any topological space, and let 4n be the
primary field Zp , while the most important ring R is
oriented n-simplex [v0 , . . . , vn ] spanned by the
the ring of integers Z though we will also consider
standard basis vectors vi in R nþ1 . The set of singular
localizations such as Z[1=p], which has the effect of
n-chains Sn (X) is the free abelian group on the set of
suppressing any p-primary torsion information.
continuous maps  : 4n ! X. The boundary of  is
Of particular interest are the elements in Ci that are
defined by the alternating sum of the restriction of 
mapped to zero by @i , the i-dimensional ‘‘cycles,’’ and
to the faces of 4n :
those that are in the image of @iþ1 , the i-dimensional
‘‘boundaries.’’ Because of [2], every boundary is a X
n

cycle, and we may define the quotient vector space @n ðÞ :¼ ð1Þi j½v0 ;...;^vi ;...;vn  ½6
i¼0
(R-module), the ith-dimensional homology,
One easily checks that the boundary of a boundary is
ker@i zero, and hence (S (X), @ ) defines a chain complex.
Hi ðC ; @ Þ :¼ ½3
im@iþ1 Its homology is by definition the singular homology
(C , @ ) is ‘‘exact’’ if all its cycles are boundaries. H (X; Z) of X. For any simplicial space, the inclusion
Homology thus measures to what extent the of the simplicial chains into the singular chains
sequence [1] fails to be exact. induces an isomorphism of homology groups. In
particular, this implies that the simplicial homology
of a manifold, and hence its Euler characteristic do
Simplicial Homology
not depend on its triangulation.
A triangulation of a surface gives rise to its If in the definition of simplicial and singular
‘‘simplicial’’ chain complex: Taking coefficients in homology we take free R-modules (where R may
Cohomology Theories 547

also be a field) instead of free abelian groups, we get and b in B to @c := @ n a = @n b. For example,
the homology H (X; R) of X with coefficients in R. consider two cones, A and B, on a space X and
The ‘‘universal-coefficient theorem’’ describes the identify them at the base X to define the suspension
homology with arbitrary coefficients in terms of the X of X. Then X = A [ B with A, B ’ pt and A \
homology with integer coefficients. In particular, if R B ’ X. The boundary map @ is then an isomorphism:
is a field of characteristic zero,
~ n ðX; RÞ ’ Hnþ1 ðX; RÞ for all n  0
H ½7
dim Hn ðX; RÞ ¼ rk Hn ðX; ZÞ
From this one can easily compute the homology of a
sphere. First note that
Basic Properties of Singular Homology ~ 0 ðX; ZÞ ¼ Zk1
H
While simplicial homology (and the more efficient
where k is the number of connected components in
cellular homology which we will not discuss) is
X. Also, Sn ’ Sn1 ’    ’ n S0 . Thus, by [7],
easier to compute and easier to understand geome-
trically, singular homology lends itself more easily to ~  ðSn ; ZÞ ¼ 0 for  6¼ n
Hn ðSn ; ZÞ ’ Z and H ½8
theoretical treatment.
If Y is a subspace of X, relative homology groups
1. Homotopy invariance. Any continuous map H (X, Y; R) can be defined as the homology of the
f : X ! Y induces a map on homology quotient complex S (X)=S (Y). When Y has a good
f : H (X; R) ! H (Y; R) which only depends on neighborhood in X (i.e., it is a neighborhood
the homotopy class of f. deformation retract in X), then, by the ‘‘excision
In particular, a homotopy equivalence f : X ! Y theorem,’’
induces an isomorphism in homology. So, for exam- ~  ðX=Y; RÞ
H ðX; Y; RÞ ’ H
ple, the inclusion of the circle S1 into the punctured
plane Cn{0} is a homotopy equivalence, and thus where X=Y denotes the quotient space of X with Y
identified to a point. There is a long exact sequence
Hi ðCnf0g; RÞ ’ Hi ðS1 ; RÞ
    ! Hn ðY; RÞ ! Hn ðX; RÞ ! Hn ðX; Y; RÞ
Z for i ¼ 0; 1
¼ @
0 for i  2 ! Hn1 ðY; RÞ !    ! H0 ðX; Y; RÞ ! 0
For the one point space we have H0 (pt; R) = R. Define This and the Mayer–Vietoris sequence give two ways of
reduced homology by H ~  (X; R) := ker(H (X; R) ! breaking up the problem of computing the homology of
H (pt; R)). a space into computing the homology of related spaces.
~ i (pt; R) = 0 for all i. An iteration of this process leads to the powerful tool of
2. Dimension axiom. H
spectral sequences (see Spectral Sequences).
More generally, it follows immediately from the
definition of simplicial homology that the homology
of any n-dimensional manifold is zero in dimensions Relation to Homotopy Groups
larger than n. Let 1 (X, x0 ) denote the fundamental group of X
We mentioned in the introduction that homology relative to the base point x0 . These are the based
depends only on local data. This is made precise homotopy classes of based maps from a circle to X.
by the
If X is connected; then H1 ðX; ZÞ is
3. Mayer–Vietoris theorem. Let X = A [ B be the ½9
the abelianization of 1 ðX; x0 Þ
union of two open subspaces. Then the following
sequence is exact: Indeed, every map from a (triangulated) sphere to
X defines a cycle and hence gives rise to a homology
   !Hn ðA \ B; RÞ ! Hn ðA; RÞ  Hn ðB; RÞ
class. This defines the Hurewicz map h :  (X; x0 ) !
@
! Hn ðX; RÞ! Hn1 ðA \ B; RÞ H (X; Z). In general there is no good description of
its image. However, if X is k-connected with k  1,
!    ! H0 ðX; RÞ ! 0
then h induces an isomorphism in dimension k þ 1
On the level of chains, the first map is induced by the and an epimorphism in dimension k þ 2.
diagonal inclusion, while the second map takes the Though [9] indicates that homology cannot distin-
difference between the first and second summands. guish between all homotopy types, the fundamental
Finally, @ takes a cycle c = a þ b in the chains of X group is in a sense the only obstruction to this.
that can be expressed as the sum of a chain a in A A simple form of the ‘‘Whitehead theorem’’ states:
548 Cohomology Theories

Theorem If a map f : X ! Y between two simpli- an associative, graded commutative ring: u [ v =


cial complexes with trivial fundamental groups (1)deg u deg v v [ u.
induces an isomorphism on all homology groups, The ‘‘Künneth theorem’’ gives some geometric
then it is a homotopy equivalence. intuition for the cup product. A simple version
states: for spaces X and Y with H  (Y; R) a finitely
Warning: This does not imply that two simply
generated free R-module, the cup product defines an
connected spaces with isomorphic homology groups
isomorphism of graded rings
are homotopic! The existence of the map f inducing
this isomorphism is crucial and counterexamples can H  ðX; RÞ R H  ðY; RÞ ! H  ðX
Y; RÞ
easily be constructed.
For example, for a sphere, all products are trivial for
dimension reasons. Hence,
Dual Chain Complexes and Cohomology ^
H  ðSn ; ZÞ ¼ ðxÞ ½12
The process of dualizing itself cannot be expected to
yield any new information. Nevertheless, the coho- is an exterior algebra on one generator x of degree
mology of a space, which is obtained by dualizing its n. On the other hand, the cohomology of the
simplicial chain complex, carries important addi- n-dimensional torus T n is an exterior algebra on
tional structure: it possesses a product, and more- n degree-1 generators,
over, when the coefficients are a primary field, it is ^
an algebra over the rich Steenrod algebra. As with H  ðT n ; ZÞ ¼ ðx1 ; . . . ; xn Þ ½13
homology we start with the algebraic setup. The dual pairing can be generalized to the slant or
Every chain complex (C , @ ) gives rise to a dual cap product
chain complex (C , @  ) where Ci = homR (Ci , R) is
the dual R-module of Ci ; because of [2], the \ : Hn ðX; RÞ R H i ðX; RÞ ! Hni ðX; RÞ
composition of two dual boundary morphisms
@ iþ1 : Ci ! Ciþ1 is trivial. Hence we may define the defined on the chain level by the formula
ith dimensional cohomology group as (, ) 7! (j[v0 ,..., vi ] )j[vi ,..., vn ] .

ker @ iþ1
H i ðC ; @  Þ :¼ ½10 Steenrod Algebra
im @ i
Evaluation (, ) 7! () descends to a dual pairing The cup product on the chain level is homotopy
commutative, but not commutative. Steenrod used
Hn ðC ; @ Þ R H n ðC ; @  Þ!R this defect to define operations
and when R is a field, this identifies the cohomology
Sqi : Hn ðX; Z2 Þ ! H nþi ðX; Z2 Þ
groups as the duals of the homology groups. More
generally, the universal-coefficient theorem relates for all i  0 which refine the cup-squaring opera-
the two. A simple version states: let (C , @ ) be a tion: when n = i, then Sqn (x) = x [ x. These are
chain complex of free abelian groups (such as the natural group homomorphisms which commute
simplicial or singular chain complexes) with finitely with suspension. Furthermore, they satisfy the
generated homology groups. Then, Cartan and Adem Relations
X
Hi ðC ; @  Þ ’ Hifree ðC ; @ Þ  Hi1
tor
ðC ; @Þ ½11 Sqn ðx [ yÞ ¼ Sqi ðxÞ [ Sqj ðyÞ
iþj¼n
where Htor denotes the torsion subgroup of H and
!
Hfree denotes the quotient group H =Htor . ½i=2
X jk1
i j
Sq Sq ¼ Sqiþjk Sqk
Singular Cohomology k¼0 i  2k

The dual S (X) of the singular chain complex of a for i 2j
space X carries a natural pairing, the cup product,
The mod-2 Steenrod algebra A is then the free
[ : Sp (X) Sq (X) ! Spþq (X) defined by
Z2 -algebra generated by the Steenrod squares
ð1 [ 2 ÞðÞ Sqi , i  0, subject only to the Adem relations. With
:¼ 1 ðj½v0 ;...;vp  Þ2 ðj½vp ;...;vpþq  Þ the help of Adem’s relations, Serre and Cartan found
a Z2 -basis for A:
This descends to a multiplication
L on cohomology
groups and makes H  (X; R):= n0 Hn (X; R) into fSqI :¼ Sqi1    Sqin jij  2ijþ1 for all jg
Cohomology Theories 549

The Steenrod algebra is also a Hopf algebra with where sign(p0 ) is þ1 or 1 depending on whether f is
a commutative comultiplication  : A ! A A orientation preserving or reversing in a neighbor-
induced by hood of p0 . For example, a complex polynomial of
X degree d defines a map of the two-dimensional
ðSqn Þ :¼ Sqi Sqj sphere to itself of degree d: a generic point has n
iþj¼n points in its inverse image and the map is locally
The Cartan relation implies that the mod-2 orientation preserving. On the other hand, a map of
cohomology of a space is compatible with the Sn1 induced by a reflection of Rn reverses orienta-
comultiplication, that is, H  (X; Z2 ) is an algebra tion and has degree 1. Thus, as degrees multiply on
over the Hopf algebra A. There are odd primary composing maps, the antipodal map x 7! x has
analogs of the Steenrod algebra based on the degree (1)n . As an application we prove:
reduced pth power operations Every tangent vector field on an even-dimensional
sphere Sn1 has a zero.
Pi : H n ðX; Zp Þ ! H nþ2iðp1Þ ðX; Zp Þ
Proof Assume v(x) is a vector field which is nonzero
with similar properties to A. for all x 2 Sn1 . Then x is perpendicular to v(x), and
One of the most striking applications of the after rescaling, we may assume that v(x) has length 1.
Steenrod algebra can be found in the work of The function F(x, t) = cos (t)x þ sin (t) v(x) is a well-
Adams on the ‘‘vector fields on spheres problem’’: defined homotopy from the identity map (t = 0) to
for each n, find the greatest number k, denoted K(n), the antipodal map (t = ). But this is impossible as
such that there is a k-field on the (n  1)-sphere Sn1 . homotopic maps induce the same map in (co)homo-
Recall that a k-field is an ordered set of k pointwise logy and we have already seen that the degree of the
linear independent tangent vector fields. If we write n identity map is 1 while the degree of the antipodal
in the form n = 24aþb (2s þ 1) with 0 b < 4, Adams map is (1)n = 1 when n is odd.
proved that K(n) = 2b þ 8a  1. In particular, when n It is well known that two self-maps of a sphere of
is odd, K(n) = 0. We give an outline of the proof for
any dimension are homotopic if and only if they
this special case in the next section.
have the same degree, that is, n (Sn ) ’ Z for n  1.
The failure of associativity of the cup product at When M is not orientable, [M] still defines a cycle
the chain level gives rise to secondary operations, in homology with Z2 -coefficients, and [M]\
the so-called ‘‘Massey products.’’ defines an isomorphism between the cohomology
and homology with Z2 coefficients.
As [M] represents a homology class, so does every
other closed (orientable) submanifold of M. It is
Cohomology of Smooth Manifolds however not the case that every homology class
A smooth manifold M of dimension n can be can be represented by a submanifold or linear
triangulated by smooth simplices  : n ! M. If M combinations of such.
is compact, oriented, without boundary, the sum of
Cohomology is a contravariant functor. Poincaré
these simplices define a homology cycle [M], the
duality however allows us to define, for any f : M0 ! M
fundamental class of M. The most remarkable
between oriented, compact, closed manifolds of arbi-
property of the cohomology of manifolds is that
trary dimensions, a ‘‘transfer’’ or ‘‘Umkehr map,’’
they satisfy ‘‘Poincaré duality’’: taking cap product
with [M] defines an isomorphism: f ! :¼ D1 f D0 : H ðM0 ; ZÞ ! H c ðM; ZÞ

D:¼ ½M\ : H k ðM; ZÞ ! Hnk ðM; ZÞ for all k ½14 which lowers the degree by c = dim M0  dim M. It
satisfies the formula
In particular, for connected manifolds, H n (M; Z) ’ Z;
and every map f : M0 ! M between oriented, compact f ! ðf  ðxÞ [ yÞ ¼ x [ f ! ðyÞ
closed manifolds of the same dimension has a degree:
for all x 2 H  (M; Z) and y 2 H  (M0 ; Z). When f is a
f  : H  (M; Z) ! H  (M0 ; Z) is multiplication by an
covering map then f ! can be defined on the chain
integer deg(f ), the degree of f. For smooth maps, the
level by
degree is the number of points in the inverse image of
X 
a generic point p 2 M counted with signs: f ! ðxÞðÞ :¼ x ~
X f ð~
Þ¼
degðf Þ ¼ signðp0 Þ
p0 2f 1 ðpÞ where x 2 C (M0 ) and  2 C (M).
550 Cohomology Theories

de Rham Cohomology Hodge Decomposition


If x1 , . . . , xn are the local coordinates of Rn , define an Let M be a compact oriented Riemannian manifold of
algebra  to be the algebra generated by symbols dimension n. The Hodge star operator, , associates to
dx1 , . . . , dxn subject to the relations dxi dxj = dxj dxi every q-form an (n  q)-form. For Rn and any
for all i, j. We say dxi1    dxiq has degree q. The orthonormal basis {e1 , . . . , en }, it is defined by setting
differential forms on Rn are the algebra
ðe1 ^    ^ eq Þ :¼ epþ1 ^    ^ en
 ðRn Þ :¼ fC1 functions on Rn g R 
L where one takes þ if the orientation defined by
The algebra  (R n ) = nq = 0 q (Rn ) is naturally {e1 , . . . , en } is the same as the given one, and 
graded by degree. There is a differential operator otherwise. Using local coordinate charts this defini-
d : q (Rn ) ! qþ1 (R n ) defined by tion can be extended to M. Clearly,  depends on the
0 n P chosen metric and orientation of M. If M is
1. if f 2  P(R ), then df = P(@f =@xi )dxi compact, we may define an inner product on the
2. if ! = fI dxI , then d! = dfI dxI
q-forms by
I stands here for a multi-index. For example, in R3 Z
0
the differential assigns to 0-forms ( = functions) the ð!; ! Þ :¼ ! ^ !0
gradient, to 1-forms the curl, and to 2-forms the M

divergence. An easy exercise shows that d2 = 0 and With respect to this inner product  is an isometry.
the qth de Rham cohomology of Rn is the vector space Define the codifferential via
q ker d : q ðRn Þ ! qþ1 ðRn Þ  :¼ ð1Þnpþnþ1  d : q ðMÞ ! q1 ðMÞ
Hde ðRn Þ ¼
R im d : q1 ðRn Þ ! q ðRn Þ
and the Laplace–Beltrami operator via
More generally, the de Rham complex  (M) and
  :¼ d þ d
its cohomology Hde R (M) can be defined for any
smooth manifold M. The codifferential satisfies 2 = 0 and is the adjoint
Let  be a smooth, singular, real (q þ 1)-chain on of the differential. Indeed, for q-forms ! and (q þ 1)-
M, and let ! 2 q (M). Stokes theorem then says forms !0 :
Z Z
ðd!; !0 Þ ¼ ð!; !0 Þ ½15
!¼ d!
@ 
It follows easily that  is self-adjoint, and
and therefore integration defines a pairing between furthermore,
the qth singular homology and the qth de Rahm
! ¼ 0 if and only if d! ¼ 0 and ! ¼ 0 ½16
cohomology of M. This pairing is exact and thus de
Rahm cohomology is isomorphic to singular coho- A form ! satisfying ! = 0 is called ‘‘harmonic.’’ Let
mology with real coefficients: Hq denote the subspace of all harmonic q-forms. It is
   not hard to prove the ‘‘Hodge decomposition theorem’’:
Hde R ðMÞ ’ ðH ðM; RÞÞ ’ H ðM; RÞ
q ¼ Hq  im d  im 
Let c (M) denote the subcomplex of compactly
supported forms and Hc (M) its cohomology. Integra- Furthermore, by adjointness [15], a form ! is closed
tion with respect to the first i coordinates defines a map only if it is orthogonal to im . On calculating the
de Rham cohomology we can also ignore the
c ðRn Þ ! i
c ðR
ni
Þ summand im d and find that:
which induces an isomorphism in cohomology; note in Each de Rham cohomology class on a compact
particular Hcn (Rn ) = R. More generally, when E ! M oriented Riemannian manifold M contains a unique
is an i-dimensional orientable, real vector bundle over q
harmonic representative, that is, Hde q
R (M) ’ H .
a compact, orientable manifold M, integration over
the fiber gives the ‘‘Thom isomorphism’’: Warning: This is an isomorphism of vector spaces
and in general does not extend to an isomorphism of
Hc ðEÞ ’ Hci ðMÞ ’ Hde
i
R ðMÞ algebras.
f
For orientable fiber bundles F ! M0 ! M with
compact, orientable fiber F, integration over the Examples
fiber provides another definition of the transfer map
 0 i
We list the cohomology of some important
f ! : Hde R ðM Þ ! Hde R ðMÞ examples.
Cohomology Theories 551

Projective Spaces BG can be constructed as the space of G-orbits of


n a contractible space EG on which G acts freely.
Let RP be real projective space of dimension n. Then,
Thus, for example,
H  ðRPn ; Z2 Þ ¼ Z2 ½x=ðxnþ1 Þ BZ ¼ R=Z ’ S1
is a stunted polynomial ring on a generator x of BZ2 ¼ ð lim Sn Þ=Z2 ’ RP1
n!1
degree 1.
Similarly, let CPn and HPn denote complex and BS1 ¼ ð lim S2nþ1 Þ=S1 ’ CP1
n!1
quaternionic projective space of real dimensions 2n
and more generally, infinite Grassmannian mani-
and 4n, respectively. Then,
folds are classifying spaces for linear groups. When
H  ðCPn ; ZÞ ¼ Z½y=ðynþ1 Þ G is a compact connected Lie group,
H ðHPn ; ZÞ ¼ Z½z=ðznþ1 Þ H  ðBG; QÞ ’ Q½x2d1 ;. . . ; x2dl 

are stunted polynomial rings with deg(y) = 2 and with di as above and jxi j = i. In particular,
deg(z) = 4. H  ðBSOð2k þ 1Þ; Z½1=2Þ
’ Z½1=2½p1 ; p2 ; . . . ; pk 
Lie Groups 
H ðBSOð2kÞ; Z½1=2Þ
Let G be a compact, connected Lie group of rank l, ’ Z½1=2½p1 ; p2 ; . . . ; pk1 ; ek 
that is, the dimension of the maximal torus of G is l. 
H ðBUðkÞ; ZÞ ’ Z½c1 ; c2 ; . . . ; ck 
Then,
where the Pontryagin, Euler, and Chern classes have
H  ðG; QÞ degree jpi j = 4i, jek j = 2k, and jci j = 2i, respectively.
^
’ ½a2d1 1 ; a2d2 1 ; . . . ; a2dl 1 
Q Moduli Spaces

where jai j = i and d1 , . . . , dl are the fundamental Let Mng be the space of Riemann surfaces of genus g
degrees of G which are known for all G. Often this with n ordered, marked points. There are naturally
structure lifts to the integral cohomology. In defined classes i and e1 , . . . , en of degree 2i and 2,
particular we have: respectively. By Harer–Ivanov stability and the
recent proof of the Mumford conjecture (Madsen–

Hfree ðSOð2k þ 1Þ; ZÞÞ Weiss, preprint 2004), there is an isomorphism up to
^ degree  < 3g=2 of the rational cohomology of Mng
’ ½a3 ; a7 ; . . . ; a4k1 
with
Z

Hfree ðSOð2kÞ; ZÞÞ Q½1 ; 2 ; . . . Q½e1 ; . . . ; en 
^ The rational cohomology vanishes in degrees  >
’ ½a1 ; a7 ; . . . ; a4k5 ; a2k1 
4g  5 if n = 0, and  > 4g  4 þ n if n > 0. Though
Z
^ the stable part of the cohomology is now well under-

H ðUðkÞ; ZÞ ’ ½a1 ; a3 ; . . . ; a2k1  stood, the structure of the unstable part, as proposed by
Z Faber (Viehweg 1999), remains conjectural.

Classifying Spaces Generalized Cohomology Theories


For any group G there exists a classifying space BG, The three basic properties of singular homology
well defined up to homotopy. Classifying spaces appropriately dualized, hold of course also for
are of central interest to geometers and topologists cohomology. Furthermore, they (essentially) deter-
for the set of isomorphism classes of principal mine (co)homology uniquely as a functor from the
G-bundles over a space X is in one-to-one corre- category of simplicial spaces and continuous func-
spondence with the set of homotopy classes of maps tions to the category of abelian groups. If we drop
from X to BG. In particular, every cohomology class the dimension axiom (2), we are left with homotopy
c 2 H  (BG; R) defines a characteristic class of invariance (1), and the Mayer–Vietoris sequence (3).
principle G-bundles E over X: if E corresponds to Abelian group valued functors satisfying (1) and (3)
the map fE : X ! BG, then c(E) := fE (c). are so called ‘‘generalized (co)homology theories.’’
552 Cohomology Theories

K-theory and cobordism theory are two well-known Elliptic Cohomology


examples but there are many more.
Quillen proved that complex cobordism theory is
universal for all complex oriented cohomology
K-Theory
theories, that is, those cohomology theories that
The geometric objects representing elements in com- allow a theory of Chern classes. In a complex
plex K-theory K0 (X) are isomorphism classes of finite oriented theory, the first Chern class of the tensor
dimensional complex vector bundles E over X. Vector product of two line bundles can be expressed in
bundles E, E0 can be added to form a new bundle terms of the first Chern class of each of them via a
E  E0 over X, and K0 (X) is just the group completion two-variable power series: c1 (E E0 ) = F(c1 (E),
of the arising monoid. Thus, for example, for the point c1 (E0 )). F defines a formal group law and Quillen’s
space we have K0 (pt) = Z. Tensor product of vector theorem asserts that the one arising from complex
bundles E E0 induces a multiplication on K-theory cobordism theory is the universal one.
making K (X) into a graded commutative ring. Vice versa, given a formal group law, one may try to
In many ways K-theory is easier than cohomol- construct a complex oriented cohomology theory from
ogy. In particular, the groups are 2-periodic: all even it. In particular, an elliptic curve gives rise to a formal
degree groups are isomorphic to the reduced group law and an elliptic cohomology theory. Hopkins
K-theory group K ~ 0 (X) := coker(K0 (pt) = Z ! K0 (X)), et al. have described and studied an inverse limit of
and all odd degree groups are isomorphic to these elliptic theories, which they call the theory of
K1 (X) := K ~ 0 (X). topological modular forms, tmf, as the theory is closely
The theory of characteristic classes gives a close related to modular forms. In particular, there is a
relation between the two cohomology theories. The natural map from the groups tmf 2n (pt) to the group of
Chern character map, a rational polynomial in the modular forms of weight n over Z. After inverting a
Chern classes, defines certain element (related to the discriminant), the
theory becomes periodic with period 242 = 576.
ch : K0 ðXÞ Z Q ! H even ðX; QÞ Witten (1998) showed that the purely theoreti-
:¼  H 2k ðX; QÞ cally constructed elliptic cohomology theories
k0
should play an important role in string theory: the
an isomorphism of rings. Thus, the K-theory and index of the Dirac operator on the free loop space of
cohomology of a space carry the same rational certain manifolds should be interpreted as an
information. But they may have different torsion element of it. But unlike for ordinary cohomology,
parts. This became an issue in string theory when K-theory, and cobordism theory we do not (yet)
D-brane charges which had formerly been thought know a good geometric object representing elements
of as differential forms (and hence cohomology in this theory without which its use for geometry
classes) were later reinterpreted more naturally as and analysis remains limited. Segal speculated some
K-theory classes by Witten 1998) 20 years ago that conformal field theories should
define such geometric objects. Though progress has
There are real and quaternionic K-theory groups
been made, the search for a good geometric
which are 8-periodic.
interpretation of elliptic cohomology (and tmf)
remains an active and important research area.
Cobordism Theory
The geometric objects representing an element in the
Infinite Loop Spaces
oriented cobordism group nSO (X) are pairs (M, f )
where M is a smooth, orientable n-dimensional Brown’s representability theorem implies that for
manifold and f : M ! X is a continuous map. Two each (reduced) generalized cohomology theory h we
pairs (M, f) and (M0 , f 0 ) represent the same cobord- can find a sequence of spaces E such that hn (X) is
ism class if there exists a pair (W, F) where W is an the set of homotopy classes [X, En ] from the space X
(n þ 1)-dimensional, smooth, oriented manifold to En for all n. Recall that the Mayer–Vietoris
with boundary @W = M [ M0 such that F: W ! X sequence implies that hn (X) ’ hnþ1 (X). The sus-
restricts to f and f 0 on the boundary @W. Disjoint pension functor  is adjoint to the based loop space
union and Cartesian product of manifolds define an functor  which takes a space X to the space of
addition and multiplication so that SO (X) is a based maps from the circle to X. Hence,
graded, commutative ring.
hn ðXÞ ¼ ½X; En  ¼ ½X; Enþ1 
Similarly, unoriented, complex, or spin cobordism
groups can be defined. ¼ ½X; Enþ1 
Combinatorics: Overview 553

and it follows that every generalized cohomology example, the category of finite-dimensional,
theory is represented by an infinite loop space complex vector spaces and their isomorphisms
gives rise to Z
BU. To give another example, in
E0 ’ E1 ’    ’ n En ’    quantum field theory, one considers the (d þ 1)-
Vice versa, any such infinite loop space gives rise to dimensional cobordism category with objects the
a generalized cohomology theory. compact, oriented d-dimensional manifolds, and
One may think of infinite loop spaces as the their (d þ 1)-dimensional cobordisms as morphisms.
abelian groups up to homotopy in the strongest Disjoint union of manifolds makes this category
sense. Indeed, ordinary cohomology with integer into a symmetric monoidal category. The associated
coefficients is represented by infinite loop space and hence generalized cohomol-
ogy theory has recently been identified as a (d þ 1)-
Z ’ S1 ’ 2 CP1 ’    ’ n Kðn; ZÞ ’    dimensional slice of oriented cobordism theory
(Galatius et al. preprint 2005).
where by definition the Eilenberg–MacLane space
K(n, Z) has trivial homotopy groups for all dimen- See also: Characteristic Classes; Equivariant
sions not equal to n and n K(n, Z) = Z. Complex Cohomology and the Cartan Model; Functional Equations
K-theory is represented by and Integrable Systems; Index Theorems; Intersection
Theory; K-Theory; Moduli Spaces: An Introduction;
Z
BU ’ ðUÞ ’ 2 ðBUÞ ’ 3 ðUÞ ’    Riemann Surfaces; Spectral Sequences.

This is Bott’s celebrated ‘‘periodicity theorem.’’


Finally, oriented cobordism theory is represented by Further Reading
1 n
 MSO :¼ lim  Thðn Þ Adams F (1978) Infinite Loop spaces. Annals of Mathematical
n!1
Studies 90: PUP.
Bott R and Tu L (1982) Differential Forms in Algebraic
where n ! BSOn is the universal n-dimensional
Topology. Springer.
vector space over the Grassmannian manifold of Galatius, Madsen, Tillmann, Weiss (2005).
oriented n-planes in R1 , and Th(n ) denotes its Hatcher A (2002) Algebraic Topology, (http://www.math.cornell.
Thom space. edu). Cambridge: Cambridge University Press.
A good source of infinite loop spaces are Madsen, Weiss (2004).
symmetric monoidal categories. Indeed every infinite Mosher R and Tangora M (1968) Cohomology Operations and
Applications in Homotopy Theory. Harper and Row.
loop space can be constructed from such a category: Viehweg (1999) Aspects of Mathematics E33.
the symmetric monoidal structure gives the corre- Witten (1998) Journal of Higher Energy Physics 12.
sponding homotopy abelian group structure. For Witten (1998) Springer Lecture Notes in Mathematics, vol. 1326.

Combinatorics: Overview
C Krattenthaler, Universität Wien, Vienna, Austria technique, Redfield–Pólya theory, methods of solving
ª 2006 Elsevier Ltd. All rights reserved.
functional equations of combinatorial origin, meth-
ods of asymptotic enumeration, the theory of heaps,
and the transfer matrix method. The subsequent
sections then discuss specific problem circles with
Introduction
relation to statistical physics more closely. We discuss
Combinatorics is a vast field which enters particularly lattice path problems, explain Kasteleyn’s method of
in a crucial way in statistical physics. There, it is enumerating perfect matchings and tilings, present
particularly the enumerative problems that are of the fundamental theorems on nonintersecting paths,
importance. Therefore, in this article, we shall mainly and provide an introduction into the research field
concentrate on the enumerative aspects of combina- involving vicious walkers, plane partitions, rhombus
torics. We first recall the basic terminology, in tilings, alternating sign matrices, six-vertex config-
particular the basic combinatorial objects and num- urations, and fully packed loop configurations.
bers, together with the simplest facts about them. We Finally, we explain how one should treat binomial
then provide introductions into the most important and hypergeometric series, which frequently arise in
techniques of enumeration: the generating function enumeration problems.
554 Combinatorics: Overview

Basic Combinatorial Terminology n is 2n1 . The number of compositions


 of n with
n1
exactly k summands is k1. A partition of a
In this section we review basic combinatorial
positive integer n is a representation of n as a sum
notions and facts. The reader can find a more
n = 1 þ 2 þ    þ k of other positive integers i ,
detailed treatment and further results, for example,
where the order of the summands does matter. Thus,
in chapter 1 of Stanley (1986).
we may assume that the summands are ordered,
The basic combinatorial choice problems and
n 1  2      k > 0. This is the motivation
their solutions are: there n are 2 subsets of an to write partitions most often in the form of
n-element set. There are k k-element subsets of an
tuples (1 , 2 , . . . , k ) the entries of which are
n-element set. Given an alphabet A = {a1 , a2 , . . . }, a
weakly decreasing. The summands of a partition
word is a (finite or infinite) sequence of elements of
are called the parts of the partition. Let p(n) denote
A. Usually, a finite word is written in the form
the number of partitions of n. These numbers are
w1 w2 . . . wn (with wi 2 A). Out of the letters
given by
{1, 2, . . . , k}, one can build kn words of length n.
Out of the letters {1, 2, . . . , k}, one can build ( nþk1
n ) X
1
1
increasing sequences of length n. The number of pðnÞxn ¼ Q1 i
n¼0 i¼1 ð1  x Þ
permutations of an n-element set is n!. The set of
permutations of {1, 2, . . . , n} is denoted by S n . The If p(n, k) denotes the number of partitions of n into
number of permutations of an n-element set with at most k parts, then we have
exactly k cycles is the Stirling number of the first X
1
1
kind, s(n, k). These numbers are given as the pðn; kÞxn ¼ Qk
i
expansion coefficients of falling factorials, n¼0 i¼1 ð1  x Þ

X
n Finally, if p(n, k, m) denotes the number of parti-
xðx  1Þ    ðx  n þ 1Þ ¼ ð1Þnk sðn; kÞxk tions of n into at most k parts, all of which are at
k¼0 most m, then
X
or in form of the double (formal) power series pðn; k; mÞxn
X yn n0
sðn; kÞxk ¼ ð1 þ yÞx
n! ð1  xkþm Þð1  xkþm1 Þ    ð1  xmþ1 Þ
n;k0 ¼
ð1  xk Þð1  xk1 Þ    ð1  xÞ
A partition of a set is a collection of pairwise
The expression on the right-hand side is called
disjoint subsets the union of which is the complete kþm
set. The subsets in the collection are called the q-binomial coefficient, and is denoted by [ ]x .
k
blocks of the partition. The total number of Partitions are frequently encoded in terms of their
partitions of an n-element set is the Bell number Ferrers diagrams. The Ferrers diagram of a partition
Bn . These numbers are given by  = (1 , 2 , . . . , ‘ ) is an array of cells with ‘ left-
justified rows and i cells in row i. For example, the
X xn x diagram in Figure 1 is the Ferrers diagram of the
Bn ¼ ee 1
n0
n! partition (3, 3, 2).
A lattice path P in Zd (where Z denotes the set of
The number of partitions of an n-element set into integers) is a path in the d-dimensional integer
exactly k blocks is the Stirling number of the second lattice Zd which uses only points of the lattice, that
kind, S(n, k). These numbers are given by is, it is a sequence (P0 , P1 , . . . , Pl ), where Pi 2 Zd for
! ! !
X yn y
all i. The vectors P0 P1 , P1 P2 , . . . , Pl1 Pl are called
Sðn; kÞxk ¼ exðe 1Þ the steps of P. The number of steps, l, is called the
n!
n;k0 length of P. Figure 2 shows a lattice path in Z2 of
length 11.
or, explicitly, by
 
1X k
k n
Sðn; kÞ ¼ ð1Þkj j
k! j¼0 j

A composition of a positive integer n is a represen-


tation of n as a sum n = s1 þ s2 þ    þ sk of other
positive integers si , where the order of the sum-
mands matters. The total number of compositions of Figure 1 A Ferrers diagram.
Combinatorics: Overview 555

Figure 2 A Motzkin path.

A Dyck path is a lattice path in the integer


plane Z2 consisting of up-steps (1, 1) and down-steps
(1, 1), which starts at the origin, never passes below
the x-axis, and ends on the x-axis. See Figure 3 for an
example.
The number of Dyck paths of length 2n is the
Figure 4 A Schröder path.
Catalan number
 
1 2n
Cn ¼ vertical steps (0, 1), which starts at the origin, never
nþ1 n passes below the diagonal x = y, and ends on the
The generating function (see the next section for an diagonal x = y. See Figure 4 for an example.
introduction to the theory of generating functions) The number of Schröder paths of length n is the
for these numbers is (large) Schröder number
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X1
1  1  4x X 1 2kn þ k
n Sn ¼
Cn x ¼ ½1
n¼0
2x k0
kþ1 k 2k

The reader is referred to exercise 6.19 in Stanley The generating function for these numbers is
(1999) for countless occurrences of the Catalan
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
numbers. X1
1  x  1  6x þ x2
n
A Motzkin path is a lattice path in the integer Sn x ¼ ½3
n¼0
2x
plane Z2 consisting of up-steps (1, 1), level steps
(1, 0), and down-steps (1,1), which starts at the The reader is referred to exercise 6.39 in Stanley
origin, never passes below the x-axis, and ends on (1999) for numerous occurrences of the Schröder
the x-axis. The path in Figure 2 is in fact a Motzkin numbers.
path. The number of Motzkin paths of length n is There is another famous sequence of numbers
the Motzkin number which we did not touch yet, the Fibonacci numbers
X 1 2k n  Fn . They are given by
Mn ¼
k0
kþ1 k 2k pffiffiffi!nþ1
1 1þ 5
Fn ¼ pffiffiffi
The generating function for these numbers is 5 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X 1
1  x  1  2x  3x2
n
Mn x ¼ ½2 with generating function
n¼0
2x2
X
1
1
Fn xn ¼ ½4
The reader is referred to exercise 6.38 in Stanley (1999) n¼0
1  x  x2
for numerous occurrences of the Motzkin numbers.
A Schröder path is a lattice path in the integer They also occur in numerous places. For example,
plane Z2 consisting of horizontal steps (1, 0) and the number Fn counts all paths on the integers Z
from 0 to n with steps (1, 0) and (2, 0).
An undirected graph G consists of vertices and
edges. An edge is a two-element subset of the
vertices, which, however, is thought of as a line or
curve connecting the two vertices. See Figure 5a
for an example. The usual notation for a graph G
is G = (V, E), where V is the set of vertices and E
Figure 3 A Dyck path. is the set of edges of G. A graph is planar if it is
556 Combinatorics: Overview

υ5
(ordinary) generating function for A is the formal
power series
υ4 X X
1
υ2
FA ðxÞ ¼ xjaj ¼ an x n
a2A n¼0

υ3 (‘‘formal’’ means that x is just an indeterminate, not


a real or complex number. One can compute with
υ1
formal power series in the same way as with analytic
(a) (b) series, only that convergence issues do not arise,
Figure 5 (a) An undirected graph. (b) A directed graph. respectively that ‘‘convergence’’ has a different
meaning; cf. Stanley (1998, section 1.1)) Typical
embedded in the plane (sphere) in such a way that examples are Sets (the collection containing all
the curves which mark the edges do not intersect ‘‘unlabeled sets,’’ that is, all objects of the form
in their interiors. There can be several different {•, •, . . . , •}, including the empty set, where the size
ways to embed the same graph in the plane (or in of {•, •, . . . , •} is the number of •’s), Sequences
another surface). When we speak of a planar (the collection containing all ‘‘unlabeled sequences,’’
graph then we assume the graph already to be that is, all objects of the form (•, •, . . . , •), including
embedded in a given way. For example, the graph the empty sequence), Cycles (unlabeled cycles),
in Figure 5 is not a planar graph, by its drawing. with respective generating function
However, there is a different embedding which is 1
planar (namely, all embeddings which put the FSets ðxÞ ¼ FSequences ðxÞ ¼
1x ½5
vertex v3 above the vertex v5 and leave the other x
vertices as they are). A tree is a graph without any FCycles ðxÞ ¼
1x
cycles.
A directed graph (or digraph) G consists of or Trees (unlabeled trees).
vertices and arcs (which are sometimes also called If A and B are two sets of objects, one can define
several other sets of objects using them. The union
directed edges). An arc is a pair of vertices, which,
of A and B, written A [ B, has as a groundset the
however, is thought of an arrow pointing from the
disjoint union of A and B, and the size of an element
first vertex of the pair to the second. See Figure 5b
from A is its size in A, while the size of an element
for an example. The usual notation for a directed
graph G is again G = (V, E), where V is the set of from B is its size in B. We have
vertices and E is the set of arcs of G. All other FA[B ðxÞ ¼ FA ðxÞ þ FB ðxÞ ½6
notions explained for undirected graphs have analo-
gous meanings for directed graphs. The product of A and B, written A  B, has as a
Graphs can be labeled, in which case each vertex groundset the set of pairs A  B, and the size of an
is assigned a label, or unlabeled. The (undirected) element (a, b) from A  B is the sum of the sizes of a
graph in Figure 5a is labeled, whereas the (directed) (in A) and of b (in B). We have
graph in Figure 5b is unlabeled. FAB ðxÞ ¼ FA ðxÞ  FB ðxÞ ½7
The substitution of two sets A and B of objects
can only be defined in certain circumstances, and
Generating Functions
only in certain more restrictive circumstances the
Generating functions are the very basic tools of generating function for the substitution can be
enumeration. For introductions to this technique, computed by substituting the generating functions
from different points of view, the reader is referred for A and B. Let us assume that any object a from
to Bergeron et al. (1998), Flajolet and Sedgewick A of size n, by its structure, has n atoms (nodes). For
(chapter 1 in the reference listed in ‘‘Further read- example, if A is a certain set of trees, where the size
ing’’ section), and Stanley (1998, chapter 1; 1999, of a tree is the number of leaves in the tree, then we
chapter 4). may take, as the atoms, the leaves of the tree. In this
Let A be a set of (unlabeled) objects. Each object situation, the substitution of B in A, denoted by
a in A has a certain size, jaj, which is a non-negative A(B), is the set of objects which arises by replacing
integer. Let us also assume that there is only a finite the atoms of objects from A by objects from B in all
number of objects from A of a given size. Let an be possible ways. The size of an object from A(B) is the
the number of objects from A of size n. The sum of the sizes of the objects from B that it
Combinatorics: Overview 557

contains. In order that A(B) contains only a finite If A and B are two sets of objects, one defines
number of objects of a given size, we must assume again several other sets of objects using them. The
that B contains no elements of size 0. If, in addition, union of A and B, written A [ B, has as a groundset
the atoms of any element a from A inherit an order the disjoint union of A and B, and the size of an
(e.g., if A is a set of binary trees, then the leaves of a element from A is its size in A, while the size of an
binary tree are ordered in a natural way from ‘‘left’’ element from B is its size in B. We have
to ‘‘right’’), then we have
EA[B ðxÞ ¼ EA ðxÞ þ EB ðxÞ ½13
FAðB
Þ ðxÞ ¼ FA ðFB ðxÞÞ ½ 8
To define the product of A and B, written A  B,
However, this equation is not true in general. The we cannot simply take A  B as a groundset, we
general formula comes out of Redfield–Pólya theory must also say something about the labeling of the
(see [21] and [24]) and requires the notion of cycle objects. So, as a groundset we take all pairs (a, b)
index series. For example, if B is the set of connected with a 2 A and b 2 B, but labeled in all possible
(unlabeled) graphs, A is Sets, so that A(B) is the ways by 1, 2, . . . , jaj þ jbj such that the order of
set of all (connected and disconnected) graphs, then labels assigned to a respects the original order of
[8] is not true, but what is true is labels of a, and the same for b. The size of such an
  element (a,b) is again the sum of the sizes of a (in A)
FSetsðBÞ ¼ exp FB ðxÞ þ 12 FB ðx2 Þ þ 13 FB ðx3 Þ þ    ½9 and of b (in B). We have
This holds, in fact, for any set B of unlabeled objects. EAB ðxÞ ¼ EA ðxÞ  EB ðxÞ ½14
(This is seen by combining [24], [17], and [21].)
Next we deal with the enumeration of labeled Since, in the labeled world, objects come automati-
objects. Let A be a set of labeled objects, again, each cally with atoms, the substitution of two sets A and
object a with a certain size jaj which is a non- B of objects can now always be defined. The
negative integer. ‘‘Labeled’’ means that each object substitution of B in A, denoted by A(B), is the set
of size n, by its structure, comes with n atoms of objects which arises by replacing the atoms of
(nodes) which are labeled 1, 2, . . . , n. For example, objects from A by objects from B in all possible
A may be the set of all labeled graphs, where the ways, and labeling the substituted
P objects in all
size of a graph is the number of its vertices, and possible ways by 1, 2, . . . , b jbj (the sum being
where the vertices are labeled 1, 2, . . . , n. Again, we over the objects from B which were put in the places
assume that there is only a finite number of objects of the atoms) that are consistent with the original
from A of a given size. Let an be the number of labelings of the objects from B. The size of an object
objects from A of size n. The exponential generating from A(B) is the sum of the sizes of the objects from
function for A is the formal power series B that it contains. In order that A(B) contains only a
finite number of objects of a given size, we must
X xjaj X
1
xn assume that B contains no elements of size 0. Then
EA ðxÞ ¼ ¼ an we have
a2A
jaj! n¼0
n!
EAðBÞ ðxÞ ¼ EA ðEB ðxÞÞ ½15
Typical examples are Sets (the collection containing
all ‘‘labeled sets,’’ that is all objects of the form An example of a composition is
{1, 2, . . . , n}, including the empty set), Permuta-
Permutations ¼ SetsðCyclesÞ
tions, Cycles (labeled cycles), with respective
generating functions Thus, from [15] we have
ESets ðxÞ ¼ expðxÞ ½10 EPermutations ðxÞ ¼ ESets ðECycles ðxÞÞ
1 corresponding to the identity
EPermutations ðxÞ ¼ ½11
1x
1
1 ¼ expðlog 1=ð1  xÞÞ
ECycles ðxÞ ¼ log ½12 1x
1x
Another manifestation of the composition rule is, for
or Trees (labeled trees). The explicit form of the example, the fact (which is sometimes called the
generating function for Trees is discussed in the ‘‘exponential principle’’) that, if one takes the log of
section ‘‘Solving equations for generating functions: the partition function for some maps, the result is
the Lagrange inversion formula and the kernel the partition function for the connected maps among
method.’’ them.
558 Combinatorics: Overview

All of the above can be generalized to a weighted our familiar families of objects, compact expressions
setting. Namely, if A is a set of objects (labeled or are available:
unlabeled), and if w : A ! R is a weight function  x2 x3 
from A into some ring R, then all of the above ZSets ðx1 ; x2 ; . . .Þ ¼ exp x1 þ þ þ    ½17
2 3
remains true, if we replace the definitions of FA (x)
and EA (x) above by the weighted sums Y1
1
ZPermutations ðx1 ; x2 ; . . .Þ ¼ ½18
X X1 1  xi
FA ðxÞ ¼ wðaÞxjaj ðiÞ i¼1 1
ZCycles ðx1 ; x2 ; . . .Þ ¼ log ½19
a2A i¼1
i 1  xi

and where (i) is the Euler totient function (the number


of positive integers j  i relatively prime to i).
X xjaj What makes the cycle index series so fundamental
EA ðxÞ ¼ wðaÞ
a2A
jaj! is the fact that the generating functions from the last
section are specializations of it. Namely, the
respectively, if in the definition of the union of A exponential generating function for A is equal to
and B we define the weight of an object to be its
weight in A, respectively B, if in the definition of the EA ðxÞ ¼ ZA ðx; 0; 0; . . .Þ ½20
product of A and B we define the weight of an If, given the set of labeled objects A, we produce a
object (a, b) to be the product of the weights of a set of unlabeled objects A~ by taking all the objects
and b, and if in the definition of the substitution we from A but forgetting the labels, then the ordinary
define the weight of an object in A(B) as the product generating function for A~ is another specialization
of the weights of the objects from B that were put in of the cycle index series,
place of the atoms.
FA~ðxÞ ¼ ZA ðx; x2 ; x3 ; . . .Þ ½21

The cycle index series satisfies the following


Redfield–Pólya Theory of Colored properties with respect to union, product and
Enumeration composition of sets of objects:
The natural and uniform environment for the
ZA[B ðx1 ; x2 ; . . .Þ ¼ ZA ðx1 ; x2 ; . . .Þ
separate treatment of generating functions for
unlabeled and labeled objects in the last section is þ ZB ðx1 ; x2 ; . . .Þ ½22
the theory for counting colored objects founded by
Redfield and Pólya, in the modern treatment ZAB ðx1 ; x2 ; . . .Þ ¼ ZA ðx1 ; x2 ; . . .Þ
through cycle index series due to Joyal. We refer  ZB ðx1 ; x2 ; . . .Þ ½23
the reader to Bergeron et al. (1998, appendix 1),
de Bruijn (1981), and Stanley (1999, chapter 7) for ZAðBÞ ðx1 ; x2 ; . . .Þ ¼ ZA ðZB ðx1 ; x2 ; x3 ; . . .Þ;
further reading. ZB ðx2 ; x4 ; x6 . . .Þ;
Let A be a set of labeled objects with the ZB ðx3 ; x6 ; x9 ; . . .Þ; . . .Þ ½24
constraint that there is only a finite number of
objects of a given size. The cycle index series for A is Similar to the theory of generating functions
the formal multivariable series surveyed in the last section, one can also develop a
weighted version of the cycle index series. Given a set
ZA ðx1 ; x2 ; . . .Þ of labeled objects A, where each object a is assigned a
X 1
1 X c ðÞ c ðÞ c ðÞ ½16 weight w(a), one changes the definition [16] insofar as
¼ fix ðAÞx11 x22 x33 . . .
n¼0
n! 2S
fix
P  (A) gets replaced by the weighted sum
(a) = a w(a), where (a) means the object arising
n

from a by permuting the labels according to . Then all


where fix (A) is the number of objects a from A that
the above formulas remain true in this weighted setting.
remain invariant when the labels are permuted
Cycle index series are instrumental in the enu-
according to the permutation  (in particular, if  2
meration of colored objects. The basic situation is
Sn , the size of a must be n in order that  can be
that we have given a set A~ of unlabeled objects so
applied to the labels), and where ci () denotes the
that every object of size n comes with n atoms
number of cycles of length i of .
(nodes). For example, we may think of A~ as the set
In most cases, it is difficult to obtain compact
of cycles. We are now going to color each atom by a
expressions for the cycle index series. However, for
Combinatorics: Overview 559

color from the set of colors C. The question that we Sedgewick, (section VII.5 of the reference in ‘‘Further
pose is: how many different colored objects of a reading’’ section) for further reading.
given size are there? In our example, if C consists of In many situations it will happen that, when we
the two colors ‘‘black’’ and ‘‘white,’’ then we are apply the methods from the last section, we end up
asking the question of how many necklaces one can with aPfunctional equation for the generating function
make out of n pearls that can be black or white. In f (x) = 1 n
n = 0 fn x that we wanted to compute. For
terms of generating functions, we want to compute example, if tn denotes the number of labeled rooted
X trees
A~ðxÞ ¼ xjcj P1 with n
n nodes, and if we write T(x) =
t
n=1 n x =n!, then, by applying a straightforward
c
decomposition of a tree into its root and its set of
where the sum is over all colored objects c that one subtrees attached to the root, we obtain the equation
can obtain by coloring the objects from A. ~
The central result of Redfield–Pólya theory is that, TðzÞ ¼ z expðTðzÞÞ ½25
if A is the set of labeled objects that one obtains How does one solve such an equation? As a matter
from A~ by labeling the objects of A~ in all possible of fact, for T(z), there is no expression in terms of
ways, then known functions. However, the Lagrange inversion
A~ðxÞ ¼ ZA ðjCjx; jCjx2 ; jCjx3 ; . . .Þ formula enables one to find the coefficients tn =n! of
T(z) explicitly. The theorem reads as follows.
There is again a weighted version. One allows the Theorem Let g(x) be a formal Laurent series
objects a from A~ to have weight w(a) 2 R. More- containing only a finite number of negative powers
over, one assumes a weight function f : C ! R on of x, and let f (x) be a formal power series without
the colors with values in the ring R. One defines the constant term. If we expand g(x) in powers of f (x),
weight of a colored object obtained by coloring X
the atoms of a to be w(a) multiplied by the product gðxÞ ¼ ck f k ðxÞ ½26
of all f (), where  ranges over all the colors of the k
atoms (including repetitions of colors). Let A~(w, f ) then the coefficients cn are given by
denote the sum of all the weights of all colored
objects obtained from A. ~ Then 1
cn ¼ ½x1 g0 ðxÞf n ðxÞ for n 6¼ 0 ½27
! n
X X 2
X 3
A~ðw; f Þ ¼ ZA f ðcÞ; f ðcÞ ; f ðcÞ ; . . . or, alternatively, by
c2C c2C c2C
cn ¼ ½x1 gðxÞf 0 ðxÞf n1 ðxÞ ½28
We remark that these results cover also the case of n n
Here, [x ]h(x) denotes the coefficient of x in the
enumeration of objects under a group action. This
power series h(x).
includes the enumeration of objects on which we
impose certain symmetries. See Bergeron et al. With this theorem in hand, eqn [25] is easy to
(1998, appendix 1), de Bruijn (1981), and Stanley solve. We write it in the form
(1999, chapter 7) for more details. The enumeration
TðxÞ expðTðxÞÞ ¼ x ½29
of asymmetric objects is the subject of an ongoing
research program (cf. Labelle and Lamathe (2004)). We want P to know the coefficients in the expansion
T(x) = 1 n=0 t n xn
=n!. Since, by [29], T(x) is the
compositional inverse of x exp (x), substitution of
Solving Equations for Generating x exp (x) instead of x gives
Functions: The Lagrange Inversion X
1
tn
Formula and the Kernel Method x¼ ðx expðxÞÞn
n¼0
n!
In this section, we describe two methods to solve
This equation is in the form [26] with f (x) =
functional equations for generating functions. The
x exp (x) and g(x) = x. Hence, by [27], we obtain
Lagrange inversion makes it possible (in some situa-
tions) to find explicit expressions for the coefficients of tn 1 1
¼ ½x ðx expðxÞÞn
an implicitly given series. The kernel method (and its n! n
extensions), on the other hand, is a powerful method 1 nn1
to obtain an explicit expression for an implicitly given ¼ ½xn1  expðnxÞ ¼
n n!
function. We refer the reader to Flajolet and
and, thus, tn = nn1 .
560 Combinatorics: Overview

The second method to solve functional equations reading’’ section). In a more general situation, one
which we explain in this section is the kernel has a functional equation
method. We illustrate the method by an example.
PðFðu; xÞ; F1 ðxÞ; . . . ; Fk ðxÞ; x; uÞ ¼ 0 ½33
Let us consider the problem of counting Dyck paths
of length 2n (see the section ‘‘Basic combinatorial where F(u, x) appears linearly, as well as the
terminology’’). Rather than attempting to arrive at a unknown series F1 (x), . . . , Fk (x), whereas x and u
solution of the problem directly, we consider the appear rationally. It is clear that one can apply the
more general problem of counting the number an, k same technique, namely collecting all the terms
of paths consisting of steps (1, 1) and (1, 1), which involving F(u, x), equating the coefficient of F(u, x)
start at the origin, never drop below y = 0, have to zero, solving for u and substituting back in [33]. If
length n, and end at height k. We then form P the there is more than one function Fi (x), then this will
bivariate generating function F(u, x) = n, k0 only give one equation for Fi (x). However, when
an, k xn uk . We then have the functional equation equating the coefficient of F(u, x), which was a
x polynomial equation, there can be more solutions.
Fðu; xÞ ¼ 1 þ xuFðu; xÞ þ ðFðu; xÞ  Fð0; xÞÞ ½30 (That was actually also the case in our example,
u
although only one solution could be used.) All these
since a path can be empty (this explains the term 1), solutions can be substituted in [33] to give many
it can end by a step (1,1) (this explains the term more equations for Fi (x). The kernel method will
xuF(u)), or it can end by a step (1,1). The latter work if we have enough equations to determine the
can only happen if the path before that last step did unknown functions Fi (x) (see the Flajolet and
not end at height 0. The generating function for Sedgewick reference, section VII.5 for further details).
these paths is F(u, x) F(0, x), and this explains the In the variant of the ‘‘obstinate kernel method,’’
third term in the eqn [30]. In fact, we may replace more equations are produced in more sophisticated
[30] by ways. The method has been largely extended by
x Bousquet–Mélou and co-workers to cover equations
Fðu; xÞ ¼ 1 þ xuFðu; xÞ þ ðFðu; xÞ  F1 ðxÞÞ ½31
u of the form [33], where P is a polynomial such that
because [31] implies that F1 (x) = F(0, x). eqn [33] determines all involved series uniquely. This
The idea of the kernel method is to get rid of the extension covers in particular the so-called quadratic
unknown series F(u, x). This is possible because F(u, x) method due to Brown, which is of great significance
occurs linearly in [31], which can be rewritten as in the work of Tutte on the enumeration of maps.
 We refer the reader to Bousquet–Mélou and Jehanne
x x (2005) and the references given there for these
Fðu; xÞ 1  xu  ¼ 1  F1 ðxÞ ½32
u u extensions.
We simply equate the coefficient of F(u, x) in this
equation to zero,
x Extracting Asymptotic Information
1  xu  ¼ 0
u from Generating Functions
solve this for u, There is powerful machinery available to extract the
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi asymptotic behavior of the coefficients of a power
1  1  4x2
u¼ series out of analytic properties of the power series.
2x
We describe the corresponding methods, singularity
(the other solution for u makes no sense in [31]), analysis and the saddle point method in this section.
and substitute this back in [32], to obtain The survey by Odlyzko (1995) and the Flajolet and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Sedgewick reference in ‘‘Further reading’’ are excel-
1  1  4x2
F1 ðxÞ ¼ lent sources for further reading, which, in particular,
2x2 contain several other methods which we cannot
the familiar generating function [2] for the Catalan cover here for reasons of limited space.
numbers. Now, by substituting this result in [31], we Let us suppose that we are interested in the
can even compute the full series F(u, x). asymptotic behavior of the sequence (fn )n0 of real
While this was certainly a complicated, and (or complex) numbers as n tends to infinity.P Let usn
unusual, way to compute the Catalan numbers, suppose that the power series f (z) = 1 n = 0 fn z
this approach generalizes when one considers converges in some neighborhood of the origin. (If
paths with different step sets (see section VII.5 of this series converges only at z = 0, then either one
the Flajolet and Sedgewick reference in ‘‘Further has to try to scale, that is, for example, look at the
Combinatorics: Overview 561

P
power series f (z) = 1 n
n = 0 fn z =n! instead, or one expansion of f (z). For the above-mentioned stan-
must apply methods other than singularity analysis dard functions, we have
or the saddle point method. In the latter case,  
depending on the nature of the coefficients fn , this 1 1
½zn ð1  zÞ log
may be the Euler–Maclaurin or the Poisson summa- z 1z
tion formulas, the Mellin transform technique, or n1 C1 
other direct methods. The reader is referred to  ðlog nÞ 1 þ
ðÞ 1! log n
Odlyzko (1995) and the Flajolet and Sedgewick !
reference.) The idea is then to consider f (z) as a C2 ð  1Þ
complex function in z (and extend the range of f þ þ  ½35
2! ðlog nÞ2
beyond the disk of convergence about the origin),
and to study the singularities of f (z). (The point at where [zn ]g(z) denotes the coefficient of zn in g(z),
infinity can also be a singularity.) The upshot is that and where
the singularities of f (z) with smallest modulus
dictate the asymptotic behavior of the coefficients dk 1
Ck ¼ ðÞ k
fn . These singularities of smallest modulus are called ds ðsÞ s¼
the dominating singularities.
If there is an infinite number of dominant If  is a nonpositive integer, then this expansion has
singularities, then one has to try the circle method. to be taken with care (cf. section VI.2 of the Flajolet
We refer the reader to Andrews (1976) and Ayoub and Sedgewick reference).
(1963) for details of this method. ToPsee how
  this works, consider the example
If there is a finite number of dominant singula- fn = nk = 0 2kk . We have
rities, then there can be again two different situa- X
1
1
tions, depending on whether these are ‘‘small’’ or fn zn ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n¼0 ð1  zÞ 1  4z
‘‘large’’ singularities. Roughly speaking, a singularity
is small if the function f (z) grows at most The function on the right-hand side is meromorphic
polynomially when z approaches the singularity, in all of C (where C denotes the complex numbers),
otherwise it is ‘‘large.’’ A typical example of a small with singularities at z = 1 and z = 1=4. The domi-
singularity is z = 1=4 in (1  4z)1=2 , whereas a nant singularity is z = 1=4. We determine the
typical example of a large singularity is z = 1 in singular expansion of f(z) about z = 1=4,
exp (x) or z = 1 in exp (1=(1  z)).
The method to apply for small singularities is the 4 4
f ðzÞ ¼ ð1  4zÞ1=2  ð1  4zÞ1=2
method of singularity analysis as developed by 3 9
4  
Flajolet and Odlyzko. (Singularity analysis implies
þ ð1  4zÞ þ O ð1  4zÞ5=2
3=2
Darboux’s method, which occurs frequently in the 27
literature, and, thus, supersedes it.) For the sake of (We stopped the expansion after three terms. The
simplicity, we consider first only the case of a farther we go, the more terms can we compute
unique dominant singularity. We shall address the of the asymptotic expansion for fn .) Hence, we
issue of several dominant singularities shortly. obtain
Furthermore, we assume the singularity to be
 1=2
 
z = 1, again for the sake of simplicity of presenta- n 4 n 1 1
tion. The general result can then be obtained by fn ¼ 4 1 þ
3 ð1=2Þ 8n 128n2
rescaling z.  
4 n3=2 3
The basic idea is the transfer principle:  1þ
9 ð1=2Þ 8n
If f ðzÞ ¼ ðzÞ þ OððzÞÞ then 4 n 5=2  
7=2
z!1 þ þO n
27 ð3=2Þ
fn ¼ n þ Oðn Þ ½34   
n!1 4n 4 1 11 1
¼ pffiffiffiffiffiffi þ þ þ O
P n 3 18n 288n2 n3
where (z) = 1 n
n = 0 n z is a linear combination of

standard functions of the form P1 (1  z)n , or loga- If there are several small dominant singularities
rithmic variants, and (z) = n = 0 n z also lies in (but only a finite number of them), then one simply
the scale (see sections VI.3,4 of the Flajolet and applies the above procedure for all of them and, to
Sedgewick reference for the exact statement). The obtain the desired asymptotic expansion, one adds
expansion for f (z) in [34] is called the singular up the corresponding contributions.
562 Combinatorics: Overview

The method to apply for large singularities is the This result covers only the first term in the
saddle point method. For the following considera- asymptotic expansion. There is an even more
tions, we assume that f(z) is analytic in jzj < R  1. sophisticated theory due to Harris and Schoenfeld,
At the heart of the saddle point method lies which allows one to also find a complete asymptotic
Cauchy’s formula expansion. We refer the reader to section VIII.5 of
Z the Flajolet and Sedgewick reference and Odlyzko
1 f ðzÞ (1995) for more details.
fn ¼ ½zn f ðzÞ ¼ dz ½36
2 i C znþ1 Methods for the asymptotic analysis of multi-
for writing the nth coefficient in the power series variable generating functions are also available
expansion of f(z). Here, C is some simple closed (see the corresponding chapters in Flajolet and
contour around the origin that stays in the range Sedgewick, Odlyzko (1995) and the recent impor-
jzj < R. The idea is to exploit the fact that we are tant development surveyed in the Pemantle and
free to deform the contour. The aim is to choose a Wilson reference listed in ‘‘Further reading’’). We
contour such that the main contribution to the add that both the method of singularity analysis and
integral in [36] comes from a very tiny part of the Hayman’s theory of admissible functions have been
contour, whereas the contribution of the rest is made largely automatic, and that this has been
negligible. This will be possible if we put the implemented in the Maple program gdev (see
contour through a saddle point of the integrand ‘‘Further reading’’).
f (z)=znþ1 . Under suitable conditions, the main
contribution will then come from the small passage
of the path through the saddle point, and the The Theory of Heaps
contribution of the rest will be negligible. The theory of heaps, developed by Viennot, is a
In practice, the saddle point method is not always geometric rendering of the theory of the partial
straightforward to apply, but has to be adapted to the commutation monoid of Cartier and Foata, which
specific properties of the function f(z) that we are is now most often called the Cartier–Foata monoid.
encountering. We refer the reader to the correspond- Its importance stems from the fact that several
ing chapters in the Flajolet and Sedgewick reference objects which appear in statistical physics, such as
and Odlyzko (1995) for more details. There is one Motzkin paths, animals, respectively polyominoes,
important exception though, namely the Hayman or Lorentzian triangulations (see the Viennot and
admissible functions. We will not reproduce the James reference in ‘‘Further reading’’ and the
definition of Hayman admissibility because it is references therein), are in bijection with heaps.
cumbersome (cf. section VIII.5 in the Flajolet and Informally, a heap is what we would imagine. We
Sedgewick reference and definition 12.4 of Odlyzko take a collection of ‘‘pieces,’’ say B1 , B2 , . . . , and put
(1995)). However, in many applications, it is not them one upon the other, sometimes also sideways,
even necessary to go back to it because of the closure to form a ‘‘heap,’’ see Figure 6.
properties of Hayman admissible functions. Namely, There, we imagine that pieces can only move
it is known (cf. Odlyzko (1995), theorem 12.8) that vertically, so that the heap in Figure 6 would indeed
exp (p(z)) is Hayman admissible in jzj < 1 for any form a stable arrangement. Note that we allow
polynomial p(z) with real coefficients as long as the several copies of a piece to appear in a heap. (This
coefficients an of the Taylor series of exp (p(z)) are means that they differ only by a vertical translation.)
positive for all sufficiently large n (thus, e.g., exp (z) For example, in Figure 6 there appear two copies of
is Hayman admissible), and it is known that, if f(z) B2 . Under these assumptions, there are pieces which
and g(z) are Hayman admissible in jzj < R  1, then can move past each other, and others which cannot.
exp (f (z)) and f(z)g(z) are also (thus, e.g., For example, in Figure 6, we can move the piece B6
exp ( exp (z)  1) is Hayman admissible). higher up, thus moving it higher than B1 if we wish.
The central result P of Hayman’s theory is the However, we cannot move B7 higher than B6 ,
following: if f (z) = n0 fn zn is Hayman admissible
in jzj < R, then
B1
f ðrn Þ
fn  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi as n!1 ½37 B3 B2
n
rn 2bðrn Þ
B4 B5 B6
where rn is the unique solution for large n of the
B2 B7
equation a(r) = n in (R0 , R), with a(r) = rf 0 (r)=f (r),
b(r) = ra0 (r), and a suitably chosen constant R0 > 0. Figure 6 A heap of pieces.
Combinatorics: Overview 563

because B6 blocks the way. On the other hand, we


can move B7 past B1 (thus taking B6 with us). Thus, 3
2
a rigorous way to introduce heaps is by beginning
1
with a set B of pieces (in our example, B = 0
{B1 , B2 , . . . , B7 }), and we declare which pieces can
be moved past another and which cannot. We Figure 9 Bijection between Motzkin paths and heaps of
indicate this by a symmetric relation R: we write monomers and dimers.
aRb to indicate that a cannot move past b (and vice
versa). When we consider a word a1 a2 . . . an of the beginning to the end. Whenever we read a level-
pieces, ai 2 B, we think of it as putting first a1 , then step at height h, we make it into a monomer with
putting a2 on top of it (and, possibly, moving it past x-coordinate h, whenever we read a down-step from
a1 ), then putting a3 on top of what we already have, height h to height h  1, we make it into a dimer
etc. We declare two words to be equivalent if one whose endpoints have x-coordinates h  1 and h.
arises from the other by commuting adjacent letters Up-steps are ignored. Figure 9 shows an example. In
which are not in relation. A heap is then an the figure, the heap is not in ‘‘standard’’ fashion, in
equivalence class of words under this equivalence the sense that the x-axis is not shown as a horizontal
relation. What we have described just now is indeed line but as a vertical line (cf. Figure 7). But it could
the original definition of Cartier and Foata. be easily transformed into ‘‘standard’’ fashion by a
The class of heaps which occurs most frequently simple reflection with respect to a line of slope 1.
in applications is the class of heaps of monomers Lattice animals on the triangular lattice and on the
and dimers, which we now introduce. Let B = M [ D, quadratic lattice are also in bijection with heaps, this
where M = {m0 , m1 , . . . } is the set of monomers and time with heaps consisting entirely out of dimers.
D = {d1 , d2 , . . . } is the set of dimers. We think of a Given an animal, one simply replaces each vertex of
monomer mi as a point, symbolized by a circle, the animal by a dimer, see Figures 10 and 11. While
with x-coordinate i, see Figure 7. We think in the case of animals on the triangular lattice this
of a dimer di as two points, symbolized by circles, gives a constraintless bijection (see Figure 10), in the
with x-coordinates i  1 and i which are connected case of the quadratic lattice this sets up a bijection
by an edge, see Figure 7. We impose the relations with heaps of dimers in which two dimers of the
mi Rmi , mi Rdi , mi Rdiþ1 , i = 0, 1, . . . , di Rdj , i  1  same type can never be placed directly one over the
j  i, and extend R to a symmetric relation. Figure 8 other (see Figure 11). For example, two dimers d5 ,
shows two heaps of momomers and dimers. one placed directly over the other (as they occur in
For example, Motzkin paths are in bijection with Figure 10), are forbidden under this rule.
heaps of monomers and dimers. To see this, given a Next we make heaps into a monoid by introdu-
Motzkin path, we read the steps of the path from cing a composition of heaps. (A monoid is a set with
a binary operation which is associative.) Intuitively,
given two heaps H1 and H2 , the composition of H1
d4 and H2 , the heap H1 H2 , is the heap which results

d3

d2

d1

0 1 2 3 4 5 6 7 8
m0 m1 m2 m3 m4 m5 m6 m7 Figure 10 Bijection between animals and heaps of dimers.

0 1 2 3 4 5 6 7
Figure 7 Monomers and dimers.

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8
Figure 8 Two heaps of monomers and dimers. Figure 11 Bijection between animals and heaps of dimers.
564 Combinatorics: Overview

Furthermore, if P(B, R) denotes the set of all


pyramids with pieces from B, then
0 1
0 1 2 3 4 5 6 7 8
X wðPÞ X
Figure 12 The composition of the heaps in Figure 8. ¼ log@ wðHÞA ½40
P2PðB;RÞ
jPj H2HðB;RÞ
by putting H2 on top of H1 . In terms of words, the
composition of two heaps is the equivalence class of where jPj is the number of pieces of P. (As the
the concatenation uw, where u is a word from the reader will have guessed, this is a consequence of the
equivalence class of H1 , and w is a word from the ‘‘exponential principle’’ mentioned in the section
equivalence class of H2 . ‘‘generating functions.’’)
The composition of the two heaps in Figure 8 is
shown in Figure 12.
Given pieces B with relation R, let H(B, R) be the
The Transfer Matrix Method
set of all heaps consisting of pieces from B, The transfer matrix method (cf. Stanley (1986),
including the empty heap, the latter denoted by ;. chapter 4 for further reading) applies whenever we
It is easy to see that the composition makes are able to build the combinatorial objects that we
(H(B, R), ) into a monoid with unit ;. are interested in by moving on a finite number of
For the statement of the main theorem in the states in a step-by-step fashion, where the current
theory of heaps, we need two more terms. A trivial step does not depend on the previous ones. (In
heap is a heap consisting of pieces all of which are statistical language, we are considering a finite-state
pairwise unrelated. Figure 13a shows a trivial heap Markov chain.) For example, Motzkin paths which
consisting of monomers and dimers. A pyramid is a are constrained to stay between two parallel lines,
heap with exactly one maximal ( = topmost) ele- say between y = 0 and y = K, can be described in
ment. Figure 13a shows a pyramid consisting of such a way: the states are the heights 0, 1, . . . , K,
monomers and dimers. Finally, if H is a heap, then and, if we are in state h, then in the next step we are
we write jHj for the number of pieces in H. allowed to move to states h þ 1, h, or h  1, except
In applications, heaps will have weights, which are that from state 0 we cannot move to 1 (there is no
defined by introducing a weight w(B) for each piece B state 1), and when we are in state K we cannot
in B, and by extending the weight w to all heaps H by move to K þ 1 (there is no state K þ 1).
letting w(H) denote the product of all weights of the For describing the general situation, let G = (V, E)
pieces in H (multiplicities of pieces included). be a directed graph with vertex set V and edge set E. Let
Let M be a subset of the pieces B. Then, the wn (u, v) denote the number of walks from vertex u to
generating function for all heaps with maximal vertex v along edges of G. To compute these numbers,
pieces contained in M is given by we consider the adjacency matrix of G, A(G). By
P jTj definition, using our notation, A(G) = (w1 (u, v))u, v2V .
X T2T ðBnM;RÞ ð1Þ wðTÞ
wðHÞ ¼ P Obviously, (wn (u, v))u, v2V = (A(G))n . Thus,
jTj !
H2HðB;RÞ T2T ðB;RÞ ð1Þ wðTÞ X1 X1
maximal pieces
M
wn ðu; vÞx n
¼ ðAðGÞÞn xn
½38 n¼0 u;v2V n¼0

where T (B, R) denotes the set of all trivial heaps ¼ ðIn  AðGÞxÞ1
with pieces from B. In particular, the generating
function for all heaps is given by where In is the n  n identity
Pmatrix. In other words,
1
the generating functions n=0 wn (u, v)xn for the
X 1 walk numbers between u and v form the entries of a
wðHÞ ¼ P jTj
½39
H2HðB;RÞ T2T ðB;RÞ ð1Þ wðTÞ
matrix which is the inverse matrix of In  A(G)x. By
elementary linear algebra,
X
1
wn ðu; vÞ xn
n¼0

ð1Þ#uþ#v detðIn  AðGÞxÞv;u


0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
¼ ½41
detðIn  AðGÞxÞ
(a) (b) where det (In  A(G)x)v, u is the minor of In  A(G)x
Figure 13 (a) A trivial heap. (b) A pyramid. with the row indexed by v and the column indexed
Combinatorics: Overview 565

by u omitted, and where #u denotes the row from ( t  1, t þ 1) to (n, m), by reflecting the path
number of u and similarly for #v. A weighted portion between the origin and the last touching
version could also be developed in the same way, point on y = x þ t þ 1 in this latter line. Thus, the
where we put a weight w(e) on each edge, and the result of the enumeration problem is the number of
weight of a walk is the product of the weights of all all paths from (0, 0) to (n,
 m), which is given by the
its edges. binomial coefficient nþm n , minus the number of all
In particular, the expression [41] is a rational paths from (t  1, t þ 1) to (n, m), which is given
function in x. Then, by the basic theorem on nþm
by the binomial coefficient nþtþ1 , whence the
rational generating functions (cf. Stanley (1986), formula [42].
sectionP4.1), the number wn (u, v) can be expressed as If one considers more generally paths bounded by
a sum di= 1 Pi (n)in , where the i ’s are the different the line my = nx þ t, no compact formula is known.
roots of the polynomial det (xIn  A(G)), and Pi (n) It seems that the most conceptual way to approach
is a polynomial of degree less than the multiplicity this problem is through the so-called kernel method
of the root i . (The Pi (n)’s depend on u and v, (see the section on solving equations for generating
whereas the i ’s do not.) If there exists a unique root functions), which, in combination with the saddle
j with maximal modulus, then this implies that, point method, allows one also to obtain strong
asymptotically as n ! 1, wn (u, v)  Pj (n)jn . asymptotic results. There is one special instance,
however, which has a ‘‘nice’’ formula. The number
of all lattice paths from the origin to (n, m) which
Lattice Paths never pass above x = y, where is a positive
integer, is given by
Recall from the section on basic combinatorial
 
terminology that a lattice path P in Zd is a path in n  m þ 1 n þ m þ 1
the d-dimensional integer lattice Zd which uses only ½43
nþmþ1 m
points of the lattice, that is, it is a sequence
(P0 , P1 , . . . , Pl ), where Pi 2 Zd for all i. The vectors The most elegant way to prove this formula is by


!
P0 P1 , P1 P2 , . . . , Pl1 Pl are called the steps of P. The means of the cycle lemma of Dvoretzky and
number of steps, l, is called the length of P. Motzkin (see Mohanty (1979), p. 9 where the cycle
The enumeration of lattice paths has always lemma occurs under the name of ‘‘penetrating
been an intensively studied topic in statistics, analysis’’).
because of their importance in the study of Iteration of the reflection principle shows that the
random walks, of rank order statistics for non- number of paths from the origin to (n, m) which stay
parametric testing, and of queueing processes. The between the lines y = x þ t and y = x þ s (being
reader is referred to Feller (1957) and particularly allowed to touch them), where t  0  s and n þ t 
Mohanty’s (1979) book, which is a rich source for m  n þ s, is given by the finite (!) sum (see, e.g.,
enumerative results on lattice paths, albeit in a Mohanty (1979), p. 6)
statistical language. We review the most important X 
results in this section. Most of these concern two- nþm
dimensional lattice paths, that is, the case d = 2. k2Z
n  kðt  s þ 2Þ
To begin with, we consider paths in the integer  
nþm
plane Z2 consisting of horizontal and vertical unit  ½44
n  kðt  s þ 2Þ þ t þ 1
steps in the positive direction. Clearly, the number
of all (unrestricted) paths from  the origin to (n, m) is The enumeration of lattice paths restricted to
the binomial coefficient nþm n . By the reflection regions bounded by hyperplanes has also been
principle, which is commonly attributed to D André considered for other regions, such as quadrants,
(see, e.g., Comtet (1974) p. 22), it follows that the octants, and rectangles, as well as in higher dimen-
number of paths from the origin to (n, m) which do sions. A general result due to Gessel and Zeilberger,
not pass above the line y = x þ t, where m  n þ t, is and Biane, independently, on the number of lattice
given by paths in a chamber (alcove) of an (affine) reflection
n þ m  n þ m  group (see Krattenthaler (2003) for the correspond-
 ½42 ing references and pointers to further results) shows
n nþtþ1
how far one can go when one uses the reflection
Roughly, the reflection principle sets up a bijec- principle. In particular, this result covers [42] and
tion between the paths from the origin to (n, m) [44], the enumeration of lattice paths in quadrants,
which do pass above the line y = x þ t and all paths octants, rectangles, and many other results that have
566 Combinatorics: Overview

appeared (before and after) in the literature. We Enumerating lattice paths with a fixed number
present a particularly elegant (and frequently occur- of maximal straight pieces (which correspond to
ring) special case. (In reflection group language, it runs), is intimately connected to another basic
corresponds to the reflection group of ‘‘type An1 .’’ enumeration problem concerning lattice paths: the
See Humphreys (1990) for terminology and infor- enumeration of lattice paths having a fixed number
mation on reflection groups.) of turns. An effective way to attack the latter problem
Let A = (a1 , a2 , . . . , ad ) and E = (e1 , e2 , . . . , ed ) be is by means of two-rowed arrays (see the survey
points in Zd with a1  a2      ad and e1  article by Krattenthaler (1997), where in particular
e2      ed . The number of all paths from A to E in analogs of the reflection principle for two-rowed
the integer lattice Zd , which consist of positive unit arrays are developed. These imply formulas for the
steps and which stay in the region x1  x2      xd , number of lattice paths with fixed starting points and
equals endpoints and a fixed number of north-east (respec-
!   tively east–north) turns, for unrestricted paths, as
X d
1
ðei  ai Þ ! det ½45 well as for paths bounded by lines. (A north–east turn
1i;jd ðei  aj  i þ j Þ! in a lattice path is a point where the direction changes
i¼1
from ‘‘north’’ to ‘‘east.’’ An east–north turn is defined
The counting problem of the theorem is equiva- analogously.) In particular, analogs of [42]–[44] are
lent to numerous other counting problems. It has known when the number of north–east (respectively
been originally formulated as an n-candidate ballot east–north) turns is fixed.
problem, but it is as well equivalent to counting the These formulas imply for example (see again
number of standard Young tableaux of a given Krattenthaler (1997, section 3.5)) that the number
shape. In the case that all aj ’s are equal, the of lattice paths from the origin to (n, n) which
determinant does in fact evaluate into a closed- never pass above the line y = x þ t and have
form product. In Young tableaux theory, a parti- exactly 2r maximal straight pieces is given by
cular way to write the result is known as the
    
hook-length formula (see, e.g., Stanley (1999), n1 2 nþt1 nt1
corollary 7.21.6). 2 
r1 r2 r
We return to lattice paths in the plane, mention-
  
ing some more closely related results. The first is a nþt1 nt1
result of Mohanty (1979, section 4.2), which  ½49
r1 r1
expresses the number of all lattice paths from the
origin to (n, m) which touch the line y = x þ t with a similar result for the case of 2r þ 1 maximal
exactly r times, never crossing it, as the difference straight pieces. (If t = 0, the numbers in [49] become
   
nþmr nþmr 1  n  n 
 ; r1 ½46
nþt1 nþt n r r1
Not forbidding that the paths cross the bounding and they are known as the Narayana numbers.)
line, we arrive at the problem of counting the lattice Furthermore, they imply that the number of lattice
paths from the origin to (n, m), which cross the main paths from the origin to (n, n) which never pass
diagonal y = x exactly r times, the answer being above the line y = x þ t and never below the line
8   y = x  t and have exactly 2r maximal straight
> m  n þ 2r þ 1 m þ n þ 1 pieces is given by
>
> if m > n
< mþnþ1 nr
½47 X1   
>   n  2kt  1 n þ 2kt  1
>
> 2r þ 2 2n 2
: if m = n rþk1 rk1
n nr1 k¼1
  
Next, we give the number of lattice paths from the n  2kt þ t  1 n þ 2kt  t  1

origin to (n, n) which have 2r steps on one side of rþk2 rk
the line y = x, as   
n  2kt þ t  1 n þ 2kt  t  1
    ½50
2r 2n  2r rþk1 rk1
½48
r nr
with a similar result for the case of 2r þ 1 maximal
a result due to Sparre Andersen. We refer the reader straight pieces.
to Mohanty (1979, chapter 3) for further results in The most general boundary for lattice paths that
this direction. one can imagine is the restriction that it stays
Combinatorics: Overview 567

between two given (fixed) paths. Let us assume that The sequence of polynomials (pn (x))n0 is in fact a
the horizontal steps of the upper (fixed) path are at sequence of orthogonal polynomials (cf. Koekoek
heights a1  a2      an , whereas the horizontal and Swarttouw (1998) and Szego " (1959)).
steps of the lower (fixed) path are at heights b1  We remark that in the case that r = s = 0 there is
b2      bn , ai  bi , i = 1, 2, . . . , n. Then the num- also an elegant expression for the generating func-
ber of all paths from (0, b1 ) to (n, an ) satisfying the tion due to Flajolet (see section V.2 of the Flajolet
property that for all i = 1, 2, . . . , n the height of the and Sedgewick reference in ‘‘Further reading’’) in
ith horizontal step is between bi and ai is given by terms of a continued fraction.
the determinant In order to solve our problem, we just have to
  extract the coefficient of x‘ in [53]. By a partial
a i  bj þ 1
det ½51 fraction expansion, a formula of the type
1i;jn jiþ1 X

In the statistical literature, this formula is often cm
m ½54
m
known as ‘‘Steck’s formula,’’ but it is actually a
special case of a much more general theorem due results, where the
m ’s are the zeroes of pKþ1 (x), and
to Kreweras. A generalization of [51] to higher- the cm ’s are some coefficients, only a finite number
dimensional paths was given by Handa and of them being nonzero.
Mohanty (see Mohanty (1979, section 2.4)). It should be noted that, because of the many
Next, we consider three-step lattice paths in the available parameters (the bn ’s and n ’s), by appro-
integer plane Z2 , that is, paths consisting of up-steps priate specializations one can also obtain numerous
(1, 1), level steps (1, 0), and down-steps (1, 1). The results about enumerating three-step paths accord-
particular problem that we are interested in is to ing to various statistics, such as the number of
count such three-step paths starting at (0, r) and touchings on the bounding lines, etc.
ending at (‘, s), which do not pass below the x-axis There are two important special cases in which a
and do not pass above the horizontal line y = K. completely explicit solution in terms of elementary
Furthermore, we assign the weight 1 to an up-step, functions can be given.
the weight bh to a level-step at height h, and the The first case occurs for bi = 0 and i = 1 for all i.
weight h to a down-step from height h to h  1. In this case, the polynomials pn (x) defined by
The weight w(P) of a path P is defined as the the three-term recurrence [52] are Chebyshev poly-
product of the weights of all its steps. Then we have nomials of the second kind, pn (x) = Un (x=2).
the following result, which can be obtained by the (The Chebyshev polynomial of the second kind
transfer matrix method described in the last section. Un (x) is defined by Un ( cos t) = sin ((n þ 1)t)= sin t
Define the sequence (pn (x))n0 of polynomials by (see Koekoek and Swarttouw (1998) for almost
exhaustive information on these polynomials and,
xpn ðxÞ ¼ pnþ1 ðxÞ þ bn pn ðxÞ þ n pn1 ðxÞ
½52 more generally, on hypergeometric orthogonal poly-
for n  1 nomials)). The result which is then obtained from the
with initial conditions p0 (x) = 1 and p1 (x) = x  b0 . general theorem (clearly, the zeros of Un (x) are
Furthermore, define (Spn (x))n0 to be the sequence of x = cos (2k=(n þ 1)), k = 1, 2, . . . , n, and therefore
polynomials which arises from the sequence (pn (x)) the partial fraction expansion of [53] is easily
by replacing i by iþ1 and bi by biþ1 , i = 0, 1, 2, . . . , determined) is that the number of lattice paths from
everywhere in the three-term recurrence [52] and in (0, r) to (‘, s) with only up- and down-steps, which
the initial conditions. Finally, given a polynomial p(x) always stay between the x-axis and the line y = K, is
of degree n, we denote the corresponding reciprocal given by (see also Feller (1957, chapter XIV, eqn [5.7])
polynomial xn p(1=x) by p (x). Kþ1 
2 X k ‘
With the weight w defined as before, the generat- 2 cos
P K þ 2 k¼1 Kþ2
ing function P w(P)x‘(P) , where the sum is over all
three-step paths which start at (0, r), terminate at kðr þ 1Þ kðs þ 1Þ
 sin sin ½55
height s, do not pass below the x-axis, and do not Kþ2 Kþ2
pass above the line y = K, is given by
a formula which goes back to Lagrange.
8 sr The second case occurs for bi = 1 and i = 1 for
>
> x pr ðxÞSsþ1 p Ks ðxÞ
< ; rs all i. In this case, the polynomials pn (x) defined
p ðxÞ
Kþ1
½53 by the three-term recurrence [52] are again
>
> xrs p s ðxÞSrþ1 p Kr ðxÞ
: r    sþ1 ; rs Chebyshev polynomials of the second kind,
p Kþ1 ðxÞ pn (x) = Un ((x  1)=2). The result which is then
568 Combinatorics: Overview

obtained from the general theorem is that the The latter equality shows in particular that Pfaffians
number of three-step lattice paths from (0, r) to are very close to determinants. They do, in fact,
(‘, s), which always stay between the x-axis and the generalize determinants since
line y = K, is given by  
0 B
Pf ¼ det B ½59
Kþ1 ‘ B 0
2 X k
2 cos þ1
K þ 2 k¼1 Kþ2 for any square matrix B.
Thus, given a graph with vertices v1 , v2 , . . . , v2n ,
kðr þ 1Þ kðs þ 1Þ
 sin sin ½56 specializing ai, j to the weight of the edge between vi
Kþ2 Kþ2 and vj , if it exists, and setting ai, j = 0 otherwise in
the definition of the Pfaffian, we obtain almost
Mw (G), the only difference is that there could be
signs in front of the individual terms of the sum,
Perfect Matchings and Tilings whereas in Mw (G) the sign in front of each term
In this section we consider the problem of counting must be þ. (The object obtained by omitting the sign
the perfect matchings of a graph. For an introduc- in [57] is called Hafnian. Unfortunately, in contrast
tion into the problem, and into methods to solve it, to the Pfaffian, it does not have any nice properties
as well as for a report on recent developments, we and it is therefore extremely difficult to compute.)
refer the reader to Propp (1999). Kasteleyn’s idea is to circumvent this problem by
Let G = (V, E) be a finite loopless graph with orienting the edges of the graph, defining signed
vertex set V and edge set E. A matching (also called weights of the edges, in such a way that the Pfaffian
1-factor in graph theory) is a subset of the edges of the array with signed weights produces exactly
with the property that no two edges share a vertex. Mw (G).
A matching is perfect if it covers all the edges. More precisely, given a (weighted) graph G with
Let M(G) denote the number of perfect matchings of vertices v1 , v2 , . . . , v2n , we make it into an oriented
!
the graph G. More generally, we could assign a (weighted) graph G . That is, if there is an edge
weight w(e) to each edge e of the graph and define the between vi and vj , ei, j say, we orient it either from vi
weight of a matching to be the product of to vj or the other way. Now we define the signed
! !
the weights of all its edges. Let Mw (G) denote adjacency matrix A(G ) of G by letting its (i, j)-entry
the sum of all weights of all matchings of the to be þw(ei, j ) if there is an edge from vi to vj
graph G. oriented that way, w(ei, j ) if there is an edge from
Kasteleyn’s method for determining M(G), respec- vj to vi oriented that way, and 0 if there is no edge
tively Mw (G), makes use of determinants and between vi and vj . Such an orientation is called
Pfaffians. Recall that the Pfaffian Pf(A) of a Pfaffian if
triangular array A = (ai, j )1i<j2n is defined by !
X Y PfðAðG ÞÞ ¼ Mw ðGÞ
PfðAÞ ¼ ðsgn mÞ i;j ½57
m fi;jg2m Clearly, the question remains whether a Pfaffian
orientation can be found for a given graph. In
where the sum is over all perfect matchings of the general, this is an open question. However, Kaste-
complete graph on vertices {1, 2, . . . , 2n}, and where leyn shows that for planar graphs such a Pfaffian
the product is over all edges {i, j}, i < j, of m. The orientation can always be found. Moreover, he
sign sgn m of m is (1)#crossings of m , where a crossing shows that any orientation of a planar graph
is a pair ({i, j}, {k, l}) of edges such that i < k < j < l. which has the property that around any face
Usually, one extends the triangular array A to a bounded by 4k edges an odd number of edges is
matrix by setting aj, i = ai, j , i < j, and ai, i = 0 for oriented in either direction and that around any face
all i. Then, abusing notation, we identify the bounded by 4k þ 2 edges an even number of edges is
triangular array with the skew-symmetric matrix oriented in either direction is Pfaffian.
A = (ai, j )1i, j2n . The Pfaffian satisfies the following For bipartite graphs (i.e., for graphs in which the set
useful properties: of vertices can be split into two disjoint sets such that
all the edges connect the vertex of one of these sets to a
PfðBt ABÞ ¼ detðBÞ PfðAÞ
vertex of the other), the situation is even nicer. This is
and because for a bipartite graph G in which both parts of
the bipartition of the vertices are of the same size
PfðAÞ2 ¼ detðAÞ ½58 (otherwise, there is no perfect matching), any signed
Combinatorics: Overview 569

!
adjacency matrix A(G ) has the block form of the denote the set of all walks in G from u to v by
matrix on the left-hand side of [59] and, hence, the P(u ! v), and the set of all families (P1 , P2 , . . . , Pn )
Pfaffian reduces to a determinant. More precisely, let of walks, where Pi runs from ui to vi , i = 1, 2, . . . , n,
G be a bipartite graph with vertex set V = U [ W, by P(u ! v), with u = (u1 , u2 , . . . , un ) and v = (v1 ,
U = {u1 , u2 , . . . , un } and W = {w1 , w2 , . . . , wn }, with v2 , . . . , vn ). The symbol P þ (u ! v) stands for the set
edges connecting some ui to some wj . Given a of all families (P1 , P2 , . . . , Pn ) in P(u ! v) with the
!
Pfaffian orientation G , we build the signed bipartite additional property that no two walks share a
! !
adjacency matrix B(G ) = (bi, j )1i, jn of G by setting vertex. We call such families of walk(er)s ‘‘vicious
bi, j = þw(ei, j ) if there is an edge from ui to wj oriented walkers’’ or, alternatively, ‘‘nonintersecting paths.’’
that way, w(ei, j ) if there is an edge from uj to wi The weight w(P) of a family PQ = (P1 , P2 , . . . , Pn ) of
oriented that way, and 0 if there is no edge between ui walks is defined as the product ni= 1 w(Pi ) of all the
and wj . Then we have weights of the walks in the family. Finally, given a
! set M with weight functionP w, we write GF(M; w)
detðBðG ÞÞ ¼ Mw ðGÞ for the generating function x2M w(x).
In particular, this holds for any bipartite planar We need two further notations before we are able
graph. See Robertson et al. (1999) for a structural to state the Lindström–Gessel–Viennot theorem.
description about which (not necessarily planar) (For references and historical remarks, we refer the
bipartite graphs admit a Pfaffian orientation. reader to footnote 5 in Krattenthaler (2005a).) As
Kasteleyn’s construction in the planar case has earlier, the symbol S n denotes the symmetric group
been generalized to graphs on surfaces of any genus of order n. Given a permutation  2 S n , we write u
g in Dolbilin et al. (1996), Galluccio and Loebl for (u(1) , u(2) , . . . , u(n) ). Then
(1999), and Tesler (2000), independently. As pre- X
dicted by Kasteleyn, the solution is in terms of a ðsgn Þ  GFðP þ ðu ! vÞ; wÞ
linear combination of 4g Pfaffians. 2S n
 
With the help of his method, Kasteleyn computed ¼ det GFðPðuj ! vi Þ; wÞ ½60
1i;jn
the number of dimer coverings of an m  n
rectangle. (A dimer is a 2  1 rectangle. Thus, this Most often, this theorem is applied in the case
is equivalent to counting the number of perfect where the only permutation  for which vicious
matchings on the m  n grid graph. The formula walks exist is the identity permutation, so that the
was independently found by Temperley and Fisher.) sum on the left-hand side reduces to a single term
The result is that counts all families (P1 , P2 , . . . , Pn ) of vicious
Ym Y n  pffiffiffiffiffiffiffi
 walks, the ith walk Pi running from Ai to
i j
2 cos þ 2 1 cos Ei , i = 1, 2, . . . , n. This case occurs, for example, if
i¼1 j¼1
mþ1 nþ1
for any pair of walks (P, Q) with P running from ua
For even m and n, the formula can be rewritten as to vd and Q running from ub to vc , a < b and c < d,
it is true that P and Q must have a common vertex.
m=2
YY n=2  
i j Explicitly, in that case we have
4 cos2 þ 4 cos2
mþ1 nþ1  
i¼1 j¼1
GFðP þ ðu ! vÞ;wÞ ¼ det GFðPðuj ! vi Þ; wÞ ½61
1i;jn
There is a similar rewriting if one of m or n is odd.
(If both m and n are odd, there is no dimer If the starting points or/and the endpoints are not
covering.) fixed, then the corresponding number is given by a
For further reading and references see Dimer Pfaffian, a result obtained by Okada and Stembridge
Problems and Kuperberg (1998). (see Bressoud (1999) for references). For a set A of
starting points, let P þ (A ! v) denote the set of all
families (P1 , P2 , .. ., P2n ) of nonintersecting lattice
Nonintersecting Paths
paths, where Pi runs from some point of A to
Let G = (V, E) be a directed acyclic graph with vi , i = 1, 2, ..., 2n. Furthermore, let us suppose that
vertices V and directed edges E. Furthermore, we are the elements of A = {u1 , u2 , ...} are ordered in such a
given a function w which assigns a weight w(x) to way that for any pair of walks (P, Q) with P running
every vertex or edge x. Let usQdefine the
Q weight w(P) from ua to vd and Q running from ub to vc , a < b and
of a walk P in the graph by e w(e) v w(v), where c < d, it is true that P and Q must have a common
the first product is over all edges e of the walk P and vertex. (This is the same condition as the one which
the second product is over all vertices v of P. We makes [61] valid, with the only difference that, here,
570 Combinatorics: Overview

the number of ui ’s could be larger than the number of The second model could also be realized as a
vi ’s.) Then, single walker model (cf. Krattenthaler (2003)).
However, most often it is realized as a model of n
GFðP þ ðA ! vÞ;wÞ paths in the plane consisting of steps (1, 1) and
X (1, 1) with the property that no two paths have a
¼ Pf GFðPðua ! vi Þ;wÞGFðPðub ! vj Þ;wÞ point in common. In this picture, the x-axis becomes
1i;j2n
a<b
 the time line, the kth path doing an up-step (1, 1)
GFðPðub ! vi Þ; wÞGFðPðua ! vj Þ;wÞ ½62 from (t  1, y) to (t, y þ 1) meaning that the kth
If the number of paths is odd, then one can use the particle moves to the left at time t, whereas the kth
same formula by adding an artificial point to the path doing a down-step (1, 1) from (t  1, y) to
endpoints and to the set of starting points A. There (t, y  1) meaning that the kth particle moves to the
is also a theorem by Okada and Stembridge which right at time t.
covers the case that starting points and endpoints The reader should consult Figure 14a for an
vary. Refinements when the number of turns is fixed example. (The labelings should be ignored at this
can be found in Krattenthaler (1997). point.) Clearly, what we encounter here is a
particular instance of the nonintersecting paths of
the last section. Therefore, for fixed starting points
Vicious Walkers, Plane Partitions, and endpoints, formula [61] applies, whereas if the
starting points vary and the endpoints are fixed, it is
Rhombus Tilings, and Fully Packed
formula [62] that applies.
Loop Configurations
At this point, the links to the other objects,
In this section we describe the interrelations between semistandard tableaux and plane partitions
four frequently appearing objects in statistical (cf. Bressoud (1999)), emerge. A filling of the cells
mechanics and combinatorics: vicious walkers, of the Ferrers diagram of  with elements of the set
plane partitions, rhombus tilings, and fully packed {1, 2, . . . }, which is weakly increasing along rows
loop configurations. and strictly increasing along columns is called a
Given a lattice, vicious walkers, as introduced by (semistandard) tableau of shape . Figure 14b shows
Fisher (1984), are particles which move on lattice such a semistandard tableau of shape (4, 3, 2). In
sites in such a way that two particles never occupy fact, vicious walkers and semistandard tableaux are
the same lattice site. Models of vicious walkers have equivalent objects. To see this, first label down-steps
been the object of numerous studies from various by the x-coordinate of their endpoint, so that a step
points of view. Rather than accomplishing the from (a  1, b) to (a, b  1) is labeled by a, see
impossible task of providing a complete overview Figure 14a. Then, out of the labels of the jth path,
of references, the reader is referred to the basic form the jth column of the corresponding tableau,
reference Fisher (1984) and to Krattenthaler (2005a)
for further pointers to the literature.
Most of the known results apply for vicious
walkers on the line. There are in fact two different
6
models: in the random turns vicious walker model, n
E4
walkers move on the integral points of the real line
in such a way that at each tick of the clock exactly
one walker moves to the right or to the left, whereas
in the lock step vicious walker model n walkers 4 6
move on the integral points of the real line in such a A4 E3
way that at each tick of the clock each walker moves
to the right or to the left. A3 3
The first model is equivalent to a model of one 4 6
walker in Zn (Z denoting the set of integers) which A2 E2
at each tick of the clock moves a positive or negative 2 4
unit step in the direction of one of the coordinate A1 5 E1
2 3 4 6
axes, always staying in the wedge x1 > x2 >    > 4 4 6
xn . This point of view was already put forward by 5 6
Fisher (1984). However, this problem belongs to the
problem of counting paths in chambers of reflection (a) (b)
groups discussed in the section ‘‘Lattice paths.’’ Figure 14 (a) Vicious walkers. (b) A tableau.
Combinatorics: Overview 571

see Figure 14b. The resulting array of numbers is


indeed a semistandard tableau. This can be readily
seen, since the entries are trivially strictly increasing
along columns, and they are weakly increasing along
rows because the paths do not touch each other.
Thus, problems of enumerating vicious walkers can
(a) (b)
be translated into tableau enumeration problems,
and vice versa. Figure 16 (a) A plane partition; three-dimensional view.
(b) A rhombus tiling.
The significance of semistandard tableaux lies
particularly in the representation theory for classical
groups, see Classical Groups and Homogenous which opposite sides have the same length, see
Spaces and Compact Groups and Their Representa- Figure 16b.
tions. Namely, the irreducible characters for From the rhombus tiling, there is then again an
GL(n, C) and SL(n, C), the Schur functions, are elegant way to go to nonintersecting paths: we mark
generating functions for semistandard tableaux of the mid-points of the edges along two opposite sides,
a given shape. If the entries of the ith row of see Figure 17a. Now we draw lattice paths which
a semistandard tableau are required to be at least connect points on different sides, by ‘‘following’’
2i  1, then one speaks of symplectic tableaux, and along the other lozenges, as indicated in Figure 17a
the irreducible characters for Sp(2n, C) are generat- by the dashed lines. Clearly, the resulting paths are
ing functions for symplectic tableaux of a given nonintersecting, that is, no two paths have a
shape. We refer the reader to Krattenthaler et al. common vertex. If we slightly distort the underlying
(2000) for more information on these topics. lattice, we get orthogonal paths with horizontal and
Objects which are very close to semistandard vertical steps in the positive direction, see
tableaux are plane partitions. According to MacMa- Figure 17b.
hon, a plane partition of shape  is a filling of the Rhombus tilings, on their part, are equivalent to
Ferrers diagram of  with non-negative integers which perfect matchings of hexagonal graphs. To see this,
is weakly decreasing along rows and columns. See one places the tiling on the underlying triangular
Figure 15b for an example of a plane partition of shape grid, see Figure 18a. Then one places a bond into
(3, 3, 3). In particular, semistandard tableaux and each rhombus, so that it connects the mid-points of
plane partitions of rectangular shape are actually the two triangles out of which the rhombus is
equivalent. For, let T be a semistandard tableau of composed, see Figure 18b. Finally, one forgets the
rectangular shape. Then, from each element of the ith contour of the tiling, but instead one introduces all
row we subtract i. Finally, the obtained array is rotated the other edges which connect mid-points of
by 180 . As a result, we obtain a plane partition. See adjacent triangles of the triangular grid, see
Figure 15 for a semistandard tableau and a plane Figure 18c. Thus, one arrives at a perfect matching
partition which correspond to each other under these of the hexagonal graph consisting of the edges
transformations. connecting mid-points of triangles.
On the other hand, plane partitions can also be Because of these various connections, enumera-
realized as three-dimensional objects, by interpreting tion problems for vicious walkers, plane partitions,
each entry in the array as a pile of unit cubes of the tableaux, rhombus tilings can be approached by
size of the entry. For example, the plane partition in the different methods which are available for the
Figure 15 corresponds to the pile of cubes in various objects: the determinant theorem from
Figure 16a. But then, forgetting the three-dimensional the section ‘‘Nonintersecting paths,’’ together
view, by embedding the picture in a minimally with determinant evaluation techniques (cf. the
bounding hexagon, and by filling the emerging empty survey Krattenthaler (2005b)), apply, as well as the
regions by rhombi of unit length in the unique way this ‘‘Kasteleyn method’’ from the section ‘‘Perfect
is possible, we obtain a rhombus tiling of a hexagon in

1 1 2 2 2 1
3 3 3 1 1 1
4 5 5 1 0 0
(a) (b)
(a) (b)
Figure 17 (a) A rhombus tiling. (b) A family of nonintersecting
Figure 15 (a) A semistandard tableau. (b) A plane partition. paths.
572 Combinatorics: Overview

Thus, the number of rhombus tilings of a hexagon


with side lengths a,b,c,a,b,c is given by the same
number, as well as the number of all vicious walkers
(P1 , P2 , . . . , Pa ), where Pi runs from (0, 2i) to (b þ c,
b  c þ 2i), i = 1, 2, . . . , a. More generally, the num-
ber of semistandard tableaux of shape  with entries
at most m is given by the hook-content formula
Y cðuÞ þ m
(a) (b) ½64
u2
hðuÞ

where u ranges over all the cells of the Ferrers


diagram of , with c(u) being the content of u,
defined as the difference of the column number and
the row number of u, and with h(u) being the hook
length of u, defined as the number of cells in the
hook of u, the latter consisting of the cells to the
right of u in the same row and below u in the
same column, including u. Thus, this also gives a
(c)
formula for the number of all vicious walkers
Figure 18 (a) A rhombus tiling. (b) Bonds in rhombi.
(c) A perfect matching of a hexagonal graph.
(P1 , P2 , . . . , Pa ), where Pi runs from (0, 2i) to
(N, hi ). See Krattenthaler et al. (2000, section 2)
for details. There it is also explained that a Schur
matchings and tilings,’’ and also methods from function summation formula, together with an
character theory for the classical groups. All analog of the hook-content formula for special
of these methods have been applied extensively (see orthogonal characters, proves that the number of
the surveys by Kenyon (2003), Propp (1999), and all vicious walkers (P1 , P2 , . . . , Pa ), where Pi runs
Krattenthaler et al. (2000)), the first and third more from (0, 2i) for N steps is given by
frequently for exact enumeration, while the second
Y aþiþj1
particularly for asymptotic studies. It should be ½65
noted that methods from random matrix theory also 1ijN
iþj1
apply in certain situations, see Johansson (2002). See
Growth Processes in Random Matrix Theory and The reader is referred to the references given in
Random Matrix Theory in Physics. this section for many more results, in particular, on
In fact, we missed mentioning a further object, from the enumeration of plane partitions with symmetry,
statistical physics, which in some cases is equivalent to the enumeration of rhombus tilings of regions other
vicious walkers, etc.: fully packed loop configurations. than hexagons, and the enumeration of vicious
(Fully packed loop configurations are in bijection with walkers with various starting points and endpoints,
six-vertex configurations, see the next section.) If one under various constraints.
imposes certain ‘‘connectivity constraints’’ on fully
packed loop configurations, then one can construct
bijections with rhombus tilings and, hence, with
Six-Vertex Model and Alternating-Sign
nonintersecting paths and with the other objects
discussed in this section. The reader is referred to
Matrices
Di Francesco et al. (2004) and references therein. An alternating-sign matrix is a square matrix of 0’s,
Having explained the various connections, we cite 1’s and 1’s for which the sum of entries in each
some fundamental results in the area. (We refer the row and in each column is 1 and the nonzero entries
reader to Bressoud (1999) and Stanley (1999, of each row and of each column alternate in sign.
chapter 7).) MacMahon proved that the number of For instance,
all plane partitions contained in an a  b  c box 0 1
(when viewed in three dimensions) is equal to 0 0 1 0 0
B 1 0 1 1 0C
B C
Y
a Y
b Y
c
iþjþk1 B0 1 0 1 1C
B C
½63 @0 0 1 0 0A
i¼1 j¼1 k¼1
iþjþk2
0 0 0 1 0
Combinatorics: Overview 573

is a 5  5 alternating-sign matrix. Zeilberger proved ⎛0 1 0⎛


that the number of n  n alternating-sign matrices is ⎜ ⎜
⎜ 1 –1 1 ⎜
given by ⎜ ⎜
⎜ ⎜
Y
n1 ⎝ 0 1 0 ⎝
ð3i þ 1Þ!
½66
i¼0
ðn þ iÞ! (a) (b)
Figure 20 (a) An alternating-sign matrix. (b) A six-vertex
and he went on to prove the finer version that the
configuration.
number of n  n alternating-sign matrices with the
(unique) 1 in the first row in position j is given by
nþj22nj1 n1 incident to exactly two edges. One obtains a fully
Y ð3i þ 1Þ! packed loop configuration out of a six-vertex config-
n1
3n2n1
 ½67
n1 i¼0
ðn þ iÞ! uration by dividing the square lattice into its even and
odd sublattice denoted by A and B, respectively.
The first number is also equal to the number of Instead of arrows, only those edges are drawn that,
totally symmetric self-complementary plane parti- on sublattice A, point inward and, on sublattice B,
tions contained in the (2n)  (2n)  (2n) box, but point outward. The reader is referred to de Gier
there is no intrinsic explanation why this is so. We (2005) and Di Francesco et al. (2004) for further
refer the reader to Bressoud (1999) for an exposi- reading.
tion of these results, and for pointers to the The story of alternating-sign matrices and their
literature containing further unexplained connec- connection to the six-vertex model is given a vivid
tions between alternating-sign matrices and plane account in Bressoud (1999), with further important
partitions. results by Kuperberg, Okada, Razumov and
While the first result was achieved by a brute-force Stroganov, referenced in Razumov and Stroganov
constant-term approach, the second result is based on (2005).
the observation that alternating-sign matrices are in Fully packed loop configurations seem to play an
bijection with configurations in the six-vertex model important role in the explicit form of the ground-
on the square grid under domain-wall boundary state vectors of certain Hamiltonians in the dense
conditions. This then allowed one to use a formula O(1) loop model. The corresponding conjectures are
due to Izergin for the partition function for these six- surveyed in de Gier (2005). There is important
vertex configurations. Similar formulas for variations progress on these conjectures by Di Francesco and
of the model have been found by Kuperberg, and by Zinn–Justin (2005, and references therein).
Razumov and Stroganov (see Razumov and Stroga-
nov (2005) and references therein).
A configuration in the six-vertex model is an
orientation of edges of a 4-regular graph (i.e., at Binomial Sums and Hypergeometric Series
each vertex there meet exactly four edges) such that When dealing with enumerative problems, it is
at each vertex two edges are oriented towards the inevitable to deal with binomial sums, that is, sums
vertex and two are oriented away from the vertex. in which the summands are products/quotients of
Thus, there are six possible vertex configurations, binomial coefficients and factorials, such as, for
giving the name of the model, see Figure 19. To go example,
from one object to the other, one uses the transla- n   
X 2k 2n  2k
tion between local configurations at a vertex and
entries in alternating-sign matrices indicated in the k nk
k¼0
figure. An example of the correspondence can be
found in Figure 20. In most cases, the right environment in which one
Another manifestation of alternating-sign matrices should work is the theory of (generalized) hypergeo-
and six-vertex configurations are fully packed loop metric series. These are defined as follows:
2 3
configurations. A fully packed loop configuration on a a1 ; . . . ; ar X1
ða1 Þk    ðar Þk zk
graph is a collection of edges such that each vertex is F
r s
4 ; z 5 ¼
ðb1 Þk    ðbs Þk k!
b1 ; . . . ; bs k¼0

where ()k = ( þ 1)( þ 2)    ( þ k  1) for k >


0, and ()0 = 1. The symbol ()k is called the
0 0 0 0 –1 1 Pochhammer symbol or shifted factorial. For in-
Figure 19 The six vertex configurations. depth treatments of the subject, we refer the reader
574 Combinatorics: Overview

2 3
to Andrews et al. (1999), Gasper and Rahman a;a=2 þ 1;b;c;d;1 þ 2a  b  c  d þ n;n
(2004), and Slater (1966). 6 7
6 ;1 7
Hypergeometric series can be characterized as 6 7
6
7 F6 6
7
series in which the quotient of the (k þ 1)st by the a=2; 1 þ a  b; 1 þ a  c; 1 þ a  d; 7
6 7
4 5
kth summand is a rational function in k. This is also
the way to convert binomial sums into their  a þ b þ c þ d  n; a þ 1 þ n
hypergeometric form (respectively to see if this is ð1 þ aÞn ð1 þ a  b  cÞn ð1 þ a  b  dÞn ð1 þ a  c  dÞn
possible; in most cases it is): form the quotient of the ¼
ð1 þ a  bÞn ð1 þ a  cÞn ð1 þ a  dÞn ð1 þ a  b  c  dÞn
(k þ 1)st by the kth summand and read off the
parameters a1 , . . . , ar , b1 , . . . , bs , and the argument z provided n is a non-negative integer.
from the factorization of the numerator and the Some of the most important transformation
denominator polynomials of the rational function, formulas are
out of these form the corresponding hypergeometric the Euler transformation formula
series, and multiply the series by the summand for 2 3 2 3
a;b c  a;c  b
k = 0. This is, in fact, a completely routine task, and, 2 F1
4 ;z 5 ¼ ð1  zÞcab 2 F1 4 ;z 5
indeed, computer algebra programs such as Maple c c
and Mathematica do this automatically.
The reason why hypergeometric series are much provided jzj < 1,
more fundamental than the binomial sums them- the Kummer transformation formula
selves is that there are hundreds of ways to write the 2 3
a; b; c
same sum using binomial coefficients and factorials, 6 7 ðeÞðd þ e  a  b  cÞ
whereas there is just one hypergeometric form, that 3 F2 4 ; 1 5¼
ðe  aÞðd þ e  b  cÞ
is, hypergeometric series are a kind of normal form d; e
2 3
for binomial sums. In particular, given a specific a; d  b; d  c
binomial sum, it is a hopeless enterprise to scan 6 7
 3 F2 4 ;15
through all the identities available in the literature
for this sum. There may be an identity for it, but d; d þ e  b  c
perhaps written differently. On the contrary, given a provided both series converge,
specific hypergeometric series, the list of available and the Whipple transformation formulas
identities which apply to this series is usually not 2 3
large, and tables of such identities can be set up in a;b;c;n
6 7
a systematic way. This has been done (cf. Slater 6
4 F3 4 ;1 7
5
(1966); the most comprehensive table available to e;f ;1 þ a þ b þ c  e  f  n
this date is contained in the manual of
ðe  aÞn ðf  aÞn
the Mathematica package HYP – see ‘‘Further ¼
ðeÞn ðf Þn
reading’’), and scanning through these tables is 2 3
largely facilitated by the use of the Mathematica n;a;1 þ a þ c  e  f  n;1 þ a þ b  e  f  n
6 7
package HYP. 6
 4 F3 4 ;1 7
5
We give here some of the most important
1 þ a þ b þ c  e  f  n;1 þ a  e  n;1 þ a  f  n
identities for hypergeometric series. Aside from the
binomial theorem, the most important summation ½68
formulas are: the Gauß 2 F1 -summation formula where n is a non-negative integer, and
2 3 2 3
a; b a; 1 þ 2a ; b; c; d; e; n
4 ðcÞðc  a  bÞ
2 F1 ;15¼ 6
6
7
ðc  aÞðc  bÞ 7 F6 4 ;17
5
c
a
2 ; 1 þ a  b; 1 þ a  c; 1 þ a  d; 1 þ a  e; 1 þ a þ n
provided <(c  a  b) > 0,
the Pfaff–Saalschütz summation formula ð1 þ aÞn ð1 þ a  d  eÞn
¼
ð1 þ a  dÞn ð1 þ a  eÞn
2 3
a; b; n 2 3
ðc  aÞn ðc  bÞn 1 þ a  b  c; d; e; n
3 F2
4 ;15¼ 6 7
ðcÞn ðc  a  bÞn  4 F3 6 ;17 ½69
c; 1 þ a þ b  c  n 4 5
1 þ a  b; 1 þ a  c; a þ d þ e  n
provided n is a non-negative integer, and
the Dougall summation formula provided n is a non-negative integer.
Combinatorics: Overview 575

Since about 1990, for the verification of binomial may now sum both sides of [71] over k to obtain a
and hypergeometric series, there are automatic tools recurrence of the form [70].
available. The book by Petkovšek et al. (1996) is an Algorithms for multiple sums are also available
excellent introduction into these aspects. The philo- (see ‘‘Further reading’’). They follow ideas by Wilf
sophy is as follows. Suppose we are P given a binomial and Zeilberger (1992) (of which a simplified
or hypergeometric series S(n) = k F(n, k). The version is presented in a Mohammed and Zeilber-
Gosper–Zeilberger algorithm (see ‘‘Further read- ger preprint (see ‘‘Further reading’’)); however, they
ing’’) (cf. Petkovšek et al. (1996); a simplified run more quickly in capacity problems. Schneider
version was presented in the reference Zeilberger in (2005) is currently developing a very promising
‘‘Further reading’’) will find a linear recurrence new algorithmic approach to the automatic treat-
ment of multisums. See q-Special Functions and
A0 ðnÞSðnÞ þ A1 ðnÞSðn þ 1Þ þ    Statistical Mechanics and Combinatorial Problems.
þ Ad ðnÞSðn þ dÞ ¼ CðnÞ ½70
See also: Classical Groups and Homogeneous Spaces;
for some d, where the coefficients Ai (n) are Compact Groups and Their Representations; Dimer
polynomials in n, and where C(n) is a certain Problems; Growth Processes in Random Matrix Theory;
function in n, with proof ! Ordinary Special Functions; q-Special Functions; Saddle
If, for example, we suspected that S(n) = RHS(n), Point Problems; Statistical Mechanics and Combinatorial
where RHS(n) is some closed-form expression, then Problems.
we just have to verify that RHS(n) satisfies the
recurrence [70] and check S(n) = RHS(n) for suffi-
ciently many initial values of n to have a proof for Further Reading
the identity S(n) = RHS(n) for all n. On the other http://algo.inria.fr This site includes, among its libraries, the
hand, if RHS(n) was a different sum, then we would Maple program gdev.
apply the algorithm to find a recurrence for RHS(n). Andrews GE (1976) The Theory of Partitions, Encyclopedia of
Mathematics and Its Applications, vol. 2. (reprinted by Cambridge
If it turns out to be the same recurrence then, again,
University Press, Cambridge, 1998). Reading: Addison–Wesley.
a check of S(n) = RHS(n) for a few initial values will Andrews GE, Askey RA, and Roy R (1999) In: Rota GC (ed.)
provide a full proof of S(n) = RHS(n) for all n. Special Functions, Encyclopedia of Mathematics and Its
Even in the case that we do not have a conjectured Applications, vol. 71. Cambridge: Cambridge University Press.
expression RHS(n), this is not the end of the story. Ayoub R (1963) An Introduction to the Analytic Theory of
Numbers. Mathematical Surveys, vol. 10, Providence, RI:
Given a recurrence of the type [70], the Petkovšek
American Mathematical Society.
algorithm (see ‘‘Further reading’’) (cf. Petkovšek et al. Bergeron F, Labelle G, and Leroux P (1998) Combinatorial Species
(1996)) is able to find a closed-form solution (where and Tree-Like Structures. Cambridge: Cambridge University Press.
‘‘closed form’’ has a precise meaning), respectively tell Bousquet-Mélou M and Jehanne A (2005), Polynomial equations
that there is no closed-form solution. with one catalytic variable, algebraic series, and map
enumeration. Preprint, ar iv:math.CO/0504018.
The fascinating point about both algorithms is
Bressoud DM (1999) Proofs and Confirmations – The Story of
that neither do we have to know what the algorithm the Alternating Sign Matrix Conjecture. Cambridge: Cam-
does internally nor do we have to check that. For bridge University Press.
the Petkovšek algorithm, this is obvious anyway de Bruijn NG (1964) Pólya’s theory of counting. In: Beckenbach
because, once the computer says that a certain EF (ed.) Applied Combinatorial Mathematics, New York:
Wiley, (reprinted by Krieger, Malabar, Florida, 1981).
expression is a solution of [70], it is a routine matter
Comtet L (1974) Advanced Combinatorics. Dordrecht: Reidel.
to check that. This is less obvious for the Gosper– Dolbilin NP, Mishchenko AS, Shtan’ko MA, Shtogrin MI, and
Zeilberger algorithm. However, what the Gosper– Zinoviev YuM (1996) Homological properties of dimer
Zeilberger
P algorithm does is, for a given sum
configurations for lattices on surfaces. Functional Analysis
S(n) = k F(n, k), it finds polynomials A0 (n), and its Application 30: 163–173.
Feller W (1957) An Introduction to Probability Theory and Its
A1 (n), . . . , Ad (n) and an expression G(n, k) (which
Applications, vol. 1, 2nd edn. New York: Wiley.
is, in fact, a rational multiple of F(n, k)), such that Fisher ME (1984) Walks, walls, wetting and melting. Journal of
Statistical Physics 34: 667–729.
A0 ðnÞFðn; kÞ þ A1 ðnÞFðn þ 1; kÞ þ    Flajolet P and Sedgewick R, Analytic Combinatorics, book
þ Ad ðnÞFðn þ d; kÞ ¼ Gðn; k þ 1Þ  Gðn; kÞ ½71 project, available at http://algo.inria.fr.
Di Francesco P, Zinn-Justin P and Zuber J.-B. (2004), Determi-
for some d. Because of the properties of F(n, k) and nant formulae for some tiling problems and application to
G(n, k), which are part of the theory, this is an fully packed loops, Preprint, ar iv:math-ph/0410002.
Di Francesco P and Zinn-Justin P (2005), Quantum Knizhnik–
identity which can be directly verified by clearing all Zamolodchikov equation, generalized Razumov–Stroganov
common factors and checking the remaining identity sum rules and extended Joseph polynomials. Preprint,
between rational functions in n and k. However, we ar iv:math-ph/0508059.
576 Compact Groups and Their Representations

Galluccio A and Loebl M (1999) On the theory of Pfaffian Pemantle R and Wilson MC, Twenty combinatorial examples of
orientations I. Perfect matchings and permanents. Electronic asymptotics derived from multivariate generating functions.
Journal of Combinatorics 6: Article #R6, 18 pp. Preprint, available at http://www.cs.auckland.ac.nz.
http://www.fmf.uni-lj.si – website of Faculty of Mathematics of Petkovšek M, Wilf H, and Zeilberger D (1996) A ¼ B Wellesley:
University of Ljubljana. A Mathematica implementation by Peters AK.
Marko Petkovšek is available here. http://www.mat.univie.ac.at – Website of Faculty of Mathematics,
Gasper G and Rahman M (2004) Basic Hypergeometric Series, University of Vienna. It provides the manual of the Mathe-
2nd edn. Encyclopedia of Mathematics and Its Applications, matica package HYP.
vol. 96. Cambridge: Cambridge University Press. Propp J (1999) Enumeration of matchings: problems and progress.
de Gier J (2005) Loops matchings and alternating-sign matrices. In: Billera L, Björner A, Greene C, Simion R, and Stanley RP
Discrete Mathematics 365–388. (eds.) New Perspectives in Algebraic Combinatorics, Mathe-
Humphreys JE (1990) Reflection Groups and Coxeter Groups. matical Sciences Research Institute Publications, vol. 38,
Cambridge: Cambridge University Press. pp. 255–291. Cambridge: Cambridge University Press.
Johansson K (2002) Non-intersecting paths, random tilings and Razumov AV and Stroganov YG (2005) Enumeration of quarter-
random matrices. Probability Theory and Related Fields turn symmetric alternating-sign matrices of odd order.
123: 225–280. Preprint, ar iv:math-ph/0507003.
Kenyon R (2003) An Introduction to the Dimer Model, Lecture Notes Robertson N, Seymour PD, and Thomas R (1999) Permanents,
for a Short Course at the ICTP, 2002; ar iv:math.CO/0310326. Pfaffian orientations, and even directed circuits. Annals of
Koekoek R and Swarttouw RF, The Askey–scheme of hypergeo- Mathematics 150(2): 929–975.
metric orthogonal polynomials and its q-analogue, TU Delft, Schneider C (2005) A new Sigma approach to multi-summation.
The Netherlands, 1998; on the www: http://aw.twi.tudelft.nl. Advances in Applied Mathematics 34(4): 740–767.
Krattenthaler C (1997) The enumeration of lattice paths with Slater LJ (1966) Generalized Hypergeometric Functions.
respect to their number of turns. In: Balakrishnan N (ed.) Cambridge: Cambridge University Press.
Advances in Combinatorial Methods and Applications to Stanley RP (1986) Enumerative Combinatorics, Pacific Grove,
Probability and Statistics, pp. 29–58. Boston: Birkhäuser. CA: Wadsworth & Brooks/Cole, (reprinted by Cambridge
Krattenthaler C (2003), Asymptotics for random walks in alcoves University Press, Cambridge, 1998).
of affine Weyl groups. Preprint, ar iv:math.CO/0301203. Stanley RP (1999) Enumerative Combinatorics, vol. 2. Cambridge:
Krattenthaler C (2005a), Watermelon configurations with wall Cambridge University Press.
interaction: exact and asymptotic results. Preprint, Szego" G (1959) Orthogonal Polynomials, American Mathematical
ar iv:math.CO/0506323. Society Colloquium Publications, vol. 23. New York. Provi-
Krattenthaler C (2005b) Advanced determinant calculus: a dence RI: American Mathematical Society.
complement. Linear Algebra Applications 411: 68–166. Tesler G (2000) Matchings in graphs on non-oriented surfaces.
Krattenthaler C, Guttmann AJ, and Viennot XG (2000) Vicious Journal of Combinatorial Theory Series B 78: 198–231.
walkers, friendly walkers and Young tableaux II: with a wall. http://www.risc.uni.linz.ac.at – website of RISC (Research Insti-
Journal of Physics A: Mathematical and General 33: 8835–8866. tute for Symbolic Computation). Mathematica implementa-
Kuperberg G (1998) An exploration of the permanent-determi- tions written by Peter Paule and Markus Schorn, and Axel
nant method. Electronic Journal of Combinatorics 5: Article Riese and Kurt Wegschaider are available here.
#R46, 34 pp. http://www.math.rutgers.edu – website of Department of Mathe-
Labelle G and Lamathe C (2004) A shifted asymmetry index matics, Rutgers University. Computer implementations written
series. Advances in Applied Mathematics 32: 576–608. by D Zeilberger are available here.
Mohammed M and Zeilberger D (2005) Multi-variable Zeilberger Viennot X and James W Heaps of segments, q-Bessel functions in
and Almkvist–Zeilberger algorithms and the sharpening of square lattice enumeration and applications in quantum
Wilf–Zeilberger theory. Advanced Applications in Mathe- gravity. Preprint.
matics (to appear). Wilf HS and Zeilberger D (1992) An algorithmic proof theory for
Mohanty SG (1979) Lattice Path Counting and Applications. hypergeometric (ordinary and ‘‘q’’) multisum/integral identi-
New York: Academic Press. ties. Inventiones Mathematicae 108: 575–633.
Odlyzko AM (1995) Asymptotic enumeration methods. In: Zeilberger D (2005) Deconstructing the Zeilberger algorithm.
Graham RL, Grötschel M, and Lovász L (eds.) Handbook of Journal of Difference Equations and Applications 11: 851–856.
Combinatorics, pp. 1063–1229. Amsterdam: Elsevier.

Compact Groups and Their Representations


A Kirillov, University of Pennsylvania, group with Lie algebra g. Unless otherwise stated,
Philadelphia, PA, USA G is assumed to be connected. The word ‘‘group’’
A Kirillov, Jr., Stony Brook University, will always mean a ‘‘Lie group’’ and the word
Stony Brook, NY, USA ‘‘subgroup’’ will mean a closed Lie subgroup. The
ª 2006 Elsevier Ltd. All rights reserved. notation Lie(H) stands for the Lie algebra of a Lie
group H. We assume that the reader is familiar
with the basic facts of the theory of Lie groups and
In this article, we describe the structure and Lie algebras, which can be found in Lie Groups:
representation theory of compact Lie groups. General Theory, or in the books listed in the
Throughout the article, G is a compact real Lie bibliography.
Compact Groups and Their Representations 577

Examples of Compact Lie Groups The proof of these results is based on the fact that
the Killing form of g is negative semidefinite.
Examples of compact groups include
Example 1 The group U(n) contains as the center
 finite groups,
the subgroup C of scalar matrices. The quotient
 quotient groups Tn = Rn =Zn , or more generally,
group U(n)=C is simple and isomorphic to
V=L, where V is a finite-dimensional real vector
SU(n)=Zn . The presentation of Theorem 1 in this
space and L is a lattice in V, that is, a discrete
case is
subgroup generated by some basis in V – groups
 
of this type are called ‘‘tori’’; it is known that UðnÞ ¼ T1  SUðnÞ =Zn
every commutative connected compact group is a
¼ ðC  SUðnÞÞ=ðC \ SUðnÞÞ
torus;
 unitary groups U(n) and special unitary groups For the group SO(4) the presentation is
SU(n), n  2; (SU(2)  SU(2))={(1  1)}.
 orthogonal groups O(n) and SO(n), n  3; and
 the groups U(n, H), n  1, of unitary quaternionic This theorem effectively reduces the study of the
transformations, which are isomorphic to Sp(n) := structure of connected compact groups to the study
Sp(n, C) \ SU(2n). of simply connected compact simple Lie groups.

The groups O(n) have two connected components,


one of which is SO(n). The groups SU(n) and Sp(n) Complexification of a Compact Lie Group
are connected and simply connected. Recall that for a real Lie algebra g, its complex-
The groups SO(n) are connected but not simply ification is gC = g  C with obvious commutator. It
connected: for n  3, the fundamental group of is also well known that gC is semisimple or
SO(n) is Z2 . The universal cover of SO(n) is a
reductive iff g is semisimple or reductive, respec-
simply connected compact Lie group denoted by
tively. There is a subtlety in the case of simple
Spin(n). For small n, we have isomorphisms:
algebras: it is possible that a real Lie algebra is
Spin(3) ’ SU(2), Spin(4) ’ SU(2)  SU(2), Spin(5) ’
simple, but its complexification gC is only semi-
Sp(4), and Spin(6) ’ SU(4). simple. However, this problem never arises for Lie
algebras of compact groups: if g is a Lie algebra of a
real compact Lie group, then g is simple if and only if
Relation to Semisimple Lie Algebras gC is simple.
and Lie Groups The notion of complexification for Lie groups is
more delicate.
Reductive Groups
Definition 1 Let G be a connected real Lie group
A Lie algebra g is called
with Lie algebra g. A complexification of G is a
 ‘‘simple’’ if it is nonabelian and has no ideals connected complex Lie group GC (i.e., a complex
different from {0} and g itself; manifold with a structure of a Lie group such that
 ‘‘semisimple’’ if it is a direct sum of simple ideals; group multiplication is given by a complex analytic
and map GC  GC ! GC ), which contains G as a closed
 ‘‘reductive’’ if it is a direct sum of semisimple and subgroup, and such that Lie(GC ) = gC . In this case,
commutative ideals. we will also say that G is a real form of GC .
We call a connected Lie group G ‘‘simple’’ or It is not obvious why such a complexification
‘‘semisimple’’ if Lie(G) has this property. exists at all; in fact, for arbitrary real group it may
not exist. However, for compact groups we do have
Theorem 1 Let G be a connected compact Lie
the following theorem.
group and g = Lie(G). Then
Theorem 2 Let G be a connected compact Lie
(i) The Lie algebra g = Lie(G) is reductive: g = a 
group. Then it has a unique complexification GC  G.
g0 , where a is abelian and g0 = [g, g] is
Moreover, the following properties hold:
semisimple.
(ii) The group G can be written in the form G = (A  (i) The inclusion G GC is a homotopy equiva-
K)=Z, where A is a torus, K is a connected, simply lence. In particular, 1 (G) = 1 (GC ) and the
connected compact semisimple Lie group, and Z quotient space GC =G is contractible.
is a finite central subgroup in A  K. (ii) Every complex finite-dimensional representation
(iii) If G is simply connected, it is a product of of G can be uniquely extended to a complex
simple compact Lie groups. analytic representation of GC .
578 Compact Groups and Their Representations

Since the Lie algebra of a compact Lie group G is The restrictions on n in this table are
reductive, we see that GC must be reductive; if G is made to avoid repetitions which appear for
semisimple or simple, then so is GC . The natural small values of n. Namely, A1 = B1 = C1 , which
question is whether every complex reductive group gives SU(2) = Spin(3) = Sp(1); D2 = A1 [ A1 , which
can be obtained in this way. The following theorem gives Spin(4) = SU(2)  SU(2); B2 = C2 , which gives
gives a partial answer. SO(5) = Sp(4); and A3 = D3 , which gives SU(4) =
Spin(6). Other than that, all entries are distinct.
Theorem 3 Every connected complex semisimple
Exceptional groups E6 , . . . , G2 also admit explicit
Lie group H has a compact real form: there is a
geometric and algebraic descriptions which are
compact real subgroup K H such that H = KC .
related to the exceptional nonassociative algebra O
Moreover, such a compact real form is unique up to
of the so-called octonions (or Cayley numbers). For
conjugation.
example, the compact group of type G2 can be
Example 2 defined as a subgroup of SO(7) which preserves an
almost-complex structure on S6 . It can also be
(i) The unitary group U(n) is a compact real form
described as the subgroup of GL(7, R) which
of the group GL(n, C).
preserves one quadratic and one cubic form, or,
(ii) The orthogonal group SO(n) is a compact real
finally, as a group of all automorphisms of O.
form of the group SO(n, C).
(iii) The group Sp(n) is a compact real form of the
group Sp(n, C). Maximal Tori
(iv) The universal cover of GL(n, C) has no compact
real form. Main Properties

These results have a number of important appli- In this section, G is a compact connected Lie group.
cations. For example, they show that study of Definition 2 A ‘‘maximal torus’’ in G is a maximal
representations of a semisimple complex group H connected commutative subgroup T G.
can be replaced by the study of representations of its
compact form; in particular, every representation is The following theorem lists the main properties of
completely reducible (this argument is known as maximal tori.
Weyl’s unitary trick). Theorem 5
(i) For every element g 2 G, there exists a maximal
Classification of Simple Compact Lie Groups torus T 3 g.
(ii) Any two maximal tori in G are conjugate.
Theorem 1 essentially reduces such classification to
(iii) If g 2 G commutes with all elements of a
classification of simply connected simple compact
maximal torus T, then g 2 T.
groups, and Theorems 2 and 3 reduce it to the
(iv) A connected subgroup H G is a maximal
classification of simple complex Lie algebras. Since
torus iff the Lie algebra Lie(H) is a maximal
the latter is well known, we get the following result.
abelian subalgebra in Lie(G).
Theorem 4 Let G be a connected, simply con-
Example 3 Let G = U(n). Then the set T of
nected simple compact Lie group. Then gC must be
diagonal unitary matrices is a maximal torus in G;
a simple complex Lie algebra and thus can be
moreover, every maximal torus is of this form after
described by a Dynkin diagram of one the following
a suitable unitary change of basis. In particular, this
types: An , Bn , Cn , Dn , E6 , E7 , E8 , F4 , G2 .
implies that every element in G is conjugate to a
Conversely, for each Dynkin diagram in the above
diagonal matrix.
list, there exists a unique, up to isomorphism, simply
connected simple compact Lie group whose Lie Example 4 Let G = SO(3). Then the set D of
algebra is described by this Dynkin diagram. diagonal matrices is a maximal commutative sub-
group in G, but not a torus. Here D consists of four
For types An , . . . , Dn , the corresponding compact
elements and is not connected.
Lie groups are well-known classical groups shown in
the table below: Maximal Tori and Cartan Subalgebras
The study of maximal tori in compact Lie groups is
An , n  1 Bn , n  2 Cn , n  3 Dn , n  4 closely related to the study of Cartan subalgebras in
SU(n þ 1) Spin(2n þ 1) Sp(n) Spin(2n) reductive complex Lie algebras. For convenience of
readers, we briefly recall the appropriate definitions
Compact Groups and Their Representations 579

here; details can be found in Serre (2001) or in Lie It follows from the definition of root system that
Groups: General Theory. we have inclusions
Definition 3 Let a be a complex reductive Lie Q P it
algebra. A Cartan subalgebra h a is a maximal ½2

Q_ P_ it
commutative subalgebra consisting of semisimple
elements.
Both P, Q are lattices in it ; thus, the index (P : Q)
Note that for general Lie algebras Cartan sub- is finite. It can be computed explicitly: if i is a basis
algebra is defined in a different way; however, for of the root system, then the fundamental weights !i
reductive algebras the definition given above is defined by
equivalent to the standard one.
A choice of a Cartan subalgebra gives rise to the h_i ; !j i ¼ ij
so-called root decomposition: if h a is a Cartan
subalgebra in a complex reductive Lie algebra, then form a basis of P. The simple roots i are related
we can write to fundamental
P weights !j by the Cartan matrix A:
i = Aij !j . Therefore, (P : Q) = (P_ : Q_ ) = j det Aj.
!
M Definitions of P, Q, P_ , Q_ also make sense when
a¼h a ½1
g is reductive but not semisimple. However, in this
2R case they are no longer lattices: rkQ < dim t , and P
is not discrete.
where
We can now give more precise information about
a ¼ fx 2 aj ad h:x ¼ h; hix 8h 2 hg the structure of the maximal torus.
R ¼ f 2 h f0gja 6¼ 0g h Lemma 1 Let T be a compact connected commu-
tative Lie group, and t = Lie(T) its Lie algebra. Then
The set R is called the ‘‘root system’’ of a with the exponential map is surjective and preimage
respect to Cartan subalgebra h; elements  2 R are of unit is a lattice L t. There is an isomorphism
called ‘‘roots.’’ We will also frequently use elements of Lie groups
_ 2 h defined by h_ , i = 2(, )=(, ) where ( , )
is a nondegenerate invariant bilinear form on a and exp : t=L ! T
h , i is the pairing between a and a . It can be shown
that so defined _ does not depend on the choice of In particular, T ’ Rr =Zr = Tr , r = dim T.
the form ( , ). Let X(T) it be the lattice dual to ð2iÞ 1 L:
Theorem 6 Let G be a connected compact Lie
group with Lie algebra g, and let T G be a XðTÞ ¼ f 2 it jh; li 2 2iZ 8l 2 Lg ½3

maximal torus in G, t = Lie(T) g. Let gC , GC be


the complexification of g, G as in Theorem 2. It is called the ‘‘character lattice’’ for T (see the
Let h = tC gC . Then h is a Cartan subalgebra in subsection ‘‘Examples of representations’’).
gC , and the corresponding root system R it . Theorem 7 Let G be a compact connected Lie
Conversely, every Cartan subalgebra in gC can be group, and let T G be a maximal torus in G.
obtained as tC for some maximal torus T G. Then Q X(T) P. Moreover, the group G is
uniquely determined by the Lie algebra g and the
lattice X(T) 2 it which can be any lattice between
Weights and Roots
Q and P.
Let G be semisimple. Recall that the root lattice
Corollary For a given complex semisimple Lie
Q it is the abelian group generated by roots  2
algebra a, there are only finitely many (up to
R, and let the coroot lattice Q_ it be the abelian
isomorphism) compact connected Lie groups G
group generated by coroots _ ,  2 R. Define also
with gC = a.
the weight and coweight lattices by
The largest of them is the simply connected group,
P ¼ fjh_ ; i 2 Z 8 2 Rg it for which T = t=2iQ_ , X(T) = P; the smallest is the
so-called ‘‘adjoint group,’’ for which T = t=2iP_ ,
P_ ¼ ftjht; i 2 Z 8 2 Rg it;
X(T) = Q.
where h , i is the pairing between t and the dual Example 5 Let G = U(n). Then it = {real diagonal
vector space t . matrices}. Choosing the standard basis of matrix
580 Compact Groups and Their Representations

units in it, we identify it ’ Rn , which also allows us Example 6 Let G = U(n). The set of diagonal unitary
to identify it ’ Rn . Under this identification, matrices is a maximal torus, and the Weyl group is the
n X o symmetric group Sn acting on diagonal matrices by
Q ¼ ð1 ; . . . ; n Þji 2 Z; i ¼ 0 permutations of entries. In this case, Theorem 9 shows
  that if f (U) is a central function of a unitary matrix,
P ¼ ð1 ; . . . ; n Þji 2 R; i j 2 Z then f (U) = ~f (1 , . . . , n ), where i are eigenvalues of
XðTÞ ¼ Zn U and ~f is a symmetric function in n variables.

Note that Q, P are not lattices: Q ’ Zn 1 ,


P ’ R  Zn 1 . Representations of Compact Groups
Now let G = SU(n). Then it = Rn =R (1, . . . , 1), Basic Notions
and Q, P are the images of Q, P for G = U(n) in this By a representation of G we understand a pair
quotient. In this quotient they are lattices, and (, V), where V is a complex vector space and  is
(P : Q) = n. The character lattice in this case is a continuous homomorphism G ! Aut(V). This
X(T) = P, since SU(n) is simply connected. The notation is often shortened to  or V. In this article,
adjoint group is PSU(n) = SU(n)=C, where C = we only consider finite-dimensional (f.d.) represen-
{ idjn = 1} is the center of SU(n). tations; in this case, the homomorphism  is
automatically smooth and even real-analytic.
Weyl Group We associate to any f.d. representation (, V) of G
the representation ( , V) of the Lie algebra g = Lie(G)
Let us fix a maximal torus T G. Let N(T) G be which is just the derivative of the map  : G ! AutV at
the normalizer of T in G: N(T) = {g 2 G j gTg 1 = T}. the unit point e 2 G. In terms of the exponential map,
For any g 2 N(T) the transformation A(g): t 7! gtg 1 is we have the following commutative diagram:
an automorphism of T. According to Theorem 5, this 
automorphism is trivial iff g 2 T. So in fact, it is the G ! AutV
quotient group N(T)=T which acts on T. exp " " exp
Definition 4 The group W = N(T)=T is called the 

‘‘Weyl group’’ of G.
g ! EndV
Choosing a basis in V, we can write the operators
Since the Weyl group acts faithfully on t and t , it
(g) and  (X) in matrix form and consider  and 
is common to consider W as a subgroup in GL(t ). It
as matrix-valued functions on G and g. The diagram
is known that W is finite.
above means that
The Weyl group can also be defined in terms of
Lie algebra g and its complexification gC . ðexp XÞ ¼ e ðXÞ ½4

Theorem 8 The Weyl group coincides with the Recall that if G is connected, simply connected, then
subgroup in GL(it ) generated by reflections every representation of g can be uniquely lifted to a
s : x 7! x (2(, x))=(, ),  2 R, where, as representation of G. Thus, classification of repre-
before, ( , ) is a nondegenerate invariant bilinear sentations of connected simply connected Lie groups
form on g . is equivalent to the classification of representations
Theorem 9 of Lie algebras.
Let (1 , V1 ) and (2 , V2 ) be two representations of
(i) Two elements t1 , t2 2 T are conjugate in G iff the same group G. An operator A 2 Hom(V1 , V2 ) is
t2 = w(t1 ) for some w 2 W. called an ‘‘intertwining operator,’’ or simply an
(ii) There exists a natural homeomorphism of ‘‘intertwiner,’’ if A  1 (g) = 2 (g)  A for all g 2 G.
quotient spaces G=AdG ’ T=W, where AdG Two representations are called ‘‘equivalent’’ if they
stands for action of G on itself by conjugation. admit an invertible intertwiner. In this case, using an
(Note, however, that these quotient spaces are appropriate choice of bases, we can write 1 and 2
not manifolds: they have singularities.) by the same matrix-valued function.
(iii) Let us call a function f on G central if Let (, V) be a representation of G. If all operators
f (hgh 1 ) = f (g) for any g, h 2 G. Then the (g), g 2 G, preserve a subspace V1 V, then the
restriction map gives an isomorphism restrictions 1 (g) = (g)jV1 define a ‘‘subrepresenta-
tion’’ (1 , V1 ) of (, V). In this case, the quotient
fcontinuous central functions on Gg
space V2 = V=V1 also has a canonical structure of a
’ fW invariant continuous functions on Tg representation, called the ‘‘quotient representation.’’
Compact Groups and Their Representations 581

A representation (, V) is called ‘‘reducible’’ if it The collection of all unirreps of T is itself a group,
has a nontrivial (different from V and {0}) sub- called ‘‘Pontrjagin dual’’ of T and denoted by
representation. Otherwise it is called ‘‘irreducible.’’ b This group is isomorphic to Z.
T.
We call representation (, V) ‘‘unitary’’ if V is a By Theorem 11, any f.d. representation  of T is
Hilbert space and all operators (g), g 2 G, are equivalent to a direct sum of one-dimensional
unitary, that is, given by unitary matrices in any unirreps. So, an equivalence class of  is defined by
orthonormal basis. We use a short term ‘‘unirrep’’ the multiplicity function  on T b = Z taking non-
for a ‘‘unitary irreducible representation.’’ negative values:
X
’ ðkÞ k
Main Theorems k2Z
The following simple but important result was one The many-dimensional case of compact connected
of the first discoveries in representation theory. It abelian Lie group can be treated in a similar way.
holds for representations of any group, not necessa- Let T be a torus, that is, an abelian compact group,
rily compact. t = Lie(T). Then every irreducible representation
Theorem 10 (Schur lemma). Let (i , Vi ), i = 1, 2, be of T is one dimensional and thus is defined by a
any two irreducible finite-dimensional representa- group homomorphism  : T ! T1 = U(1). Such
tions of the same group G. Then any intertwiner homomorphisms are called ‘‘characters’’ of T. One
A : V1 ! V2 is either invertible or zero. easily sees that such characters themselves form a
group (Pontrjagin dual of T). If we denote by L the
Corollary 1 If V is an irreducible f.d. representation, kernel of the exponential map t ! T (see Lemma 1),
then any intertwiner A : V ! V is scalar: A = c id, c 2 C. one easily sees that every character has a form
Corollary 2 Every irreducible representation of a ðexpðtÞÞ ¼ eht;i ; t 2 t;  2 XðTÞ
commutative group is one dimensional.
where X(T) it is the lattice defined by [3]. Thus,
The following theorem is one of the fundamental we can identify the group of characters T b with X(T).
results of the representation theory of compact b
In particular, this shows that T ’ Z dim T
.
groups. Its proof is based on the technique of The second example is the group G = SU(2), the
invariant integrals on a compact group, which will simplest connected, simply connected nonabelian
be discussed in the next section. compact Lie group. Topologically, G is a three-
Theorem 11 dimensional sphere since the general element of G is
a matrix of the form
(i) Any f.d. representation of a compact group is  
equivalent to a unitary representation. a b
g¼ ; a; b 2 C; jaj2 þ jbj2 ¼ 1
(ii) Any f.d. representation is completely reducible: b a
it can be decomposed into direct sum
M Let V be two-dimensional complex vector space,
V¼ ni V i realized by column vectors ð uv Þ. The group G acts
naturally on V. This action induces the representa-
where Vi are pairwise nonequivalent unirreps. tion  of G in the space S(V) of all polynomials in
Numbers ni 2 Zþ are called ‘‘multiplicities.’’ u, v. It is infinite dimensional, but has many f.d.
subrepresentations. In particular, let Sk (V), or
Examples of Representations simply Sk , be the space of all homogeneous
polynomials of degree k. Clearly, dim Sk = k þ 1.
The representation theory looks rather different for
It turns out that the corresponding f.d. representa-
abelian (i.e., commutative) and nonabelian groups.
tions (k , Sk ), k  0, are irreducible, pairwise non-
Here we consider two simplest examples of both kinds. b of all unirreps.
equivalent, and exhaust the set G
Our first example is a one-dimensional compact
Some particular cases are of special interest:
connected Lie group. Topologically, it is a circle
which we realize as a set T ’ U(1) of all complex 1. k = 0. The space V0 consists of constant functions
numbers t with absolute value 1. and 0 is the trivial one-dimensional representa-
Every unirrep of T is one dimensional; thus, it is tion: 0 (g)  1.
just a continuous multiplicative map  of T to itself. 2. k = 1. The space V1 is identical to V and 1 is
It is well known that every such map has the form just the tautological representation (g)  g.
3. k = 2. The space V2 is spanned by monomials
k ðtÞ ¼ tk for some k 2 Z u2 , uv, v2 . The remarkable fact is that this
582 Compact Groups and Their Representations

representation is equivalent to a real one. Namely, Theorem 12 For every compact Lie group G, there
in the new basis exists a unique measure dg on G, called ‘‘Haar
measure,’’ which is invariant
R under left shifts
u2 þ v2 u2 v2 Lg : h 7! gh and satisfies G dg = 1.
x¼ ; y¼ ; z ¼ iuv
2 2i In addition, this measure is also invariant under
we have right shifts h 7! hg and under involution h 7! h 1 .
0 1
! Reða2 þ b2 Þ 2ImðabÞ Imðb2 a2 Þ Invariance of the Haar measure implies that for
a b B C every integrable function f (g), we have
2 ¼@ 2ImðabÞ jaj2 jbj2 2ReðabÞ A
b a Z Z Z Z
Imða2 þ b2 Þ 2ReðabÞ Reða2 b2 Þ f ðgÞ dg ¼ f ðhgÞdg ¼ f ðghÞ dg ¼ f ðg 1 Þdg
G G G G
This formula defines a homomorphism 2 : SU(2) !
For a finite group G, the integral with respect to
SO(3). It can be shown that this homomorphism is
the Haar measure is just averaging over the group:
surjective, and its kernel is the subgroup
Z
{ 1} SU(2): 1 X
f ðgÞ dg ¼ f ðgÞ
2 G jGj g2G
1 ! f1g ,! SUð2Þ ! SOð3Þ ! 1
The simplest way to see it is to establish the For compact connected Lie groups, the Haar
equivalence of 2 with the adjoint representation measure is given by a differential form of top degree
of G in g. The corresponding intertwiner is which is invariant under right and left translations.
For a torus T n = Rn =Zn with real coordinates
k 2
S2 3 ð þ i Þu2 þ 2iuv
  R=Z or complex coordinates tk = e2i
k , the Haar
2 i  þ i measure is dn
:= d
1 d
2 d
n or
þ ð i Þv ! 2g
 þ i i Yn
dtk
dn t :¼
Note that SU(2) and SO(3) are the only compact k¼1
2it k
groups associated with the Lie algebra sl(2, C).
The group G contains the subgroup H of diagonal In particular, consider a central function f (see
matrices, isomorphic to T1 . Consider the restriction Theorem 9). Since every conjugacy class contains
of n to T1 . It splits into the sum of unirreps k as elements of the maximal torus T (see Theorem 5),
follows: such a function is determined by its values on T, and
the integral of a central function can be reduced to
s¼½n=2

X integration over T. The resulting formula is called


ResG
T 1 n ¼ n 2s
s¼0
‘‘Weyl integration formula.’’ For G = U(n) it looks
as follows:
The characters k which enter this decomposition Z Z
1 Y
are called the weights of n . The collection of all f ðgÞdg ¼ f ðtÞ jti tj j2 dn t
weights (together with multiplicities) forms a multi- UðnÞ n! T i<j
b denoted by P(n ) or P(Sn ).
set in T
Note the following features of this multiset: where T is the maximal torus consisting of diagonal
matrices
1. P(n ) is invariant under reflection k 7! k.
2. All weights of n are congruent modulo 2. t ¼ diagðt1 ; . . . ; tn Þ; tk ¼ e2i
k
3. The nonequivalent unirreps have different multi- and dn t is defined above.
sets of weights. Weyl integration formula for arbitrary compact
Below we show how these features are generalized group G can be found in Simon (1996) or Bump
to all compact connected Lie groups. (2004, section 18).
The main applications of the Haar measure are the
proof of complete reducibility theorem (Theorem 11)
and orthogonality relations (see below).
Fourier Transform
Haar Measure and Invariant Integral Orthogonality Relations and Peter–Weyl Theorem
The important feature of compact groups is the Let V1 , V2 be unirreps of a compact group G.
existence of the so-called ‘‘invariant integral,’’ or Taking any linear operator A : V1 ! V2 and aver-
‘‘average.’’ aging the expression A(g) := 2 (g 1 )  A  1 (g) over
Compact Groups and Their Representations 583

R
G, we get an intertwining operator hAi = G A(g)dg. b as the space
We introduce the Hilbert space L2 (G)
Comparing this fact with the Schur lemma, one b whose value at a point
of matrix-valued functions on G
obtains the following fundamental results. 2G b belongs to Matd() (C). The norm is defined as
Let (, V) be any unirrep of a compact group G. X
Choose any orthonormal basis {vk , 1  k  dim V} kFk2 2 b ¼ dðÞ trðFðÞFðÞ Þ
L ðGÞ
V  b
in V and denote by tkl , or tkl , the function on G 2G
defined by
For a function f on G define its Fourier transform e
f
V
tkl ðgÞ ¼ ððgÞvl ; vk Þ as a matrix-valued function on G:b
Z
V
The functions tkl are called ‘‘matrix elements’’ of the e
f ðÞ ¼ f ðg 1 Þ ðgÞdg
unirrep (, V). G

Note that in the case G = T1 this transform


Theorem 13 (Orthogonality relations) associates to a function f the set of its Fourier
V coefficients. In general this transform keeps some
(i) The matrix elements tkl are pairwise orthogonal
important features of Fourier coefficients.
and have norm ( dim V) 1=2 in L2 (G, dg).
(ii) The matrix elements corresponding to equiva- Theorem 14
lent unirreps span the same subspace in
(i) For a function f 2 L1 (G, dg) the Fourier transform
L2 (G, dg). e
f is well defined and bounded (by matrix norm)
(iii) The matrix elements of two nonequivalent b
function on G.
unirreps are orthogonal.
(ii) For a function f 2 L1 (G, dg) \ L2 (G, dg) the
(iv) The linear span of all matrix elements of all
following analog of the Plancherel formula holds:
unirreps is dense in C(G), C1 (G), and in Z
L2 (G, dg) (generalized Peter–Weyl theorem). kf k2L2 ðG;dgÞ :¼ jf ðgÞj2 dg
G
b of
In particular, this theorem implies that the set G X
equivalence classes of unirreps is countable. For an ¼ dðÞ trðef ðÞe
f ðÞ Þ ¼: ke
f k2 2 b
L ðGÞ
f.d. representation (, V) we introduce the character 2Gb
of  as a function (iii) The following inversion formula expresses f in
XV
dim terms of e
f:
 X
 ðgÞ ¼ trðgÞ ¼ tkk ðgÞ ½5

k¼1 f ðgÞ ¼ dðÞ trðe


f ðÞ ðgÞÞ
b
2G
It is obviously a central function on G.
(iv) The Fourier transform sends the convolution to
Remark Traditionally, in representation theory
the matrix multiplication:
the word ‘‘character’’ has two different meanings:
(1) a multiplicative map from a group to U(1), and g
f1 f2 ¼ e
f1 e
f2
(2) the trace of a representation operator (g). For
one-dimensional representations both notions where the convolution product is defined by
coincide. Z
ðf1 f2 ÞðhÞ ¼ f1 ðhgÞf2 ðg 1 Þ dg
From the orthogonality relations we get the G
following result.
Note the special case of the inversion formula for
Corollary The characters of unirreps of G form an g = e:
orthonormal basis in the subspace of central func- X
tions in L2 (G, dg). f ðeÞ ¼ dðÞ trðe
f ðÞÞ;
b
2G

Noncommutative Fourier Transform or


X
The noncommutative Fourier transform on a com- ðgÞ ¼ dðÞ  ðgÞ
b denote the
pact group G is defined as follows. Let G b
2G
set of equivalence classes of unirreps of G. Choose R
b a representation ( , V ) of class  where (g) is Dirac’s delta-function:
for any  2 G G f (g)
and an orthonormal basis in V . Denote by d() the (g) dg = f (e). Thus, we get a presentation of Dirac’s
dimension of V . delta-function as a linear combination of characters.
584 Compact Groups and Their Representations

Classification of Finite-Dimensional However, this representation can be infinite dimen-


Representations sional; moreover, it may not be possible to lift it to a
representation of G.
In this section, we give a classification of unirreps of
a connected compact Lie group G. Definition 5 A weight  2 X(T) is called ‘‘domi-
nant’’ if h, _i i 2 Zþ for any simple root i . The set
Weight Decomposition of all dominant weights is denoted by Xþ (T).

Let G be a connected compact group with maximal Theorem 17


torus T, and let (, V) be a f.d. representation of G. (i) All weights of L() are of the form  =  ni i ,
Restricting it to T and using complete reducibility, ni 2 Zþ .
we get the following result. (ii) Let  2 Xþ . Then the irreducible highest-weight
Theorem 15 The vector space V can be written in representation L() is f.d. and lifts to a
the form representation of G.
M (iii) Every irreducible f.d. representation of G is of
V¼ V ; the form L() for some  2 Xþ .
2XðTÞ ½6
Thus, we have a bijection {unirreps of G} $ Xþ .
V ¼ fv 2 Vj ðtÞv ¼ h; tiv 8t 2 tg
Example 7 Let G = SU(2). There is a unique simple
where X(T) is the character group of T defined by [3]. root  and the unique fundamental weight !, related
The spaces V are called ‘‘weight subspaces,’’ by  = 2!. Therefore, Xþ = Zþ ! and unirreps are
vectors v 2 V – ‘‘weight vectors’’ of weight . The set indexed by non-negative integers. The representa-
tion with highest weight k ! is precisely the
PðVÞ ¼ f 2 XðTÞjV 6¼ f0gg ½7
representation k constructed in the subsection
‘‘Examples of representations.’’
is called the ‘‘set of weights’’ of , or the ‘‘spectrum’’
of ResG Example 8 Let G = U(n). Then X = Zn , and Xþ =
T , and
{(1 , . . . , n ) 2 Zn j 1   n }. Such objects are
multð;VÞ ðÞ :¼ dim V well known in combinatorics: if we additionally
assume that n  0, then such dominant weights are
is called the ‘‘multiplicity’’ of  in V. in bijection with partitions with n parts. They can
The next theorem easily follows from the defini- also be described by ‘‘Young diagrams’’ with n rows
tion of the Weyl group. (see Fulton and Harris (1991)).

Theorem 16 For any f.d. representation V of G,


the set of weights with multiplicities is invariant Explicit Construction of Representations
under the action of the Weyl group: In addition to description of unirreps as highest-
wðPðVÞÞ ¼ PðVÞ; multð;VÞ ðÞ ¼ multð;VÞ ðwðÞÞ weight representations, they can also be constructed
in other ways. In particular, they can be defined
for any w 2 W. analytically as follows. Let B = HNþ be the
Borel subgroup
P in GC ; here H = exp h,
Classification of Unirreps Nþ = exp 2Rþ (gC ) . For  2 h , let  : B ! C
be a multiplicative map defined by
Recall that R is the root system of gC . Assume that
we have chosen a basis of simple roots 1 , . . . , r  ðhnÞ ¼ ehh;i ½8

R. Then R = Rþ [ R ; roots  2 Rþ can be written


as a linear combination of simple roots with positive Theorem 18 (Cartan–Borel–Weil). Let  2 X(T).
coefficients, and R = Rþ . Denote by V() the space of complex-analytic
A (not necessarily f.d.) representation of gC is functions on GC which satisfy the following trans-
called a ‘‘highest-weight representation’’ if it is formation property:
generated by a single vector v 2 V (the highest-
weight vector) such that g v = 0 for all positive f ðgbÞ ¼  1
 ðbÞf ðgÞ; g 2 GC ; b 2 B
roots  2 Rþ .
It can be shown that for every  2 X(T), there is a The group GC acts on V() by left shifts:
unique irreducible highest-weight representation of
gC with highest weight , which is denoted L(). ððgÞf ÞðhÞ ¼ f ðg 1 hÞ ½9

Compact Groups and Their Representations 585

Then Example 9 Let G = SU(2). Then Weyl character


formula gives, for irreducible representation k with
(i) V() 6¼ {0} iff  2 Xþ . highest weight k !,
(ii) If  2 Xþ , the representation of G in V() is xkþ1 x ðkþ1Þ
equivalent to L(w0 ()), where w0 2 W is the k ¼
x x 1
unique element of the Weyl group which sends ¼ x þ xk 2 þ þ x k ;
k
x ¼ e!
Rþ to R .
which implies dim k = k þ 1.
This theorem can also be reformulated in more
geometric terms: the spaces V() are naturally Weyl character formula is equivalent to the follow-
interpreted as spaces of global sections of appro- ing formula for weight multiplicities, due to Kostant:
priate line bundles on the ‘‘flag variety’’ X
B = GC =B = G=T. multLðÞ  ¼ "ðwÞKðwð þ Þ Þ
w2W
For classical groups, irreducible representations
can also be constructed explicitly as the subspaces in where K is Kostant’s partition function: K( ) is the
tensor powers (Cn )k , transforming in a certain way number of ways of writing as a sum of positive
under the action of the symmetric group Sk . roots (with repetitions).
For classical Lie groups such as G = U(n), there are
more explicit combinatorial formulas for weight multi-
plicities; for U(n), the answer can be written in terms of
Characters and Multiplicities the number of ‘‘Young tableaux’’ of a given shape.
Characters Details can be found in Fulton and Harris (1991).
Let (, V) be a f.d. representation of G and let  be Tensor Product Multiplicities
its character as defined by [5]. Since  is central,
and every element in G is conjugate to an element of Let (, V) be a f.d. representation of G. By complete
T,  is completely determined by its restriction to reducibility, one can write V = n L(). The coeffi-
T, which can be computed from the weight decom- cients n are called multiplicities; finding them is an
position [6]: important problem in many applications. In parti-
X cular, a special case of this is finding the multi-
 j T ¼ dim V e plicities in tensor product of two unirreps:
2XðTÞ X
X LðÞ  LðÞ ¼
N Lð Þ
¼ mult  e ½10

2XðTÞ Characters provide a practical tool for computing


where e is the function on T defined by multiplicities: since characters of unirreps are line-
e ( exp (t)) = eht, i , t 2 t. Note that eþ = e e and arly independent, multiplicities can be found from
that e0 = 1. the condition that V = n L() . In particular,
X

LðÞ LðÞ ¼ N Lð Þ
Weyl Character Formula

Theorem 19 (Weyl character formula). Let  2 Xþ . Example 10 For G = SU(2), tensor product multi-
Then plicities are given by

Aþ X n  m ¼ l
LðÞ ¼ ; A ¼ "ðwÞewðÞ
A w2W
where the sum is taken over all l such that jm nj 
l  m þ n, m þ n þ l is even.
where, for w 2 W, we denote "(w) = det wPconsid-
ered as a linear map t ! t , and = (1=2) Rþ . For G = U(n), there is an algorithm for finding the
tensor product multiplicities, formulated in the
In particular, computing the value of the character language of Young tableaux (Littlewood–Richardson
at point t = 0 by L’Hopital’s rule, it is possible to rule). There are also tables and computer programs
deduce the following formula for the dimension of for computing these multiplicities; some of them are
irreducible representations: listed in the bibliography.
Y h_ ;  þ i
dim LðÞ ¼ ½11
See also: Classical Groups and Homogeneous Spaces;
2R
h_ ; i Combinatorics: Overview; Equivariant Cohomology and
þ
586 Compactification of Superstring Theory

the Cartan Model; Finite Group Symmetry Breaking; Lie Fulton W and Harris J (1991) Representation Theory. New York:
Groups: General Theory; Ljusternik–Schnirelman Theory; Springer.
Noncommutative Geometry and the Standard Model; Knapp A (2002) Lie Groups beyond an Introduction, 2nd edn.
Optimal Cloning of Quantum States; Ordinary Special Boston: Birkhaüser.
LiE: A Computer algebra package for Lie group computations,
Functions; Quasiperiodic Systems; Symmetry Classes in
available from http://young.sp2mi.univ-poitiers.fr
Random Matrix Theory. McKay WG, Patera J, and Rand DW (1990) Tables of
Representations of Simple Lie Algebras, vol. I. Exceptional
Simple Lie Algebras. Montreal: CRM.
Further Reading Serre J-P (2001) Complex Semisimple Lie Algebras. Berlin: Springer.
Bump D (2004) Lie Groups. New York: Springer. Simon B (1996) Representations of Finite and Compact Groups.
Bröcker T and tom Dieck T (1995) Representations of Compact Providence, RI: American Mathematical Society.
Lie Groups, Graduate Texts in Mathematics, vol. 98. Zelobenko DP (1973) Compact Lie Groups and Their Represen-
New York: Springer. tations. Providence, RI: American Mathematical Society.

Compactification of Superstring Theory


M R Douglas, Rutgers, The State University of understood simply in terms of compactification of these
New Jersey, Piscataway, NJ, USA field theories, with the addition of a few crucial
ª 2006 Elsevier Ltd. All rights reserved. ingredients from string/M-theory. Thus, most of this
article will restrict attention to this case, leaving many
‘‘stringy’’ topics to the articles on conformal field
theory, topological string theory, and so on. We also
Introduction
largely restrict attention to compactifications based on
Superstring theories and M-theory, at present the best Ricci-flat compact spaces. There is an equally important
candidate quantum theories which unify gravity, class in which K has positive curvature; these lead to
Yang–Mills fields, and matter, are directly formu- anti-de Sitter (AdS) spacetimes and are discussed in the
lated in ten and eleven spacetime dimensions. To article on AdS/CFT (see AdS/CFT Correspondence).
obtain a candidate theory of our four-dimensional After a general review, we begin with compacti-
universe, one must find a solution of one of fication of the heterotic string on a three complex
these theories whose low-energy physics is well dimensional Calabi–Yau manifold. This was the first
described by a four-dimensional effective field theory construction which led convincingly to the SM, and
(EFT), containing the well-established standard remains one of the most important examples. We
model (SM) of particle physics coupled to Einstein’s then survey the various families of compactifications
general relativity (GR). The standard paradigm for to higher dimensions, with an eye on the relations
finding such solutions is compactification, along the between these compactifications which follow from
lines originally proposed by Kaluza and Klein in the superstring duality. We then discuss some of the
context of higher-dimensional general relativity. One phenomena which arise in the regimes of large
postulates that the underlying D-dimensional space- curvature and strong coupling. In the final section,
time is a product of four-dimensional Minkowski we bring these ideas together in a survey of the
spacetime, with a (D 4)-dimensional compact and various known four-dimensional constructions.
small Riemannian manifold K. One then finds
that low-energy physics effectively averages over K,
General Framework
leading to a four-dimensional EFT whose field
content and Lagrangian are determined in terms of Let us assume we are given a D- (=d þ k) dimen-
the topology and geometry of K. sional field theory T . A compactification is then a
Of the huge body of prior work on this subject, the D-dimensional spacetime which is topologically
part most relevant for string/M-theory is supergravity the product of a d-dimensional spacetime with an
compactification, as in the limit of low energies, small k-dimensional manifold K, the compactification or
curvatures and weak coupling, the various string ‘‘internal’’ manifold, carrying a Riemannian metric
theories and M-theory reduce to ten- and eleven- and with definite expectation values for all other
dimensional supergravity theories. Many of the quali- fields in T . These must solve the equations of motion,
tative features of string/M-theory compactification, and and preserve d-dimensional Poincaré invariance (or,
a good deal of what is known quantitatively, can be perhaps another d-dimensional symmetry group).
Compactification of Superstring Theory 587

The most general metric ansatz for a Poincaré for AdS compactifications). The remaining perturba-
invariant compactification is tions can be divided into massless fields, correspond-
  ing to zero modes of the linearized equations of
f  0 motion on K, and massive fields, the others. General
GIJ ¼
0 Gij results on eigenvalues of Laplacians imply that the
masses of massive fields depend on the diameter of
where the tangent space indices are 0  I < d þ
K as m  1=diam(K), so at energies far smaller than
k = D, 0   < d, and 1  i  k. Here  is the
m, they cannot be excited (this is not universal;
Minkowski metric, Gij is a metric on K, and f is a
given strong negative curvature on K, or a rapidly
real-valued function on K called the ‘‘warp factor.’’
varying warp factor, one can have perturbations of
As the simplest example, consider pure
small nonzero mass). Thus, the massive fields can be
D-dimensional GR. in this case, Einstein’s equations
‘‘integrated out,’’ to leave an EFT with a finite
reduce to Ricci flatness of GIJ . Given our metric
number of fields. In the classical approximation, this
ansatz, this requires f to be constant, and the metric
simply means solving their equations of motion in
Gij on K to be Ricci flat. Thus, any K which admits
terms of the massless fields, and using these
such a metric, for example, the k-dimensional torus,
solutions to eliminate them from the action. At
will lead to a compactification.
leading order in an expansion around a solution,
Typically, if a manifold admits a Ricci-flat metric,
these fields are zero and this step is trivial; never-
it will not be unique; rather there will be a moduli
theless, it is useful in making a systematic definition
space of such metrics. Physically, one then expects
of the interaction terms in the EFT.
to find solutions in which the choice of Ricci-flat
As we saw in pure GR, the configuration space
metric is slowly varying in d-dimensional spacetime.
parametrized by the massless fields in the EFT, is the
General arguments imply that such variations
moduli space of compactifications obtained by
must be described by variations of d-dimensional
deforming the original solution. Thus, from a
fields, governed by an EFT. Given an explicit
mathematical point of view, low-energy EFT can
parametrization of the family of metrics, say
be thought of as a sort of enhancement of the
Gij ( ) for some parameters  , in principle the
concept of moduli space, and a dictionary set up
EFT could be computed explicitly by promoting
between mathematical and physical languages. To
the parameters to d-dimensional fields, substituting
give its next entry, there is a natural physical metric
this parametrization into the D-dimensional action,
on moduli space, defined by restriction from the
and expanding in powers of the d-dimensional
metric on the configuration space of the theory T ;
derivatives. In pure GR, we would find the four-
this becomes the sigma-model metric for the scalars
dimensional effective Lagrangian
in the EFT. Because the theories T arising from
Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi string theory are geometrically natural, this metric is
LEFT ¼ dk y det GðÞRð4Þ also natural from a mathematical point of view, and
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi one often finds that much is already known about it.
@Gij @Gkl For example, the somewhat fearsome two derivative
þ det GðÞGik ðÞGjl ðÞ  @  @ 
@ @ terms in eqn [1], are (perhaps) less so when one
þ  ½1 realizes that this is an explicit expression for the
Weil–Petersson metric on the moduli space of Ricci-
While this is easily evaluated for K a symmetric space flat metrics. In any case, knowing this dictionary is
or torus, in general a direct computation of LEFT is essential for taking advantage of the literature.
impossible. This becomes especially clear when one Another important entry in this dictionary is that
learns that the Ricci-flat metrics Gij are not explicitly the automorphism group of a solution translates
known for the examples of interest. Nevertheless, into the gauge group in the EFT. This can be either
clever indirect methods have been found that give a continuous, leading to the gauge symmetry of
great deal of information about LEFT ; this is much of Maxwell and Yang–Mills theories, or discrete,
the art of superstring compactification. However, in leading to discrete gauge symmetry. For example, if
this section, let us ignore this point and continue as if the metric on K has continuous isometry group G,
we could do such computations explicitly. the resulting EFT will have gauge symmetry G, as in
Given a solution, one proceeds to consider its the original example of Kaluza and Klein with K ffi S1
small perturbations, which satisfy the linearized and G ffi U(1). Mathematically, these phenomena
equations of motion. If these include exponentially of ‘‘enhanced symmetry’’ are often treated using the
growing modes (often called ‘‘tachyons’’), the solu- languages of equivariant theories (cohomology,
tion is unstable. (Note that this criterion is modified K-theory, etc.), stacks, and so on.
588 Compactification of Superstring Theory

To give another example, obstructed deformations We now assume N = 1 supersymmetry. An unbroken


(solutions of the linearized equations which do not supersymmetry is a spinor
for which the left-hand
correspond to elements of the tangent space of the side is zero, so we seek compactifications with a
true moduli space) correspond to scalar fields which, unique solution of these equations.
while massless, appear in the effective potential in a We first discuss the case H = 0. Setting  in
way which prevents giving them expectation values. eqn [2] to zero, we find that the warp factor f must
Since the quadratic terms V 00 are masses, this be constant. The vanishing of i requires
to be a
dependence must be at cubic or higher order. covariantly constant spinor. For a six-dimensional
While the preceding concepts are general and apply M to have a unique such spinor, it must have SU(3)
to compactification of all local field theories, string holonomy; in other words, M must be a Calabi–Yau
and M-theory add some particular ingredients to this manifold. In the following, we use basic facts about
general recipe. In the limits of small curvatures and their geometry.
weak coupling, string and M-theory are well described The vanishing of  then requires constant dilaton
by the ten- and 11-dimensional supergravity theories, , while the vanishing of a requires the gauge field
and thus the string/M-theory discussion usually starts strength F to solve the hermitian Yang–Mills
with Kaluza–Klein compactification of these theories, equations,
which we denote I, IIa, IIb, HE, HO and M. Let us
now discuss a particular example. F2;0 ¼ F0;2 ¼ F1;1 ¼ 0
By the theorem of Donaldson and Uhlenbeck–Yau,
such solutions are in one-to-one correspondence
Calabi–Yau Compactification with -stable holomorphic vector bundles with
of the Heterotic String structure group H contained in the complexification
of G. Choose such a bundle E; by the general
Contact with the SM requires finding compactifications discussion above, the commutant of H in G will be
to d = 4 either without supersymmetry, or with at most the automorphism group of the connection on E and
N = 1 supersymmetry, because the SM includes chiral thus the low-energy gauge group of the resulting
fermions, which are incompatible with N > 1. Let us EFT. For example, since E8 has a maximal E6 
start with the E8  E8 heterotic string or ‘‘HE’’ theory. SU(3) subgroup, if E has structure group H = SL(3),
This choice is made rather than HO because only in this there is an embedding such that the unbroken gauge
case can we find the SM fermion representations as symmetry is E6  E8 , realizing one of the standard
subrepresentations of the adjoint of the gauge group. grand unified groups E6 as a factor.
Besides the metric, the other bosonic fields of the HE The choice of E is constrained by anomaly
supergravity theory are a scalar  called the dilaton, cancellation. This discussion (Green et al. 1987)
Yang–Mills gauge potentials for the group G E8  modifies the Bianchi identity for H to
E8 , and a 2-form gauge potential B (often called the
‘‘Neveu–Schwarz’’ or ‘‘NS’’ 2-form) whose defining 1 X a
dH ¼ tr R ^ R  F ^ Fa ½5
characteristic is that it minimally couples to the 30 a
heterotic string world-sheet. We will need their gauge
field strengths below: for Yang–Mills, this is a 2-form where R is the matrix of curvature 2-forms. The
a normalization of the F ^ F term is such that if we
FIJ with a indexing the adjoint of Lie G, and for the NS
2-form this is a 3-form HIJK . Denoting the two take E ffi TK the holomorphic tangent bundle of K,
Majorana–Weyl spinor representations of SO(1, 9) as with isomorphic connection, then using the embed-
S and C, then the fermions are the gravitino I 2 ding we just discussed, we obtain a solution of eqn
S
V, a spin 1/2 ‘‘dilatino’’  2 C, and the adjoint [5] with H = 0.
gauginos a 2 S. We use I to denote Dirac matrices Thus, we have a complete solution of the
contracted with a ‘‘zehnbein,’’ satisfying {I , J } = equations of motion. General arguments imply that
2GIJ , and IJ = (1=2)[I , J ], etc. supersymmetric Minkowski solutions are stable, so
A local supersymmetry transformation with para- the small fluctuations consist of massless and
meter
is then massive fields. Let us now discuss a few of the
massless fields. Since the EFT has N = 1 super-
I ¼ DI
þ 18HIJK JK
½2 symmetry, the massless scalars live in chiral multi-
plets, which are local coordinates on a complex
 ¼ @I I
 12
1
HIJK IJK
½3 Kähler manifold.
First, the moduli of Ricci-flat metrics on K will
a ¼ FIJ
a IJ

½4 lead to massless scalar fields: the complex structure
Compactification of Superstring Theory 589

moduli, which are naturally complex, and Kähler Note the very important fact that this expression
moduli, which are not. However, in string compac- only depends on the cohomology classes of the i
tification the latter are complexified to the periods of (and ). This means the Yukawa couplings can be
the 2-form B þ iJ integrated over a basis of H2 (K, Z), computed without finding the explicit harmonic
where J is the Kähler form and B is the NS 2-form. In representatives, which is not possible (we do not
addition, there is a complex field pairing the dilaton even know the explicit metric). More generally, one
(actually, exp()) and the ‘‘model-independent expects to be able to explicitly compute the super-
axion,’’ the scalar dual in d = 4 to the 2-form B . potential and all other holomorphic quantities in
Finally, each complex modulus of the holomorphic the effective Lagrangian solely from ‘‘topological’’
bundle E will lead to a chiral multiplet. Thus, the information (the Dolbeault cohomology ring, and
total number of massless uncharged chiral multiplets its generalizations within topological string theory).
is 1 þ h1, 1 (K) þ h2, 1 (K) þ dim H1 (K, End (E)). On the other hand, computing the Kähler metric in
Massless charged matter will arise from zero an N = 1 EFT is usually out of reach as it would
modes of the gauge field and its supersymmetric require having explicit normalized zero modes.
partner spinor a . It is slightly easier to discuss the Most results for this metric come from considering
spinor, and then appeal to supersymmetry to get the closely related compactifications with extended
bosons. Decomposing the spinors of SO(6) under supersymmetry, and arguing that the breaking
SU(3), one obtains (0, p) forms, and the Dirac to N = 1 supersymmetry makes small corrections
equation becomes the condition that these forms to this.
are harmonic. By the Hodge theorem, these are in There are several generalizations of this construc-
one-to-one correspondence with classes in Dolbeault tion. First, the necessary condition to solve eqn [5] is
cohomology H 0, p (K, V), for some bundle V. The that the left-hand side be exact, which requires
bundle V is obtained by decomposing the spinor into
c2 ðEÞ ¼ c2 ðTKÞ ½7
representations of the holonomy group of E. For
H = SU(3), the decomposition of the adjoint under This allows for a wide variety of E’s to be used, so
the embedding of SU(3)  E6 in E8 , that Ngen = 3 can be attained with many more K’s.
This class of models is often called ‘‘(0, 2) compacti-
 27Þ
248 ¼ ð8; 1Þ þ ð1; 78Þ þ ð3; 27Þ þ ð3;  ½6 fication’’ to denote the world-sheet supersymmetry
implies that charged matter will form ‘‘generations’’ of the heterotic string in these backgrounds. One can
in the 27, of number dim H 0, 1 (K, E), and ‘‘antigene- also use bundles with larger structure group; for

rations’’ in the 27, of number dim H 0, 1 (K, E)  = example, H = SL(4) leads to unbroken SO(10)  E8 ,
0, 2
dim H (K, E). The difference in these numbers is and H = SL(5) leads to unbroken SU(5)  E8 .
determined by the Atiyah–Singer index theorem to be The subsequent breaking of the grand unified
group to the SM gauge group is typically done by
1 choosing K with nontrivial 1 , so that it admits a
Ngen N27  N27
 ¼ 2c3 ðEÞ
flat line bundle W with nontrivial holonomy
In the special case of E ffi TK, these numbers are (usually called a ‘‘Wilson line’’). One then uses the
separately determined to be N27 = b1, 1 and bundle E
W in the above discussion, to obtain the
2, 1
N27
 =b , so their difference is (K)=2, half the commutant of H
W as gauge group. For example,
Euler number of K. In the real world, this number is if 1 (K) ffi Z5 , one can use W whose holonomy is an
Ngen = 3, and matching this under our assumptions element of order 5 in SU(5), to obtain as commutant
so far is very constraining. the SM gauge group SU(3)  SU(2)  U(1).
Substituting these zero modes into the ten- Another generalization is to take the 3-form H 6¼ 0.
dimensional Yang–Mills action and integrating, one This discussion begins by noting that, for super-
can derive the d = 4 EFT. For example, the cubic symmetry, we still require the existence of a unique
terms in the superpotential, usually called Yukawa spinor
; however, it will no longer be covariantly
couplings after the corresponding fermion–boson constant in the Levi-Civita connection. One way to
interactions in the component Lagrangian, are structure the problem is to note that the right-hand
obtained from the cubic product of zero modes side of eqn [2] takes the form of a connection with
Z torsion; the resulting equations have been discussed
 ^ trð1 ^ 2 ^ 3 Þ mathematically in (Li and Yau 2004).
K
Another recent approach to these compactifica-
where  is the holomorphic i 2 H 0, 1 (K, Rep E) are tions (Gauntlett 2004) starts out by arguing that

the zero modes, and tr arises from decomposing the cannot vanish on K, so it defines a weak SU(3)
E8 cubic group invariant. structure, a local reduction of the structure group of
590 Compactification of Superstring Theory

T K to SU(3) which need not be integrable. This Ns = 32


structure must be present in all N = 1, d = 4 super-
Given the supersymmetry algebra, if such a super-
symmetric compactifications and there are hopes
gravity exists, it is unique. Thus, toroidal compac-
that it will lead to a useful classification of the
tifications of d = 11 supergravity, IIa and IIb
possible local structures and corresponding partial
supergravity lead to the same series of maximally
differential equations (PDEs) on K.
supersymmetric theories. Their structure is gov-
erned by the exceptional Lie algebra E11d ; the
Higher-Dimensional and Extended gauge charges transform in a fundamental repre-
Supersymmetric Compactifications sentation of this algebra, while the scalar fields
parametrize a coset space G=H, where G is the
While there are similar quasirealistic constructions maximally split real form of the Lie group E11d ,
which start from the other string theories and and H is a maximal compact subgroup of G.
M-theory, before we discuss these, let us give an Nonperturbative duality symmetries lead to a
overview of compactifications with N 2 super- further identification by a maximal discrete sub-
symmetry in four dimensions, and in higher dimen- group of G.
sions. These are simpler analog models which can be
understood in more depth; their study led to one of
the most important discoveries in string/M-theory, Ns = 16
the theory of superstring duality. This supergravity can be coupled to maximally
As before, we require a covariantly constant supersymmetric super Yang–Mills theory, which
spinor. For Ricci-flat K with other background given a choice of gauge group G is unique. Thus,
fields zero, this requires the holonomy of K to be these theories (with zero cosmological constant and
one of trivial, SU(n), Sp(n), or the exceptional thus allowing super-Poincaré symmetry) are
holonomies G2 or Spin(7). In Table 1 we tabulate uniquely determined by the choice of G.
the possibilities with spacetime dimension d greater In d = 10, the choices E8  E8 and Spin(32)=Z2
or equal to 3, listing the supergravity theory, the which arise in string theory, are almost uniquely
holonomy type of K, and the type of the resulting determined by the Green–Schwarz anomaly cancel-
EFT: dimension d, total number of real super- lation analysis. Compactification of these HE, HO
symmetry parameters Ns, and the number of spinor and type I theories on T n produces a unique theory
supercharges N (in d = 6, since left- and right- with moduli space
chirality Majorana spinors are inequivalent, there
are two numbers). Rþ  SOðn; n þ 16; ZÞnSOðn; n þ 16; RÞ=SOðn; RÞ
The structure of the resulting supergravity EFTs is  SOðn þ 16; RÞ ½8
heavily constrained by Ns. We now discuss the
various possibilities. In Kaluza–Klein (KK) reduction, this arises from the
choice of metric gij , the antisymmetric tensor Bij and
Table 1 String/M-theories, holonomy groups and the resulting the choice of a flat E8  E8 or Spin(32)=Z2 connec-
supersymmetry tion on T n , while a more unified description follows
from the heterotic string world-sheet analysis. Here
Theory Holonomy d Ns N
the group SO(n, n þ 16) is defined to preserve an even
M, II Torus Any 32 Max self-dual quadratic form  of signature (n, n þ 16);
M SU(2) 7 16 1 for example,  = (E8 ) (E8 ) I I I, where I
SU(3) 5 8 1 is the form of signature (1,1) and E8 is the Cartan
G2 4 4 1
matrix. In fact, all such forms are equivalent under
Sp(4) 3 6 3
SU(4) 3 4 2 orthogonal integer similarity transformation; so,
Spin(7) 3 2 1 the resulting EFT is unique. It has a rank 16 þ 2n
IIa SU(2) 6 16 (1, 1) gauge group, which at generic points in moduli
SU(3) 4 8 2 space is U(1)16þ2n , but is enhanced to a nonabelian
G2 3 4 2
group G at special points. To describe G, we first
IIb SU(2) 6 16 (0, 2)
SU(3) 4 8 2 note that a point p in moduli space determines an
G2 3 4 2 n-dimensional subspace Vp of R16þ2n , and
HE, HO, I Torus Any 16 Max/2 an orthogonal subspace Vp? (of varying dimen-
SU(2) 6 8 1 sion). Lattice points of length squared 2 con-
SU(3) 4 4 1
tained in Vp? then correspond to roots of the Lie
G2 3 2 1
algebra of Gp .
Compactification of Superstring Theory 591

The other compactifications with Ns = 16 is Finally, these constructions admit further discrete
M-theory on K3 and its further toroidal reductions, choices, which break some of the gauge symmetry.
and IIb on K3. M-theory compactification to d = 7 The simplest to explain is in the toroidal compacti-
is dual to heterotic on T 3 , with the same moduli fication of I/HE/HO. The moduli space of theories
space and enhanced gauge symmetry. As we discuss we discussed uses flat connections on the torus
at the end of the section ‘‘Stringy and quantum which are continuously connected to the trivial
corrections,’’ the extra massless gauge bosons of connection, but in general the moduli space of flat
enhanced gauge symmetry are M2 branes wrapped connections has other components. The simplest
on 2-cycles with topology S2 . For such a cycle to example is the moduli space of flat E8  E8
have zero volume, the integral of the Kähler form connections on S1 , which has a second component
and holomorphic 2-form over the cycle must vanish; in which the holonomy exchanges the two E8 ’s. On
expressing this in a basis for H 2 (K3, R) leads to T 3 , there are connections for which the holonomies
exactly the same condition we discussed for cannot be simultaneously diagonalized. This struc-
enhanced gauge symmetry above. The final result is ture and the M-theory dual of these choices is
that all such K3 degenerations lead to one- of the discussed in (de Boer et al. 2001).
two-dimensional canonical singularities, of types A,
D or E, and the corresponding EFT phenomenon is
Ns = 8, d < 6
the enhanced gauge symmetry of corresponding
Dynkin type A, D, or E. Again, the gravity multiplet is uniquely determined,
IIb on K3 is similar, but reducing the self-dual so the most basic classification is by the gauge group
Ramond–Ramond (RR) 4-form potential on the 2- G. The full low-energy EFT is determined by the
cycles leads to self-dual tensor multiplets instead of matter content and action, and there are two types
Maxwell theory. The moduli space is eqn [8] but of matter multiplets. First, vector multiplets contain
with n = 5, not n = 4, incorporating periods of RR the Yang–Mills fields, fermions and 6  d scalars;
potentials and the SL(2, Z) duality symmetry of IIb their action is determined by a prepotential which is
theory. a G-invariant function of the fields. Since the vector
One may ask if the Ns = 16 I/HE/HO theories in multiplets contain massless adjoint scalars, a generic
d = 8 and d = 9 have similar duals. For d = 8, these vacuum in which these take nonzero distinct
are obtained by a pretty construction known as vacuum expectation values (VEVs) will have U(1)r
‘‘F-theory.’’ Geometrically, the simplest definition of gauge symmetry, the commutant of G with a generic
F-theory is to consider the special case of M-theory matrix (for d < 5, while there are several real
on an elliptically fibered Calabi–Yau, in the limit scalars, the potential forces these to commute in a
that the Kähler modulus of the fiber becomes small. supersymmetric vacuum). Vacua with this type of
One check of this claim for d = 8 is that the moduli gauge symmetry breaking, which does not reduce
space of elliptically fibered K3s agrees with eqn [8] the rank of the gauge group, are usually referred to
with n = 2. as on a ‘‘Coulomb branch’’ of the moduli space. To
Another definition of F-theory is the particular summarize, this sector can be specified by nV , the
case of IIb compactification using Dirichlet number of vector multiplets, and the prepotential F ,
7-branes, and orientifold 7-planes. This construction a function of the nV VEVs which is cubic in d = 5,
is T-dual to the type I theory on T 2 , which provides and holomorphic in d = 4.
its simplest string theory definition. As discussed in Hypermultiplets contain scalars which parame-
Polchinski (1999), one can think of the open strings trize a quaternionic Kähler manifold, and partner
giving rise to type I gauge symmetry as living on 32 fermions. Thus, this sector is specified by a 4nH real
Dirichlet 9-branes (or D9-branes) and an orientifold dimensional quaternionic Kähler manifold. The G
nineplane. T-duality converts Dirichlet and orienti- action comes with triholomorphic moment maps; if
fold p-branes to (p  1)-branes; thus this relation nontrivial, VEVs in this sector can break gauge
follows by applying two T-dualities. symmetry and reduce it in rank. Such vacua are
These compactifications can also be parametrized usually referred to as on a ‘‘Higgs branch.’’
by elliptically fibered Calabi–Yaus, where K is the The basic example of these compactifications is
base, and the branes correspond to singularities of M-theory on a Calabi–Yau 3-fold (CY3 ). Reduction
the fibration. The relation between these two of the 3-form leads to h1, 1 (K) vector multiplets,
definitions follows fairly simply from the duality whose scalar components are the CY Kähler moduli.
between M-theory on T 2 , and IIb string on S1 . There The CY complex structure moduli pair with periods
is a partially understood generalization of this of the 3-form to produce h2, 1 (K) hypermultiplets.
to d = 9. Enhanced gauge symmetry then appears when the
592 Compactification of Superstring Theory

CY3 contains ADE singularities fibered over a curve, M-theory on an elliptically fibered CY3 in the same
from the same mechanism involving wrapped M2 general way we discussed under Ns = 16. The
branes we discussed under Ns = 16. If degenerating relation between F-theory and the heterotic string
curves lead to other singularities (e.g., the ODP or on K3 can be seen by lifting M-theory-heterotic
‘‘conifold’’), it is possible to obtain extremal transi- duality; this suggests that the two constructions are
tions which translate physically into Coulomb–Higgs dual only if the CY3 is a K3 fibration as well. Since
transitions. Finally, singularities in which surfaces not all elliptically fibered CY3 s are K3 fibered, the
degenerate lead to nontrivial fixed-point theories. F-theory construction is more general.
Reduction on S1 leads to IIa on CY3 , with the We return to d = 4 and Ns = 4 in the final section.
spectrum above plus a ‘‘universal hypermultiplet’’ The cases of Ns < 4 which exist in d  3 are far less
which includes the dilaton. Perhaps the most studied.
interesting new feature is the presence of world-
sheet instantons, which correct the metric on vector
Stringy and Quantum Corrections
multiplet moduli space. This metric satisfies the
restrictions of special geometry and thus can be The D-dimensional low-energy effective supergrav-
derived from a prepotential. ity actions on which we based our discussion so far
The same theory can be obtained by compactifi- are only approximations to the general story of
cation of IIb theory on the mirror CY3 . Now vector string/M-theory compactification. However, if
multiplets are related to the complex structure Planck’s constant is small, K is sufficiently large,
moduli space, while hypermultiplets are related to and its curvature is small, then they are controlled
Kähler moduli space. In this case, the prepotential approximations.
derived from variation of complex structure receives In M-theory, as in any theory of quantum gravity,
no instanton corrections, as we discuss in the next corrections are controlled by the Planck scale
section. parameter MD2P , which sits in front of the Einstein
Finally, one can compactify the heterotic string on term of the D-dimensional effective Lagrangian, and
K3  T 6d , but this theory follows from toroidal plays the role of h. In general, this is different from
reduction of the d = 6 case we discuss next. the four-dimensional Planck scale, which satisfies
M2P 4 = Vol(K)MD2P . After taking the low-energy
limit E MP , the remaining corrections are con-
Ns = 8, d = 6
trolled by the dimensionless parameters lP =R, where
These supergravities are similar to d < 6, but there R can any characteristic length scale of the solution:
is a new type of matter multiplet, the self-dual a curvature radius, the length of a nontrivial cycle,
tensor (in d < 6 this is dual to a vector multiplet). and so on.
Since fermions in d = 6 are chiral, there is an In string theory, one usually thinks of the
anomaly cancellation condition relating the numbers corrections as a double series expansion in gs , the
of the three types of multiplets (Aspinwall 1996, dimensionless (closed) string coupling constant, and
section 6.6), 0 , the inverse string tension parameter, of dimen-
sions (length)2 . The ten-dimensional Planck scale is
nH  nV þ 29nT ¼ 273 ½9
related to these parameters as M8P = 1=g2s (0 )4 , up to
One class of examples is the heterotic string a constant factor that depends on conventions.
compactified on K3. In the original perturbative Besides perturbative corrections, which have power-
constructions, to satisfy eqn [7], we need to choose a like dependence on these parameters, there can be
vector bundle with c2 (V) = (K3) = 24. The result- world sheet and ‘‘brane’’ instanton corrections. For
ing degrees of freedom are a single self-dual tensor example, a string world sheet can wrap around a
multiplet and a rank-16 gauge group. More gen- topologically nontrivial spacelike 2-cycle  in K,
erally, one can introduce N5B heterotic 5-branes, leading to an instanton correction to the effective
which generalize eqn [7] to c2 (E) þ N5B = c2 (TK). action which is suppressed as exp(Vol()=2 0 ).
Since this brane carries a self-dual tensor multiplet, More generally, any p-brane wrapping a p-cycle
this series of models is parametrized by nT . They are can produce a similar effect. As for which terms in
connected by transitions in which an E8 instanton the effective Lagrangian receive corrections, this
shrinks to zero size and becomes a 5-brane; the depends largely on the number and symmetries of
resulting decrease in the dimension of the moduli the fermion zero modes on the instanton world
space of E8 bundles on K3 agrees with eqn [9]. volumes.
Another class of examples is F-theory on an Let us start by discussing some cases in which one
elliptically fibered CY3 . These are related to can argue that these corrections are not present.
Compactification of Superstring Theory 593

First, extended supersymmetry can serve to elim- string/M-theory compactification on a singular


inate many corrections. This is analogous to the manifold K is typically consistent, but has new
familiar result that the superpotential in d = 4, N = 1 light degrees of freedom in the EFT, not predicted
supersymmetric field theory does not receive (or ‘‘is by KK arguments. We implicitly touched on one
protected from’’) perturbative corrections, and in example of this in the discussion of M-theory
many cases follows from similar formal arguments. compactification on K3 above, as the space of
In particular, supersymmetry forbids corrections to Ricci-flat K3 metrics has degeneration limits in
the potential and two derivative terms in the which curvatures grow without bound, while the
Ns = 32 and Ns = 16 Lagrangians. volumes of 2-cycles vanish. On the other hand, the
In Ns = 8, the superpotential is protected, but the structure of Ns = 16 supersymmetry essentially
two derivative terms can receive corrections. How- forces the d = 7 EFT in these limits to be non-
ever, there is a simple argument which precludes singular. Its only noteworthy feature is that a
many such corrections – since vector multiplet and nonabelian gauge symmetry is restored, and thus
hypermultiplet moduli spaces are decoupled, a certain charged vector bosons and their superpart-
correction whose control parameter sits in (say) a ners become massless.
vector multiplet, cannot affect hypermultiplet mod- To see what is happening microscopically, we
uli space. This fact allows for many exact computa- must consider an M-theory membrane (or 2-brane),
tions in these theories. wrapped on a degenerating 2-cycle. This appears as
As an example, in IIb on CY3 , the metric on a particle in d = 7, charged under the vector
vector multiplet moduli space is precisely eqn [1] as potential obtained by reduction of the D = 11
obtained from supergravity (in other words, the 3-form potential. The mass of this particle is the
Weil–Petersson metric on complex structure moduli volume of the 2-cycle multiplied by the membrane
space). First, while in principle it could have been tension, so as this volume shrinks to zero, the
corrected by world-sheet instantons, since these particle becomes massless. Thus, the physics is also
depend on Kähler moduli which sit in hypermulti- well defined in 11 dimensions, though not literally
plets, it is not. The only other instantons with the described by 11-dimensional supergravity.
requisite zero modes to modify this metric are This phenomenon has numerous generalizations.
wrapped Dirichlet branes. Since in IIb theory these Their common point is that, since the essential
wrap even-dimensional cycles, they also depend on physics involves new light degrees of freedom, they
Kähler moduli and thus leave vector moduli space can be understood in terms of a lower-dimensional
unaffected. quantum theory associated with the region around
As previously discussed, for K3-fibered CY3 , this the singularity. Depending on the geometry of the
theory is dual to the heterotic string on K3  T 2 . singularity, this is sometimes a weakly coupled field
There, the vector multiplets arise from Wilson lines theory, and sometimes a nontrivial conformal field
on T 2 , and reduce to an adjoint multiplet of N = 2 theory. Occasionally, as in IIb on K3, the lightest
supersymmetric Yang–Mills theory. Of course, in wrapped brane is a string, leading to a ‘‘little string
the quantum theory, the metric on this moduli space theory’’ (Aharony 2000).
receives instanton corrections. Thus, the duality
allows deriving the exact moduli space metric, and
N = 1 Supersymmetry in Four Dimensions
many other results of the Seiberg–Witten theory of
N = 2 gauge theory, as aspects of the geometry of Having described the general framework, we con-
Calabi–Yau moduli space. clude by discussing the various constructions which
In Ns = 4, only the superpotential is protected, lead to N = 1 supersymmetry. Besides the heterotic
and that only in perturbation theory; it can receive string on a CY3 , these compactifications include
nonperturbative corrections. Indeed, it appears that type IIa and IIb on orientifolds of CY3 , the related
this is fairly generic, suggesting that the effective F-theory on elliptically fibered Calabi–Yau 4-folds
potentials in these theories are often sufficiently ðCY4 Þ, and M-theory on G2 manifolds. Let us briefly
complicated to exhibit the structure required for spell out their ingredients, the known nonperturbative
supersymmetry breaking and the other symmetry corrections to the superpotential, and the duality
breakings of the SM. Understanding this is an active relations between these constructions.
subject of research. To start, we recap the heterotic string construc-
We now turn from corrections to novel physical tion. We must specify a CY3 K, and a bundle E over
phenomena which arise in these regimes. While this K which admits a Hermitian Yang–Mills connec-
is too large a subject to survey here, one of the basic tion. The gauge group G is the commutant of the
principles which governs this subject is the idea that structure group of E in E8  E8 or Spin(32)=Z2 ,
594 Compactification of Superstring Theory

while the chiral matter consists of metric moduli of Chern–Simons action on the special Lagrangian
K, and fields corresponding to a basis for the cycles, with disk world-sheet instanton corrections,
Dolbeault cohomology group H 0, 1 (K, Rep E) where as studied in open string mirror symmetry. The
Rep E is the bundle E embedded into an E8 bundle gauge theory instantons are now D2-branes.
and decomposed into G-reps. Using the duality relation between the IIa string and
There is a general (though somewhat formal) 11-dimensional M-theory, this construction can be
expression for the superpotential, lifted to a compactification of M-theory on a seven-
Z dimensional manifold L, which is an S1 fibration over
  K. The D6 and O6 planes arise from singularities in the
W¼  ^ þ tr A @A
 þ 2A3
3
S1 fibration. Generically, L can be smooth, and the
Z
only candidate in Table 1 for such an N = 1
þ  ^ H ð3Þ þ WNP ½10
compactification is a manifold with G2 holonomy;
therefore, L must have such holonomy. Finally, both
The first term is the holomorphic Chern–Simons the IIa world-sheet instantons and the D2-brane
action, whose variation enforces the F0, 2 = 0 condi- instantons lift to membrane instantons in M-theory.
tion. The second is the ‘‘flux superpotential,’’ while This construction implicitly demonstrates the exis-
the third term is the nonperturbative corrections. tence of a large number of G2 holonomy manifolds.
The best understood of these arise from super- Another way to arrive at these is to go back to the
symmetric gauge theory sectors. In some, but not all, heterotic string on K, and apply the duality (again
cases, these can be understood as arising from gauge under Ns = 16) between heterotic on T 3 and M-theory
theoretic instantons, which can be shown to be dual on K3 to the T 3 fibration structure on K, to arrive at
to heterotic 5-branes wrapped on K. Heterotic M-theory on a K3-fibered manifold of G2 holonomy.
world-sheet instantons can also contribute. Wrapping membranes on 2-cycles in these fibers, we
The HO theory is S-dual to the type I string, with can see enhanced gauge symmetry in this picture fairly
the same gauge group, realized by open strings on directly. It is an illuminating exercise to work through
Dirichlet 9-branes. This construction involves essen- its dual realizations in all of these constructions.
tially the same data. The two classes of heterotic Our final construction uses the interpretation of the
instantons are dual to D1- and D5-brane instantons, strong coupling limit of the HE theory as M-theory on
whose world-sheet theories are somewhat simpler. a one-dimensional interval I, in which the two E8
If the CY3 K has a fibration by tori, by applying factors live on the two boundaries. Thus, our original
T-duality to the fibers along the lines discussed for starting point can also be interpreted as the heterotic
tori under Ns = 16 above, one obtains various type II string on K  I. This construction is believed to be
orientifold compactifications. On an elliptic fibra- important physically as it allows generalizing a
tion, double T-duality produces a IIb compactifica- heterotic string tree-level relation between the gauge
tion with D7s and O7s. Using the relation between and gravitational couplings which is phenomenologi-
IIb theory on T 2 and F-theory on K3 fiberwise, one cally disfavored. One can relate it to a IIa orientifold as
can also think of this as an F-theory compactifica- well, now with D8- and O8-branes.
tion on a K3-fibered CY4 . More generally, one These multiple relations are often referred to as the
can compactify F theory on any elliptically fibered ‘‘web’’ of dualities. They lead to numerous relations
4-fold to obtain N = 1. These theories have between compactification manifolds, moduli spaces,
D3-instantons, the T-duals of both the type I superpotentials, and other properties of the EFTs,
D1- and D5-brane instantons. whose full power has only begun to be appreciated.
The theory of mirror symmetry predicts that all
CY3 s have T 3 fibration structures. Applying the
corresponding triple T-duality, one obtains a IIa Suggestions for further reading
compactification on the mirror CY3 K, ~ with D6-
Original references for all but the most recent of
branes and O6-planes. Supersymmetry requires these topics can be found in the following textbooks
these to wrap special Lagrangian cycles in K.~ As in
and proceedings. We have also referenced a few
all Dirichlet brane constructions, enhanced gauge research articles which are good starting points for
symmetry arises from coincident branes wrapping the more recent literature. There are far more
the same cycle, and only the classical groups are reviews than we could reference here, and a partial
visible in perturbation theory. Exceptional gauge listing of these appears at http://www.slac.stanford.
symmetry arises as a strong coupling phenomenon edu/spires/reviews/
of the sort described in the previous section. The
superpotential can also be thought of as mirror to See also: Brane Construction of Gauge Theories;
eqn [10], but now the first term is the sum of a real Random Algebraic Geometry, Attractors and Flux Vacua;
Compressible Flows: Mathematical Theory 595

String Theory: Phenomenology; Superstring Theories; Connes A and Gawȩdzki K (eds.) (1998) Les Houches 1995:
Two-Dimensional Conformal Field Theory and Vertex Quantum Symmetries. Amsterdam: North-Holland.
Operator Algebras; Viscous Incompressible Fluids: Deligne P et al. (eds.) (1999) Quantum Fields and Strings: A Course for
Mathematical Theory. Mathematicians. Providence, RI: American Mathematical Society.
Douglas M et al. (eds.) (2004) Strings and Geometry: Proceedings
of the 2002 Clay School. Providence, RI: American Mathe-
Further Reading matical Society.
Gauntlett J (2004) Branes, calibrations and supergravity. In:
Aharony O (2000) A brief review of ‘‘little string theories.’’ Douglas M et al. (eds.) Strings and Geometry, pp. 79–126.
Classical and Quantum Gravity 17: 929–938. Providence, RI: American Mathematical Society.
Aspinwall PS (1996) K3 surfaces and string duality, 1996 Green MB, Schwarz JH, and Witten E (1987) Superstring Theory,
preprint, arXiv:hep-th/9611137. 2 vols. Cambridge: Cambridge University Press.
Bachas C et al. (eds.) (2002) Les Houches 2001: Unity from Li J and Yau S-T (2004) The existence of supersymmetric string
Duality: Gravity, Gauge Theory and Strings. Berlin: theory with torsion, 2004 preprint, arXiv:hep-th/0411136.
Springer. Polchinski J (1998) String Theory, 2 vols. Cambridge: Cambridge
de Boer J et al. (2002) Triples, fluxes, and strings. Advances in University Press.
Theoretical and Mathematical Physics 4: 995.

Compressible Flows: Mathematical Theory


G-Q Chen, Northwestern University, independent variables, then the constitutive relations
Evanston, IL, USA can be written as
ª 2006 Elsevier Ltd. All rights reserved. ðe; p; Þ ¼ ðeð ; SÞ; pð ; SÞ; ð ; SÞÞ ½6
2
governed by  dS = de þ pd = de  pd = . For
Introduction polytropic gases,

The Euler equations for compressible fluids consist of p ¼ pð ; SÞ ¼   eS=cv


the conservation laws of mass, momentum, and energy: p

ð  1Þ ½7
@t þ rx  m ¼ 0; x 2 Rd ½1
p
  ¼
R
m
m
@t m þ rx  þ rx p ¼ 0 ½2
where R > 0 may be taken to be the universal gas
  constant divided by the effective molecular weight of
m the particular gas, cv > 0 is the specific heat at constant
@t E þ rx  ðE þ pÞ ¼ 0 ½3
volume,  = 1 þ R=cv > 1 is the adiabatic exponent,
and  can be any positive constant under scaling.
Equivalently, these correspond to the general form of
The most important criterion of applicability of
nonlinear hyperbolic systems of conservation laws:
any mathematical model is its well-posedness:
@t u þ rx  f ðuÞ ¼ 0; x 2 Rd ; u 2 Rn ½4 existence, uniqueness, and stability. The well-posedness
theory for compressible fluid flows is far from being
System [1]–[3] is closed by the following constitutive complete, and many further issues are still unexplored.
relations: In particular, the global existence and uniqueness of
solutions in Rd , d 2, is still a major open problem, and
1 jmj2
p ¼ pð ; eÞ; E¼ þ e ½5 only partial results shed some lights on the amazing
2
complexity of the problem. Below, we will mainly focus
In [1]–[3] and [5],  = 1= is the deformation on the well-posedness issues with emphasis on the
gradient (specific volume for fluids, strain for Cauchy problem, the initial value problem:
solids), v = (v1 , . . . , vd )> is the fluid velocity with
ujt¼0 ¼ u0 ½8
v = m the momentum vector, p is the scalar
pressure, and E is the total energy with e the first for inviscid compressible fluid flows and then
internal energy which is a given function of (, p) or for viscous compressible fluid flows.
( , p) defined through thermodynamical relations. Throughout this article, where a cited reference is
The other two thermodynamic variables are tem- not shown in the ‘‘Further reading’’ section, it may
perature  and entropy S. If ( , S) are chosen as usually be found by consulting Bressan (2000),
596 Compressible Flows: Mathematical Theory

Chen (2005), Dafermos (2005), Feireisl (2004), The system above can be rewritten in Lagrangian
Lions (1986, 1988) or Liv (2000). coordinates:

@t   @x v ¼ 0; @t v þ @x p ¼ 0
2
½13
Inviscid Compressible Fluid Flows: @t ðe þ v =2Þ þ @x ðpvÞ ¼ 0
Euler Equations with v = m=, where the coordinates (t, x) are
Solutions to the Euler equations [1]–[3] are generically the Lagrangian coordinates, which are different
discontinuous functions obeying the Clausius–Duhem from the Eulerian coordinates for [12]; for simp-
inequality, the second law of thermodynamics: licity of notations, we do not distinguish them.
For the barotropic case, systems [12] and [13]
@t ðSÞ þ rx  ðmSÞ  0 ½9 reduce to
in the sense of distributions. Such discontinuous  
@t  þ @x m ¼ 0; @t m þ @x m2 = þ p ¼ 0 ½14
solutions are called entropy solutions.
When a flow is isentropic, that is, entropy S is a and
uniform constant S0 in the flow, then the Euler
equations for the flow take the simpler form: @t   @x v ¼ 0; @t v þ @x p ¼ 0 ½15

@t  þ rx  m ¼ 0 respectively, where pressure p = p() = p~(),  = 1=.


½10 The solutions of [12] and [13], as well as [14] and
@t m þ rx  ðm  m=Þ þ rx p ¼ 0
[15], are equivalent even for entropy solutions with
where the pressure is a function of the density, vacuum where  = 0.
p = p(, S0 ), with constant S0 . For a polytropic gas, The potential flow is well known in transonic
aerodynamics, beyond the isentropic approxi-
pðÞ ¼  ; >1 ½11
mation [10] from [1] to [3]. Denote Dt = @t þ
P d
where  can be any positive constant by scaling. This k = 1 vk @xk the convective derivative along fluid
system can be derived from [1] to [3] as follows: for particle trajectories. From [1] to [3], we have
smooth solutions of [1]–[3], entropy S(, m, E) is
conserved along fluid particle trajectories, that is, Dt S ¼ 0 ½16

@t ðSÞ þ rx  ðmSÞ ¼ 0 and, by taking the curl of the momentum equations,


 
If the entropy is initially a uniform constant and ! ! pS ð; SÞ
Dt ¼  rx v þ rx   rx S ½17
the solution remains smooth, then the energy   3
equation can be eliminated and entropy S keeps the
The identities [16] and [17] imply that a smooth
same constant in later time. Thus, under constant
solution of [1]–[3] which is both isentropic and
initial entropy, a smooth solution of [1]–[3] satisfies
irrotational at time t = 0 remains isentropic and
the equations in [10]. Furthermore, solutions of
irrotational for all later times, as long as this
system [10] are also a good approximation to
solution stays smooth. Then, the conditions
solutions of system [1]–[3] even after shocks form,
S = S0 = const. and ! = curlx v = 0 are reasonable for
since the entropy increases across a shock to the
smooth solutions. For a smooth irrotational solu-
third order in wave strength for solutions of [1]–[3],
tion, we integrate the d-momentum equations in
while in [10] the entropy is constant. Moreover,
[10] through Bernoulli’s law:
system [10] is an excellent model for the isothermal
fluid flow with  = 1 and for the shallow-water flow @t v þ rx ðjvj2 =2Þ þ rx hðÞ ¼ 0
with  = 2. For such barotropic flows (i.e., p = p()),
the energy equation [3] serves as an entropy where h0 () = p (, S0 )=. On a simply connected
inequality (see Lax (1973)): space region, the condition curlx v = 0 implies that
there exists  such that v = rx . Then,
@t E þ rx  ðmðE þ pðÞÞ=Þ  0
in the sense of distributions @t  þ rx  ðrx Þ ¼ 0
½18
@t  þ 12jrx j2 þ hðÞ ¼ K
In the one-dimensional case, system [1]–[3] in
Eulerian coordinates is for some constant K. From the second equation in
  [18], we have
@t  þ @x m ¼ 0; @t m þ @x m2 = þ p ¼ 0
½12 ðDÞ ¼ h1 ðK  ð@t  þ 12jrx j2 ÞÞ
@t E þ @x ðmðE þ pÞ=Þ ¼ 0
Compressible Flows: Mathematical Theory 597

Then, system [18] can be rewritten as the following Consider the Cauchy problem of the Euler
time-dependent potential flow equation of second equations [1]–[3] in R3 for polytropic gases with
order: smooth initial data:

@t ðDÞ þ rx  ððDÞrx Þ ¼ 0 ½19 ð; v; SÞjt¼0 ¼ ð0 ; v0 ; S0 ÞðxÞ


0 ðxÞ > 0; x 2 R3 ½21
For a steady solution  = (x), that is, @t  = 0,
we obtain the celebrated steady potential flow , 0, S) for jxj  L, where
satisfying (0 , v0 , S0 )(x) = (
equation of aerodynamics: 
 > 0, S, and L are given constants. The equations
possess a unique local C1 solution (, v, S)(t, x) with
rx  ððrx Þrx Þ ¼ 0 ½20 (t, x) > 0 provided that the initial data [21] is
sufficiently regular. The support of the smooth
In applications in aerodynamics, [18] or [19] is disturbance (0 (x)  , q (x), S0 (x)  S) propagates
v0ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
used for discontinuous solutions, and the empirical
evidence is that entropy solutions of [18] or [19] are with speed at most  = p ( , S) (the sound speed),
fairly good approximations to entropy solutions for that is,
[1]–[3] provided that (1) the shock strengths are ; 0; SÞ if jxj  L þ t
ð; v; SÞðt; xÞ ¼ ð ½22
small, (2) the curvature of shock fronts is not too
large, and (3) there is a small amount of vorticity in Define
the region of interest. Model [19] or [18] is an Z  
excellent model to capture multidimensional shock PðtÞ ¼ pððt; xÞ; Sðt; xÞÞ1=  pð
; SÞ1= dx
3
waves by ignoring vorticity waves, while the ZR
incompressible Euler equations are an excellent FðtÞ ¼ ðvÞðt; xÞ  x dx
model to capture multidimensional vorticity waves R3

by ignoring shock waves. which, roughly speaking, measure the entropy and the
radial component of momentum. Then, if (, v, S)(t, x)
is a C1 solution of [1]–[3] and [21] for 0 < t < T, and
Local Well-Posedness for Classical Solutions
Pð0Þ  0; Fð0Þ > R4 max 0 ðxÞ
Consider the Cauchy problem for the Euler equations x
[1]–[3] with Cauchy data [8]: with  ¼ 16 =3 ½23
d s 1
Assume that u0 : R ! D is in H \ L with s > d=2 þ 1. then the lifespan T of the C1 solution is finite
Then, for the Cauchy problem [1]–[3] and [8], there (Sideris 1985).
exists a finite time T = T(ku0 ks , ku0 kL1 ) 2 (0, 1) such
To illustrate a way in which the conditions in
that there is a unique, stable bounded classical solution
[23] may be satisfied, consider the initial data:
u 2 C1 ([0, T]  Rd ) with u(t, x) 2 D for (t, x) 2 [0, T] 
Rd and u 2 C([0, T]; Hs ) \ C1 ([0, T]; H s1 ). Moreover, 0 = , S0 = S. Then P(0) = 0, and [23] holds if
the interval [0, T) with T < 1 is the maximal interval
Z
of the classical H s existence for [1]–[3] if and only if v0 ðxÞ  x dx > R4
jxj<R
either k(ut ,rx u)kL1 ! 1 or u(t, x) escapes every
compact subset K ! D as t ! T. Comparing both sides, one finds that the initial
This local existence can be established by relying velocity must be supersonic in some region relative
solely on the elementary linear existence theory for to the sound speed at infinity. The formation of a
symmetric hyperbolic systems with smooth coeffi- singularity (presumably a shock wave) is detected as
cients (cf. Majda (1984)), or by the abstract the disturbance overtakes the wave front forcing the
semigroup theory (Kato 1975). front to propagate with supersonic speed.
Singularities are formed even without the condi-
tion of largeness, such as [23], being satisfied. For
Formation of Singularities example, if S0 (x)  S and, for some 0 < R0 < R,
Z
For the one-dimensional case, singularities include
jxj1 ðjxj  rÞ2 ð0 ðxÞ  Þ dx > 0
the development of shock waves and formation of jxj>r
vacuum states. For the multidimensional case, the Z ½24
situation is much more complicated: besides shock jxj3 ðjxj2  r2 Þ0 ðxÞv0 ðxÞ  x dx  0
jxj>r
waves and vacuum states, singularities can also be
generated from vortex sheets, focusing and breaking for R0 < r < R, then the lifespan T of the C1
of waves, among others. solution of [1]–[3] and [21] is finite. The
598 Compressible Flows: Mathematical Theory

assumptions in [24] mean that, in an average sense, Sobolev space Hul s


(Dþ   
0 ), while (0 , v0 , E0 )(x) belongs
the gas must be slightly compressed and outgoing to the Sobolev space H s (D 0 ), for some fixed s  10.
directly behind the wave front. Assume also that there is a function () 2 H s (S 0 )
so that [26] and [27] hold, and the compatibility
conditions up to order s  1 are satisfied on S 0 by
Local Well-Posedness for Shock-Front Solutions
the initial data, together with the entropy condition:
For a general hyperbolic system of conservation laws qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
[4], shock-front solutions are discontinuous, piecewise vþ
0 
ðÞ þ p ðþ þ
0 ; S0 Þ < ðÞ
smooth entropy solutions with the following structure: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
< v 0 
ðÞ þ p ð 
0 ; S0 Þ ½29
1. There exists a C2 spacetime hypersurface S(t)
defined in (t, x) for 0  t  T with spacetime Then, there are a C2 hypersurface S(t) and C1
normal (
t ,
x ) = (
t ,
1 , . . . ,
d ) as well as two functions ( , v , E )(t, x) defined for t 2 [0, T],
C1 vector-valued functions: uþ (t, x) and u (t, x), with T sufficiently small, so that
defined on respective domains Dþ and D on þ þ þ
ð ; v ; E Þðt; xÞ; ðt; xÞ 2 Dþ
either side of the hypersurface S(t) and satisfying ð;v; EÞðt; xÞ ¼ ½30
@t u þ rx  f (u ) = 0 in D ; ð ; v ; E Þðt; xÞ; ðt; xÞ 2 D
2. The jump across the hypersurface S(t) satisfies the is the discontinuous shock-front solution of the
Rankine–Hugoniot condition: Cauchy problem [1]–[3] and [28]. Here a vector
s
{
t ðuþ  u Þ þ
x  ðf ðuþ Þ  f ðu ÞÞ}jS = 0 function u is in Hul , provided that there exists
some r > 0 so that maxy2Rd kwr, y ukHs < 1 with
d
For [4], the surface S is not known in advance wr, y (x)= w((x  y)=r), where w 2 C1 0 (R ) is a
and must be determined as part of the solution of function so that w(x)  0, w(x)= 1 when jxj  1=2,
the problem; thus, the two equations in (1)–(2) and w(x) = 0 when jxj > 1.
describe a multidimensional, highly nonlinear, free- The compatibility conditions are needed in order
boundary-value problem. The initial data yielding to avoid the formation of discontinuities in higher
shock-front solutions is defined as follows. Let S 0 be derivatives along other characteristic surfaces ema-
a smooth hypersurface parametrized by , and let nating from S 0 : Once the main condition [26] is

() = (
1 , . . . ,
d )() be a unit normal to S 0 . Define satisfied, the compatibility conditions are automati-
the piecewise smooth initial values for respective cally guaranteed for a wide class of initial data. The
domains Dþ 
0 and D0 on either side of the hypersur- idea of the proof is to use the existence of a strictly
face S 0 as convex entropy and the symmetrization of [4]; the
þ shock-front solutions are defined as the limit of a
u0 ðxÞ; x 2 Dþ 0
u0 ðxÞ ¼ ½25 convergent classical iteration scheme based on
u
0 ðxÞ; x 2 D
0 a linearization by using the theory of linearized
It is assumed that the initial jump in [25] satisfies the stability for shock fronts (Majda 1984). The uni-
Rankine–Hugoniot condition, that is, there is a form existence time of shock-front solutions in
smooth scalar function () so that shock strength can be achieved (Métivier 1990).
 
 ðÞ uþ 
0 ðÞ  u0 ðÞ
      Global Theory in L1 for the Isentropic Euler
þ
ðÞ  f uþ0 ðÞ  f u0 ðÞ ¼0 ½26 Equations for x 2 R
and that () does not define a characteristic Consider the Cauchy problem for [14] with initial
direction, that is, data:
 
ðÞ 6¼ i u
0 ;  2 S0 ; 1  i  n ½27 ð; mÞjt¼0 ¼ ð0 ; m0 ÞðxÞ ½31
where i , i = 1, . . . , n, are the eigenvalues of [4]. It is where 0 and m0 are in the physical region
natural to require that S(0) = S 0 . {(, m) :   0, jmj  C0 } for some C0 > 0. System
Consider the Euler equations [1]–[3] in R3 for [14] is strictly hyperbolic at the states with  > 0,
polytropic gases with piecewise smooth initial data: and strict hyperbolicity fails at the vacuum states
 þ þ þ  V := {(, m=) :  = 0, jm=j < 1}. Then, we have:
 ; v ; E ðxÞ; x 2 Dþ 0
ð; v; EÞjt¼0 ¼  0 0   ½28
0 ; v0 ; E ðxÞ; x 2 D 0
1. There exists a global solution (, m)(t, x) of the
Cauchy problem [14] and [31] satisfying
Assume that S 0 is a smooth compact surface in R3
and that (þ þ þ 0  ðt; xÞ  C; jmðt; xÞj  Cðt; xÞ ½32
0 , v0 , E0 )(x) belongs to the uniform local
Compressible Flows: Mathematical Theory 599

for some C > 0 depending only on C0 and , and such that, for every initial data (0 , v0 , S0 ) 2 K with
the entropy inequality TVR (0 , v0 , S0 )  N, when

@t ð; mÞ þ @x qð; mÞ  0 ½33 ð  1ÞTVR ð0 ; v0 ; S0 Þ  C0 for any  2 ð1; 5=3

in the sense of distributions for any convex weak the Cauchy problem [13] and [34] has a global
entropy–entropy flux pair ( , q), that is, entropy solution (, v, S)(t, x) which is bounded and
satisfies
rqð; mÞ ¼ r ð; mÞrf ð; mÞ
TVR ð; v; SÞðt; Þ  C TVR ð0 ; v0 ; S0 Þ
with
for some constant C > 0 independent of .
r2 ð; mÞ  0 and jV ¼ 0
This result specially includes that for the baro-
2. The solution operator (, m)(t,  ) = St (0 , m0 )(  ), tropic case (Nishida 1968, Nishida–Smoller 1973,
determined by (1), is compact in L1loc (R) for t > 0; DiPerna 1973). Some efforts in the direction of
3. Furthermore, if (0 , m0 )(x) is periodic with period relaxing the requirement of small total variation
P, then there exists a global periodic solution have been made. Some extensions to the initial-
(, m)(t, x) with [32] such that (, m)(t, x) asymp- boundary value problems have also been made. In
totically decays to addition, an entropy solution in BV with periodic
Z data or compact support decays when t ! 0.
1
ð0 ; m0 ÞðxÞdx Furthermore, even for a general hyperbolic system
jPj P
[4] for x 2 R, we have:
in L1 .
If the initial data functions u0 (x) and v0 (x) have
The convergence of the Lax–Friedrichs scheme, sufficiently small total variation and u0  v0 2 L1 (R),
the Godunov scheme, and the vanishing viscosity then, for the corresponding exact Glimm, or wave-
method for system [14] have also been established. front tracking, or vanishing viscosity solutions u(t, x)
The results are based on a compensated compact- and v(t, x) of the Cauchy problem [4] and [8], there
exists a constant C > 0 such that
ness framework to replace the BV compactness
framework. For a gas obeying the -law, the case
kuðt; Þ  vðt; ÞkL1 ðRÞ  Cku0  v0 kL1 ðRÞ
 = (N þ 2)=N, N  5 odd, was first studied by
DiPerna (1983), and the case 1 <   5=3 for for all t > 0 ½35
usual gases was first solved by Chen (1986) and An immediate consequence is that the whole
Ding-Chen-Luo (1985). The cases   3 and 5=3 < sequence of the approximate solutions constructed
 < 3 were treated by Lions–Perthame–Tadmor by the Glimm (1965) scheme, as well as the wave-
(1994) and Lions–Perthame–Souganidis (1996), front tracking method and the vanishing viscosity
respectively. The case of general pressure laws was method, converges to a unique entropy solution of
solved by Chen–LeFloch (2000, 2003). All the [4] and [8] when the mesh size or the viscosity
results for entropy solutions to [14] in Eulerian coefficient tends to zero. More detailed discussions
coordinates can equivalently be presented as the and extensive references about the L1 -stability of BV
corresponding results for entropy solutions to [15] entropy solutions and related topics can be found in
in Lagrangian coordinates. The isothermal case Bressan (2000) and Dafermos (2000); also see Chen
 = 1 was treated by Huang–Wang (2002). and Wang (2002). Furthermore, the Riemann solu-
tion is unique and asymptotically stable in the class
Global Theory in BV for the Adiabatic Euler of entropy solutions to [13] with large variation
Equations for x 2 R satisfying only one physical entropy inequality
(Chen-Frid-Li 2002).
Consider the Euler equations [13] for polytropic
gases with the Cauchy data: Multidimensional Steady Theory
ð; v; SÞjt¼0 ¼ ð0 ; v0 ; S0 ÞðxÞ ½34 The mathematical study of two-dimensional steady
supersonic flows past wedges, whose vertex angles
Then we have (Liu 1977, Temple 1981, Chen and
are less than the critical angle, can date back to the
Wagner 2003):
1940s, since the stability of such flows is fundamental
Let K
{(, v, S) :  > 0} be a compact set in Rþ  R2 , in applications (cf. Courant–Friedrichs (1948)). Local
and let N  1 be any constant. Then there exists a solutions around the wedge vertex were first
constant C0 = C0 (K, N), independent of  2 (1, 5=3], constructed (Gu 1962, Schaeffer 1976, Li 1980).
600 Compressible Flows: Mathematical Theory

Such global potential solutions were constructed the free boundary has a strictly positive lower bound
when the wedge has some convexity, or is a small (Chen-Feldman 2003, 2004), which works for the
perturbation of the straight wedge with fast decay in nonlinear equations whose coefficients may depend
the flow direction (Chen 2001, Chen-Xin-Yin 2002), on not only the solution itself but also the gradients
or is piecewise smooth which is a small perturba- of the solution. The second approach is a partial
tion of straight wedge (Zhang 2003). For the hodograph procedure, with which the existence and
two-dimensional steady supersonic flows gov- stability of multidimensional transonic shocks that
erned by the full Euler equations past Lipschitz are not nearly orthogonal to the flow direction can
wedges, it indicates (Chen-Zhang-Zhu 2005a) be handled (Chen-Feldman 2004): one of the main
that, when the wedge vertex angle is less than ingredients in this approach is to employ a partial
the critical angle, the strong shock front hodograph transform to reduce the free boundary
emanating from the wedge vertex is nonlinearly problem into a conormal boundary value problem
stable in structure globally, although there may be for the corresponding nonlinear equations of diver-
many weak shocks and vortex sheets between the gence form and then develop techniques to solve the
wedge boundary and the strong shock front, under conormal boundary value problem. When the reg-
the BV perturbation of the wedge so that the total ularity of the steady perturbation is C3, or higher,
variation of the tangent function along the wedge the third approach is to employ the implicit function
boundary is suitably small. This asserts that any theorem to deal with the existence and stability
supersonic shock for the wedge problem is non- problem. Another iteration approach, which works
linearly stable. well for the two-dimensional equations whose coeffi-
A self-similar gas flow past an infinite cone in R3 cients depend only on the solution itself, has also
with small vertex angle is also nonlinearly stable been developed (Canic-Keyfitz-Lieberman 2000).
upon the BV perturbation of the obstacle (Lien-Liu Further longstanding open problems include the
1999). It is still open for the nonlinear stability when existence of global transonic flows past an airfoil or
the infinite cone in R3 has arbitrary vertex angle. a smooth obstacle (Morawetz 1956–58, 1985).
The stability issues of supersonic vertex sheets have
been studied by classical linearized stability analysis,
Multidimensional Unsteady Problems
large-scale numerical simulations, and asymptotic
analysis. In particular, the nonlinear development of Now we present some multidimensional time-
instabilities of supersonic vortex sheets at high dependent problems with a simplifying feature that
Mach number was predicted as time evolves the data (domain and/or the initial data) coupled
(Woodward 1985, Artola-Majda 1989). In contrast with the structure of the underlying equations
with the prediction of evolution instability, steady obey certain geometric structure so that the multi-
supersonic vortex sheets, as time-asymptotics, are dimensional problems can be reduced to lower-
stable globally in structure, even under the BV dimensional problems with more complicated
perturbation of the Lipschitz walls, although there couplings. Different types of geometric structure
may be many weak shocks and supersonic vortex call for different techniques.
sheets away from the strong vortex sheet (Chen- The Euler equations for compressible fluids
Zhang-Zhu 2005b). with geometric structure describe many important
Transonic shock problems for steady fluid flows fluid flows, including spherically symmetric flows
are important in applications (cf. Courant and and self-similar flows. Such geometric flows
Friedrichs (1948)). A program on the existence and are motivated by many physical problems such as
stability of multidimensional transonic shocks has shock diffractions, supernovas formation in stellar
been initiated and three new analytical approaches dynamics, inertial confinement fusion, and under-
have been developed (Chen-Feldman 2003, 2004). water explosions. For the initial data with large
The transonic problems include the existence and amplitude having geometric structure, the requi-
stability of transonic shocks in the whole Rd , the red physical insight is: (1) whether the solution
existence and stability of transonic flows past finite has the same geometric structure globally and
or infinite nozzles, the stability of transonic flows (2) whether the solution blows up to infinity in a
past infinite nonsmooth wedges, and the existence of finite time. These questions are not easily under-
regular shock reflection solutions. The first stood in physical experiments and numerical simula-
approach is an iteration scheme based on the tions, especially for the blow-up, because of the
nondegeneracy of the free boundary condition: the limited capacity of available instruments and
jump of the normal derivative of a solution across computers.
Compressible Flows: Mathematical Theory 601

The first type of geometric structure is spherical gradient equation when the wedge is close to a flat
symmetry. A criterion for L1 Cauchy data functions wall.
of arbitrarily large amplitude was observed to For the potential flow equation [19], a self-
guarantee the existence of spherically symmetric similar solution is a solution of the form:
solutions in L1 in the large for the isentropic flows,  = t (y), y = x=t. Letting ’(y) = y2 =2 þ (y),
which model outgoing blast waves and large-time then the system can be rewritten in the form of a
asymptotic solutions (Chen 1997). On the other hand, second-order equation of mixed hyperbolic–elliptic
it is evident that the density blows up as jxj ! 0 in type in y 2 Rd by scaling:
general, especially for the focusing case; the singular-
ity at the origin makes the problem truly multi- ry  ððjry ’j2 ; ’Þry ’Þ þ dðjry ’j2 ; ’Þ ¼ 0 ½36
dimensional due to the reflection of waves from with (q2 , z) = (1  (q2 þ 2z)=2)1=(1) . Equation [36]
infinity and their strengthening as they move radially at jry ’j = q is hyperbolic (pseudosupersonic) if
inwards. One of the important open questions is to (q2 , z) þ qq (q2 , z) < 0 and elliptic (pseudosubsonic)
understand the order of singularity, (t, jxj) jxj , if (q2 , z) þ qq (q2 , z) > 0. Under this framework,
at the origin for bounded Cauchy data. the nature of the shock reflection pattern has been
The second type of geometric structure is self- explored for weak incident shocks (strength b) and
similarity, that is, the solutions with initial data small wedge angles 2w by a number of different
functions that give rise to self-similar solutions, scalings, a study of mixed equations, and matching
especially including Riemann solutions. Compressi- asymptotics for the different scalings, where the
ble flow equations in Rd , d  2, with one or more parameter  = c1 2w =b( þ 1) ranges from 0 to 1
linearly degenerate modes of wave propagation have and c1 is the speed of sound behind the incident
additional difficulties. In that case, the global flow is shock (Morawetz 1994). For  > 2, a regular
governed by a reduced (self-similar) system which is reflection of both strong and weak kinds is
of composite (hyperbolic–elliptic) type in the sub- possible as well as a Mach reflection; for  <
sonic region. The linearly degenerate waves give rise 1=2, a Mach reflection occurs and the flow behind
to one or more families of degenerate characteristics the reflection is subsonic and can be constructed in
which remain real in the subsonic region. In some principle (with an elliptic problem) and matched;
cases, the reduced equations couple an elliptic and for 1=2 <  < 2, the flow behind a Mach
(degenerate elliptic) problem for the density with a reflection may be transonic which is a solution of
hyperbolic (transport) equation for the vorticity. a nonlinear boundary-value problem of mixed
An important prototype for both practical type. The basic pattern of reflection has been
applications and the theory of multidimensional shown to be an almost semicircular shock issuing,
complex wave patterns is the problem of diffraction for a regular reflection, from the reflection point
of a shock wave which is incident along an inclined on the wedge and, for a Mach reflection, matched
ramp (see Glimm and Majda (1991)). When a with a local interaction flow. Some related
plane shock hits a wedge head-on, a self-similar observations were also made (Keller-Blank 1951,
reflected shock moves outward as the original Hunter-Keller 1984, Hunter 1988). It is important
shock moves forward. The computational and to establish rigorous proofs. Recently, a rigorous
asymptotic analysis shows that various patterns of existence proof was established for global solutions
reflected shocks may occur, including regular to shock reflection by large-angle wedges in Chen
reflection and (simple, double, and complex) and Feldman (2005).
Mach reflections. The main part or whole reflected
shock is a transonic shock in the self-similar
coordinates, for which the corresponding equation Analytical Frameworks for Entropy Solutions
changes the type from hyperbolic to elliptic across The recent great progress for entropy solutions for
the shock. There are few rigorous mathematical one-dimensional time-dependent Euler equations
results on the global existence and stability of and two-dimensional steady Euler equations, based
shock reflection solutions and the transition among on BV, L1 , or even L1 estimates, naturally arises the
regular, simple Mach, double Mach, and complex expectation that a similar approach may also be
Mach reflections for the potential flow equa- effective for the multidimensional Euler equations,
tion [19] and the full Euler equations [1]–[3]. or more generally, hyperbolic systems of conserva-
Some results were recently obtained for simplified tion laws, especially,
models including the transonic small-disturbance
equation near the reflection point and the pressure kuðt; ÞkBV  Cku0 kBV ½37
602 Compressible Flows: Mathematical Theory

Unfortunately, this is not the case. The necessary Furthermore, since the fluid is isotropic, we are led
condition for [37] to be held for p 6¼ 2 (Rauch to the Fourier law:
1986) is
q ¼ kð; ; jrx jÞrx 
rf k ðuÞrf l ðuÞ ¼ rf l ðuÞrf k ðuÞ
for all k; l ¼ 1; 2; . . . ; d ½38 for scalar function k which, in most cases, is taken
to be simply a function of  and , or even a
The analysis suggests that only systems in which the constant called the thermal conduction coefficient.
commutativity relation [38] holds offer any hope for Again, system [39]–[41] is closed by the constitutive
treatment in the framework of BV. This special case relations in [5]. The equation for entropy S is
includes the scalar case n = 1 and the case of one
 q
space dimension d = 1. Beyond that, it contains very @t ðSÞ þ rx  mS þ
few systems of physical interest. 
In this regard, it is important to identify effective Sðrx vÞ : rx v q  rx 
¼  ½43
analytical frameworks for studying entropy solu-  2
tions of the multidimensional Euler equations [1]–
[3], which are not in BV. Naturally, we want to The second law of thermodynamics indicates that
approach the questions of existence, stability, the right-hand side of [43] should be non-negative
uniqueness, and long-time behavior of entropy which yields the restriction:
solutions with as much generality as possible. For
this purpose, a theory of divergence-measure fields kð; ; jrx jÞ  0;   0; þ 2=d  0
to construct such a global framework has been
developed for studying entropy solutions (Chen-Frid The case  > 0 and þ  > 0 is the viscous case
1999, 2000, Chen-Torres 2005, Chen-Torres-Ziemer with heat conductivity k > 0. In particular, the
2005). For more details, see Chen (2005). kinetic theory indicates that the Stokes relationship
should hold, namely = 2=d and the adiabatic
component  = 5=3 for monatomic gases.
Viscous Compressible Fluid Flows: In mathematical viscous fluid dynamics, an
Navier–Stokes Equations important model is the barotropic model for
Compressible fluid flows that are viscous and viscous fluids, that is, p = p(). Then, the specific
conduct heat are governed by the following energy E can be taken in the form of
Navier–Stokes equations: E = (1=2)jvj2 þ e() with e0 () = p()=2 . For clas-
sical solutions, the energy of a barotropic flow
@t  þ rx  m ¼ 0; x 2 Rd ½39 satisfies the equality:
  @t E þ rx  ððE þ pÞvÞ ¼ rx  ðSvÞ  S : rx v
mm
@t m þ rx  þ rx p ¼ rx  S ½40

which is now a direct consequence of [39] and [40].
    The question of local existence of classical
m m
@t E þ rx  ðE þ pÞ ¼ rx   S  rx  q ½41 solutions to [39]–[41] for regular initial data was
  addressed by Nash (1962), where there is no
Here, S = S(rx v, , ) is the viscous stress tensor indication whether or not these solutions exist for
which is symmetric from the conservation of angular all times.
momentum and q is the heat flux. If the fluid is In the case of one space dimension, the well-
isotropic and the viscous tensor S is a linear function posedness is largely settled. The basic result for the
of rx v and invariant under a change of reference existence of classical solutions is that of Kazhikhov
frame (translation and rotation), then we deduce (1976); see Lions (1998) and Feireisl (2004) for
from elementary algebraic manipulations that extensive references. The discontinuous solutions
necessarily have been constructed (Shelukhin 1979, Serre 1986,
Hoff 1987, Chen-Hoff-Trivisa 2000).
S ¼ ð; Þrx  v þ 2ð; ÞD ½42 For the Navier–Stokes equations in R3 with
general equation of state, the global classical
which corresponds to the Newtonian fluids, where solutions for the Cauchy problem and various
D = (rx v þ (rx v)> )=2 is the deformation tensor and initial-boundary value problems whose initial data
and  are the Lamé viscosity coefficients. is small around a constant state have been
Compressible Flows: Mathematical Theory 603

constructed (Matsumura-Nishida 1980, 1983). The The inviscid limits from the Navier–Stokes equa-
approach is to obtain a priori estimates via energy tions to the Euler equations have been established as
methods for extending the local solution or for a long as the solutions of the Euler equations are
difference method globally. These results have been smooth, when the viscosity and heat conductivity
extended to the Cauchy problem or the initial- coefficients tend to zero (Klainerman-Majda 1982).
boundary value problems with small discontinuous It is completely open for general entropy solutions,
initial data (Hoff 1997). even in the one-dimensional case.
For the Navier–Stokes equations in Rd for
barotropic flows with [11] and large initial data, See also: Breaking Water Waves; Capillary Surfaces;
the global existence of solutions containing vacuum Fluid Mechanics: Numerical Methods; Geophysical
for the Cauchy problem or various initial-boundary Dynamics; Incompressible Euler Equations:
Mathematical Theory; Inviscid Flows;
value problems was first established by Lions
Magnetohydrodynamics; Newtonian Fluids and
(1998) for   3=2 if d = 2,   9=5 if d = 3, and
Thermohydraulics; Non-Newtonian Fluids; Partial
 > d=2 if d  4. The gap was closed by Feireisl– Differential Equations: Some Examples; Stability of
Novotný–Petzeltová (2001) for the full range Flows; Viscous Incompressible Fluids: Mathematical
 > d=2. These results have been extended to the Theory.
full Navier–Stokes equations describing the motion
of a general compressible, viscous, and heat con-
ducting fluid (see Feireisl (2004)). The physically
relevant isothermal case,  = 1, is completely open Further Reading
even if d = 2. The only large data existence result is Bressan A (2000) Hyperbolic Systems of Conservation Laws: The
that for radially symmetric data (Hoff 1992). The One-Dimensional Cauchy Problem. Oxford: Oxford Univer-
general case   1 and d = 3 for radially symmetric sity Press.
data was solved only recently (Jiang-Zhang 2001). Chen G-Q (2005) Euler equations and related hyperbolic
conservation laws. In: Dafermos CM and Feireisl E (eds.)
The lower-bound estimate on the density is a Handbook of Differential Equations II: Evolutionary Differ-
delicate issue. Weak solutions containing vacuum ential Equations, Chapter 1, pp. 1–104. Amsterdam: Elsevier.
for the isentropic viscous flows with constant Chen G-Q and Wang D (2002) The Cauchy problem for the
viscosity are unstable in general (Hoff-Serre Euler equations for compressible fluids. In: Friedlander S
1991). Hence, it is important to see whether and Serre D (eds.) Handbook of Mathematical Fluid
Dynamics, vol. 1, ch. 5, pp. 421–543. Amsterdam: Elsevier
vacuum will never develop if the initial data is Science B.V.
away from vacuum; this has been shown for the Courant R and Friedrichs KO (1948) Supersonic Flow and Shock
one-dimensional case for large initial data and Waves. New York: Springer.
for the multidimensional case with small data. On Dafermos CM (2005) Hyperbolic Conservation Laws in Con-
tinuum Physics (2nd edn). Berlin: Springer.
the other hand, from the kinetic theory, if
Feireisl E (2004) Dynamics of Viscous Compressible Fluids.
solutions contain vacuum, then the viscosity Oxford: Oxford University Press.
coefficients in the Navier–Stokes equations should Glimm J (1965) Solutions in the large for nonlinear hyperbolic
depend on the density near vacuum; this indeed system of equations. Communications on Pure and Applied
stabilizes the solutions for the one-dimensional Mathematics 18: 95–105.
case. Glimm J and Majda A (1991) Multidimensional Hyperbolic
Problems and Computations. New York: Springer.
The stability of viscous shock waves has been Lax PD (1973) Hyperbolic Systems of Conservation Laws and
studied for the one-dimensional case (see Liu (2000) the Mathematical Theory of Shock Waves. Philadelphia:
and the references therein). The compressible– SIAM.
incompressible limits from the isentropic compres- Lions PL (1996, 1998) Mathematical Topics in Fluid Mechanics,
sible to incompressible Navier–Stokes equations vols. 1–2. New York: Oxford University Press.
Liu T-P (2000) Hyperbolic and Viscous Conservation Laws,
when the Mach number tends to zero have been CBMS-NSF RCSAM, vol. 72. Philadelphia: SIAM.
established for arbitrarily weak solutions (Lions- Majda A (1984) Compressible Fluid Flow and Systems of
Masmoudi 1998) and for smooth solutions and a Conservation Laws in Several Space Variables. New York:
class of initial data functions (Hoff 1998). Springer.
604 Computational Methods in General Relativity: The Theory

Computational Methods in General Relativity: The Theory


M W Choptuik, University of British Columbia, Here, G is the Einstein tensor – that contracted
Vancouver, Canada piece of the Riemann curvature tensor that has
ª 2006 Elsevier Ltd. All rights reserved. vanishing divergence – and T is the stress tensor of
the matter content of the spacetime. T likewise has
vanishing divergence, an expression of the principle
of local conservation of stress–energy that general
Conventions and Units relativity embodies.
The elegant tensor formulation [1] belies the fact
This article adopts many of the conventions and that, ultimately, the field equations are generically a
notations of Misner, Thorne, and Wheeler (1973) – complicated and nonlinear set of partial differential
hereafter denoted MTW – including metric signature equations (PDEs) for the components of the space-
(  þ þ þ); definitions of Christoffel symbols and time metric tensor, g (x ), in some coordinate
curvature tensors (up to index permutations per- system x . Moreover, implicit in a numerical
mitted by standard symmetries of the tensors in a solution of [1] is the numerical solution of the
coordinate basis); the use of Greek indices equations of motion for any matter fields that
, , , . . . , ranging over the spacetime coordinate couple to the gravitational field – that is, that
values (0, 1, 2, 3) ! (t, x1 , x2 , x3 ), to denote the com- contribute to T . The reader is reminded that it is a
ponents of spacetime tensors such as g ; the similar hallmark of general relativity that, in principle, all
use of Latin indices i, j, k, . . . , ranging over the matter fields – including massless ones such as the
spatial coordinate values (1, 2, 3) ! (x1 , x2 , x3 ), for electromagnetic field – contribute to T .
spatial tensors such as ij ; the use of the Einstein Now, in the 3 þ 1 approach to general relativity
summation convention for both types of indices; the that is described below, the task of solving the field
use of standard Kronecker delta symbols (tensors), equations [1] is formulated as an initial-value or
  and i j ; the choice of geometric units, G = c = 1; Cauchy problem. Specifically, the spacetime metric,
and, finally, the normalization of the matter fields g (x ) = g (t, xk ), which encodes all geometric
implicit in the choice of the constant 8 in [1]. information concerning the spacetime, M, is
The majority of the equations that appear in this viewed as the time history, or dynamical evolution,
article are tensor equations, or specific components of the spatial metric, ij (0, xk ), of an initial space-
of tensor equations, written in traditional index (not like hypersurface, (0). In any practical calculation,
abstract index) form. Thus, these equations are the degree to which the matter fields ‘‘back-react’’
generally valid in any coordinate system, (t, xi ), on the gravitational field, that is, contribute to T
but, of course do require the introduction of a substantially enough to cause perturbations in g
coordinate basis and its dual. This approach is also at or above the desired accuracy threshold, will
largely a matter of convention, since all of what thus depend on the specifics of the initial
follows can be derived in a variety of fashions, some configuration.
of them purely geometrical, and there are also In astrophysics, there are relatively few well-
approaches to numerical relativity based, for exam- identified environments in which it is generally
ple, on frames rather than coordinate bases. thought to be crucial to the faithful emulation of
This article departs from MTW in its use of , i , the physics that the matter fields be fully coupled to
and ij to denote the lapse, shift, and spatial metric, the gravitational field. However, both observation-
respectively, rather than MTW’s N, N i , and (3) gij . ally and theoretically, the existence of gravitation-
Finally, the operations of partial differentiation ally compact objects is quite clear. Gravitationally
with respect to coordinates x , t, and xi are denoted compact means that a star with mass, M, has a
@ , @t , and @i , respectively. radius, R, comparable to its Schwarzschild radius,
RM , which is defined by
2G
Introduction RM ¼ M  1027 kg m1 ½2
c2
The numerical analysis of general relativity, or
Here, and only here, G and c – Newton’s gravita-
numerical relativity, is concerned with the use of
tional constant and the speed of light, respectively –
computational methods to derive approximate solu-
have been explicitly reintroduced. The fact that
tions to the Einstein field equations
RM =R is about 106 and 109 at the surfaces of the
G ¼ 8T ½1 sun and earth, respectively, is a reminder of just how
Computational Methods in General Relativity: The Theory 605

weak gravity is in the locality of Earth. However, as these events – using the techniques of numerical
befits anything of Einsteinian nature, the weakness relativity – have the potential to substantially hasten
of gravity is relative, so that at the surface of a the discovery process, on the basis of the general
neutron star, one would find principle that if one knows what signal to look for,
it is much easier to extract that signal from the
RM
 0:4 ½3 experimental noise.
R The computational task facing numerical relati-
while for black holes, one has vists who study problems such as binary inspiral is
RM formidable. In particular, such problems are intrin-
¼1 ½4 sically ‘‘3D,’’ to use the CFD (computational fluid
R
dynamics) nomenclature in which time dependence
In such circumstances, gravity is anything but is always assumed. That is, the PDEs that must be
weak! Furthermore, in situations where the mat- solved govern functions, F(t, xk ), that depend on all
ter–energy distribution has a highly time-dependent three spatial coordinates, xk , as well as on time, t.
quadrupole moment – such as occurs naturally with Unfortunately, even a cursory description of 3D
a compact-binary system (i.e., a gravitationally work in numerical relativity as it stands at this time
bound two-body system, in which each of the is far beyond the scope of this article.
bodies is either a black hole or a neutron star) – the What follows, then, is an outline of a traditional
dynamics of the gravitational field, including, approach to numerical relativity that underpins
crucially, the dynamics of the radiative components many of the calculations from the early years of
of the gravitational field, can be expected to the field (1970s and 1980s), most of which were
dominate the dynamics of the overall system, carried out with simplifying restrictions to
matter included. For scenarios such as these, it either spherical symmetry or axisymmetry. The
should come as no surprise that the solution of the mathematical development, which will hereafter be
combined gravitohydrodynamical system begs for called the 3 þ 1 approach to general relativity, has
numerical analysis. the advantage of using tensors and an associated
In addition, both from the physical and mathe- tensor calculus that are reasonably intuitive for the
matical perspectives, it is also natural to study the physicist. This ‘‘standard’’ 3 þ 1 approach is also
strong, field dynamic regimes (R ! RM and/or v ! c, sufficient in many instances (particularly those
where v is the typical speed characterizing internal with symmetry) in the sense that it leads to well-
bulk motion of the matter) of general relativity posed sets of PDEs that can be discretized and
within the context of a variety of matter models. then solved computationally in a convergent
Typical processes addressed by these theoretical (stable) fashion. In addition, a thorough under-
studies include the process of black hole formation, standing of the 3 þ 1 approach will be of sig-
end-of-life events for various types of model stars, nificant help to the reader wishing to study any of
and, again, the interaction, including collisions, of the current literature in numerical relativity,
gravitationally compact objects. Note that it is including the 3D work.
another hallmark of general relativity that highly However, the reader is strongly cautioned that
dynamical spacetimes need not contain any matter; the blind application of any of the equations that
indeed, the interaction of two black holes – the follow, especially in a 3D context, may well lead
natural analog of the Kepler problem in relativity – to ‘‘ill-posed systems,’’ numerical analysis of which
is a vacuum problem; that is, it is described by a is useless. Anyone specifically interested in using
solution of [1] with T = 0. the methods of numerical relativity to generate
Motivated in significant part by the large-scale discrete, approximate solutions to [1], particularly
efforts currently underway to directly detect gravita- in the generic 3D case, is thus urged to first
tional radiation (gravitational waves), much of the consult one of the comprehensive reviews of
contemporary work in numerical relativity is numerical relativity that continue to appear at
focused on precisely the problem of the late phases fairly regular intervals (see, e.g., Lehner (2001), or
of compact-binary inspiral and merger. Such bin- Baumgarte and Shapiro (2003)). Most such refer-
aries are expected to be the most likely candidates ences will also provide a useful overview of many
for early detection by existing instruments such as of the most popular numerical techniques that are
TAMA, GEO, VIRGO, LIGO, and, more likely, by currently being used to discretize (convert to
planned detectors including LIGO II and LISA (see, algebraic form) the Einstein equations, as well as
e.g., Hough and Rowan (2000)). Detailed and the main algorithms that are used to solve the
accurate predictions of expected waveforms from resulting discrete equations. These subjects are not
606 Computational Methods in General Relativity: The Theory

described below, not least since discussion of the of t should nominally be infinite, both to the future
available discretization techniques only makes as well as to the past; that is, the solution domain is
sense in the context of PDEs of specific systems
with specific boundary conditions, while there is 1 < t < 1 ½6
only space here to describe the general mathema-  1=2
tical setting for 3 þ 1 numerical relativity. jXj  ij xi xj <1 ½7

However, this assumes that one has global


existence for arbitrarily strong initial data, which
The 3 þ 1 Spacetime Split is decidedly not always the case in general
At least at the current time, computations in relativity. Indeed, ‘‘continued’’ or ‘‘catastrophic’’
numerical relativity are restricted to the case of gravitational collapse – that is, the process of black
globally hyperbolic spacetimes. A spacetime (four- hole formation – signaled, in modern language, by
dimensional pseudo-Riemannian manifold), M , the appearance of a trapped surface, inexorably
endowed with a metric, g , is globally hyperbolic leads to a physical singularity, which – the
if there is at least one edgeless, spacelike hypersur- somewhat vague nature of the singularity theorems
face, (0), that serves as a Cauchy surface. That is, of Penrose, Hawking, and others notwithstanding –
provided that the initial data for the gravitational in actual numerical computations invariably turns
field are set consistently on (0) – so that the four out to be ‘‘catastrophic’’ in terms of Cauchy
constraint equations are satisfied (see below) – the evolution.
entire metric g (t, xi ) can be determined from the Such behavior in time-dependent nonlinear PDEs
field equations [1] (with appropriate boundary is quite familiar in the mathematical community at
conditions), and thus, so can the complete geometric large, where it is frequently known as finite-time
structure of the spacetime manifold. blow-up (or finite-time singularity). However,
To be sure, global hyperbolicity is restrictive. It despite the fact that such behavior is one of the
excludes, for example, the highly interesting Gödel most fascinating aspects of solutions of the Einstein
universe. However, particularly from the point equations, the following discussion will be, impli-
of view of studying asymptotically flat solutions citly at least, restricted to the case of weak initial
(or solutions asymptotic to any of the currently data, that is, to initial data for which there is global
popular cosmologies), as is usually the case in existence.
astrophysics, the requirement of global hyperbolicity With the manifold M sliced into an infinite
is natural. stack of spacelike hypersurfaces, (t), attention
The 3 þ 1 split is based on the complete foliation shifts to any single surface, as well as to the
of M based on level surfaces of a scalar function, manner in which such a generic surface is
t – the time function. That is, the t = const. slices, embedded in the spacetime.
are three-dimensional spacelike (Riemannian) hyper- First, each spacelike hypersurface, (t), is itself a
surfaces, and, as t ranges from 1 to þ1, three-dimensional Riemannian differential manifold
completely fill the spacetime manifold, M . In with a metric ij (t, xk ). (Note that in this discussion,
order for the (t) to be everywhere spacelike, the symbol t is to be understood to represent any
t must be everywhere timelike: specific value of coordinate time.) From this metric,
one can construct an inverse metric,  ij (t, xk ),
g r tr t < 0 ½5 defined, as usual, so that
Here r is the spacetime covariant derivative  ik kj ¼ i j ½8
operator compatible with the four metric, g , thus
satisfying r g = 0, and g is the inverse metric Associated with the spatial metric, ij , is a natural
tensor, which satisfies g g =   . The reader is spatial covariant derivative operator, Di , that is
reminded that   is a Kronecker delta symbol; that compatible with ij :
is,   has the value 1 if  = , and the value 0
otherwise. Dk ij ¼ 0 ½9
Furthermore, the scalar function t is now adopted With the spatial metric, ij , and its inverse,  ij , in
as the temporal coordinate, so that x = (t, xi ), hand, the standard formulas of tensor analysis can
where the xi are the three spatial coordinates. As be applied to compute the usual suite of geome-
noted implicitly above, since the problem under trical tensors. All tensors thus computed, and
consideration is a pure Cauchy evolution, the range indeed, all tensors defined intrinsically to the
Computational Methods in General Relativity: The Theory 607

hypersurfaces (t) are called ‘‘spatial’’ tensors, and


have their indices (if any) raised and lowered with βidt
dx i
 ij and ij , respectively. Σ(t + dt )
Thus, the Christoffel symbols of the second kind,
α dt dx μ
i jk , are given by Σ(t )
 
i jk ¼ 12  il @k lj þ @j lk  @l jk ½10
Figure 1 Spacetime displacement in the 3 þ 1 approach,
following Misner, Thorne, and Wheeler (1973). Solid lines represent
Note that these quantities are symmetric in their last surfaces of constant time, t ; that is, each solid line represents a
two indices single spacelike hypersurface, (t). Dotted lines denote trajectories
of constant spatial coordinate, that is, trajectories with x k = const.
i jk ¼ i kj ½11 The lapse function, (t, x k ), encodes the (local) ratio between
elapsed coordinate time, dt, and elapsed proper time, d =  dt, for
an observer moving normal to the slices (i.e., for an observer with a
and that they can be used, as usual, in explicit 4-velocity, u  , identical to the hypersurface normal, n  ). Similarly,
calculation of the action of the spatial covariant the shift vector,  i (t, x k ), describes the shift, i (t, x i ) dt, in
derivative operator on an arbitrary tensor. In trajectories of constant spatial coordinate – the dotted lines in the
particular, for the special cases of a spatial vector, figure – relative to motion perpendicular to the slices. The 3 þ 1
form of the line element [18] then follows immediately from an
V i , and a covector (1-form), Wi , one has
application of the spacetime version of the Pythagorean theorem.

Di V j ¼ @i V j þ j ik V k ½12
As Figure 1 illustrates, a quick route to the 3 þ 1
and decomposition of the above expression, and thus of
the tensor g itself, is based on an application of
Di Wj ¼ @i Wj  k ij Wk ½13 the ‘‘four-dimensional Pythagorean theorem.’’ In
setting up the calculation, one naturally identifies
respectively. four functions, the scalar lapse, (t, xk ), and the
Given the Christoffel symbols, the components of vector shift, i (t, xk ), that encode the full coordi-
the spatial Riemmann tensor, denoted here Rijk l , are nate (gauge) freedom of the theory. That is,
computed using complete specification of the lapse and shift is
equivalent to completely fixing the spacetime
Rijk l ¼ @j l ik  @i l jk þ m ik l mj coordinate system.
In light of the above discussion, and again
 m jk l mi ½14
referring to Figure 1, one readily deduces the 3 þ 1
decomposition of the spacetime line element:
Finally, the Ricci tensor, Ri j , and Ricci scalar, R, are
defined in the usual fashion   
ds2 ¼ 2 dt2 þ ij dxi þ  i dt dxj þ j dt ½18

Ri j ¼  ik Rkj ¼  ik Rklj l ½15 A rearranged form of this last expression is also


often seen in the literature:
R ¼  ij Rij ½16  
ds2 ¼ 2 þ k k dt2 þ 2k dxk dt
The reader should again note that all of the
tensors just defined ‘‘live’’ on each and every single þ ij dxi dxj ½19
spacelike hypersurface, (t), and are thus known as
The following useful identifications of the ‘‘time–
hypersurface-intrinsic quantities. In particular, the
time,’’ ‘‘time–space,’’ and ‘‘space–space’’ pieces of
spatial Riemann tensor, Rijk l , which encodes all
the spacetime metric, g , follow immediately from
intrinsic geometric information about (t), in no
[19]:
way depends on how the slice is embedded in the
spacetime M . g00 ¼ 2 þ  i i ½20
The next step in the 3 þ 1 approach involves
rewriting the fundamental spacetime line element for g0i ¼ gi0 ¼ i ¼ ik  k ½21
the squared proper distance, ds2 , between two
spacetime events, P and Q, having coordinates x gij ¼ ij ½22
and x þ dx , respectively,
This last relation is an example of a useful general
ds2 ¼ g dx dx ½17 result; the purely spatial components, Qijk , of a
608 Computational Methods in General Relativity: The Theory

completely covariant, but otherwise arbitrary, space- the extrinsic curvature (or second fundamental
time tensor, Q , constitute the components of a form). This additional tensor is analogous to a
completely covariant spatial tensor. time derivative of ij (t, xk ), or, from a Hamiltonian
A straightforward calculation, which provides a perspective, to a variable that is dynamically
good exercise in the use of the 3 þ 1 calculus, conjugate to ij (t, xk ).
yields the following equally useful identifications for As the name suggests, the extrinsic curvature
various pieces of the inverse spacetime metric: g describes the manner in which the slice (t) is
embedded in the manifold (to be contrasted with
g00 ¼ 2 ½23 Rijk l defined by [14] which is, as mentioned
previously, completely insensitive to the manner in
g0i ¼ gi0 ¼ 2 i ½24 which the hypersurface is embedded in M ).
Geometrically, Kij is computed by calculating the
gij ¼  ij  2 i j ½25 spacetime gradient of the normal covector field, n ,
and projecting the result on to the hypersurface,
Since the Einstein field equations are equations
with, loosely speaking, geometry on one side and Kij ¼  12 ri nj ½31
matter on the other, tensors built from matter fields
must also be decomposed. In particular, it is where it must be stressed that r is the spacetime
conventional to define tensors,
, ji , and Sij that covariant derivative operator compatible with the
result from various projections of the spacetime 4-metric, g ; that is, r g = 0. A straightforward
stress energy tensor, T , onto the hypersurface: tensor calculus calculation then yields the following,
which can be viewed as a definition of the Kij :

 n n T  ½26
1  

Kij ¼ @t ij þ Di j þ Dj i ½32
ji  n T i ½27 2
Here, Di is the spatial covariant metric, compatible
Sij  Tij ½28 with ij (Dk ij = 0), that was defined previously.
For observers with 4-velocities u equal to n , and Observe that this equation can be easily solved for
only for those observers with u = n , the above @t ij (this will be done below), and thus, in the 3 þ 1
quantities have the interpretation of the locally and approach it is [32] that is the origin of the evolution
instantaneously measured energy density, momen- equations for the 3-metric components, ij .
tum density, and spatial stresses, respectively. As
with the geometric quantities, all of the matter
variables,
, ji , and Sij defined in [26]–[28] are Einstein’s Equations in 3 þ 1 Form
spatial tensors and thus have their indices (if any)
raised and lowered with the 3-metric. Note that the The Constraint Equations
identification Sij = Tij is another illustration of As is well known, as a result of the coordinate (gauge)
the general result mentioned in the context of the invariance of the theory, general relativity is overdeter-
previous identification of ij and gij . mined in a sense completely analogous to the situation
Finally, observing that time parameters are natu- in electrodynamics with the Maxwell equations. One
rally defined in terms of level surfaces (equipotential of the ways that this situation is manifested is via the
surfaces), it should be no surprise that the covariant existence of the constraint equations of general
components, n , of the hypersurface normal field, relativity. Briefly, starting from the naive view that
the ten metric functions, g (t, xi ), that completely
n ¼ ð; 0; 0; 0Þ ½29
determine the spacetime geometry are all dynamical –
are simpler than the components, n , of the normal that is, that they satisfy second-order-in-time equations
itself, of motion – one finds that the Einstein equations do not
 
n ¼ 1 ; 1 i ½30 provide dynamical equations of motion for the lapse,
, or the shift,  i . Rather, four of the field equations [1]
and, in fact, eqn [29] can also be deduced from a are equations of constraint for the ‘‘true’’ dynamical
quick study of Figure 1. variables of the theory, {ij , @t ij }, or, equivalently,
In the 3 þ 1 approach, in addition to the 3-metric, {ij , Ki j }. Note that in the following, the mixed
ij (t, xk ), and coordinate functions, (t, xi ) and form, Ki j , is at times used – again by convention – as
(t, xi ), it is convenient to introduce an additional the principal representation of the extrinsic curvature
rank-2 symmetric spatial tensor, Kij (t, xk ), known as tensor (instead of Kij as previously, or Kij ).
Computational Methods in General Relativity: The Theory 609

Thus, four of the components of [1] can be The Evolution Equations


written in the form
As discussed above, in the 3 þ 1 form of the Einstein
 
C ij ; Ki j ; @k ij ; @l @k ij ; @k Ki j ¼ T  ½33 equations [1], the spatial metric, ij , and the
extrinsic curvature, Ki j , are viewed as the dynamical
where T  depends only on the matter content in the variables for the gravitational field. The remainder
spacetime. Note that in addition to having no of the 3 þ 1 equations are thus two sets of six first-
dependence on @t2 ij , the constraints are also order-in-time evolution equations; one set for ij ,
independent of  and  i .
If the Einstein equations [1] are to hold throughout @t ij ¼  2ik Kk j þ  k @k ij
the spacetime, then the constraints [33] must hold on þ ik @j k þ kj @i  k ½37
each and every spacelike hypersurface, (t), including,
crucially, the initial hypersurface, (0). From the point and the other set for Ki j ,
of view of Cauchy evolution, this means that the 12 @t Ki j ¼  k @k Ki j  @k i Kk j þ @j k Ki k  Di Dj 
functions, {ij (0, xk ), Ki j (0, xk )}, constituting the grav-   
itational part of the initial data, are not completely þ  Ri j þ KKi j þ 8 12 i j ðS 
Þ  Si j ½38
freely specifiable, but must satisfy the four constraints As also noted previously, the evolution equations
  [37] for the spatial metric components, ij , follow
C ij ð0; xk Þ; Ki j ð0; xk Þ; . . . ¼ T  ð0; xk Þ ½34
from the definition of the extrinsic curvature [31].
The derivation of the equations for the extrinsic
However, provided initial data that do satisfy the
curvature, on the other hand, require lengthy, but
equations is chosen, then – as consistency of the
well-documented, manipulations of the spatial com-
theory demands – the dynamical equations of
ponents of the field equations [1].
motion for the {ij , Ki j } (eqns [37] and [38] below)
guarantee that the constraints will be satisfied on all
future (or past) hypersurfaces, (t). In this internal The (Naive) Cauchy Problem
self-consistency, the geometrical Bianchi identities, A naive statement of the Cauchy problem for 3 þ 1
r G = 0, and the local conservation of stress numerical relativity is thus as follows: fix a speci-
energy, r T  = 0, play crucial roles. fied number, N, of matter fields A (t, xk ), A =
In the 3 þ 1 approach, as one would expect, the 1, 2, . . . , N, all minimally coupled to the gravita-
constraint equations further naturally subdivide into tional field, with a total stress tensor, T , given by
a scalar equation
X
N
A
ij 2
R  Kij K þ K ¼ 16
½35 T ¼ T ½39
A¼1
and a (spatial) vector equation A
where T is the stress tensor corresponding to the
Dj Kij  Di K ¼ 8ji ½36 matter field A . Choose a topology for (0) (e.g., R3
with asymptotically flat boundary conditions; T 3 ,
where the energy and momentum densities,
and ji = with no boundaries, etc.) This also fixes the
 ik jk , are given by [26]–[28]. Equations [35] and [36] topology of M to be Rthe topology of (0).
are often known as the Hamiltonian and momentum Next, freely specify eight of the 12 {ij (0, xk ),
constraint, respectively, not least since
pffiffiffiffiffiffiffiffiffiffiffiffi
ffi the behavior of K j (0, xk )}, as well as initial values, A (0, xk ), for the
i

their solutions as X  ij xi xj ! 1 encodes the matter fields. Then determine the remaining four
conserved mass and linear momentum (four numbers) dynamical gravitational fields from the constraints
that can be defined in asymptotically flat spacetimes. [35] and [36]. This completes the initial data
In a general 3 þ 1 coordinate system, and with an specification.
appropriate choice of variables, the constraints can One must now choose a prescription for the
be written as a set of quasilinear elliptic equations kinematical (coordinate) functions,  and i , so that
for four of the {ij , Ki j } (or, more properly, for either explicitly or implicitly, they are completely fixed;
certain algebraic combinations of the {ij , Ki j }). for the case of implicit specification, this may well
Thus, especially for 2D and 3D calculations, the mean that the coordinate functions themselves will
setting of initial data for the Cauchy problem in satisfy PDEs, which, furthermore, can be of essentially
general relativity is itself a highly nontrivial mathe- any type in practice (i.e., elliptic, hyperbolic, para-
matical and computational exercise. Readers bolic, . . .). Finally, with consistent initial data,
wishing more details on this subject are directed to {ij (0, xk ), Ki j (0, xk ); A (0, xk )}, in hand, and with a
the comprehensive review by Cook (2000). prescription for the coordinate functions, the evolution
610 Computational Methods in General Relativity: The Theory

equations [37] and [38] can be used to advance the It is critical to note at this point, however, that in
dynamical variables forward or backward in time. the vast bulk of past and current work in numerical
The above description is naive since, apart from a relativity, including most of the ongoing work in
consistent mathematical specification, the most crucial 3D, the Einstein equations [1] have been solved, not
issue in the solution of a time-dependent PDE as a as a pure Cauchy problem, but as a mixed initial-
Cauchy problem is that the problem be ‘‘well posed.’’ value/boundary-value (IBVP) problem. That is, in
Roughly speaking, this means that solutions do not the discretization process in which the continuum
grow without bound (‘‘blow-up’’) without physical equations [1] are replaced with algebraic equations,
cause, and that small, smooth changes to initial data the continuum domain [6]–[7] is typically replaced
yield correspondingly small, smooth changes to the with a truncated spatial domain
evolved data. In short, the Cauchy problem must be
stable, and whether or not a particular subset of jxi j Ximax ½45
the equations displayed in this section yields a well- where the Ximax are a priori specified constants
posed problem is a complicated and delicate issue, (parameters of the computational solution) that
especially in the generic 3D case. The reader is thus define the extremities of the ‘‘computational box.’’
again cautioned against blind application of any of the As one might expect, the theory underlying stability
equations displayed in this article. and well-posedness of IBVP problems – especially
for differential systems as complicated as [1] – is
even more involved than for the pure initial-value
Boundary Conditions
case, and is another very active area of research in
In principle, because all spacelike hypersurfaces, (t), both mathematical and numerical relativity
in a pure Cauchy evolution are edgeless – and provided (see, e.g., Friedrich and Nagy (1999)).
that the initial data {ij (0, xk ), Ki j (0, xk ); A (0, xk )} is
consistent with asymptotic flatness, or whatever other See also: Critical Phenomena in Gravitational Collapse;
condition is appropriate given the topology of the Einstein Equations: Initial Value Formulation; Fluid
(t) – there are essentially no boundary conditions to Mechanics: Numerical Methods; General Relativity:
Overview; Geometric Analysis and General Relativity;
be imposed on the dynamical variables, {ij (t, xk ),
Gravitational Waves; Hamiltonian Reduction of Einstein’s
Ki j (t, xk )}, during Cauchy evolution. Note that asymp- Equations; Magnetohydrodynamics; Spacetime
totic flatness generally requires that Topology, Causal Structure and Singularities; Symmetric
 
1 Hyperbolic Systems and Shock Waves.
lim ij ¼ fij þ O ½40
X!1 X
and
  Further Reading
i 1
lim K j ¼ O ½41
X!1 X2 Baumgarte T and Shapiro SL (2001) Numerical relativity and
compact binaries. Physics Reports 376: 41–131.
where X is defined by Cook G (2000) Initial data for numerical relativity. Living
qffiffiffiffiffiffiffiffiffiffiffiffiffi Reviews of Relativity 3: 5 (irr-2000-5).
X ij xi xj ½42 Font JA (2003) Numerical hydrodynamics in general relativity.
Living Reviews of Relativity 6: 4 (irr-2003-4).
as previously, and fij is the flat 3-metric. Similarly, Frauendiener J (2004) Conformal infinity. Living Reviews of
Relativity 7: 1 (irr-2004-1).
should the lapse, , and shift, , be constrained by
Friedrich H and Nagy G (1999) The initial boundary value
elliptic PDEs – as is frequently the case in practice – problem for Einstein’s vacuum field equation. Communica-
then the only natural place to set boundary condi- tions in Mathematical Physics 201: 619–655.
tions is at spatial infinity, and then, provided that Hough J and Rowan S (2000) Gravitational wave detection by
the frame at spatial infinity is inertial, with interferometry (ground and space). Living Reviews of Rela-
tivity 3: 3 (irr-2000-3).
coordinate time t measuring proper time, one should
Lehner L (2001) Numerical relativity: a review. Classical and
have Quantum Gravity 18: R25–R86.
  Misner CW, Thorne KS, and Wheeler JA (1973) Gravitation.
1
lim  ¼ 1 þ O ½43 San Francisco: W.H. Freeman.
X!1 X Reula OA (1998) Hyperbolic methods for Einstein’s equations.
Living Reviews of Relativity 1: 3 (irr-1998-3).
and   Winicour J (2001) Characteristic evolution and matching. Living
1 Reviews of Relativity 4: 3 (irr-2001-3).
lim  i ¼ O ½44
X!1 X
Constrained Systems 611

Confinement see Quantum Chromodynamics

Conformal Geometry see Two-dimensional Conformal Field Theory and Vertex Operator Algebras

Conservation Laws see Symmetries and Conservation Laws

Constrained Systems
M Henneaux, Université Libre de Bruxelles, of motion in the standard canonical form
Brussels, Belgium q̇i = @H=@pi , ṗi = @H=@qi . These canonical
ª 2006 Elsevier Ltd. All rights reserved. equations are in normal form and have a unique
solution for given initial data, which would
contradict the presence of a gauge symmetry.
A simple example that illustrates this phenom-
Introduction enon is given by the following model for three
Consider a dynamical system with coordinates variables q1 , q2 , and , the Lagrangian of which
qi (i = 1, . . . , n) and Lagrangian L(qi , q̇i ) (field theory reads
is formally covered by regarding the spatial coordi-  
nates as a continuous index). When going to the L ¼ 12 ðq_ 1  Þ2 þ ðq_ 2  Þ2 ½2
Hamiltonian formulation, it is usually assumed that
This model is inspired by electromagnetism: the
the Legendre transformation between the velocities
variables q1 and q2 play a role somewhat similar
q̇i and the momenta
to that of the spatial components of the vector
@L potential, while  corresponds to the temporal
pi ¼ ½1
@ q_ i component. The Lagrangian is invariant under the
gauge transformations
can be inverted to yield the velocities as functions of
the q’s and the p’s. This ‘‘regular’’ situation occurs q1 ! q1 þ "; q2 ! q2 þ ";  !  þ "_ ½3
for most systems appearing in standard classical
mechanics and enables one to proceed to the where " is an arbitrary function of time. The
Hamiltonian formulation of the theory without conjugate momenta are
difficulty.
In field theory, however, the regular case is the p1 ¼ q_ 1  ; p2 ¼ q_ 2  ;  ¼ 0
exception rather than the rule. This is due to gauge
One cannot invert the Legendre transformation
invariance and first-order Lagrangians.
since one cannot express the velocity _ in terms of
 Gauge invariance A system possesses gauge sym- the momenta.
metries if it is invariant under transformations that  First-order Lagrangians Fermionic fields obey
involve arbitrary functions of time (gauge trans- first-order equations. Their Lagrangian is linear
formations). In that case, the solution of the in the derivatives, so that the conjugate momenta
equations of motion with given initial data is not pi depend on the coordinates qi only. It is then
unique, since it is always possible to perform a clearly impossible to express the velocities in
gauge transformation in the course of the evolution terms of the momenta through the Legendre
without changing the initial data. It is then clear transformation. More generally, any first-order
that the Legendre transformation cannot be inver- Lagrangian with or without gauge symmetry leads
tible, for if it were, one could rewrite the equations to a noninvertible Legendre transformation.
612 Constrained Systems

A simple system that exhibits this feature is by their expression [1] in terms of the coordinates
described by the Lagrangian and the velocities. They are called primary con-
straints. We shall assume that the matrix
L ¼ z2 z_ 1  12 ðz2 Þ2 ½4
@ðm Þ
1 2
for two bosonic degrees of freedom (z , z ). This @ðpi ; qi Þ
is in fact the canonical form of the Lagrangian for
a free particle in one dimension (z2 is the is everywhere of constant (maximum) rank M on the
momentum conjugate to the position z1 ): the phase-space surface defined by eqns [6] which is
system is already in Hamiltonian form. There is assumed to be smooth. This surface is of dimension
no gauge invariance, but because the Lagrangian 2n  M.
is first order, the Legendre transformation with
[4] as starting point, Canonical Hamiltonian The next step in the Dirac
procedure is to define the canonical Hamiltonian H
p 1 ¼ z2 ; p2 ¼ 0 ½5 through
is non invertible for the velocities (which do not H ¼ q_ i pi  L ½7
even appear in the formulas for the momenta).
As shown by Dirac, H can be re-expressed as a
Dirac showed how to develop the Hamiltonian function H(q, p) of the momenta and the coordi-
formalism in the case when the Legendre transfor- nates, even when the Legendre transformation is not
mation is not invertible. One can still reformulate invertible: the canonical Hamiltonian H depends on
the equations in phase space and write them in terms the velocities only through the pi ’s. Furthermore, the
of brackets with the Hamiltonian, but a new major original equations of motion in Lagrangian form are
feature emerges, namely the canonical variables are equivalent to the Hamiltonian equations
no longer free. Rather, the permissible phase-space
points are constrained to be on the so-called @H @m
q_ i ¼ þ um ½8
‘‘constrained surface.’’ For this reason, systems for @pi @pi
which the Legendre transformation is not invertible
are also called ‘‘constrained Hamiltonian systems.’’ @H @m
p_ i ¼   um ½9
We shall adopt this terminology here. @qi @qi
The purpose of this article is to explain the main
ideas underlying the Dirac method. To simplify the m ðq; pÞ ¼ 0 ½10
discussions and to focus on the features peculiar to
the Dirac construction, we shall assume as a rule where the um ’s are parameters, some of which will
that all necessary smoothness conditions are fulfilled be determined through the consistency algorithm to
by the functions, surfaces, etc., appearing in the be discussed shortly. (In [7]–[9] and everywhere
formalism. How to develop the analysis when some below, there is a summation over the repeated
of the smoothness conditions are not fulfilled is of indices.)
definite interest but goes beyond the scope of this
review. We shall also assume, for definiteness, that Secondary constraints The equations of motion [8]
all the variables are bosonic in order to avoid and [9] can be rewritten as
straightforward but somewhat cumbersome sign F_ ¼ ½F; H þ um ½F; m  ½11
factors in the formulas.
where F = F(q, p) is any function of the canonical
variables. Here, the Poisson bracket is defined as
General Theory usual by
Dirac Algorithm @G @F @G @F
½G; F ¼  ½12
Primary constraints When the Legendre transfor- @qi @pi @pi @qi
mation [1] cannot be inverted, the momenta pi ’s do If one takes for F one of the primary constraints
not span an n-dimensional space but are constrained m , one should get zero, _ m = 0. This yields the
by relations consistency conditions
m ðq; pÞ ¼ 0; m ¼ 1; . . . ; M ½6 0
½m ; H þ um ½m ; m0  ¼ 0 ½13
which follow from their definition. These equations These conditions can imply further restrictions on the
reduce to identities when the momenta are replaced canonical variables and/or impose conditions on the
Constrained Systems 613

variables um . Any new relation X(q, p) = 0 on the Poisson brackets with all the constraints vanish
canonical variables leads, in turn, to a further consis- weakly (i.e., are zero on the constraint surface),
0
tency condition Ẋ = [X, H] þ um [X, m0 ] = 0, which
can bring in either further restriction on the constraint ½F; j   0; j ¼ 1; . . . ; J ½18
surface or fix more variables um . Constraints that
A function is second class otherwise, that is, if there
follow from the consistency algorithm are called
is at least one constraint j such that [F, j ] 6¼ 0
‘‘secondary constraints.’’ Finally, one is left with a
(not even weakly). Second-class functions generate
certain number of secondary constraints, which are
canonical transformations that do not leave the
denoted by k = 0, k = M þ 1, . . . , M þ K. We assume
constraint surface invariant. Since canonical trans-
again that all the constraints (primary and secondary)
formations that map the constraint surface on itself
define a smooth surface, called the ‘‘constraint surface,’’
form a group, the Poisson bracket of two first-class
and fulfill the condition that @(k )=@(qi , pi ) is of
functions is itself a first-class function.
maximum rank J  M þ K on the constraint surface.
Because the system is constrained to lie on the
(We also assume for simplicity that there is no
constraint surface, the only allowed canonical
branching in the consistency algorithm.)
transformations are those that are generated by
first-class functions. The importance of the distinc-
Restrictions on the u’s Having a complete set of tion between first-class and second-class functions
constraints stems from this elementary fact. Note, in particular,
that the time evolution is generated – as it should –
j ¼ 0; j ¼ 1; . . . ; M þ K  J ½14 by a first-class generator since the equations of
motion [11] can be rewritten as
we can now investigate more precisely the restric-
tions on the variables um . These read F_  ½F; H 0  þ ua ½F; Vam m  ½19
with
½j ; H þ um ½j ; m   0; j ¼ 1; . . . ; J ½15
H 0 ¼ H þ U m m ½20
where the notation  means ‘‘equal modulo the
constraints.’’ In [15], m is summed from 1 to M.
0
One has both [H , m ]  0 and [Vam m , j ]  0.
Equations [15] are a set of J linear, inhomogeneous
equations for the u’s, with coefficients that are Splitting of the constraints One can separate
functions of the canonical variables qi , pi . The the constraints between first-class and second-class
general solution of this system is of the form constraints. This can be achieved by considering the
matrix Cjj0 of the Poisson bracket of the constraints,
um ¼ Um þ ua Vam ½16
Cjj0 ¼ ½j ; j0 ; j; j0 ¼ 1; . . . ; J ½21
where Um is a particular solution and where the Vam
(a = 1, . . . , A) provide a complete set of independent One has the following theorem due to Dirac.
solutions of the homogeneous system Theorem 1 If det Cjj0  0, there exists at least one
Vam ½j ; m   0 ½17 first-class constraint among the j ’s.
Proof Straightforward: if det Cjj0  0, one can find
The coefficients ua (a = 1, . . . , A) are completely a nontrivial solution j of j Cjj0  0. The corre-
arbitrary. sponding constraint j j is easily verified to be first
We thus see the emergence of another new feature class.
in the theory, in addition to the appearance of 0
constraints. It is that the general solution of the By redefining the constraints as j ! j = aj j j0
0
equations of motion may contain arbitrary functions with aj j (q, p) invertible, one can bring the Poisson
of time (when A 6¼ 0), in agreement with the brackets of the constraints to the form
possible presence of a gauge symmetry.
½a ; b  ¼ 0; ½a ;   ¼ 0; ½ ;   ¼ C ½22
with (j )  (a ,  ) and where the matrix C is
First- and Second-Class Constraints
invertible. (We assume, for simplicity, throughout
First- and second-class functions A function F(q, p) that the rank of the matrix Cjj0 is constant on the
is called a first-class function if it generates a constraint surface (‘‘regular case’’).) In this repre-
canonical transformation that maps the constraint sentation, the constraints are completely split into
surface on itself. Thus, F(q, p) is first class if its first-class constraints (a ) and second-class
614 Constrained Systems

constraints ( ): there is no first-class constraint left transformations as being the transformations gener-
among the  ’s, and the set {a } exhausts all the ated by the first-class constraints).
first-class constraints. Note that now the index The extended Hamiltonian HE is defined to be the
 runs over all (primary and
a = 1, . . . , A, A þ 1, . . . , A sum of the first-class Hamiltonian [20] and of all the
secondary) first-class constraints. first-class constraints a multiplied by an arbitrary
This separation of the constraints into first-class Lagrange multiplier,
and second-class constraints is quite important
H E ¼ H 0 þ va  a ½23
because, as already seen above, the first-class
constraints generate admissible canonical transfor- (with a summed from 1 to A). It is the generator of
mations, while the second-class constraints do not. the time evolution in which the complete gauge
For a bosonic system, the matrix C is antisym- symmetry is fully displayed.
metric. As C is invertible, this implies that the
number of second-class constraints is even. In the
fermionic case, C is symmetric (in the fermionic Elimination of second-class constraints – Dirac
sector) and, therefore, the number of second-class brackets Second-class constraints do not generate
constraints can be even or odd. permissible canonical transformations, since they do
not map the constraint surface on itself. For this
reason, it is convenient to eliminate them. This can
First-class constraints and gauge symmetries The consistently be done by using the Dirac brackets
first-class constraints not only map the constraint instead of the Poisson brackets. By definition, the
surface on itself, but generate, in fact, transforma- Dirac bracket [F, G]D of two phase-space functions
tions that do not change the physical state of the F and G is given by
system, that is, gauge transformations. Indeed, the ½F; DD ¼ ½F; G  ½F;  C ½ ; G ½24
presence of arbitrary functions in the solutions of
the equations of motion indicates that the q’s and where C is the inverse to C ,
the p’s involve some redundancy and are not all
C C ¼ 
physically distinct. Only those phase-space functions
whose time evolution does not depend on the (which exists since the  ’s are second class). As
arbitrary functions ua are observables. shown by Dirac, the bracket [24] is indeed a bracket
That the first-class constraints generate gauge (antisymmetry, derivation property, and Jacobi
transformations is rather clear in the case of the identity). Furthermore, it fulfills the crucial property
first-class primary constraints, since these appear that the Dirac bracket of anything with any second-
explicitly in the generator of the time evolution class constraint is zero,
multiplied by arbitrary functions. That it also holds
for the first-class secondary constraints is known as ½F;  D ¼ 0 ðF arbitraryÞ ½25
the ‘‘Dirac conjecture.’’ This conjecture can be
Thus, one can consistently eliminate the second-class
proved under reasonable assumptions (see, e.g.,
constraints and replace the Poisson bracket by the
Henneaux et al. 1990). The reason that the
Dirac bracket. Once this is done, one has fewer
secondary first-class constraints also correspond to
canonical variables and only first-class constraints
gauge transformations is that they appear in the
remain (if any). It also follows from the definition
brackets of the Hamiltonian with the primary first-
that the Dirac bracket of two first-class functions is
class constraints. Thus, different choices of arbitrary
equal to their Poisson bracket.
functions ua in the dynamical equations of motion
will lead to phase-space points that differ by a
canonical transformation whose generator involves Gauge conditions One can push the reduction
the secondary first-class constraints as well. procedure further and eliminate the first-class con-
In any case, as noted below, one must identify the straints by means of gauge conditions. Gauge condi-
phase-space points in the same orbit generated by all tions Ca = 0 are conditions on the phase-space
the first-class constraints (primary and secondary) in variables which do not follow from the Lagrangian
order to get a reduced space with a symplectic and which have the property that they cut each gauge
structure (‘‘reduced phase space’’). For this reason, orbit once and only once. Since the gauge transfor-
one postulates that the first-class constraints always mations are generated by the first-class constraints,
generate gauge transformations, even for systems this requirement is (locally) equivalent to
which are counterexamples to the Dirac conjecture
(i.e., in that case, one defines the gauge ½Ca ; b "b  0 ) "b  0 ½26
Constrained Systems 615

That is, the constraints (a , Cb ) form together a Second example (see eqn [4]). The primary
second-class system: there is no first-class constraint constraints are p1  z2 = 0 and p2 = 0 and define a
left once the conditions Ca = 0 are included. One two-dimensional plane in the four-dimensional
can then eliminate all the constraints and gauge phase space (z1 , z2 , p1 , p2 ). The consistency algo-
conditions and introduce the corresponding Dirac rithm forces u1 = z2 and u2 = 0 and does not bring
bracket. For gauge-invariant functions, this Dirac any further constraint. The constraints are second
bracket coincides with the original Poisson bracket. class since [p2 , p1  z2 ] = 1. One can eliminate p1
The reduced phase space is the unconstrained and p2 through the constraints. The Dirac brackets
space obtained after this reduction, equipped with of the remaining variables vanish, except
the Dirac bracket. It has dimension 2n  s  2A,  [z1 , z2 ] = 1. The reduced phase is the space of the
where 2n is the dimension of the original phase z’s, with z2 conjugate to z1 . The Hamiltonian is the
space, s is the number of second-class constraints, free-particle Hamiltonian , H = (1/2)(z2 )2 . Thus, one
and A is the number of first-class constraints. In the recovers the original description which was already
bosonic case, this number is even (as it should) in Hamiltonian form. (The recognition that a system
because s is even. One sees that ‘‘first-class con- is already in first-order form often enables one to
straints strike twice’’ since they need gauge shortcut some aspects of the Dirac procedure by not
conditions. introducing the unnecessary momenta which would
The observables of the theory are the reduced in any case be eliminated in the end.)
phase-space functions. They form a Poisson algebra,
the relevant reduced phase-space bracket being the
Dirac bracket associated with all the constraints and Quantization
gauge conditions. The symplectic structure defined
The phase space of physical interest is the reduced
in the reduced phase space is nondegenerate because
phase space and the physical algebra is the algebra
one has removed all the first-class constraints.
of the observables. The quantization of the theory
The definition of reduced phase space given above
then amounts to quantizing the algebra of the
is useful in practice but has the conceptual
observables. This can be achieved along two
drawback of relying on gauge conditions. This
different lines:
approach does not display clearly its intrinsic
significance and, furthermore, in the case of the 1. Reduce then quantize: In this direct approach,
so-called Gribov problems (global obstructions to one represents as quantum operators only the
cutting each gauge orbit once and only once), may reduced phase-space functions. There is no
yield the incorrect expectation that the reduced operator associated with non-gauge-invariant
phase space does not exist. We shall provide a more functions.
intrinsic definition below, which does not involve 2. Quantize then reduce: In this approach, one
gauge conditions. represents as quantum operators the bigger alge-
bra of functions of all the phase-space variables.
One must then take into account the constraints.
Examples The second-class constraints are enforced as
First example (see eqn [2]). There is here one operator equations, which is consistent with the
primary constraint, namely  = 0. The canonical correspondence rule that the commutator in the
Hamiltonian is (1=2)((p1 )2 þ (p2 )2 ) þ (p1 þ p2 ). quantum theory is ih times the Dirac bracket,
The consistency algorithm yields the secondary
AB  BA ¼ ih½A; BD ½27
constraint p1 þ p2 = 0 and no condition on the u’s.
The constraints are first class. They generate the (plus higher-order terms in h). The first-class
gauge transformations q1 ! q1 þ ", q2 ! q2 þ ", constraints are implemented in a more subtle
and  !  þ
, which coincide with the Lagrangian way. It would be inconsistent to impose them as
gauge transformations if one identifies
with "_ operator equations since in general [a , F]D 6¼ 0
(" and "_ are, of course, independent at any given (even in the Dirac bracket). What one does is to
time). One can fix the gauge by means of the gauge impose them as conditions on the physical states:
conditions  = 0, q1 þ q2 = 0. The reduced phase these are defined as the states annihilated by the
space is two-dimensional and the observables can first-class constraints,
be identified with the functions of the gauge-
a j i ¼ 0 ½28
invariant variables (1=2)(q1  q2 ) and p1  p2 ,
which are conjugate. Any other gauge condition For simple systems, it is easy to verify that the two
leads to the same reduced phase space. procedures are equivalent. There is yet another
616 Constrained Systems

approach, in which one extends the system rather functions in C1 (), that is, to impose that they are
than reduce it. This is the Becchi–Rouet–Stora– constant along the gauge orbits O. Assuming all
Tyutin (BRST) approach, in which the new variables necessary smoothness and regularity conditions to be
are called ghosts. fulfilled (i.e., that the orbits fiber which is, for
instance, the case if the gauge orbits are the orbits
of a free and proper group action), one may denote
the algebra of observables as C1 (=O). This algebra
Geometric Description
is a Poisson algebra because the induced 2-form on
We defined above first-class and second-class the quotient space =O is nondegenerate. The
constraints through algebraic means. It turns out algebraic description of the observables underlies the
that these definitions also have a geometrical BRST construction.
interpretation, which sheds considerable insight It is interesting to note that in the covariant
into their nature. approach to phase space, a similar two-step reduc-
The phase-space symplectic 2-form induces, by tion procedure occurs. What plays the role of the
pullback, a 2-form  on the constraint surface . constraint surface is the stationary surface in the
While is of maximal rank, this may not be the case space of all histories qi (t) of the dynamical variables.
for the induced  , which may be degenerate. In The gauge symmetry acts on this space and the
fact, the rank of  fails to be equal to the reduced phase space is just the quotient space. One
maximum rank 2n  J (where J is the total number can establish the equivalence of the two descriptions
of constraints) by precisely the number A  of first- (Barnich et al. 1991).
class constraints.
Indeed, the Hamiltonian vector fields Xa associated See also: Batalin–Vilkovisky Quantization; BRST
with the first-class constraints are tangent to the Quantization; Canonical General Relativity; Operads;
constraint surface  and are null eigenvectors of  , Perturbative Renormalization Theory and BRST;
Quantum Dynamics in Loop Quantum Gravity; Quantum
 ðXa ; YÞ ¼ 0 8Y tangent to  ½29 Field Theory: A Brief Introduction.

as an immediate consequence of the first-class


property. Here, all first-class constraints (primary
and secondary) yield a null eigenvector. The integral Further Reading
surfaces of the vector fields Xa are the gauge orbits. Anderson JL and Bergmann PG (1951) Constraints in covariant
The reduced phase space is nothing else but the field theories. Physical Review 83: 1018.
quotient space of the constraint surface by the gauge Barnich G, Henneaux M, and Schomblond C (1991) On the
orbits. The 2-form induced in the quotient space is covariant description of the canonical formalism. Physical
Review D 44: 939.
invertible because one has removed all degeneracy
Dirac PAM (1950) Generalized Hamiltonian dynamics. Canadian
directions (including the ones associated with sec- Journal of Mathematics 2: 129.
ondary first-class constraints). Reaching the reduced Dirac PAM (1967) Lectures on Quantum Mechanics. New York:
phase space falls under the scope of Hamiltonian Academic Press.
reduction. The observables are the functions on the Flato M, Lichnerowicz A, and Sternheimer D (1976) Deforma-
tions of Poisson brackets, Dirac brackets and applications.
reduced phase space.
Journal of Mathematical Physics 17: 1754.
Thus, the reduced phase space is obtained through Hanson A, Regge T, and Teitelboim C (1976) Constrained
a two-step procedure. First, one restricts the functions Hamiltonian Systems. Rome: Accad. Naz. dei Lincei.
to functions on the constraint surface . One may Henneaux M and Teitelboim C (1992) Quantization of Gauge
view the algebra C1 () of smooth functions on  as Systems. Princeton: Princeton University Press.
Henneaux M, Teitelboim C, and Zanelli J (1990) Gauge
the quotient algebra C1 (P)=N of the algebra C1 (P)
invariance and degree of freedom count. Nuclear Physics B
of smooth phase-space functions by the ideal N of 332: 169.
phase-space functions that vanish on the constraint Marsden JE and Weinstein A (1974) Reduction of symplectic
surface . The second step in the reduction procedure manifolds with symmetry. Reports on Mathematical Physics
is to impose the gauge-invariant condition on the 5: 121.
Constructive Quantum Field Theory 617

Constructive Quantum Field Theory


G Gallavotti, Università di Roma ‘‘La Sapienza,’’ relativistic covariance, Ruelle–Haag scattering
Rome, Italy theory: the ‘‘reconstruction problem.’’
ª 2006 G Gallavotti. Published by Elsevier Ltd. The characteristic problem for the construction of
All rights reserved. quantum fields is (1) and here attention will be
confined to it with the further restriction to the
paradigmatic massive scalar fields cases. The dimen-
Euclidean Quantum Fields sion d of the spacetime will be d = 2, 3 unless
The construction of a relativistic quantum field is specified otherwise.
still an open problem for fields in spacetime Given a cube  of side L,   R d , consider the
dimension d  4. The conceptual difficulty that following functional integral on the space of the fields on
sometimes led to fear an incompatibility between , that is, on functions ’(N)
x
defined for x 2 ,
nontrivial quantum systems and special relativity Z  Z 
ðNÞ4 ðNÞ2
has however been solved in the case of dimension ZN ð; f Þ ¼ exp  N ’x þ N ’x
d = 2, 3 although, so far, has not influenced the 
 
corresponding debate on the foundations of quan- ðNÞ
þN þ fx ’x dx PN ðd’ðNÞ Þ ½1
tum mechanics, still much alive.
It began in the early 1960s with Wightman’s work
on the axioms and the attempts at understanding the The fields ’(N)
x
are called ‘‘Euclidean’’ fields with
mathematical aspects of renormalization theory and ultraviolet cutoff N > 0, fx is a smooth function with
with Hepps’ renormalization theory for scalar fields. compact support bounded by jfx j  1 (for definiteness),
The breakthrough idea was, perhaps, Nelson’s the constants N > 0, N , N are called ‘‘bare cou-
realization that the problem could really be studied plings,’’ and PN is a Gaussian probability distribution
in Euclidean form. A solution in dimensions d = 2, 3 defining the free-field distribution with mass m and
has been obtained in the 1960s and 1970s through a ultraviolet cutoff N; the probability distribution PN
def R (N)
remarkable series of papers by Nelson, Glimm, is determined by its ‘‘covariance’’ C (N)
x, h =
’ x
Jaffe, and Guerra. While the works of Nelson and ’(N)
h dPN , which in the physics literature is called a
Guerra relied on the ‘‘Euclidean approach’’ (see ‘‘propagator,’’ given by
below) and on d = 2, the early works of Glimm and
1 X Z eipðxhþnLÞ
Jaffe dealt with d = 3 making use of the ‘‘Minkowskian ðNÞ
Cx;h ¼ N ðjpjÞdd p ½2
d p 2 þ m2
approach’’ (based on second quantization) but ð2Þ n2Zd
making already use of a ‘‘multiscale analysis’’
technique. The latter received great impulsion and The sum over the integers n 2 Zd is introduced so that
systematization by the adoption of Wilson’s views the field ’(N)
x
is periodic over the box : this is not
and methods on renormalization: in physics termi- really necessary as in the limit L ! 1 either translation
nology, renormalization group methods; a point of invariance would be recovered or lack of it properly
view taken here following the Euclidean approach. understood, but it makes the problem more symmetric
The solution dealt initially with scalar fields but it and generates a few technical simplifications; here
has been subsequently considerably extended. N (z) is a ‘‘regularizer’’ and a standard choice is
The Euclidean approach studies quantum fields
through the following problems: m2 ð 2N  1Þ
N ðjpjÞ ¼
p2 þ  2N m2
1. existence of the functional integrals defining the
generating functions (see below) of the probabil- with  > 1, which is such that
ity distribution of the interacting fields in finite
volume: the ‘‘ultraviolet stability problem,’’ N ðjpjÞ 1 1
2. existence of the infinite-volume limit of the 2 2
 2 2
 2
p þm p þm p þ  2N m2
generating functions: the ‘‘infrared problem,’’ XN  
1 1
and   ½3
3. check that the infinite volume generating h¼1
p2 þ  2ðh1Þ m2 p2 þ  2h m2
functions satisfy the axioms needed to pass
from the Euclidean, probabilistic, formulation here  > 1 can be chosen arbitrarily: so  = 2. If
to a Minkowskian formulation guaranteeing d > 3, the above regularization will not be sufficient
the existence of the Hamiltonian operator, and a N decaying faster than p2 would be needed.
618 Constructive Quantum Field Theory

A simple estimate yields, if " 2 (0, 1) is fixed and c the fields ’x(N) sampled with distribution PN
is suitably chosen, are rather singular objects. Their properties cannot be
  described by a single length scale: they are extremely
 ðNÞ 
Cx;h   c ðd2ÞN emjxhj large for large N, take independent values only beyond
  distances of order m1 but, at the same time, they look
 ðNÞ ðNÞ  "
Cx;h  Cx;h0   c ðd2ÞN ð N mjh  h0 jÞ ½4 smooth only on the much smaller scale m1  N . Their
essential feature is that fixed " < 1, for example,
with  (d2)N interpreted as N if d = 2.
" = 1=2, with PN -probability 1 there is B > 0 such
The
that (interpreting  (d2)=2N as N if d = 2)
ZN ð; f Þ  
ðf Þ ¼ log  ðNÞ 
ZN ð; 0Þ ’x   B Nðd2Þ=2
   "=2 ½6
defines a ‘‘generating function’’ of a probability  ðNÞ ðNÞ 
’x  ’h  < B Nðd2Þ=2  N mjx  hj
distribution Pint over the fields on  which will be
called the ‘‘distribution with ’4 -interaction’’ regu- and furthermore the probability of the relations in
larized on  and at length scale m1  N : the [6] will be N-independent, that is, ’(N) are
x
integral, in [1], bounded and roughly of size  N(d2)=2 as N ! 1
  Z  and, on a very small length scale m1  N , almost
def ðNÞ4 ðNÞ2
VN ’ðNÞ ¼  N ’ x þ N ’x constant.

 Substantial control on the field ’(N) x
statisti-
ðNÞ
þ N þ f x ’ x dd x ½5 cally sampled with distribution PN can be obtained
by decomposing it, through [3], into ‘‘components
will be called the ‘‘interaction potential’’ with of various scales’’: that is, as a sum of statistically
external field f. The regularizationR is introduced to mutually independent fields whose properties
guarantee that the integral [1], eVN dPN , is well are entirely characterized by a single scale of length.
defined if N > 0. The momenta of Pint are the This means that they have size of order 1 and
functional derivatives of (f ): they are called are independent and smooth on the same length
‘‘Schwinger functions.’’ scale.
The problem (1) can now be made precise: it is to Assuming the side of  to be an integer multiple
show the existence of N , N , N so that the limit of m1 , let Qh be a pavement of  into boxes of
side m1  h , imagined hierarchically arranged so
ZN ð; f Þ that the boxes of Qh are exactly paved by those of
lim
N!1 ZN ð; 0Þ Qhþ1 .
Define z(h) to be the random field with propa-
exists for all f and is not Gaussian, that is, it is not x
gator C(h) with Fourier transform
the exponential of a quadratic form in f: which x, h

would be the case if N , N ! 0 fast enough: the last X 1 1



h

requirement is of course essential because the 2 þ  2 m2


 2 2
einp L
d
p p þ m
n2Z
Gaussian case describes, in the physical interpreta-
tion, free fields and noninteracting particles, that is, so that ’(N)
x
and its propagator C(N)
x, h
can be repre-
it is trivial. Note that N does not play a role: its sented, see [2], [3], as
introduction is useful to be able to study separately
the numerator and the denominator of the fraction ðNÞ
X
N
ðhÞ
’x   hðd2Þ=2 zh x
ZN ð; f Þ h¼1
½7
ZN ð; 0Þ ðNÞ
X
N
ðhÞ
hðd2Þ
Cx;h ¼  Ch x;h h
For more details, the reader is referred to Wightman h¼1
and Gärding (1965), Streater and Wightman (1964), where the fields z(h) are independently distributed
Nelson (1966), Guerra (1972), Osterwalder and Gaussian fields. Note that the fields z(h) are also
Schrader (1973), and Simon (1974). almost identically distributed because their propa-
gator is obtained by periodizing over the period  h L
the same function
The Regularized Free Field Z ipðxhÞ  
ð0Þ def e dp 1 1
Since the propagator, see [4], decays exponentially Cx;h ¼ 
over a scale m1 and is smooth over a scale m1  N , ð2Þd p2 þ  2 m2 p2 þ m2
Constructive Quantum Field Theory 619

that is, their propagator is Perturbation Theory


X ð0Þ
ðhÞ
Cx;h ¼ Cx;hþh nL The naive approach to the problem is to fix N 
n2Zd  > 0 and to develop ZN (, f ) or, more conveniently
and equivalently, (1=jj) log ZN (, f ) in powers of .
The reason why they are not exactly equally If one fixes a priori N , N independent of N,
distributed is that the field z(h) x
is periodic with however, even a formal power series is not possible:
period  h L rather than L. But proceeding with care this is trivially due to the divergence of the
the sum over n in the above expressions can be coefficients of the power series, already to second
essentially ignored: this is a little price to pay if one order, for generic f in the limit N ! 1. Nevertheless
wants translation invariance built in the analysis it is possible to determine N (), N () as functions
since the beginning. of N and  so that a formal power series exists (to
The representation [7] defines a ‘‘multiscale all orders in ): this is the key result of renormaliza-
representation’’ of the field ’(N) x
. Smoothness tion theory.
properties for the field ’(N) x
can be read from To find the perturbative expansion, the simplest is
those of its ‘‘components’’ z(h) . Define, for  2 Q0 , to use a graphical representation of the coefficients of
0  1 the power expansion in , N , N , f and the Gaussian
     ðhÞ ðhÞ 
 ðhÞ   ðhÞ   z x
 z h  integration rules which yield (after a classical
z  ¼ max @zx  þ A ½8
 x2;h2 jx  hj
1=4 computation) that the coefficient of n pN fx1 . . . fxr is
1
jxhjm obtained by considering the graph elements shown in
and will be chosen = 0 or = 1 as needed (in Figure 1, where the segments will be called half-lines
practice = 0 if d = 2 and = 1 if d = 3): = 1 will and the graph elements will be called, respectively,
allow us to discuss some smoothness properties of ‘‘coupling’’ or ‘‘’4 -vertex,’’ ‘‘mass vertex,’’ ‘‘vacuum
the fields which will be necessary (e.g., if d = 3). vertex,’’ and ‘‘external vertex.’’
Then the size jjzjj of any field z(h) , for all h  1, is The half-lines of the graph elements are consid-
estimated by ered distinct (i.e., imagine a label attached to
  distinguish them). Then consider all possible con-
c0 B2
P max jjzjj  B  ece jj nected graphs G obtained by first drawing, respec-
Q0 tively, n, p, r graph elements in Figure 1, which are
Y 0 2 ½9
Pðjjzjj  B ; 8 2 DÞ  cec B not vacuum vertices, with their nodes marked by
2D points in  named x1 , . . . , xn , xnþ1 , . . . , xnþpþr ; and
form all possible graphs obtained by attaching pairs
where P is the Gaussian probability distribution of
of half-lines emerging from the vertices of the graph
z, D is any collection of boxes  2 Q0 and c, c0 > 0
elements. These are the ‘‘nontrivial graphs.’’
are suitable constants. The [9] imply in particular
Furthermore, consider also the single ‘‘trivial’’
[6]. The estimates [9] follow from the Markovian
graph formed just by the third graph element and
nature of the Gaussian field z(h) , that is, from the
consisting of a single point. All graphs obtained in
fact that the propagator is the Green’s function of an
this way are particular Feynman graphs.
elliptic operator (of fourth order, see the first of [3]),
Given a nontrivial graph G (there are many of
with constant coefficients which implies also the
them) we define its value to be the product
inequalities (fixing " 2 (0, 1))
  Z  WG ðx1 ; . . . ; xn ; xnþ1 ; . . . ; xnþpþr Þ
 ðhÞ    0
Cx;h    zx zh PðdzÞ  cemc jxhj Q
½10 n pN fxnþpþj Y ðNÞ
  ¼ ð1Þ nþpþr
Cx ;h ½11
 ðhÞ ðhÞ  " n!p!r!
Cx;h  Cx;h0   cðmjh  h0 jÞ ‘
‘ ‘

where jx  hj is reinterpreted as the distance where the last product runs over all pairs ‘ = (x‘ , h‘ )
between x, h measured over the periodic box  h  of half-lines of G that are joined and connect two
(hence jx  hj differs from the ordinary distance vertices labeled by points x‘ , h‘ : ‘‘call line of G’’ any
only if the latter is of the order of  h L). The such pair. If the graph consists of the single vacuum
interpretation of [10] is that z(h)x
are essentially
bounded variables which, on scale m1 , are
essentially constant and furthermore beyond length
m1 are essentially independently distributed. ξ ξ ξ ξ
For more details, the reader is referred to Wilson Figure 1 The graph elements to representing ’(N)4

, ’(N)2 ,
(1970, 1972) and Gallavotti (1981, 1985). a constant ’(N) .
620 Constructive Quantum Field Theory

R
vertex its value will be N . The series for C(N)
ax
C(N)3
xh
(C(N)
hb
 C(N)
xb
) dh. If d = 2, we only
(1=jj) log ZN (, f ) is then need to define N as the first term on the right-hand
Z side (RHS) of [14] and we can leave the subgraphs like
nþpþr
1 X Y the second in Figure 2 as they are (without any
N þ WG ðx1 ; . . . ; xnþpþr Þ dxj ½12
jj G j¼1
renormalization).
Graphs without external lines are called vacuum
and the integral will be called the integrated graph graphs and there are a few such graphs which are
value. divergent. Namely, if d = 3, they are the first three
Suppose first that N = N = 0. Then if a graph G drawn in Figure 3; furthermore, if N is set to the
contains subgraphs like in Figure 2, the correspond- above nonzero value a new vacuum graph, the
ing respective contribution to the integral in [12] fourth in Figure 3, can be formed. Such graphs
(considering only the integrals over h and suitably contribute to the graph value, respectively, the terms
taking care of the combinatorial factors) is a factor in the sum
obtained by integrating over x the quantities Z
ðNÞ2 4! ðNÞ4 23  3!3 3
3Cx ;x þ 2 Cx x dx2  
ðNÞ
6Cax Cxx Cxb
ðNÞ ðNÞ 1 1 2 1 2 3!
Z
Z ðNÞ2 ðNÞ2 ðNÞ2 ðNÞ
42  3! 2 ðNÞ ½13
Cx x Cx x Cx x dx2 dx3  N Cx x ½15
ðNÞ3 ðNÞ 1 2 2 3 3 1 1 1
or  Cax Cxh Chb dh
2!
and diverge, respectively, as  2N ,  N , N,  2N if d = 3
which if d = 3 diverge as N ! 1 as  or, respec- N while, if d = 2, only the first and the last (see [14])
tively, as N; the second factor does not diverge in diverge, like N 2 .
dimension d = 2 while the first still diverges as N. The Therefore, if we fix N as minus the quantity in
divergences arise from the fact that as x  h ! 0 the [15] we can disregard graphs like those in Figure 3;
propagator behaves as jx  hjN if d = 3 or as if d = 2 N can be defined to be the sum of the first
log jx  hj if d = 2, all the way until saturation and last terms in [15].
occurs at distance jx  hj ’ m1  N : for this reason The formal series in  and f thus obtained is called
the latter divergences are called ‘‘ultraviolet the ‘‘renormalized series’’ for the field ’4 in
divergences.’’ dimension d = 2 or, respectively, d = 3. Note that
However, if we set N 6¼ 0, then for every graph with the given definitions and choices of N , N the
containing a subgraph like those in Figure 2 there only graphs G that need to be considered to
is another one identical except that the points construct the expansion in  and f are formed by
a, b are connected via a mass vertex, see Figure 1, the first and last graph elements in Figure 1, paying
with the vertex in x, by a line ax and a line xb; attention that the graphs in Figure 3 do not
the new graph value receives a contribution from contribute and, if d = 3, the graphs with subgraphs
the mass vertex inserted in x between a and b like the second in Figure 2 have to be computed with
simply given by a factor N . Therefore if we fix, the modification described.
for d = 3, In the next section, it will be shown that the
above are the only sources of divergences as N ! 1
ðNÞ 42  3! 2 and therefore the problem of studying [1] is solved
N ¼ 6Cxx þ 
2 at the level of formal power series by the subtraction
Z
ðNÞ3 def ðNÞ in [14]. This also shows that giving a meaning to the

Cxh dh ¼  6Cxx þ N ½14 series thus obtained is likely to be much easier if

d = 2 than if d = 3.
we can simply consider graphs which do not contain The coefficients of order k of the expansion in 
any mass graph element and in which there are no of (1=jj) log ZN (, f ) can be ordered by the number
subgraphs like the first in Figure 2 while the subgraphs 2n of vertices
R representing Q external fields: and have
2n
Rlike(N)
the second in Figure 2 do not contribute a factor the form S(k) (
2n 1 x , . . . , x 2n ) i = 1 (fxi dx i ): the kernels
Cax C(N)3
xh
(N)
Chb dh but a renormalized factor (k)
S2n are the Schwinger functions of order 2n, see the
section ‘‘Euclidean quantum fields.’’

ξ1
α ξ β α ξ η β ξ1
ξ1 ξ2 ξ ξ2 ξ1
3
Figure 2 Divergent subgraphs, if d = 3. If d = 2 only the first
diverges. Figure 3 Divergent vacuum graphs.
Constructive Quantum Field Theory 621

Remark If d = 4, the regularization at cutoff N in The distinctions between the cases d = 2, 3, 4, >4
[2] is not sufficient as in the subtraction procedure explain the terminology given to the ’4 -scalar field
smoothness of the first derivatives of the field theories calling them super-renormalizable if
’(N) is necessary, while the regularization [2] does d = 2, 3, renormalizable if d = 4 and nonrenormaliz-
not even imply [6], that is, not even Hölder able if d > 4. Since the (divergent) coefficients in the
continuity. A higher regularization (i.e., using a formal power series defining N , N , N , N are
N like the square of the N in [3]). Furthermore, called counter-terms, the ’4 -scalar fields require
the subtractions discussed in the case d = 3 are not finitely many counter-terms (see [14]) in the super-
sufficient to generate a formal power series and renormalizable cases and infinitely many in the
many more subtractions are needed: for instance, renormalizable case. The nonrenormalizable cases
graphs with a subgraph like the one in Figure 4 (d > 4) cannot be treated in a way analogous to the
would give a contribution to the graph value which renormalizable ones.
is a factor For more details, the reader is referred
2 Z to Gallavotti (1985), Aizenman (1982), and
2 def 2  6 2 ðNÞ2
 ‘N ¼  Cxh dh Fröhlich (1982).
2! 

also divergent as N ! 1 proportionally to N.


Although this divergence could be canceled by Finiteness of the Renormalized Series,
changing  into N =  þ 2 ‘N the previously dis- d = 2, 3: ‘‘Power Counting’’
cussed cancelations would be affected and a change
in the value of N would become necessary; Checking that the renormalized series is well defined
furthermore, the subtraction in [14] will not be to all orders is a simple dimensional estimate
sufficient to make finite the graphs, not even to characteristic of many multiscale arguments that in
second order in , unless a new term physics have become familiar with the name of
R R
 N (@x ’(N)
x
) 2
dx with N = (1=2) 2
@ h C (N)3
xh
‘‘renormalization group arguments.’’
(x  h)2 is added in the exponential in [1]. Consider a graph G with n þ r vertices built over n
But all this will not be enough and still new graph elements with vertices x1 , . . . , xn each with four
divergences, proportional to 3 , will appear. half-lines and r graph elements with vertices
xnþ1 , . . . , xnþr representing the external fields: as
And so on indefinitely, the consequence being that remarked in the previous section, these are the only
it will be necessary to define N , N , N , N as graphs to be considered to form the renormalized series.
formal power series in  (with coefficients diverging Develop each propagator into a sum of propaga-
as N ! 1) in order to obtain a formal power series tors as in [7]. The graph G value will, as a
in  for [1] in which all coefficients have a finite consequence, be represented as a sum of values of
limit as N ! 1. Thus, the interpretation of the new graphs obtained from G by adding scale labels
formal renormalized series in the case d = 4 is on its lines and the value of the graph will
substantially different and naturally harder than be computed as a product of factors in which a
the cases d = 2, 3. Beyond formal perturbation line joining xh and bearing a scale label h
expansions, the case d = 4 is still an open problem: will contribute with C(h) replacing C(N) . To avoid
xh xh
the most widespread conjecture is that the series proliferation of symbols, we shall call the
cannot be given a meaning other than setting to 0 all graphs obtained in this way, i.e., with the scale
coefficients of j , j > 0. In other words, the con- labels attached to each line, still G: no confusion
jecture claims, there should be no nontrivial solution should arise as we shall, henceforth, only consider
to the ultraviolet problem for scalar ’4 fields in graphs G with each line carrying also a scale label.
d = 4. But this is far from being proved, even at a The scale labels added on the lines of the graph G
heuristic level. The situation is simpler if d  5: in allow us to organize the vertices of G into
such cases, it is impossible to find formal power ‘‘clusters’’: a cluster of scale h consists in a maximal
series in  for (1=jj) log ZN (, f ), even allowing set of vertices (of the graph elements in the graph)
N , N , N , N to be formal power series in  with connected by lines of scale h0  h among which one
divergent coefficients. at least has scale h.
It is convenient to consider the vertices of the
α β
graph elements as ‘‘trivial’’ clusters of highest scale:
conventionally call them clusters of scale N þ 1.
δ ξ η γ The clusters can be of ‘‘first generation’’ if they
Figure 4 The simplest new divergent subgraph on d = 4. contain only trivial clusters, of ‘‘second generation’’
622 Constructive Quantum Field Theory

if they contain only clusters which are trivial or of k=0


the first generation, and so on. h
Imagine to enclose in a box the vertices of graph p q
m
f t
elements inside a cluster of the first generation and
then into a larger box the vertices of the clusters of 1 2 3 4 5 6 7 8 9
the second generation and so on: the set of boxes Figure 7 The clusters in Figure 6 after affixing the scale labels.
ordered by inclusion can then be represented by a
rooted tree graph whose nodes correspond to the
clusters and whose ‘‘top points’’ are nodes represent- instance, in the case of Figure 6 one gets Figure 7.
ing the trivial clusters (i.e., the vertices of the graph). By construction, if two top points x and h are inside
If the maximum number of nodes that have to be the same box bv of scale hv but not in inner boxes,
crossed to reach a top point of the tree starting from then there is a path of graph lines joining x and h
a node v is nv (v included and the top nodes all of which have scales  hv and one at least has
included), then the node v represents a cluster of the scale hv .
nv th generation. The first node before the root is a Given a graph G, fix one of its points x1 (say) and
cluster containing all vertices of G and the root of integrate the absolute value of the graph over the
the tree will not be considered a node and it can positions of the remaining points. The exponential
conventionally bear the scale label 0: it represents decay of the propagators implies that if a point h is
symbolically the value of the graph. linked to a point h0 by a line of scale h the
For instance, in Figure 5 a tree is drawn: its integration over the position of h0 is essentially
nodes correspond to clusters whose scale is indicated constrained to extend only over a distance  h m1 .
next to them; in the second part of the drawing, the Furthermore, the maximum size of the propagator
trivial clusters as well as the clusters of the first associated with a line of scale h is bounded
generation are enclosed into boxes. proportionally to  (d2)h . Therefore, recalling that
Then consider the next generation clusters, that is, jfx j is suppose bounded by 1, the mentioned integral
the clusters which only contain clusters of the first can be immediately bounded by
generation or trivial ones, and draw boxes enclosing n nþr def n Cnþr Y ðd2Þ=2h‘ Y dhv ðsv 1Þ
all the graph vertices that can be reached from each C I¼   ½16
n!r! n!r! ‘ v
of them by descending the tree, etc. Figure 6
represents all boxes (of any generation) correspond- where, C being a suitable constant, the first product
ing to the nodes of the tree in Figure 5. The is over the half-lines ‘ composing the graph lines and
representations of the clusters of a graph G by a tree the second is over the tree nodes (i.e., over the
or by hierarchically ordered boxes (see Figures 5 and clusters of the graph G), sv is the number of
6) are completely equivalent provided inside each subclusters contained in the cluster v but not in
box not representing a top point of the tree the scale inner clusters; and in [16] the scale of a half-line ‘ is
hv of the corresponding cluster v is marked. For h‘ if ‘ is paired with another half-line to form a line
‘ (in the graph G) of scale label h‘ .
Denoting by v0 the cluster immediately containing
1
ξ1 v in G, by ninner
v the number of half-lines in the
ξ2 2
ξ3
cluster v, by nv , rv the numbers of graph elements of
3
ξ4 4
the first type or of the fourth type in Figure 1 with
p q m vertices in the cluster v, and denoting by nev the
ξ5 leads to 5
k=0 h ξ6 6 number of lines which are not in the cluster v but
f ξ7 7 have one extreme on a vertex in v (‘‘lines external to
t ξ8 v’’), the identities (k = 0)
8
ξ9 X
9
ðhv  kÞðsv  1Þ
Figure 5 A tree and its clusters of generation 1 and 2. v>root
X
 ðhv  hv0 Þðnv þ rv  1Þ
v>root
X X
ðhv  kÞninner  ninner
ðhv  hv0 Þe ½17
v v
v>root v>root
with
1 2 3 4 5 6 7 8 9
def
Figure 6 All clusters of any generation for the tree in Figure 5. ninner
e v ¼ 4nv þ rv  nev
Constructive Quantum Field Theory 623

hold, so that the estimate [16] can be elaborated into For more details, the reader is referred to Hepp
Y (1966), Gallavotti (1985), sections 8 and 16.
I  v ðhv hv0 Þ
v>r
½18
def dþ2 d2 e
v ¼ d þ ð4  dÞnv þ rv þ nv Asymptotic Freedom (d = 2, 3).
2 2
Heuristic Analysis
where hv0 = k = 0 if v is the first nontrivial node (i.e.,
v0 = root), and an estimate of the integral of the Finiteness to all orders of the perturbation expan-
absolute value of the graphs G with given tree sions is by no means sufficient to prove the existence
structure but different scale labels is proportional to of the ultraviolet limit for ZN (, f ) or for (1=jj)
{hv } I < 1 if (and only if) v > 0, 8v. log ZN (, f ): and a priori it might not even be
But there may be clusters v with only two necessary. For this purpose, the first step is to check
external lines nev = 2 and two graph vertices inside: uniform (upper and lower) boundedness of ZN (, f )
for which v = 0. However, this can happen only if as N ! 1.
d = 3 and in only one case: namely if the graph G The reason behind the validity of a bound
contains a subgraph of the second type in Figure 2 ejjE (, f )  ZN (, f )  ejjEþ (, f ) with E (, f ) cutoff
and the three intermediate lines form a cluster v of independent has been made very clear after the
scale hv while the other two lines are external to it: introduction of the renormalization group methods
hence on scale h0 > hv . In this case, one has to in field theory. The approach studies the integral
remember that the subtraction in the previous section ZN (, f ), recursively, decomposing the field ’(N) x
has led to a modification of the contribution of such a into its regular components z(h) x
, see [7], and
subgraph to the value of the graph (integrated over integrating first over z(N) , then over z(N1) and so on.
the position labels of the vertices). As discussed in the The idea emerges naturally if the potential VN in
previous section, the change amounts to replacing the [1] and [4] is written in terms of the ‘‘normalized’’
def
variables Xx(N) ¼  N(d2)=2 ’(N)
0
(h0 ) (h0 )
propagator C(h h, b
)
by C h, b
 C x, b
. x
, see [6]; here if d = 2
(d2)=2N
This improves, in [18], the estimate of the contribu- the factor  is interpreted as N1=2 .
tion Rof the line joining h to b from being proportional The key remark is that as far as the integration
(hv )3 (h0 ) over the small-scale component z(N) is concerned the
to Cxh Chb dh to being proportional to
R (hv )3 (h0 ) 0 field X(N) is a sum of two fields of size of order 1
Cxh (Chb  C(h )
) dh; and this changes the con- x
(statistically),
xb
0 R hv
tribution of the line hb from  (d2)h to em jxhj
h0 1=2 (h0 ) ðNÞ ðNÞ ðN1Þ
( jx  hj) dh because C is regular on scale Xx  zN x þ  ðd2Þ=2 Xx
0
 h m1 , see [10] with " = 1=2.
Since x, h are in a cluster of higher scale hv this if d = 2 this becomes
0
means that the estimate is improved by  (1=2)(hv h ) .
ðNÞ 1 ðNÞ ðN  1Þ1=2 ðN1Þ
In terms of the final estimate, this means that v in Xx  zN þ Xx
[18] can be improved to v = v þ 1=2 for the N 1=2  x N 1=2
clusters for which v = 0. Hence, the integrated and it can be considered to be smooth on scale m1  N
value of the graph G (after taking also into account (also statistically). Hence, approximately constant
the integration over the initially selected vertex x1 , and of size of order O(1) on the small cubes  of
trivially giving a further factor jj by translation volume  dN md of the pavement QN introduced
invariance), and summed over the possible scale before [7]; at the same time it can be considered to
labels is bounded proportionally to jj{hv } I < 1 take (statistically) independent values on different cubes
once the estimate of I is improved as described. of QN . This is suggested by the inequalities [8]–[10].
Note that the graphs contributing to the perturbation Therefore, it is natural to decompose the potential
series for (1=jj) log ZN (, f ) to order n are finitely VN , see [5], as a sum over the small cubes  of volume
many because the number r of external vertices is r   dN md of the pavement QN as (see [14] for the
2n þ 2 (since graphs must be connected). Hence, the definition of N , N ), taking henceforth m = 1,
perturbation series is finite to all orders in . X Z 
ðNÞ def ðNÞ 4
The above is the renormalizability proof of the VN ðz Þ ¼   Nd
 2ðd2ÞN Xx
scalar ’4 -fields in dimension d = 2, 3. The theory is 2QN 

renormalizable even if d = 4 as mentioned in the þ N  ðd2ÞN Xx


ðNÞ 2

remark at the end of the previous section. The  dx


analysis would be very similar to the above: it is just þ N þ fx  ðd2Þ=2N Xx
ðNÞ
½19
a little more involved power-counting argument. jj
624 Constructive Quantum Field Theory

where  (d2)N is interpreted as N if d = 2. Hence, if divergent when the fields were not properly scaled,
d = 3 it is are in fact of the same order or much smaller than
the main ’4 -term.
VN ðzðNÞ Þ Therefore, the integration over z(N) can be, heur-
X Z 
def N ðNÞ 4 ðNÞ 2 istically, performed by techniques well established
¼  Xx þ N Xx
2QN  in statistical mechanics (i.e., by straightforward
 dx perturbation expansions): at least if the field
X(N1)
3 ðNÞ
þ  N þ fx  2N Xx ½20 x
is smooth and bounded, as prescribed
jj by [6], with B = BN1 growing as a power of N.
where In this case, denoting symbolically the integration
over z(N) by P or by h. . .i, it can be expected that it
def should give
N ¼ ð6cN þ 2 N N c0N Þ;
def Z  
 N ¼ 3c2N þ 2  N bN þ 3 N 2N b0N eVN dP zðNÞ  eVj;N1 þRðj;NÞjj ½22
and cN , c0N , bN , b0N , computable from [15] and [14],
admit a limit as N ! 1. While if d = 2 it is where
R Vj; N1 is the Taylor expansion of
log eVN dP(z(N) ) in powers of  (hence essentially
VN ðzðNÞ Þ in the very small parameter  (4d)N ) truncated at
X Z  order j, that is,
def 2 2N ðNÞ 4 ðNÞ 2
¼ N  Xx þ N Xx
2QN  V1;N1 ¼ ½hVN i1
 dx " #2
32 ðNÞ
2
ðhVN i  hVN i2 Þ
þ  N þ fx N Xx ½21 V2;N1 ¼ hVN i þ
jj 2!
"
def 2
ðhVN i  hVN i2 Þ
where N = 6cN and  N = 3c2N and cN , compu- V3;N1 ¼ hVN i þ
2!
table from [13], admits a limit as N ! 1.  #
The fields z(N) and X(N1) can be considered 2
hVN ðhVN i  hVN i2 Þi  hVN ðhVN
2
i  hVN i2 Þ 3
þ ; ...
constant over boxes  2 QN : z(N) x
= s , X(N1)
x
= x 3!
for x 2  and the s can be considered statistically ½23
independent on the scale of the lattice QN .
j
Therefore, [20] and [21] show that integration over where [] denotes truncation to order j in ,
z(N) in the integral defining ZN (, f ) is not too and R(j, N) is a remainder (depending on ’(N1)
x
)
different from the computation of a partition func- which can be expected to be estimated, for d = 2, 3, by
tion of a lattice continuous spin model in which the
‘‘spins’’ are s and, most important, interact extre- jRðj; NÞj  Rðj; NÞ
mely weakly if N is large. In fact, the coupling def 4j
¼ Cj BN ð N 2  ð4dÞN Þjþ1  dN ½24
constants are of order of a power of jX(N1) j times
O( N ) if d = 3 (O(N 2  2N ) if d = 2), or of order for suitable constants Cj , that is, a remainder
O( N(dþ2)=2 max jfx j), no matter how large  and f. estimated by the (j þ 1)th power of the coupling
This says that the smallest scale fields are times the number of boxes of scale N in . The
extremely weakly coupled. The fields X(N1) can be relations [22]–[24] resultR from a naive Taylor
regarded as external fields of size that will be called expansion (in  of the log eVN dP(z(N) ), taking into
BN1 , of order 1 or even allowed to grow with a account that, in VN as a function of z(N) , the z(N) ’s
power of N, see [6]. Their presence in VN does not appear multiplied by quantities at most of size
affect the size of the couplings, as far as the analysis  4d N 2 B3N , by [20] and [21] if jX(N1) j  BN1 ).
of the integral over z(N) is concerned, because the In a statistical mechanics model for a lattice spin
couplings remain exponentially small in N, see [20] system, such a calculation of ZN would lead to a
and [21], being at worst multiplied by a power of mean-field equation of state once the remainder was
BN1 , i.e., changed by a factor which is a power of N. neglected.
The smallness of the coupling at small scale is a The peculiarity of field theory is that a relation like
property called ‘‘asymptotic freedom.’’ Once fields [22] and [24] has to be applied again to Vj; N1 to
and coordinates are ‘‘correctly scaled,’’ the real size perform the integration over z(N1) and define Vj; N2
of the coupling becomes manifest, that is, it is and, then, again to Vj; N2 . . . . Therefore, it will be
extremely small and the addends in VN proportional essential to perform the integral in [22] to an order
to the ‘‘counter-terms’’ N , N , which looked (in ) high enough so that the bound R(j, N) can be
Constructive Quantum Field Theory 625

summed over N: this requires (see [24]) an explicit The relevant part in d = 2 is simply of the form
calculation of [23] pushed at least to order j = 1 if [21] with h replacing N: call it Vh(rel, 1) . If d = 3, it is
d = 2 or to order j = 3 if d = 3; furthermore it is also given by [20] with h replacing N plus, for h < N, a
necessary to check that the resulting Vj; N1 can still second ‘‘nonlocal’’ term
be interpreted as low-coupling spin model so that Z
2 
[22] can be iterated with N  1 replacing N and then ðrel;2Þ def 4 3! 2 ðhÞ 3 ðNÞ 3
Vh ¼  Chh0  Chh0
with N  2 replacing N  1, . . . . 2! 2!
The first necessary check towards a proof of the  2
ðhÞ ðhÞ

’h  ’h0 dhdh0
discussed heuristic ‘‘expectations’’ is that, defining
recursively Vj; h from Vj, hþ1 for h = N  1, . . . , 1, 0
which is conveniently expressed in terms of a
by [23] with VN replaced by Vj; hþ1 and Vj; N1
‘‘nonlocal’’ field
replaced by Vj; h , the couplings between the variables
z(h) do not become ‘‘worse’’ than those discussed in ðhÞ ðhÞ
’h  ’h0
the case h = N. Furthermore, the field ’(N1)
x
has a ðhÞ def
Yhh0 ¼ 1
high probability of satisfying [6], but fluctuations ð h jh  h0 jÞ4
are possible: hence the R-estimate has to be ðrelÞ ðrel;1Þ ðrel;2Þ
combined with another one dealing with the large as Vh ¼ Vh þ Vh with
fluctuations of X(N1)
x
which has to be shown to be ðrel;2Þ def
X Z ðhÞ2 ðhÞ
‘‘not worse.’’ Vh ¼ 2  2h Yhh0 Ahh0
;0 2Q 
0
For more details, the reader is referred to Gallavotti h

(1978, 1985) and Benfatto and Gallavotti (1995). 0 h 0 dhdh0



ec  jhh j
½25
jj j0 j
where
Effective Potentials and Their
ðhÞ
!
Scale (In)Dependence Ahh0
0<a < a0
To analyze the first problem mentioned at the end of ð h jh  h0 jÞ3ð1=2Þ N
the previous section, define Vj; h by [23] with VN
0 0
replaced by Vj; hþ1 for h = N  1, N  2, . . . , 0. The with a, a , c > 0 and the subscript N means that the
quantities Vj; h , which are called ‘‘effective poten- expression in parenthesis ‘‘saturates at scale N’’, i.e.,
tials’’ on scale h (and order j), turn out to be in a its denoninator becomes  (3(1=2))(hN) as jh  h0 j ! 0.
natural sense scale independent: this is a conse- The expression [25] is not the full part of the
quence of renormalizability, realized by Wilson as a potential Vj; h which is of second order in the fields:
much more general property which can be checked, there are several other contributions which are
in the very special cases considered here with collected below as ‘‘irrelevant.’’
d = 2, 3, at fixed j by induction, and in the super- It should be stressed that ‘‘irrelevant’’ is a
renormalizable models considered here it requires traditional technical term: by no means it should
only an elementary computation of a few Gaussian suggest ‘‘negligibility.’’ On the contrary, it could be
integrals as the case j = 3 (or even j = 1 if d = 2) is maintained that the whole purpose of the theory is
already sufficient for our purposes. to study the irrelevant terms. The irrelevant part of
It can also be (more easily) proved for general j by the potential can be better designated as the ‘‘driven
a dimensional argument parallel to the one pre- part,’’ as its behavior is ‘‘controlled’’ by the relevant
sented earlier to check finiteness of the renormalized part: although initially Vj; h , h = N, contains
series. The derivation is elementary but it should be no irrelevant terms, it eventually contains them for
stressed that, again, it is possible only because of the h < N and they keep getting generated as h
special choice of the counter-terms N , N . If d = 3, diminishes. Furthermore, the part of the irrelevant
the boundedness and smoothness of the fields ’( h) terms generated at scale h0  N becomes very small
and z(h) expressed by the second of [6] and of [10] is at scales h h0 so that the irrelevant part of Vj; h at
essential; while if d = 2 the smoothness is not small h (e.g., at h = 0, i.e., on the ‘‘physical scale’’ of
necessary. the observer) only depend on the relevant terms in a
The structure of Vj; h is conveniently expressed few scales near h.
in terms of the fields X(h) x
, as a sum of three terms It also turns out that the Schwinger functions are
Vh(rel) (standing for ‘‘relevant’’ part), Vh(irr) (standing simply related to the irrelevant terms.
for ‘‘irrelevant’’ part), and a ‘‘field independent’’ The irrelevant part of the effective potential can
part E(j, h)jj. be expressed as a finite sum of integrals of
626 Constructive Quantum Field Theory

monomials in the fields X(h)


x
if d = 2, or in the fields Remarks
X(h)
x
(h)
and Yhh 0 if d = 3, which can be written as V
(irr)
j; h (i) Checking scale R independence for j = 1 is just
given by
checking that P(dz(h) )V1; h = V1; h1 . Note that
Z Y
p q
Y  Z  
ðhÞ nk ðhÞ n0 h 0
c dðx1 ;...;h0q Þ n ht def ðhÞ4 ðhÞ ðhÞ2 ðhÞ2
Xx Yh 0 h0k0 e   V1;h ¼  ’x  6C00 ’x þ 3C00 dx
k k k0 
k¼1 k0 ¼1
p
Y dxk
q
Ydhk0 dh0k0 hence, calling :’x(h)4 : the polynomial in the integral

Wðx1 ; . . . ; h0q Þ ½26 (Wick’s monomial of order 4), we have here an
jk j k0 ¼1 j1k0 j j2k0 j
k¼1 elementary Gaussian integral (‘‘martingale property
of Wick monomials’’). Note the essential role of the
with the integral extended to products 1
  

counter-terms. For j > 1, the computation is similar


p
  
(1q
2q ) of boxes  2 Qh , and
but it involves higher-order polynomials (up to 4j)
d(x1 , . . . , h0q ) is the length of the shortest tree
and the distinction between d = 2 and d = 3
graph that connects all the p þ 2q > 0 points, the
becomes important.
exponents n, t are  2, and t is  3 if q > 0;
(ii) Vj; 0 contains only the field-independent part
the kernel W depends on all coordinates
Qq x1 , . . . , h0q
E(j, 0)jj (see above) which is just a number (as
and it is bounded P abovePby Cj k0 = 1 Ahk0 hk0 for some
0
there are no fields of scale 0): by the above
Cj ; the sums nk þ n0k0 cannot exceed 4j. The
definitions, it is identical to the perturbative
test functions f do not appear in [26] because by
expansion truncated to jth order in  of
assumption they are bounded by 1: but W depends
log ZN (, f ), well defined as discussed earlier.
on the f ’s as well.
The field-independent part is simply the value
of log ZN (, f ) computed by the perturbation Nonperturbative Renormalization:
analysis in the section ‘‘Perturbation theory’’ up to Small Fields
order j in  but using as propagator (C(N)  C(h) ):
thus, E(j, h) is a constant depending on N but Having introduced the notion of effective potential
uniformly bounded as N ! 1 (because of the Vj; h , of order j and scale h, satisfying the bounds
renormalizability proved in the section ‘‘Perturba- (described after [26]) on the kernels W representing
tion theory’’). it, the problem is to estimate the remainder in [22]
If d = 2, there is no need to introduce the nonlocal and find its relation with the value [24] given by the
fields Y (h) and in [26] one can simply take q = 0, heuristic Taylor expansion. Assume  < 1 to avoid
and the relevant part also can be expressed by distinguishing this case from that with   1 which
omitting the term Vh(rel, 2) in [25]: unlike the d = 3 would lead to very similar estimates but to different
case, the estimate on the kernels W by an -dependence on some constants.
N-independent Cj holds uniformly in h without Define B (z(h) ) = 1 if kz(h) k  Bh2 for all  2 Qh ,
having to introduce Y. For d = 2, it will therefore be see [8], and 0 otherwise; then the following lemma
supposed that Vh(rel, 2)  0 in [25] and q = 0 in [26]. holds:
It is not necessary to have more information on Lemma 1 Let kX(h) k be defined as [8] with z
the structure of Vj; h even though one can find simple replaced by X and suppose kX(h) k  Bh4 for all 
graphical rules, closely related to the ones in the then, for all j  1, it is
section ‘‘Perturbation theory,’’ to construct the Z
coefficients W in full detail. The W depend, of eVj;hþ1 B ðzðhþ1Þ ÞdPðzðhþ1Þ Þ
course, on h but the uniformity of the bound on W
0
is the only relevant property and in this sense the ¼ eVj;h þR ðj;hþ1Þjj ½27
effective potentials are said to be (almost) ‘‘scale
independent.’’ with, for suitable constants c , c0 ,
The above bounds on the irrelevant part can jR0 ðj; h þ 1Þj  R ðj; h þ 1Þ
be checked by an elementary direct computation if
def 0 2
ðhþ1Þ2
j  3: in spite of its ‘‘elementary character,’’ the ¼ Rðj; h þ 1Þ þ c ec B
uniformity in h  N is a result ultimately playing an
and R(j; h þ 1) given by [24] with h þ 1 in place
essential role in the theory together with the
of N.
dominance of the relevant part over the irrelevant R Q
one which, once the fields are properly scaled, is Since ZN (, f )  eVN N (h) (h)
h = 1 B (z )P(dz ) this
‘‘much smaller’’ (by a factor of order  h , see [26]), immediately gives a lower bound on E = (1=jj)
0
at least if h is large. log ZN (, f ): in fact if B (kz(h ) k) = 1 for
Constructive Quantum Field Theory 627

h0 = 1, . . . , h, then kX(h) k  c Bh04 for some c so is  N 2  2N (BN 4 ) < 0 and it overwhelmingly
that, by recursive PN application of Lemma 1, dominates on the remaining terms whose value is
ZN (, f )  eVj, 0  h = 1 R (j, h)jj . By the remark at the bounded by a similar expression with a smaller
end of the previous section, given j the lower bound power of N. Then if E c def= =E denotes the comple-
on E just described agrees with the perturbation ment in  of a set E  :
expansion of E = (1=jj) log ZN (, f ) truncated to Lemma 2 Let d = 2. Define Vh (Dch ) to be given by
order j (in ) up to an error bounded by
P the expression [22] with the integrals extending over
1
h = 1 R (j, h). j =Dh and define R(j, h þ 1) by [24]. Then
Z  
Remark The problem solved by Lemma 1 is c c

usually referred to as the small-field problem, to eVhþ1 ðDhþ1 Þ dP zðhþ1Þ ¼ eVh ðDh ÞþRþ ðj;hþ1Þjj ½28
contrast it with the large-field problem discussed
later. The proof of the lemma is a simple Taylor where jRþ (j, h þ 1j  Rþ (j, h þ 1 def
= R(j; h þ 1) þ
c0þ B2 (hþ1)2
expansion in  h if d = 3 or in h2  2h if d = 2 to cþ e with suitable cþ , c0þ .
order j (in ). The constraint on z(hþ1) makes the Remark Lemma 2 is genuinely not perturbative
integrations over z(hþ1) , necessary to compute Vj; h and making essential use of the positivity of .
from Vj; hþ1 , not Gaussian. But the tail estimates [9], Below the analysis of the proof of the lemma, which
together with the Markov property of the distribu- consists essentially in its reduction to Lemma 1, is
tion of z(h) can be used to estimate the difference described in detail. It is perhaps the most interesting
with respect to the Gaussian unconstrained integra- part and the core of the theory of the proof that
tions of z(hþ1) : and the result is the addition of the truncating the expansion in  of (1=jj) log ZN (, f )
small ‘‘tail error’’ changing R into R in [27]. The to order j gives as a result an estimate exact to order
estimate of the main part of the remainder R would jþ1 of (1=jj) log ZN (, f ).
be obvious if the fields z(h) were independent on
boxes of scale  h : they are not independent but Let RN be the cubes  2 QN in which there is at
they are Markovian and the estimate can be done by least one point x where jz(N)
x
j  BN 2 . By definition,
taking into account the Markov property. the region DN =DN1 is covered by RN .
Remark that in the region DN1 =RN the field
For more details, the reader is referred to Wilson X(N1) is large but zN is not large so that X(N) is still
(1970, 1972), Gallavotti (1978, 1981, 1985), and very large: this is so because the bounds set to define
Benfatto et al. (1978). the regions D and R are quite different being BN 4
and BN 2 , respectively. Hence, if a point is in DN1
and not in RN , then the field X(N) must be of the
Nonperturbative Renormalization: Large order BN 3 . Therefore, by positivity of the ’(N)4
x
Fields, Ultraviolet Stability term (which dominates all other terms so that
The small-field estimates are not sufficient to obtain V (N) (’(N)
x
) < 0 for x 2 DN [ (DN1 =RN )) we can
ultraviolet stability: to control the cases in which replace VN (DcN ) by V((DN [ (DN1 =RN ))c ), for the
jX(h) (h)
j > Bh4 for some x or some h, or jYxh j > Bh4 for purpose of obtaining an upper bound.
x
h
some jx  hj <  , a further idea is necessary and it Furthermore, modulo a suitable correction, it is
rests on making use of the assumption that  > 0 possible to replace V((DN [ (DN1 =RN ))c ) by
which, in a sense to be determined, should suppress V((DN1 [ RN )c ): because the integrand in VN is
the contribution to the integral defining ZN (, f ) bounded below by
coming from very large values of the field. Assume b 2N N 2
also  < 1 for the same reasons advanced in the
section ‘‘Effective potentials and their scale if d = 2 (by b N if d = 3), for some b, so that the
(in)dependence.’’ points in RN can at most lower V((DN [
Consider first d = 2. Let DN be the ‘‘large-field (DN1 =RN ))c ) by bN 2  (4d)N #(RN ) if #RN is
region’’ where jX(N)x
j > BN 4 and let VN (=DN ) be the number of boxes of QN in RN and V(’x ) is
the integral defining the potential in [21] extended bounded below by its minimum: thus,
to the region =DN , complement of DN . This region
VððDN1 [ RN Þc Þ þ bN2  ð4dÞN #ðRN Þ
is typically very irregular (and random as X itself is
random with distribution PN ). is an upper bound to V((DN [ (DN1 =RN ))c ).
An upper bound on the integral defining ZN (, f ) In the complement of DN1 [ RN , all fields are
is obtained by simply replacing eVN by eVN (=DN ) ‘‘small’’; if X(N1) and RN are fixed this region is not
because in DN the first term in the integrand in [21] random (as a function of z(N) ) any more. Therefore,
628 Constructive Quantum Field Theory

if X(N1) , RN are fixed the integration over z(N) , quantity like b0 N 2  (4d)N (BN 4 )4 #(RN ) (because
conditioned to having z(N) fixed (and large) in the the reintroduction occurs in the region RN =DN1
region RN , is performed by means of the same which is covered by RN and in such points the field
argument necessary to prove Lemma 1 (essentially a Xx(N1) is not large, being bounded by B(N  1)4 );
Taylor expansion in  (4d)N ). The large size of so that their contribution to the effective potential
z(N) in RN does not affect too much the result is still dominated by the ’4 -term and therefore by
because on the boundary of RN the field z(N) is  (4d)N times a power of BN 4 times the volume of
BN 2 (recalling that z(N) is continuous) and since RN (in units  N , i.e., #RN ). All this is taken care
the variable z(N) is Markovian, the boundary effect of by suitably fixing c00 .
decays exponentially from the boundary @RN : it
Note that the sum over RN of [29] is
adds a quantity that can be shown to be bounded by
the number of boxes in RN on the boundary of RN , 0 2
N4 00
 ð4dÞN N2 ðBN4 Þ4  dN jj
hence by #RN , times b0 (N  1)2  (4d) (B(N  1)4 )4 ð1 þ c ec B eþc Þ
for some b0 .
The result of the integration over z(N) of (because  contains jj dN0 cubes of QN ); hence, it is
c B2 N 2
VN ((DN [(DN1 =RN ))c ) cþ e þ
e conditioned to the large-field bounded above by e for suitably defined
values of z (N)
in RN leads to an upper bound on
cþ , c0þ .
R V
e N P(dz(N) ) as The same argument can be repeated for Vj; h (Dch )
with any h if Vj; h (Dch ) is defined by the sum over ’s
X c
eVj;N1 ðDN1 ÞþRðj;NÞjj in Qh of the same integrals as those in [25] and [26]
RN with j =Dh replacing j in the integration domains.
Y 2 2
#RN Applying Lemma 1 recursively with j  1 (if
0 00
 ð4dÞN N2 ðBN 4 Þ4

c ec ðBN Þ eþc ½29 d = 3 it would become necessary to take j  3), it
2RN follows that there exist N-independent upper and
lower P bounds E jj on log Z(, f ) of the form
where c, c0 , c00 are suitable constants: this is Vj; 0 1 c0 B2 h2
)jj for c , c0 > 0
h = 1 (R(j, h) þ c e
explained as follows. suitably chosen and -independent for  < 1.
1. Taylor expansion (in ) of the integral By the remark at the end of Sec.6, given j, the
c 2 (4d )N
eVN ((DN1 [RN ) )þbN  #(RN )
(which, by cons- bounds just described agree with the perturbation
c
truction, is an upper bound on eVN (DN ) ) with expansion E(j, 0)jj  Vj; 0 of log Z(, f ) truncated
respect to the field z(N) , conditioned to be fixed toP order j (in ) up to the remainders
and large in RN , would lead to an upper bound as 1 h = 1 R (j, h). Hence, if B is chosen proportional
to logþ 1 def = log (e þ 1 ), the upper and lower
c 0 00
ðBN 4 Þ4  ð4dÞN #ðRN Þ bounds coincide to order j in  with the value
eVj; N1 ððDN1 [RN Þ ÞþR ðj;NÞjjþb
obtained by truncating to order j the perturbative
with R0 equal to [24] possibly with some C0j series.
replacing Cj . The second exponential on the RHS The latter remark is important as it implies
of [29] arises partly from the above correction not only that the bounds are finite (by the
b00 (BN 4 )4  (4d)N #(RN ) and partly from a section ‘‘Perturbation theory’’) but also that the
contribution of similar form explained in (3) function (1=jj) log Z(, f ) is not quadratic in f:
below. already to order 1Rin  it is quartic in f (containing a
2. Integration over the large conditioning fields term equal to ( Cx, 0 fx dx)4 ).
fixed in RN is controlled by the second estimate The latter property is important as it excludes
in [9] (the tail estimate): the first factors in that the result is a ‘‘Gaussian’’ generating function.
parentheses in [29] is the tail estimate just Thus, the outline of the proof of Lemma 2, which
mentioned, i.e., the probability that z(N) is large together with Lemma 1 forms the core of the
in the region RN . The second factor is only partly analysis of the ultraviolet stability for d = 2, is
explained in (1) above. completed.
3. Without further estimates, the bound [29] would If d = 3, more care is needed because (very mild)
contain Vj; N1 ((DN1 [ RN )c ) rather than smoothness, like the considered Hölder continuity
c
Vj; N1 (DN1 ). Hence, there is the need to change with exponent 1/4, of z, X is necessary to obtain the
the potential Vj;N1 ((DN1 [ RN )c ) by ‘‘reintrodu- key scale independence property discussed in earlier:
cing’’ the contribution due to the fields in therefore, the natural measure of the size of z(h) and
RN =DN1 in order to reconstruct Vj; N1 (DcN1 ). X(h) in a box  2 Qh is no longer the maximum of
Reintroducing this part of the potential costs a jz(h)
x
j or of jX(h) x
j. The region Dh becomes more
Constructive Quantum Field Theory 629

involved as it has to consist of the points x renormalization group applications in which they
where jX(h)
x
j > Bh4 and of the pairs h, h0 where either tend to zero only as powers of h or do not
  tend to zero at all.
 ðhÞ ðhÞ 
Xh  Xh0  The multiscale analysis method, i.e., the renorma-
4
jYh;h0 j  1 > Bh lization group method, in a form close to the one
ð h jh  h0 jÞ4
discussed here has been applied very often since its
i.e., it is not just a subset of . introduction in physics and it has led to the solution
However, if d = 3, the relevant part also contains of several important problems. The following is not
the negative term V (rel, 2) , see [25]: and since it an exhaustive list and includes a few open questions.
dominates over all other terms which contain a
Y-field (because their couplings [25] are smaller by 1. The arguments just discussed imply, with minor
about  h ), the argument given for d = 2 can be extra work that ZN (, f ) as N ! 1 not only admit
adapted to the new situation. Two regions D1h , D2h uniform upper and lower bounds but also that the
will be defined: the first consists of all the points x limit as N ! 1 actually exists and it is a C1 function
where jX(h) x
j > Bh4 and the second of all the pairs of , f . Its  and f-derivatives at  = 0 and f = 0 are
0 (h) 4
h, h where jYh, h0 j > Bh . The region Rh will be
given by the formal perturbation calculation. In some
the collection of all  2 Qh , where kz(h) k > Bh2 , cases, it is even possible to show that the formal series
see [8] with = 0. Then V(Dch ) will be defined as the for ZN (, f ) in powers of  is Borel summable.
sum of the integrals in [25] and [26] with the integrals 2. The problem of removing the infrared cutoff (i.e.,
over xi further restricted to xi 62 D1h and those over the  ! 1) is in a sense more a problem of statistical
pairs hi , h0i are further restricted to (hi , h0i ) 62 D2h . With mechanics. In fact, it can be solved for d = 2, 3 by a
the new settings, Lemma 2 can be proved also for typical technique used in statistical mechanics, the
d = 3 along the same lines as in the d = 2 case. ‘‘cluster expansion.’’ This is not intended to mean
For more details, the reader is referred to Wilson that it is technically an easy task: understanding its
(1970, 1972), Benfatto et al. (1978), and Gallavotti connection with the low-density expansions and
(1981). the possibility of using such techniques has been a
major achievement that is not discussed here.
3. The third problem mentioned in the introduction,
that is, checking the axioms so that the theory could
Ultraviolet Limit, Infrared Behavior, and
be interpreted as a quantum field theory is a difficult
Other Applications problem which required important efforts to con-
The results on the ultraviolet stability are nonper- trol and which is not analyzed here. An introduction
turbative, as no assumption is made on the size of  to it can be its analysis in the d = 2 case.
(the assumption  < 1 has been imposed in the last 4. Also the problem of keeping the ultraviolet cutoff
two sections only to obtain simpler expressions for and removing the infrared cutoff while the para-
the -dependence of various constants): nevertheless meter m2 in the propagator approaches 0 is a very
the multiscale analysis has allowed us to use interesting problem related to many questions in
perturbative techniques (i.e., the Taylor expansion statistical mechanics at the critical point.
in Lemmata 1, 2) to find the solution. The latter 5. Field theory methods can be applied to various
procedure is the essence of the renormalization statistical mechanics problems away from criti-
group methods: they aim at reducing a difficult cality: particularly interesting is the theory of the
multiscale problem to a sequence of simple single- neutral Coulomb gas and of the dipole gas in two
scale problems. Of course, in most cases, it is dimensions.
difficult to implement the approach and the scalar 6. The methods can be applied to Fermi systems in
quantum fields in dimensions 2, 3 are among the field theory as well as in equilibrium statistical
simplest examples. The analysis of the beta function mechanics. The understanding of the ground state
and of the running couplings, which appear in in not exactly soluble models of spinless fermions
essentially all renormalization group applications, in one dimension at small coupling is one of the
does not play a role here (or, better, their role is so results. And via the transfer matrix theory it has
inessential that it has even been possible to avoid led to the understanding of nontrivial critical
mentioning them). This makes the models somewhat behavior in two-dimensional models that are not
special from the renormalization group viewpoint: exactly soluble (like Ising next-nearest-neighbor or
the running couplings at length scale h, if intro- Ashkin–Teller model). Fermi systems are of
duced, would tend exponentially to 0 as h ! 1; particular interest also because in their analysis
unlike what happens in the most interesting the large-fields problem is absent, but this great
630 Constructive Quantum Field Theory

technical advantage is somewhat offset by the In general, constructive quantum field theory
anticommutation properties of the fermionic seems to be in a deep crisis: the few solutions that
fields, which do not allow us to employ have been found concern very special problems and
probabilistic techniques in the estimates. are very demanding technically; the results obtained
7. An outstanding open problem is whether the scalar have often not been considered to contribute
’4 -theory is possible and nontrivial in dimension appreciably to any ‘‘progress.’’ And many consider
d = 4: this is a case of a renormalizable not that the work dedicated to the subject is not worth
asymptotically free theory. The conjecture that the results that one can even hope to obtain.
many support is that the theory is necessarily trivial Therefore, in recent years, attempts have been
(i.e., the function ZN (, f ) becomes necessarily a made to follow other paths: an attitude that in the
Gaussian in the limit N ! 1). One of the main past usually did not lead, in general to great
problems is the choice of the ultraviolet cut-off; achievements but that is always tempting and
unlike the d = 2, 3 cases in which the choice is a worth pursuing because the rare major progresses
matter of convenience it does not seem that the made in physics resulted precisely by such changes
issue of triviality can be settled without a careful of attitude, leaving aside developments requiring
analysis of the choice and of the role of the work which was too technical and possibly hopeless:
ultraviolet cut-off. just to mention an important case, one can recall
8. Very interesting problems can be found in the quantum mechanics which disposed of all attempts
study of highly symmetric quantum fields: gauge at understanding the observed atomic levels quanti-
invariance presents serious difficulties to be zation on the basis of refined developments of
studied (rigorously or even heuristically) because classical electromagnetism.
in its naive forms it is incompatible with For more details, the reader is referred to Nelson
regularizations. Rigorous treatments have been (1966), Guerra (1972), Glimm et al. (1973), Glimm
in some cases possible and in few cases it has been and Jaffe (1981), Simon (1974), Benfatto et al.
shown that the naive treatment is not only not (1978, 2003), Aizenman (1982), Gawedzky and
rigorous but it leads to incorrect results. Kupiainen (1983, 1985a, b), Balaban (1983), and
9. In connection with item (8) an outstanding problem Giuliani and Mastropietro (2005).
is to understand relativistic pure gauge Higgs fields
in dimension d = 4: the latter have been shown to be See also: Algebraic Approach to Quantum Field Theory;
ultraviolet stable but the result has not been Axiomatic Quantum Field Theory; Euclidean Field
followed by the study of the infrared limit. Theory; Integrability and Quantum Field Theory;
Perturbation Theory and its Techniques; Quantum Field
10. The classical gauge theory problem is quantum
Theory: A Brief Introduction; Scattering, Asymptotic
electrodynamics, QED, in dimension 4: it is a
Completeness and Bound States.
renormalizable theory (taking into account gauge
invariance) and its perturbative series truncated
after the first few orders give results that can be Further Reading
directly confronted with experience, giving very
Aizenman M (1982) Geometric analysis of ’4 -fields and Ising
accurate predictions. Nevertheless, the model is models. Communications in Mathematical Physics 86: 1–48.
widely believed to be incomplete: in the sense that, Balaban T (1983) (Higgs)3, 2 quantum fields in a finite volume. III.
if treated rigorously, the result would be a field Renormalization. Communications in Mathematical Physics
describing free noninteracting assemblies of 88: 411–445.
photons and electrons. It is believed that QED Benfatto G, Cassandro M, Gallavotti G et al. (1978) Some
probabilistic techniques in field theory. Communications in
can make sense only if embedded in a model with Mathematical Physics 59: 143–166.
more fields, representing other particles (e.g., the Benfatto G, Cassandro M, Gallavotti G et al. (1980) Ultraviolet
standard model), which would influence the stability in Euclidean scalar field theories. Communications in
behavior of the electromagnetic field by providing Mathematical Physics 71: 95–130.
an effective ultraviolet cutoff high enough for not Benfatto G and Gallavotti G (1995) Renormalization Group,
pp. 1–143. Princeton: Princeton University Press.
altering the predictions on the observations on the Benfatto G, Giuliani A, and Mastropietro V (2003) Low
time and energy scales on which present (and, temperature analysis of two dimensional Fermi systems with
possibly, future over a long time span) experi- symmetric Fermi surface. Annales Henry Poincaré 4: 137–193.
ments are performed. In dimension d = 3, QED is De Calan C and Rivasseau V (1981) Local existence of the Borel
super-renormalizable, once the gauge symmetry is transform in euclidean 44 . Communications in Mathematical
Physics 82: 69–100.
properly taken into account, and it can be studied Fröhlich J (1982) On the triviality of 4d theories and the
with the techniques described above for the scalar approach to the critical point in d  4 dimensions. Nuclear
fields in the corresponding dimension. Physics B 200: 281–296.
Contact Manifolds 631

Gallavotti G (1978) Some aspects of renormalization problems in Glimm J and Jaffe A (1981) Quantum Physics. Springer.
statistical mechanics. Memorie dell’ Accademia dei Lincei Guerra F (1972) Uniqueness of the vacuum energy density and Van
15: 23–59. Hove phenomena in the infinite volume limit for two-dimensional
Gallavotti G (1981) Elliptic operators and Gaussian processes. In: self-coupled Bose fields. Physical Review Letters 28: 1213–1215.
Aspects Statistiques et Aspects Physiques des Processus Gaus- Hepp K (1966) Théorie de la rénormalization. Lecture Notes in
siens, pp. 349–360. Colloques Internat. C.N.R.S, St. Flour. Physics, vol. 2. Heidelberg: Springer.
Publications du CNRS, Paris. Nelson E (1966) A quartic interaction in two dimensions. In:
Gallavotti G (1985) Renormalization theory and ultraviolet Goodman R and Segal I (eds.) Mathematical Theory of
stability via renormalization group methods. Reviews of Elementary Particles, pp. 69–73. Cambridge: M.I.T.
Modern Physics 57: 471–569. Osterwalder K and Schrader R (1973) Axioms for Euclidean
Gawedzky K and Kupiainen A (1983) Block spin renormalization Green’s functions. Communications in Mathematical Physics
group for dipole gas and (@)4 . Annals of Physics 147: 198–243. 31: 83–112.
Gawedzky K and Kupiainen A (1985a) Gross–Neveu model Simon B (1974) The P(’)2 Euclidean (Quantum) Field Theory.
through convergent perturbation expansion. Communications Princeton: Princeton University Press.
in Mathematical Physics 102: 1–30. Streater RF and Wightman AS (1964) PCT, Spin, Statistics and
Gawedzky K and Kupiainen A (1985b) Massless lattice 44 theory: All That. Benjamin-Cummings (reprinted Princeton University
rigorous control of a renormalizable asymptotically free model. Press, 2000).
Communications in Mathematical Physics 99: 197–252. Wightman AS and Gärding L (1965) Fields as operator-valued
Giuliani A and Mastropietro V (2005) Anomalous universality in distributions in relativistic quantum theory. Arkiv för Fysik
the anisotropic Ashkin–Teller model. Communications in 28: 129–189.
Mathematical Physics 256: 681–735. Wilson KG (1970) Model of coupling constant renormalization.
Glimm J, Jaffe A, and Spencer T (1973) Velo G and Wightman A Physical Review D 2: 1438–1472.
(eds.) Constructive Field theory, Lecture Notes in Physics, Wilson KG (1972) Renormalization of a scalar field in strong
vol. 25, pp. 132–242. New York: Springer. coupling. Physical Review D 6: 419–426.

Contact Manifolds
J B Etnyre, University of Pennsylvania, (e.g., thermodynamics, fluid dynamics, holo-
Philadelphia, PA, USA morphic curves, and open book decompositions)
ª 2006 Elsevier Ltd. All rights reserved. are provided in the ‘‘Further reading’’ section.

Introduction Basic Definitions and Examples


Contact geometry has been seen to underly many A hyperplane field
on a manifold M is a codimen-
physical phenomena and is related to many other sion-1 sub-bundle of the tangent bundle TM. Locally,
mathematical structures. Contact structures first a hyperplane field can always be described as the
appeared in the work of Sophus Lie on partial kernel of a 1-form. In other words, for every point in
differential equations. They reappeared in Gibbs’ M there is a neighborhood U and a 1-form defined
work on thermodynamics, Huygens’ work on on U such that the kernel of the linear map
geometric optics, and in Hamiltonian dynamics. x : Tx M ! R is
x for all x in U. The form is called
More recently, contact structures have been seen to a local defining form for
. A contact structure on a
have relations with fluid mechanics, Riemannian (2n þ 1)-dimensional manifold M is a ‘‘maximally
geometry, and low-dimensional topology, and these nonintegrable hyperplane field’’
. The hyperplane
structures provide an interesting class of subelliptic field
is maximally nonintegrable if for any (and hence
operators. every) locally defining 1-form for
the following
After summarizing the basic definitions, exam- equation holds:
ples, and facts concerning contact geometry, this
^ ðd Þn 6¼ 0 ½1
article discusses the connections between contact
geometry and symplectic geometry, Riemannian (this means that the form is, pointwise, never equal
geometry, complex geometry, analysis, and to 0). Geometrically, the nonintegrability of
means
dynamics. The article ends by discussing two of that no hypersurface in M can be tangent to
along
the most-studied connections with physics: Hamil- an open subset of the hypersurface. Intuitively, this
tonian dynamics and geometric optics. References means that the hyperplanes ‘‘twist too much’’ to be
for other important topics in contact geometry tangent to hypersurfaces (Figure 1). The pair (M,
)
632 Contact Manifolds

y M is compact then so is P M; so this gives examples of


z
contact structures on compact manifolds.
If  and 0 are two locally defining 1-forms for , then
there is a nonzero function f such that 0 = f . Thus,
0 ^ (d0 )n = f nþ1  ^ (d)n is a nonzero top dimen-
sional form on M and if n is odd then the orientation
defined by the local defining form is independent of the
x
actual form. Hence, when n is odd, a contact structure
defines an orientation on M (this is independent of
whether or not  is orientable!). If M had a preassigned
orientation (and n is odd), then the contact structure is
Figure 1 The standard contact structure on R 3 given as the called ‘‘positive’’ if it induces the given orientation and
kernel of dz  y dx : Courtesy of Stephan Schönenberger. ‘‘negative’’ otherwise. One should be careful when
reading the literature, as some authors build
positive into their definition of contact structure,
is called a contact manifold and any locally defining
especially when n = 1. If there is a globally defined
form  for  is called a contact form for .
1-form  whose kernel defines , then  is called
Example 1 The most basic example of a contact transversally orientable or co-orientable. This is
seen on R2nþ1 as the kernel of the
structure can be P equivalent to the bundle  being orientable when n
1-form  = dz  ni= 1 yi dxi , where the coordinates is odd or when n is even and M is orientable. In
on R2nþ1 are (x1 , y1 , . . . , xn , yn , z). This example is this article the discussion is restricted to transver-
shown in Figure 1 when n = 1. sely orientable contact structures.
Suppose that  is a contact form for , then eqn [1]
Example 2 Recall that on the cotangent space of implies that dj is a symplectic form on . This
any n-manifold M, there is a canonical 1-form , is one sense in which a contact structure is like an
called the Liouville form. If (q1 , . . . , qn ) are local odd-dimensional analog of a symplectic structure.
coordinates on M, then any 1-form can be expressed A submanifold L of a contact manifold (M, ) is
P
as ni¼1 pi dqi , so (q1 , p1 , . . . , qn , pn ) are local coor- called Legendrian if dim M = 2 dim L þ 1 and Tp L  p .
dinates on T  M. In these coordinates, Example 4 A fiber in the unit cotangent bundle
X
n with the contact structure from Example 3 is a
¼ pi  dqi ½2 Legendrian sphere.
i¼1
Example 5 Let f : M ! R be a function. Then
where  : T  M ! M is the natural projection j1 (f )(q) = (q, dfq , f (q)) is a section of the 1-jet space
map. The 1-jet space of M is the manifold J1 (M) of M; it is called the 1-jet of f. If s is any
J1 (M) = T  M  R and can be considered as a bundle section of the 1-jet space, then it is Legendrian if and
over M. The 1-jet space has a natural contact only if it is the 1-jet of a function.
structure given as the kernel of  = dz  , where z
is the coordinate on R. Note that if M = Rn then we This observation is the basis for Lie’s study of
recover the previous example. partial differential equations. More specifically, a
first-order partial differential equation on M can be
Example 3 The (oriented) projectivized cotangent considered as giving an algebraic equation on J1 (M).
space of a manifold M is the set P M of nonzero Then, a section of J1 (M) satisfying this algebraic
covectors in T  M where two covectors are identified equation corresponds to the 1-jet of a solution to the
if they differ by a positive real number, that is, original partial differential equation if and only if it
is Legendrian.
P M ¼ ðT  M n f0gÞ=Rþ ½3
Recently, Legendrian submanifolds have been
where {0} is the zero section of T  M and R þ denotes much studied. There are various classification results
the positive real numbers. If M has a metric then P M in three dimensions and several striking existence
can be easily identified with the space of unit results in higher dimensions.
covectors. Considering P M as unit covectors, we can
restrict the canonical 1-form  to P M to get a 1-form
Local Theory
 whose kernel defines a contact structure  on P M.
(Although there is no canonical contact form on P M, The natural equivalence between contact structures
the contact structure  is still well defined.) Note that if is contactomorphism. Two contact structures 0 and
Contact Manifolds 633

1 on manifolds M0 and M1 , respectively, are Lutz and Martinet proved a similar, but weaker,
contactomorphic if there is a diffeomorphism result for oriented closed 3-manifolds. More
f : M0 ! M1 such that f (0 ) = 1 . All contact struc- specifically, every closed oriented 3-manifold admits
tures are locally contactomorphic. In particular, we a co-oriented contact structure and in fact has at least
have the following theorem. one for every homotopy class of plane field. There has
been much progress on classifying contact structures
Theorem 1 (Darboux’s Theorem). Suppose i is a
on 3-manifolds and here an interesting dichotomy has
contact structure on the manifold Mi , i = 0, 1, and
appeared. Contact structures break into one of two
M0 and M1 have the same dimension. Given any
types: tight or overtwisted. Overtwisted contact
points p0 and p1 in M0 and M1 , respectively, there
structures obey an h-principle and are in general easy
are neighborhoods Ni of pi in Mi and a contacto-
to understand. Tight contact structures have a more
morphism from (N0 , 0 jN0 ) to (N1 , 1 jN1 ). Moreover,
subtle, geometric nature. In higher dimensions there is
if i is a contact form for i near pi , then the
much less known about the existence (or classification)
contactomorphism can be chosen to pull 1 back to 0 .
of contact structures.
Thus, locally all contact structures (and contact
forms!) look like the one given in Example 1 above.
Furthermore, contact structures are ‘‘local in Relations with Symplectic Geometry
time.’’ That is, compact deformations of contact
Let (X, !) be a symplectic manifold. A vector field v
structures do not produce new contact structures.
satisfying
Theorem 2 (Gray’s theorem). Let M be an oriented
Lv ! ¼ ! ½4
(2n þ 1)-dimensional manifold and t , t 2 (0, 1), a
family of contact structures on M that agree off of (where Lv ! is the Lie derivative of ! in the direction
some compact subset of M. Then there is a family of of v) is called a symplectic dilation. A compact
diffeomorphisms t : M ! M such that (t ) t = 0 . hypersurface M in (X, !) is said to have ‘‘contact
type’’ if there exists a symplectic dilation v in a
In particular, on a compact manifold, all
neighborhood of M that is transverse to M. Given a
deformations of contact structures come from
hypersurface M in (X, !), the characteristic line field
diffeomorphisms of the underlying manifold. The
LM in the tangent bundle of M is the symplectic
theorem is not true if the contact structures do not
complement of TM in TX. (Since M is codimension 1,
agree off of a compact set. For example, there is a
it is coisotropic; thus, the symplectic complement lies
one-parameter family of noncontactomorphic
in TM and is one dimensional.)
contact structures on S1  R2 .
Theorem 3 Let M be a compact hypersurface in a
symplectic manifold (X, !) and denote the inclusion
Existence and Classification
map i : M ! X. Then M has contact type if and only
The existence of contact structures on closed odd- if there exists a 1-form  on M such that d = i !
dimensional manifolds is quite difficult. However, and the form  is never zero on the characteristic
Gromov has shown that contact structures on line field.
open manifolds obey an h-principle. To explain
If M is a hypersurface of contact type, then the
this, we note that if (M2nþ1 , ) is a co-oriented
1-form  is obtained by contracting the symplectic
contact manifold then the tangent bundle of M can
dilation v into the symplectic form:  = v !. It is
be written as   R and thus the structure group
easy to verify that the 1-form  is a contact form
of TM can be reduced to U(n) (since  has
on M. Thus, a hypersurface of contact type in a
a conformal symplectic structure on it). Such
symplectic manifold inherits a co-oriented contact
a reduction of the structure group is called an
structure.
almost contact structure on M. Clearly, a contact
Given a co-orientable contact manifold (M, ), its
structure on M induces an almost contact struc-
symplectization Symp(M, ) = (X, !) is constructed
ture. If M is an open manifold, Gromov proved
as follows. The manifold X = M  (0, 1), and given
that the inclusion of the space of co-oriented
a global contact form  for  the symplectic
contact structures on M into the space of almost
form is ! = d(t), where t is the coordinate on R.
contact structures on M is a weak homotopy
(The symplectization is also equivalently defined as
equivalence. In particular, if an open manifold
(M  R, d(et )).)
meets the necessary algebraic condition for the
existence of an almost contact structure, then the Example 6 The symplectization of the standard
manifold has a co-oriented contact structure. contact structure on the unit cotangent bundle
634 Contact Manifolds

(see Example 3) is the standard symplectic structure for  if and only if it is transverse to  and its flow
on the complement of the zero section in the preserves .
cotangent bundle. The fundamental question concerning Reeb vector
fields asks if its flow has a (contractible) periodic
The symplectization is independent of the choice
orbit. A paraphrazing of the Weinstein conjecture
of contact from . To see this, fix a co-orientation
asserts a positive answer to this question. Most
for  and note the manifold X which can be
progress on this conjecture has been made in
identified (in many ways) with the sub-bundle of
dimension 3 where H Hofer has proved the
T  M whose fiber over x 2 M is
existence of periodic orbits for all Reeb fields on S3
f 2 Tx M : ðx Þ ¼ 0 and and on 3-manifolds with essential spheres
 > 0 on vectors positively transverse to x g ½5 (i.e., embedded S2 ’s that do not bound a 3-ball in
the manifold). Relations with Hamiltonian dynamics
and restricting d to this subspace yields a symplec- are discussed below.
tic form !, where  is the Liouville form on T  M Recall, from Example 3, that a Riemannian metric
defined in Example 2. A choice of contact form  g on a manifold M provides an identification of the
fixes an identification of X with the sub-bundle of (oriented) projectivized cotangent bundle P M with
T  M under which d(t) is taken to d. the unit cotangent bundle. Considered as a subset of
The vector field v = @=@t on (X, !) is a symplectic T  M, P M inherits not only a contact structure but
dilation that is transverse to M  {1}  X. Clearly, also a contact form  (by restricting the Liouville
v !jM{1} = . Thus, we see that any co-orientable form). Let v be the associated Reeb vector field.
contact manifold can be realized as a hypersurface The metric g also provides an identification of the
of contact type in a symplectic manifold. In tangent and cotangent bundles of M. Thus, P M
summary, we have the following theorem. may be considered as the unit tangent bundle of M.
Let wg be the vector field on the unit tangent bundle
Theorem 4 If (M, ) is a co-oriented contact
generating the geodesic flow on M.
manifold, then there is a symplectic manifold
Symp(M, ) in which M sits as a hypersurface of Theorem 6 The Reeb vector field v is identified
contact type. Moreover, any contact form  for  with geodesic flow field wg when P M is identified
gives an embedding of M into Symp(M, ) that with the unit tangent space using the metric g.
realizes M as a hypersurface of contact type.
We also note that all the hypersurfaces of contact
type in (X, !) look locally, in X, like a contact Relations with Complex Geometry
manifold sitting inside its symplectification. and Analysis
Theorem 5 Given a compact hypersurface M of Let X be a complex manifold with boundary and
contact type in a symplectic manifold (X, !) with the denote the induced complex structure on TX by J.
symplectic dilation given by v, there is a neighbor- The complex tangencies  to M = @X are described
hood of M in X symplectomorphic to a neighbor- by the equation d  J = 0, where  is a function
hood of M  {1} in Symp(M, ) where the defined in a neighborhood of the boundary such that
symplectization is identified with M  (0, 1) using 0 is a regular value and 1 (0) = M. The form
the contact form  = v !jM and  = ker . L(v, w) = d(d  J)(v, Jw), for v, w 2 , is called
the Levi form, and when L(v, w) is positive
(negative) definite, then X is said to have strictly
The Reeb Vector Field and Riemannian pseudoconvex (pseudoconcave) boundary. The
Geometry hyperplane field  will be a contact structure if and
only if d(d  J) is a nondegenerate 2-form on  (if
Let (M, ) be a contact manifold. Associated to a
and only if L(v, w) is definite). A well-studied source
contact form  for  is the Reeb vector field v .
of examples comes from Stein manifolds.
This is the unique vector field satisfying
Example 7 Let X be a complex manifold and
v  ¼ 1 and v d ¼ 0 ½6
again let J denote the induced complex structure
One may readily check that v is transverse to the on TX. From a function  : X ! R, we can define a
contact hyperplanes and the flow of v preserves  2-form ! = d(d  J) and a symmetric form
(in fact, it preserves ). These two conditions g(v, w) = !(v, Jw). If this symmetric form is positive
characterize Reeb vector fields; that is, a vector definite, the function  is called ‘‘strictly plurisub-
field v is the Reeb vector field for some contact form harmonic.’’ The manifold X is a Stein manifold if X
Contact Manifolds 635

admits a proper strictly plurisubharmonic function Weinstein’s conjecture asserts a positive answer to
 : X ! R. An important result says that X is Stein the questions: Does the Hamiltonian flow along a
if and only if it can be realized as a closed complex regular level set of contact type have a periodic
submanifold of C n . Clearly any noncritical level set orbit? Viterbo proved that the answer was yes if the
of  gives a contact manifold. hypersurface is compact and in (R 2n , ! = d). Other
progress has been made by studying Reeb dynamics.
Contact manifolds also give rise to an interesting
class of differential operators. Specifically, a contact
structure  on M defines a symbol-filtered algebra of
pseudodifferential operators  (M), called the
Geometric Optics
‘‘Heisenberg calculus.’’ Operators in this algebra In this section, we study the propagation of light (or
are modeled on smooth families of convolution various other disturbances) in a medium (for the
operators on the Heisenberg group. An important moment, we do not specify the properties of this
class of operators of this type are the ‘‘sum-of- medium). The medium will be given by a three-
squares’’ operators. Locally, the highest-order part dimensional manifold M. Given a point p in M and
of such an operator takes the form t > 0, let Ip (t) be the set of all points to which light
can travel in time t. The wave front of p at time t
X
2n
L¼ v2j þ iav ½7 is the boundary of this set and is denoted as
j¼1 p (t) = @Ip (t).
where {v1 , . . . , v2n } is a local framing for the contact Theorem 8 (Huygens’ principle). p (t þ t0 ) is the
field and v is a Reeb vector field. This operator envelope of the wave fronts q (t0 ) for all q 2 p (t).
belongs to 2 (M) and is subelliptic for a outside a This is best understood in terms of contact
discrete set. geometry. Let  : (T  Mn{0}) ! P M be the natural
projection (see Example 3) and let S be any smooth
Hamiltonian Dynamics sub-bundle of T  Mn{0} that is transverse to the radial
vector field in each fiber and for which  jS : S ! P M
Given a symplectic manifold (X, !), a function is a diffeomorphism. The restriction of the Liouville
H : X ! R will be called a Hamiltonian. (Only form to S gives a contact form  and a corresponding
autonomous Hamiltonians are discussed here.) The Reeb vector field v. Given a subset F of M with a well-
unique vector field satisfying defined tangent space at every point set
vH ! ¼ dH LF ¼ fp 2 S : ðpÞ 2 F and pðwÞ ¼ 0 for all
is called the Hamiltonian vector field associated to w 2 TðpÞ Fg ½8
H. Many problems in classical mechanics can be
The set LF is a Legendrian submanifold of S and is
formulated in terms of studying the flow of vH for
called the ‘‘Legendrian lift’’ of F. If L is a generic
various H.
Legendrian submanifold in S, then (L) is called the
Example 8 If (X, !) = (R 2n , d), where  is from front projection of L and L(L) = L. Given a Legendrian
Example 2, then the flow of the Hamiltonian vector submanifold L, let t (L) be the Legendrian submani-
field is given by fold obtained from L by flowing along v for time t.
@H @H Example 9 Given a metric g on M, Fermat’s
q_ ¼ ; p_ ¼ 
@p @q principle says that light travels along geodesics.
Thus, if S is the unit cotangent bundle, then using g
A standard fact says that the flow of vH preserves
to identify the geodesic flow with the Reeb flow
the level sets of H.
one sees that light will travel along trajectories
Theorem 7 If M is a level set of H corresponding of the Reeb vector field. Given a point p in M,
to a regular value and M is a hypersurface of contact the Legendrian submanifold Lp is a sphere sitting
type, then the trajectories of vH and of the Reeb in Tp M. The Huygens principle follows from the
vector field (associated to M in Theorem 3) agree. observation that p (t) = (t (Lp )).
Thus under suitable hypothesis, Hamiltonian Using the more general S discussed above, one can
dynamics is a reparametrization of Reeb dynamics. generalize this example to light traveling in a medium
In particular, searching for periodic orbits in such a that is nonhomogeneous (i.e., the speed differs from
Hamiltonian system is equivalent to searching for point to point in M) and anisotropic (i.e., the speed
periodic orbits in a Reeb flow. Thus in this context, differs depending on the direction of travel).
636 Control Problems in Mathematical Physics

See also: Hamiltonian Fluid Dynamics; Integrable Systems Etnyre J and Ng L (2003) Problems in Low Dimensional Contact
and Recursion Operators on Symplectic and Jacobi Topology, Topology and Geometry of Manifolds (Athens,
Manifolds; Minimax Principle in the Calculus of Variations. GA, 2001), pp. 337–357, Proc. Sympos. Pure Math., vol. 71.
Providence, RI: American Mathematical Society.
Geiges H Contact geometry. Handbook of Differential Geometry,
Further Reading vol. 2 (in press).
Geiges H (2001a) Contact Topology in Dimension Greater than
Aebisher B, Borer M, Kälin M, Leuenberger Ch, and Reimann Three, European Congress of Mathematics, vol. II (Barcelona,
HM (1994) Symplectic Geometry, Progress in Mathematics, 2000), Progress in Mathematics, vol. 202, pp. 535–545. Basel:
vol. 124. Basel: Birkhäuser. Birkhäuser.
Arnol’d VI (1989) Mathematical Methods of Classical Mechanics, Geiges H (2001b) A brief history of contact geometry and
Graduate Texts in Mathematics, vol. 60, xviþ516, pp. 163–179. topology. Expositiones Mathematicae 19(1): 25–53.
New York: Springer. Ghrist R and Komendarczyk R (2001) Topological features of
Arnol’d VI (1990) Contact Geometry: The Geometrical Method of inviscid flows. An Introduction to the Geometry and Topology
Gibbs’s Thermodynamics, Proceedings of the Gibbs Symposium. of Fluid Flows (Cambridge, 2000), 183–201, NATO Sci. Ser. II
(New Haven, CT, 1989), pp. 163–179. Providence, RI: American Math. Phys. Chem., vol. 47. Dordrecht: Kluwer Academic.
Mathematical Society. Giroux E (2002) Géométrie de contact: de la dimension trois
Beals R and Greiner P (1988) Calculus on Heisenberg manifolds. vers les dimensions supérieures, Proceedings of the Inter-
Annals of Mathematics Studies 119. national Congress of Mathematicians, vol. II (Beijing, 2002),
Eliashberg Y, Givental A, and Hofer H (2000) Introduction to pp. 405–414. Beijing: Higher Ed. Press.
Symplectic Field Theory, GAFA 2000 (Tel Aviv, 1999), Geom. Hofer H and Zehnder E (1994) Symplectic Invariants and
Funct. Anal. 2000, Special Volume, Part II, pp. 560–673. Hamiltonian Dynamics, Birkhäuser Advanced Texts: Basler
Etnyre J. Legendrian and transversal knots. Handbook of Knot Lehrbücher, pp. xivþ341. Basel: Birkhäuser.
Theory (in press). Taylor ME (1984) Noncommutative Microlocal Analysis, Part I,
Etnyre J (1998) Symplectic Convexity in Low-Dimensional Mem Amer. Math. Soc., 52, no. 313. American Mathematical
Topology, Symplectic, Contact and Low-Dimensional Topol- Society.
ogy (Athens, GA, 1996), Topology Appl., vol. 88, No. 1–2,
pp. 3–25.

Control Problems in Mathematical Physics


B Piccoli, Istituto per le Applicazioni del Calcolo, There are various problems one can formulate
Rome, Italy regarding systems of type [1], among which:
ª 2006 Elsevier Ltd. All rights reserved.
Controllability Given any two states y0 and y1
determine a control function u(
) such that for
some time t > 0 we have y1 = A(t, y0 , u(
)).
Introduction
Optimal control Consider a cost function J(y(
),
Control Theory is an interdisciplinary research area, u(
)) depending both on the evolutions of y and u
bridging mathematics and engineering, dealing with and determine a control function u ~(
) and a
physical systems which can be ‘‘controlled,’’ that is, trajectory ~y(t) = A(t, y0 , u
~(
)) such that ~y(
) steers
whose evolution can be influenced by some external the system from y0 to y1 , as before, and the cost J
agent. A general model can be written as is minimized (or maximized).
Stabilization We say that y is an equilibrium if
yðtÞ ¼ Aðt; yð0Þ; uð
ÞÞ ½1
there exists u
 2 U such that A(t, y, u ) = y for every
where y describes the state variables, y(0) the initial t > 0 (here u  indicates also the constant in time
condition, and u(
) the control function. Thus, eqn control function). Determine the control u as
[1] means that the state at time t depends on the function of the state y so that y is a (Lyapunov)
initial condition but also on some parameters u stable equilibrium for the uncontrolled dynamical
which can be chosen as function of time. To be system y(t) = A(t, y(0), u(y(
))).
precise, there are some control problems which are Observability Assume that we can observe not the
not of evolutionary type; however, in this presenta- state y, but a function (y) of the state. Determine
tion we restrict ourselves to this case. conditions on  so that the state y can be
One has to distinguish among the control set U where reconstructed from the evolution of (y) choosing
the control function can take values: u(t) 2 U, and the u(
) suitably.
space of control functions, U, to which each control
function should belong: u(
) 2 U. Thus, for example, For the sake of simplicity, we restrict ourselves
we may have U = Rm and U = L1 ([0, T], Rm ). mainly to the first two problems and just mention
Control Problems in Mathematical Physics 637

some facts about the others. Also, we focus on two y2


cases: y2
Control of ordinary differential equations (ODEs) In ζ u=–
–(
1) u(y) = –1
this case t 2 R, y 2 Rn , U is a set, typically
U  Rm , and A is determined by a controlled ODE
y1 u(y) = +1 y1
y_ ¼ f ðt; y; uÞ ½2
(u = +
A typical example in mathematical physics is the 1)
ζ+
control of mechanical systems (Bloch 2003, Bullo
Figure 1 Example 1. The simplest example of (a) optimal
and Lewis 2005).
synthesis and (b) corresponding feedback.
Control of partial differential equations (PDEs) In
this case t 2 R, x 2 Rn , y(x) belongs to a Banach origin with maximum force on some interval [0, t]
functional space, for example, H s (Rnþ1 , R), U is a and then to decelerate with maximum force to reach
functional space, and A is determined by a the origin at velocity zero. The set of optimal
controlled PDE, trajectories is depicted in Figure 1a: they can
be obtained using the following discontinuous
Fðt; x; y; yt ; yx1 ; . . . ; yxn ; yt ; . . . ; uÞ ¼ 0 ½3
feedback, see Figure 1b. Define the curves
A typical example in mathematical physics is the   = {(y1 , y2 ) : y2 > 0, y1 = y22 } and let  be
control of wave equation using boundary condi- defined as the union   [ {0}. We define Aþ to be
tions, see below. the region below  and A the one above. Then the
feedback is given by
There are various other possible situations we do 8
not treat here: ‘‘stochastic control,’’ when y is a random < þ1 if ðy1 ; y2 Þ 2 Aþ [  þ
variable and A defined by a (controlled) sto- uðxÞ ¼ 1 if ðy1 ; y2 Þ 2 A [  
:
chastic differential equation; ‘‘discrete time control,’’ 0 if ðy1 ; y2 Þ ¼ ð0; 0Þ
where t 2 N; ‘‘hybrid control,’’ where t and y may have
both discrete and continuous components, and so on. Example 2 Consider a (one-dimensional) vibrating
As shown above, the control law can be assigned string of unitary length with a fixed endpoint. The
in (at least) two basically different ways. In open- model for the motion of the displacement of the
loop form, as a function of time: t ! u(t), and in string with respect to the rest position is given by
closed-loop form or feedback, as a function of the ytt þ y ¼ 0; yðt; 0Þ ¼ 0 ½5
state: y ! u(y). For example, in optimal control we
look for a control u ~(t) in open-loop form, while in with initial data
stabilization we search for a feedback control u(y). yð0; Þ ¼ y0 ; yt ð0; Þ ¼ y1 ½6
The open-loop control depends on y(0), while a
feedback control can stabilize regardless of the Assume that we can control the position of the
initial condition. second endpoint; then,

Example 1 A point with unit mass moves along a yðt; 1Þ ¼ uðtÞ ½7
straight line; if a controller is able to apply an for some control function u() 2 R.
external force u, then, calling y1 (t), y2 (t), respec-
tively, the position and the velocity of the point at Let us introduce another key concept: the reach-
time t, the motion is described by the control system able set at time t from y is the set

ðy_ 1 ; y_ 2 Þ ¼ ðy2 ; uÞ ½4 Rðt; yÞ ¼ fAðt; y; uðÞÞ : uðÞ 2 Ug

It is easy to check that the feedback control Various problems can be formulated in terms of
u(y1 , y2 ) = y1  y2 stabilizes the system asymptot- reachable sets, for example, controllability requires
ically to the origin, that is, for every initial data that for every y the union of all R(t; y) as t ! 1
(
y1 , 
y2 ), the solution of the corresponding Cauchy includes the entire space. The dependence of R(t; y)
problem satisfies limt ! 1 (y1 , y2 )(t) = (0, 0). on time t and on the set of controls U is also a
Another simple problem consists in driving the subject of investigation: one may ask whether the
point to the origin with zero velocity in minimum same points in R(t; y) can be reached by using
time from given initial data. It is quite easy to see controls which are piecewise constant, or take
that the optimal strategy is to accelerate towards the values within some subsets of U.
638 Control Problems in Mathematical Physics

Control of ODEs the so-called geometric control theory. The main idea
is that controllability (and properties of optimal
For most proofs we refer to Agrachev and Sachkov
trajectories) is determined by the Lie algebra gener-
(2004) and Sontag (1998).
ated by vector fields fi . For example:

Controllability
Theorem 5 (Lie-algebraic rank condition). Let L
be the Lie algebra generated by the vector fields
Consider first the case of a linear system: fi , i = 1, . . . , m, and assume f0 = 0. If L(y) is of
y_ ¼ Ay þ Bu; u 2 U; yð0Þ ¼ y0 ½8 dimension n at every point y then the system is
controllable.
where y, y0 2 Rn , U  Rm , A is an n  n matrix and
B an n  m matrix. We have the following property We refer to Agrachev and Sachkov (2004)
of reachable sets: and Jurdjevic (1997) for general presentation of
geometric control theory and give a simple example
Theorem 1 If U is compact convex then the to show how Lie brackets characterize reachable
reachable set R(t) for [8] is compact and convex. directions.
A control system [8] is controllable if taking Example 3 Consider the Brockett integrator
U = Rm we have R(t) = Rn for every t > 0. By
linearity, this is equivalent to requiring the reachable y_ 1 ¼ u1 ; y_ 2 ¼ u2 ; y_ 3 ¼ u1 y2  u2 y1
set to be a neighborhood of the origin in case of Starting from the origin, using constant controls, we
bounded controls. Define the controllability matrix can move along curves tangent to the y1 y2 plane.
to be the n  nm matrix However, let f1 = (1, 0, y2 ) and f2 = (0, 1, y1 ) (fields
corresponding to constant controls); then their Lie
CðA; BÞ ¼ ðB; AB; . . . ; An1 BÞ
bracket is given by
Controllability is characterized by the following:
½f1 ; f2 ð0Þ ¼ ðDf2  f1  Df2  f2 Þð0Þ ¼ ð0; 0; 2Þ
Theorem 2 (Kalman controllability theorem). The
Moving for time t first along the integral curve of f1 ,
linear system [8] is controllable if and only if
then of f2 , then of f1 , and finally of f2 , we reach
rank(C(A, B)) = n.
a point t2 [f1 , f2 ](0) þ o(t2 ) along the vertical direc-
For linear systems, there exists a duality between tion y3 . This corresponds to say that the system
controllability and observability in the sense of the satisfies LARC.
following theorem:
Optimal Control
Theorem 3 Consider the linear control system [8]
and assume to observe the variable z(y) = Cy for The theory of optimal control has developed in three
some p  n matrix C. Then, observability holds if main directions:
and only if the linear system y_ = At y þ Ct v is Existence of optimal controls, under various
controllable. assumptions on L, f , U. When the sets F(t, y) are
convex, optimal solutions can be constructed follow-
There exists no characterization of controllability
ing the direct method of Tonelli for the calculus of
for nonlinear systems as for linear ones, but we have
variations, that is, as limits of minimizing sequences:
the linearization result:
the two main ingredients are compactness and lower-
Theorem 4 A nonlinear system is locally control- semicontinuity. If convexity does not hold, existence
lable if its linearization is. The converse is false. is not granted in general but for special cases.
Necessary conditions for the optimality of a
There are many results for the important class of
control u(). The major result in this direction is
control–affine systems
the celebrated ‘‘Pontryagin maximum principle’’
X
m (PMP) which extends the Euler–Lagrange equation
y_ ¼ f0 ðyÞ þ fi ðyÞui ½9 to control systems, and the Weierstrass necessary
i¼1
conditions for a strong local minimum in the
where f0 , . . . , fm are smooth vector fields on Rn and calculus of variations. Various extensions and other
U = Rm . In general, there exists no explicit represen- necessary conditions are now available (Agrachev
tation for the trajectories of [9], in terms of integrals and Sachkov 2004).
of the control as it happens for linear systems. Still, a Sufficient conditions for optimality. The standard
rich mathematical theory has been developed apply- procedure resorts to embedding the optimal control
ing techniques and ideas from differential geometry: problem in a family of problems, obtained by
Control Problems in Mathematical Physics 639

varying the initial conditions. One defines the value Alternatively, one can define the maximized
function V by Hamiltonian
Vðt; 
yÞ ¼ inf JðyðÞ; uðÞÞ Hðy; pÞ ¼ maxhp; f ðy; uÞi
u
where the inf is taken over the set of trajectories and but H may fail to be smooth. Another difficulty lies
controls satisfying y(t) = 
y. Under suitable assumptions, in the fact that an initial condition is given for y and
V is the solution to a first-order Hamilton–Jacobian a final condition is given for .
PDE. The lack of regularity of the value function V has The proof of PMP relies on a special type of
long provided a major obstacle to a rigorous mathema- variations, called needle variations, of a reference
tical analysis, solved by the theory of viscosity solutions trajectory. Given a candidate optimal control u and
(Bardi and Capuzzo Dolcetta 1997). Another method corresponding trajectory y , a time  of approximate
consists in building an optimal synthesis, that is, a continuity for f (y (), u ()) and ! 2 U, a needle
collection of trajectory–control pairs. variation is a family of controls u" obtained
Pontryagin maximum principle Consider a general by replacing u with ! on the interval [  ", ].
autonomous control system: A needle variation gives rise to a variation v of the
trajectory satisfying the variational equation
y_ ¼ f ðy; uÞ ½10
vðtÞ
_ ¼ Dy f ðy ðtÞ; u ðtÞÞ  vðtÞ ½14
where y 2 Rn and u 2 U compact subset of Rm . We
assume to have regularity of f guaranteeing existence in classical sense only after time . Recently Piccoli
and uniqueness of trajectories for every u() 2 U. For and Sussmann (2000) introduced a setting in which
a fixed T > 0, an optimal control problem in Mayer needle and other variations happen to be
form is given by differentiable.
One may also consider some final (or initial)
min ðyðT; uÞÞ; yð0Þ ¼ 
y ½11
uðÞ2U constraint:
where is the final cost and y  the initial condition. ðT; yðTÞÞ 2 S ½15
More generally, one can consider also the Lagran-
R where S  R  Rn (and T not fixed). In this case, the
gian cost L(y, u)dt and reduce to this case by
final condition for p is more complicated as well as
adding a variable y0 (0) = 0 and y_ 0 = L.
the proof of PMP. It is interesting to note the many
The well-known PMP provides, under suitable
connections between PMP and classical mechanics
assumptions, a necessary condition for optimality in
framework well illustrated by Bloch (2003) and
terms of a lift of the candidate optimal trajectory to
Jurdjevic (1997).
the cotangent bundle. For problems as [11], PMP
can be stated as follows:
Value function and HJB equation In this section
Theorem 6 Let u () be a (bounded) admissible we consider the minimization problem
control whose corresponding trajectory y () = y(, u )
is optimal. Call p : [0, T] 7! Rn the solution of the inf ðT; yðT; uÞÞ ½16
u2U
adjoint linear equation
for the control system
_
pðtÞ ¼ pðtÞ  Dy f ðy ðtÞ; u ðtÞÞ
½12
pðTÞ ¼ r ðy ðTÞÞ y_ ¼ f ðt; y; uÞ; uðtÞ 2 U a.e. ½17

Then the maximality condition subject to the terminal constraints [15], where
S  Rnþ1 is a closed target set.
pðtÞ  f ðy ðtÞ; u ðtÞÞ ¼ max pðtÞ  f ðy ðtÞ; !Þ ½13
!2U Theorem 7 (PDE of dynamic programming).
holds for almost every time t 2 [0, T]. Assume that the value function V, for [15]–[17],
is C1 on some open set 
R  Rn , not intersecting
Notice that the conclusion of the theorem can be the target set S. Then V satisfies the Hamilton–
interpreted by saying that the pair (y, p) satisfies the Jacobi equation
system:  
Vs ðs; yÞ þ min Vy ðs; yÞ  f ðs; y; !Þ ¼ 0
@Hðy ; p; u Þ @Hðy ; p; u Þ !2U
½18
y_ ¼ ; p_ ¼ 
@p @y 8ðs; yÞ 2 
where H(y, p, u) = hp, f (y, u)i. This is a pseudo– Equation [18] is called the Hamilton–Jacobi–Bellman
Hamiltonian system, since H also depends on u . (HJB) equation, after Richard Bellman. In general,
640 Control Problems in Mathematical Physics

however, V fails to be differentiable: this is the case for


F–G
Example 1 along the lines   . To isolate V as the F+G
unique solution of the HJB equation, one has to resort
to the concept of viscosity solution. The dynamic
programming and HJB equation apparatus applies
also to stochastic problems for which the equation
u = +1 u = –1
happens to be parabolic, because of the Ito formula.

Optimal syntheses Roughly speaking, an optimal


synthesis is a collection of optimal trajectories, one
for each initial condition  y. Geometric techniques
provide a systematic method to construct syntheses: Figure 2 Optimal feedback for Example 4.

Step 1 Study the properties of optimal trajectories


via PMP and other necessary conditions. Control of PDEs
Step 2 Determine a (finite-dimensional) sufficient
The theory for control of models governed by PDEs
family for optimality, that is, a class of trajectories
is, as expected, much more ramified and much less
(satisfying PMP) containing all possible optimal ones.
complete. An exhaustive resume of the available
Step 3 Construct a synthesis selecting one trajec-
results is not possible in short space, thus we focus
tory for every initial condition in such a way as to
on Example 2 and few others to illustrate some
cover the state space in a regular fashion.
techniques to treat control problems and give
Step 4 Prove that the synthesis of Step 3 is indeed
various references (see also Fursikov and Imanuvilov
optimal.
(1996), Komornik (1994), and Lasiecka and Triggiani
One of the main problems in step 2 is the possible (2000), and references therein).
presence of optimal controls with an infinite number Besides the variety of control problems illustrated
of discontinuities, known as Fuller phenomenon. The in the Introduction, for PDE models one can consider
key concept of regular synthesis, of step 3, was different ways of applying the control, for example:
introduced by Boltianskii and recently refined by Boundary control One consider the system [3]
Piccoli and Sussmann (2000) to include Fuller phe- (with F independent of u) and impose the condition
nomena. The above strategy works only in some y(t, x) = u(t, x) to hold for every time t and every x in
special cases, for example for two-dimensional some region. Usually, we assume y(t) to be defined
minimum-time problems (Boscain and Piccoli 2004): bounded region  and the control acts on some set
we report below an example.   @. Obviously, also Neumann conditions are
natural as @ y = u where  is the exterior normal to .
Example 4 Consider the problem of orienting in
Internal control One consider the system [3]
minimum time a satellite with two orthogonal rotors:
with F depending on u. Thus, the control acts on the
the speed of one rotor is controlled, while the second
equation directly.
rotor has constant speed. This problem is modelled by
Other controls There are various other control
a left-invariant control system on SO(3):
problems one may consider as Galerkin-type
y_ ¼ yðF þ uGÞ; y 2 SOð3Þ; juj 1 approximation and control of some finite family of
modes. An interesting example is given by Coron
where F and G are two matrices of so(3), the Lie
(2002), where the position of a tank is controlled to
algebra of SO(3). Using the isomorphism of Lie
regulate the water level inside.
algebras (SO(3), [. , .]) (R3 , ), the condition that
the rotors are orthogonal reads: trace(F  G) = 0.
If we are interested to orient only a fixed semi-axis Control of a Vibrating String
then we project the system on the sphere S2 :
We consider Example 2, but various results hold for
y_ ¼ yðF þ uGÞ; y 2 S2 ; juj 1 hyperbolic linear systems in general. First consider
the uncontrolled system
In this case, F þ G and F  G are rotations around
two fixed axes and, if the angle between these two ztt ¼ z; zð0; tÞ ¼ zð1; tÞ ¼ 0 ½19
axes is less than =2, every optimal trajectory is a
A first integral is the energy given by
finite concatenation of arcs corresponding to con-
Z
stant control þ1 or 1. The ‘‘optimal synthesis’’ can 1 h 2 i
EðtÞ ¼ jzx j þ jzt j2 dx
be obtained by the feedback shown in Figure 2. 2
Control Problems in Mathematical Physics 641

Then we say that the system [19] is observable at method of Coron, which consists in finding a
time T if there exists C(T) such that trajectory y such that the following hold:
Z T 1. y(0) = y(T) = 0;
Eð0Þ CðTÞ jzx ð1; tÞj2 dt 2. the linearized system around y is controllable.
0
Then by implicit-function theorem, local controll-
which means that if we observe zero displacement
ability is granted, that is, there exits " > 0 such that
on the right end for time T then the solution has
for every data y0 , y1 of norm less than ", there exists
zero energy and hence vanishes. In this case, the
a control steering the system from y0 to y1 in time T.
system is observable for every time T 2: this is
This method does not give many advantages in the
precisely the time taken by a wave to travel from the
finite-dimensional case, but permits to obtain excel-
right end point to the left one and backward.
lent results for PDE systems such as Euler, Navier–
Thanks to a duality as for the finite-dimensional
Stokes, Saint–Venant, and others (Coron 2002).
case, observability of [19] is equivalent to null
controllability for [5]–[7], that is, to the property
Control of Schrödinger Equation
that for every initial conditions y0 , y1 there exists a
control u() such that the corresponding solution Consider the issue of designing an efficient transfer of
verifies y(x, T) = yt (x, T) = 0. More precisely, the population between different atomic or molecular
desired control is given by u(t) = ~zx (1, t), where ~z is levels using laser pulses. The mathematical descrip-
the solution of [19] minimizing the functional (over tion consists in controlling the Schrödinger equation.
L2  H 1 ) Many results are available in the finite-dimensional
case. Finite-dimensional closed quantum systems are
Jðzð;0Þ; zt ð; 0ÞÞ in fact left-invariant control systems on SU(n), or on
Z Z Z
1 T 2 the corresponding Hilbert sphere S2n1  Cn , where
¼ jzx ð1; tÞj dt þ y0 zt ð; 0Þdx  y1 zð; 0Þdx n is the number of atomic or molecular levels, and
2 0
powerful techniques of geometric control are avail-
One can check that this functional is continuous and able both for what concerns controllability and
convex, and the coercivity is granted by the optimal control (Agrachev and Sachkov 2004,
observability of [19]; thus, a minimum exists by Boscain and Piccoli 2004, Jurdjevic 1997).
the direct method of Tonelli. This is an example of Recent papers consider the minimum-time pro-
the method known as Hilbert’s uniqueness method blem with unbounded controls as well as minimiza-
introduced by Lions (1988). tion of the energy of transition. Boscain et al. (2002)
In the multidimensional case, controllability can have applied the techniques of sub-Riemannian geo-
be characterized by imposing a condition on the metry on Lie groups and of optimal synthesis on two-
region   @ on which the control acts. More dimensional manifolds to the population transfer
precisely, rays of geometric optics in  should problem in a three-level quantum system driven by
intersect  (Zuazua 2005). two external fields of arbitrary shape and frequency.
If we consider infinite-time horizon T = þ1 and Although many results are available for finite-
introduce the functional dimensional systems, only few controllability prop-
Z þ1 Z erties have been proved for the Schrödinger equation
J¼ kyk2 dt þ N u2 dt dx as a PDE, and in particular no satisfactory global
0
controllability results are available at the moment.
then the optimal control is determined as follows.
If (y, p) is a solution of the optimality system:
[5]–[6] with y = 0 outside  and Further Reading
ptt  p þ y ¼ 0; @ p þ Ny ¼ 0 on  Agrachev A and Sachkov Y (2004) Control from a Geometric
Perspective. Springer.
p ¼ 0 on @ Bardi M and Capuzzo Dolcetta I (1997) Optimal Control and
Viscosity Solutions of Hamilton–Jacobi–Bellman Equations.
then u = y on  (Lions 1988, Zuazua 2005). Boston: Birkhauser.
Bloch AM (2003) Nonholonomic Mechanics and Control, with
the collaboration of J. Baillieul, P. Crouch and J. Marsden,
Controllability via Return Method of Coron with scientific input from P. S. Krishnaprasad, R. M. Murray
and D. Zenkov. New York: Springer.
As we saw in Theorem 4, a nonlinear system may be Boscain U and Piccoli B (2004) Optimal Synthesis for Control
controllable even if its linearization is not. In this Systems on 2-D Manifolds. Springer SMAI, vol. 43. Heidelberg:
case, controllability can be proved by the return Springer.
642 Convex Analysis and Duality Methods

Boscain U, Chambrion T, and Gauthier J-P (2002) On the K þ P Komornik V (1994) Exact Controllability and Stabilization. The
problem for a three-level quantum system: optimality implies Multiplier Method. Chichester: Wiley.
resonance. Journal of Dynamical and Control Systems Lasiecka I and Triggiani R (2000) Control theory for Partial
8: 547–572. Differential Equations: Continuous and Approximation The-
Bullo F and Lewis AD (2005) Geometric Control of Mechanical ories. Cambridge: Cambridge University Press.
Systems. New York: Springer. Lions JL (1988) Exact controllability, stabilization and perturba-
Coron JM (2002) Return method: some application to flow tions for distributed systems. SIAM Review 30: 1–68.
control. Mathematical Control Theory, Part 1, 2 (Trieste, Piccoli B and Sussmann HJ (2000) Regular synthesis and
2001). In: Agrachev A (ed.) ICTP Lecture Notes, vol. VIII. sufficiency conditions for optimality. SIAM Journal of Control
Trieste: Abdus Salam Int. Cent. Theoret. Phys. Optimization 39: 359–410.
Fursikov AV and Imanuvilov O Yu (1996) Controllability of Sontag ED (1998) Mathematical Control Theory. New York:
Evolution Equations. Lecture Notes Series, vol. 34. Seoul: Springer.
Seoul National University. Zuazua E (2005) Propagation, observation and conrol of wave
Jurdjevic V (1997) Geometric Control Theory. Cambridge: approximatex by finite difference methods. SIAM Review
Cambridge University Press. 47: 197–243.

Convex Analysis and Duality Methods


G Bouchitté, Université de Toulon et du Var, is recovered by saying that A is convex, where its
La Garde, France indicator function A is defined by setting
ª 2006 Elsevier Ltd. All rights reserved.

0 if x 2 A
A ðxÞ ¼
þ1 otherwise
Introduction
Continuity and Lower-Semicontinuity
Convexity is an important notion in nonlinear
optimization theory as well as in infinite- A first consequence of the convexity is the continuity
dimensional functional analysis. As will be seen on the topological interior of the domain. We refer for
below, very simple and powerful tools will be instance to Borwein and Lewis (2000) for a proof of
derived from elementary duality arguments (which
Theorem 1 Let f : X ! R [ {þ1} be convex and
are by-products of the Moreau–Fenchel transform
proper. Assume that supU f < þ1, where U is a
and Hahn–Banach theorem). We will emphasize on
suitable open subset of X. Then f is continuous and
applications to a large range of variational pro- locally Lipschitzian on all int(dom f ).
blems. Some arguments of measure theory will be
skipped. As an immediate corollary, a convex function on
a normed space is continuous provided it is
majorized by a locally bounded function. In the
Basic Convex Analysis finite-dimensional case, it is easily deduced that a
In the following, we denote by X a normed vector finite-valued convex function f : Rd ! R is locally
space, and by X the topological dual of X. If Lipschitz. Furthermore, by Aleksandrov’s theorem,
a topology different from the normed topology is f is almost everywhere twice differentiable and the
used on X, we will denote it by . For every x 2 X non-negative Hessian matrix r2 f coincides with the
and A  X, V x denotes the open neighborhoods of x absolutely continuous part of the distributional
and int A, cl A, respectively, the interior and the Hessian matrix D2 f (it is a Radon measure taking
closure of A. We deal with extended real-valued values in the non-negative symmetric matrices).
functions f : X ! R [ {þ1}. We denote by dom f = However, in infinite-dimensional spaces, for
f 1 (R) and by epi f = {(x, ) 2 X  R: f (x) } ensuring compactness properties (as, e.g., in condi-
the domain and the epigraph of f, respectively. We tion (ii) of Theorem 4 below), we need to use weak
say that f is proper if dom f 6¼ ;. Recall that f is topologies and the situation is not so simple.
convex if for every (x, y) 2 X2 and t 2 [0, 1], there A major idea consists in substituting the continuity
holds property with lower-semicontinuity.
f ðtx þ ð1  tÞyÞ tf ðxÞ þ ð1  tÞf ðyÞ Definition 2 A function f : X ! R [ {þ1} is -l.s.c.
at x0 2 X if for all  2 R, there exists U 2 V x0
ðby convention 1 þ a ¼ þ1Þ
such that f >  on U. In particular, f will be l.s.c. on
The notion of convexity for a subset A  X all X provided f 1 ((r, þ1)) is open for every r 2 R.
Convex Analysis and Duality Methods 643

Remark 3 Definition 7 Let f : X ! R [ {þ1}. The Moreau–


Fenchel conjugate f  : X ! R [ {þ1} of f is defined
(i) The following sequential notion can be also
by setting, for every x 2 X :
used: f is -sequentially l.s.c. at x0 if
 f  ðx Þ ¼ supfhxjx i  f ðxÞjx 2 Xg
8ðxn Þ  X xn ! x0 ¼) lim inf f ðxn Þ  f ðx0 Þ
n!þ1
In a symmetric way, if f  is proper on X , we define
It turns out that this notion (weaker in general) the biconjugate f  : X ! R [ {þ1} by setting
is equivalent to the previous one provided x0
admits a countable basis of neighborhoods. f  ðxÞ ¼ supfhxjx i  f  ðx Þjx 2 X g
(ii) A well-known consequence of Hahn–Banach As a consequence, the so-called Fenchel inequality
theorem is that, for convex functions, the lower- holds:
semicontinuity property with respect to the
normed topology of X is equivalent to the weak hxjx i  f ðxÞ þ f  ðx Þ; ðx; x Þ 2 X  X
(or weak sequential) lower-semicontinuity.
Notice that f does not need to be convex. However,
Theorem 4 (Existence). Let f : X ! R [ {þ1} be if f is convex, then f  agrees with the Legendre–
proper, such that Fenchel transform.
(i) f is -l.s.c., Definition 8 Let f : X ! R [ {þ1}. The sub-
(ii) 8r 2 R, f 1 ((1, r]) is -relatively compact. differential of f at x is the possibly void subset of
Then there is x  2 X such that f ( x) = inf f and @f (x)  X defined by
argmin f := {x 2 Xjf (x) = inf f } is -compact. @f ðxÞ :¼ fx 2 X: f ðxÞ þ f  ðx Þ ¼ hx; x ig
In practice, the choice of the topology  is ruled
It is easy to check that @f (x) is convex and weak-
by the condition (ii) above. For example, if X is a
star closed. Moreover, if f is convex and has a
reflexive infinite-dimensional Banach space and if f
differential (or Gateaux derivative) f 0 (x) at x, then
is coercive (i.e., limkxk ! 1 f (x) = þ1), we may take
@f (x) = {f 0 (x)}. After summarizing some elementary
for  the weak topology (but never the normed
properties of the Fenchel transform, we give
topology). This restriction implies in practice that
examples in Rd or in infinite-dimensional spaces.
the first condition in Theorem 4 may fail. In this
case, it is often useful to substitute f with its lower- Lemma 9
semicontinuous (l.s.c.) envelope.
(i) f  is convex, l.s.c. with respect to the weak star
Definition 5 Given a topology , the relaxed function topology of X .
f (=f  ) is defined as (ii) f  (0) = inf f and f  g ) f   g .
f ðxÞ ¼ supfgðxÞjg : X ! R [ fþ1g; (iii) (inf i fi ) = supi fi , for every family {fi }.
(iv) f  (x) = sup{g(x): g affine continuous on X and
g is -l:s:c:; g  f g g  f } (by convention, the supremum is identi-
cally 1 if no such g exists).
It is easy to check that f is -l.s.c. at x0 if and only
if f (x0 ) = f (x0 ). Futhermore, Proof (i) This assertion is a direct consequence of the
f ðxÞ ¼ sup inf f ; fact that f  can be written as the supremum
epi f ¼ clðXRÞ ðepi f Þ of functions gx , where gx := hx j i  f (x). Clearly,
U2V u U
these functions are affine and weakly star-continuous
We can now state the relaxed version of Theorem 1.4. on X . The assertions (ii), (iii) are trivial. To obtain (iv),
Theorem 6 (Relaxation). Let f : X ! R [ {þ1}, it is enough to observe that an affine function g of
then: inf f = inf f . Assume further that, for all the form g(x) = hx, x i   satisfies g  f iff
real r, f 1 ((1, r]) is T -relatively compact; then f f  (x )  . &
attains its minimum and argmin f = argmin f \ Example 1 Let f : X ! R, be defined by
{x2 Xjf (x) = f (x)}.
1 p
f ðxÞ ¼ kxkX ; 1 < p < þ1
p
Moreau–Fenchel Conjugate
then,
The duality between X and X will be denoted by the
symbol h j i. If X is a Euclidian space, we identify X 1  p0 1 1
f  ðx Þ ¼ kx kX ; with þ ¼1
with X via the scalar product denoted ( j ). p0 p p0
644 Convex Analysis and Duality Methods

whereas, for p = 1, we find f  = B , where Rd ! [0, þ1] a T BRd -measurable integrand.
B = {kx k  1}. Then the partial conjugate ’ (x, z ) := sup{hz j z i 
2 ’(x, z): z 2 Rd } is a convex measurable integrand.
Example 2 Let A 2 R dsym be a symmetric positive-
Let us define
definite matrix and let f (x) := (1=2)(Ax j x)(x 2 R d ).
Then, for all y 2 Rd , we have f  (y) = (1=2)(A1 y j y). Z
Notice that if A has a negative eigenvalue, then I’ : u 2 ðLp Þd ! ’ðx; uðxÞÞd 2 R [ fþ1g

f  þ1.
and assume that I’ is proper. Then there holds
Particular examples on Rd are also very popular.
(I’ ) = I’ , where
For instance:
Z
0
Minimal surfaces ðI’ Þ : v 2 ðLp Þd ! ’ ðx; vðxÞÞd
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 

f ðxÞ ¼ 1 þ jxj2
( qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 Duality Arguments
f  ðyÞ ¼  1  jyj if jyj  1
þ1 otherwise Two Key Results

Entropy The first result related to the biconjugate f  is


 a consequence of the Hahn–Banach theorem.
x log x if x 2 Rþ
f ðxÞ ¼ ; f  ðyÞ ¼ expðy  1Þ Recalling the assertion (v) of Lemma 9, we notice
þ1 otherwise that the existence of an affine minorant for f is
equivalent to the properness of f  (i.e.,
Example 3 Let C  X be convex, and let f = C . 9x0 2 X : f  (x0 ) < þ1).
Then,
Theorem 10 Let f : X ! R [ {þ1} be convex and
f  ðx Þ ¼ C ðc Þ ¼ suphxjx i proper. Then
x2C
(i) f is l.s.c. at x0 if and only if f  is proper
ðsupport function of CÞ
and f  (x0 ) = f (x0 ). In particular, the lower-
Notice that if M is a subspace of X, then semicontinuity of f on all X is equivalent to the
(M ) = M? . We specify now a particular case of identity f f  .
interest. (ii) If f  is proper, then f  = f .
Let  be a bounded open subset of Rn . Take
 R d ) to be the Banach space of continu- Proof We notice that by Lemma 9, f   f and f 
X = C0 (;
 with values in Rd . is l.s.c (even for the weak topology). Therefore,
ous functions on the compact )
f   f and, moreover, f is l.s.c. at x0 if f  (x0 ) 
As usual, we identify the dual X with the space
 R d ) of R d -valued Borel measures on   with f (x0 ). Conversely, if f is l.s.c. at x0 , for every 0 <
Mb (;
f (x0 ), there exists a neighborhood V of x0 such
finite total variation. Let K be a closed convex of
that V  (1, 0 ) \ epi f = ;. It follows that
Rd such that 0 2 K. Then 0K () := sup {( j z): z 2 K}
epi f is a proper closed convex subset of X  R
is a non-negative convex l.s.c. and positively
which does not intersect the compact singleton
1-homogeneous function on R d (e.g., K is the
{(x0 , 0 )}. By applying the Hahn–Banach strict
Euclidean norm if K is the unit ball of Rd ). Let us
separation theorem, there exists (x0 , 0 ) 2 X  R
define C := {’ 2 X: ’(x) 2 K, 8x 2 }. Then, we
such that
have
Z hx0 ; x0 i þ 0 0 < hx; x0 i þ 0
ðC Þ ð Þ ¼ 0K ð Þ
for all ðx; Þ 2 epi f
Z  
d
:¼ 0K
ðdxÞ ½1
Taking  ! 1 and x 2 dom f , we find 0  0. In
 d

fact, 0 > 0 as the strict inequality above would be


where
is any non-negative Radon measure such violated for x = x0 . Eventually, we obtain that f is
that
(the choice of
is indifferent). In the case minorized by the affine continuous function
where K is the unit ball, we recover the total g(x) = hx  x0 , x0 =i þ 0 . Thus, we conclude
variation of . that f  is proper and that f  (x0 )  0 .
Example 4 (Integral functionals). Given 1  p < The assertion (ii) is a direct consequence of the
þ1, (, , T ) a measured space and ’ :   equivalence in (i). &
Convex Analysis and Duality Methods 645

Theorem 11 Let X be a normed space and let convex l.s.c. function and let F 7! X be the convex
f : X ! [0, þ1] be a convex and proper function; functional defined by
assume that f is continuous at 0, then 
ðAuÞ if u 2 DðAÞ
(i) f  achieves its minimum on X FðuÞ ¼
þ1 otherwise
(ii) f (0) = f  (0) = inf f 
Proof Assume that there exists u0 2 D(A) such that  is
continuous at Au0 . Then
(i) Let M be an upper bound of f on the ball {kxk
R}. Then (i) The Fenchel conjugate of F is given by

f  ðx Þ  supfhx; x i  f ðxÞ: kxk  Rg 8f 2 X ; F ðf Þ ¼ inff ðÞ:  2 Y  ; A  ¼ f g


 Rkx kX  M
where, if both sides of the equality are finite, the
Hence, for every r, the set {x 2 X : f  (x )  r} infimum on the right-hand side is achieved.
is bounded, thus -relatively compact, where  is (ii) If, in addition, Y is reflexive and  is l.s.c.
the weak-star topology on X . By assertion (i) of coercive, we have
Lemma 9, f  is -l.s.c. and Theorem 4 applies.
(ii) By Theorem 10, since f is convex proper and 
FðuÞ ¼ F ðuÞ ¼ inffðpÞj ðu; pÞ 2 GðAÞg ½3

l.s.c. at x0 = 0, we have f (0) = f  (0) = inf f  .


& where G(A) denotes the graph of A.

Some Useful Consequences Proof

Proposition 12 (Conjugate of a sum). Let f , g : X ! (i) Define H, K : X  Y ! R [ {þ1} by


R [ {þ1} be convex such that
Hðu; pÞ ¼ GðAÞ ðu; pÞ; Kðu; pÞ ¼ ðpÞ
9x0 2 X : f is continuous at x0 and gðx0 Þ < þ1 ½2

Then we have the identity F (f ) = (H þ K) (f , 0),


Then where the conjugate of H þ K is taken with
(i) respect to the duality (X  Y, X  Y  ). From the
ðf þ gÞ ðx Þ = inf {f  ðx1 Þ þ g ðx2 Þ} assumption, K is continuous at (u0 , Au0 ) 2
x1 þx2 = x
dom H. By Proposition 12, we obtain

(the equality holds in R).
(ii) If both sides of the equality in (i) are finite, then ðH þ KÞ ðf ; 0Þ
the infimum in the right-hand side is achieved. ¼ inf  fK ðf  g; Þ þ H  ðg; Þg
ðg;Þ2X Y
Proof Without any loss of generality, we may
assume that x = 0 (we reduce to this case by After a simple computation, it is easy to check
substituting g with g  h , x i). We let that
hðpÞ ¼ infff ðx þ pÞ þ gðxÞjx 2 Xg 
H ðg; Þ ¼ 0
 if A  ¼ f
Noticing that (p, x) 7! f (x þ p) þ g(x) is convex, we þ1 otherwise

infer that h(p) is convex as well. As h is majorized 
by the function p 7! f (x0 þ p) þ g(x0 ), which by [2] K ðf  g; Þ ¼  ðÞ if g ¼ f
þ1 otherwise
continuous at 0, we deduce from Theorems 1 and 11
that h(0) = h (0) and that h achieves its infimum. (ii) Let J(u) := inf{(p): (u, p) 2 G(A)}. As observed
Now h(0) = inf(f þ g) = (f þ g) (0) and for F in the proof of (i), we have the identity
J (f ) = (H þ K) (f , 0). Therefore, in view of
h ðp Þ ¼ supfhp; p i  hðpÞ: p 2 Xg  = F = J and it is enough to
Theorem 10, F
¼ supfhp; p i  f ðx þ pÞ  gðxÞ: x 2 X; p 2 Xg prove that J is convex l.s.c. proper. Let us
¼ g ðp Þ þ f  ðp Þ consider a sequence (un ) in X converging to
some u 2 X. Without any loss of generality, we
The assertions (i), (ii) follow since h (0) =
may assume that lim inf J(un ) = lim J(un ) < þ1.
min h = min {g (p ) þ f  (p )}. &
Then there is a sequence (pn ) such that, for every
Proposition 13 (Composition). Let X, Y be two 
n, (un , pn ) 2 G(A) and J(un )  (un )  1=n. As
Banach spaces and A : X 7! Y a linear operator with is coercive, {pn } is bounded in the reflexive
dense domain D(A). Let  : Y ! R [ {þ1} be a space Y and possibly passing to a subsequence,
646 Convex Analysis and Duality Methods

we may assume that pn converges weakly to is a possibly concentrated Radon measure sup-
some p. Since G(A) is a (weakly) closed subspace ported on . In general, the operator A : u 2
of X  Y, we infer that (u, p) as the limit of C1 ()  L2 () 7! ru 2 L2 (; Rn ) is not closable
(un , pn ) still belongs to G(A). Thus, we conclude, and we need to come back to the general formula
thanks to the (weak) lower-semicontinuity of  [3]. The general structure of G(A) has been given in
Bouchitté et al. (1997) and Bouchitté and Fragalà
lim inf Jðun Þ ¼ lim ðpn Þ  ðpÞ  JðuÞ (2002, 2003), namely
n n
&

An immediate consequence of Propositions 12 and ðu; Þ 2 GðAÞ ()u 2 W 1;2 ; 9 2 L2 ð; Rn Þ:


13 is the following variant:  ¼ r u þ ; ðxÞ 2 T ðxÞ?
Proposition 14 Under the same notation as in
Proposition 13, let  : X ! R [ {þ1} be a convex where T (x), r (x) are suitable notions of tangent
function and assume that there exists u0 2 D(A) space and tangential gradient with respect to , and
such that F(u0 ) < þ1 and  is continuous at Au0 . W 1, 2 denotes the domain of the extended tangential
Then we have gradient operator.
Remark 16 The assertion (ii) of Proposition 13
inf f ðuÞ þ ðAuÞg ¼ sup f  ðA Þ   ðÞg is not valid in the nonreflexive case. In
u2X 2Y 
particular, for
where the supremum on the right-hand side is Z
achieved. Furthermore, a pair ( u, ) is optimal if FðuÞ ¼ f ðx; ruÞdx
and only if it satisfies the relations:  2 @(A u) and 
A  2 @ (
u).
where f (x ,  ) has a linear growth at infinity,
Remark 15 From the assertion (ii) of Proposition we need to take Y as the space of Rn -values
13, we may conclude that F is l.s.c. whenever the vector measures on  and the relaxed functional
operator A is closed. If now A is merely closable F needs to be indentified on the space BV()
 we obtain
(with closure denoted by A), of integrable functions with bounded variations.
 The computation of F is a delicate problem for
 

FðuÞ ¼ GðAuÞ if u 2 dom A which we refer to Bouchitté and Dal Maso (1993)
þ1 otherwise and Bouchitté and Valadier (1998).
This is the typical situation when F is an integral Remark 17 By duality techniques, it is possible
functional defined on smooth functions of the kind also to handle variational integrals of the kind
Z Z
FðuÞ ¼ f ðx; ruÞ dx FðuÞ ¼ f ðx; uðxÞ; ruðxÞÞdx
 

where  is an bounded open subset of R n , f :   even if the dependence of f (x, u, z) with respect to u
Rn ! R is a convex integrand with quadratic growth is nonconvex. The idea consists in embedding the
(i.e., cjzj2  f (x, z)  C(1 þ jzj2 for suitables C  space BV() in the larger space BV(  R) through
c > 0). Then X = L2 (), Y = L2 (; Rn ), the map u 7! 1u , where 1u is the characteristic
Z function defined on   R by setting
GðvÞ ¼ f ðx; vðxÞÞ dx 

1u ðx; tÞ :¼ 1 if uðxÞ > t
0 otherwise
and A : u 2 C1 () 7! ru 2 L2 (; Rn ). It turns out
that A is closable and that the domain of A  Then it is possible to show, under suitable
characterizes the Sobolev space W 1, 2 () on which conditions on the integrand f, that there exists
 coincides with the distributional gradient
A a convex l.s.c., 1-homogeneous functional
operator. G : BV(  R) ! R [ {þ1} such that F(u) = G(1u ).
The situation is more involved if we consider This functional G is constructed as in the Example
3 taking C to be a suitable convex subset of
Z C0 (  R). This nice new idea has been the key
FðuÞ ¼ f ðx; ruÞ d tool of the calibration method developed recently

(Alberti et al. 2003).
Convex Analysis and Duality Methods 647

Convex Variational Problems in Duality sup h . Recalling [4], we therefore consider the dual
problem:
Finite-Dimensional Case
 
ðP  Þ sup b  y : y  0; AT þ c  0
We sketch the duality scheme in two cases.

Linear programming Let c 2 R n , b 2 Rm and A an Theorem 19 The following assertions are equivalent:
m  n matrix. We denote by AT the transpose
(i) (P) has a solution.
matrix. We consider the linear program
(ii) (P  ) has a solution.
ðPÞ inffðcjxÞ: x  0; Ax  bg (iii) There exists (x0 , y0 ) 2 Rnþ  Rm
þ such that
Ax0  b, AT y0 þ c  0.
and its perturbed version (p 2 Rm )
In this case, we have min (P) = max (P  ) and
hðpÞ :¼ inffðcjxÞ: x  0; Ax þ p  bg an admissible pair ( x, y) is optimal if and
only if c  x = b  y or, equivalently, satisfies
 
An easy computation gives the complementarity relations: (A x  b)  
y=
8y 2 Rm ;  (AT y þ c)  x
 = 0.
if AT y þ c  0; y  0 ½4

h ðyÞ ¼ ðbjyÞ
þ1 otherwise
Convex programming Let f , g1 , . . . , gm : X ! R be
Lemma 18 Assume that inf (P) is finite. Then: convex l.s.c. functions and the optimization problem
(i) h is convex proper and l.s.c. at 0. ðPÞ infff ðxÞ: gj ðxÞ  0; j ¼ 1; 2 . . . ; mg
(ii) (P) has at least one solution.
Here X = Rn or any Banach space. As before, we
Proof We introduce the (n þ m)  (m þ 1) matrix introduce the value function
B defined by
  p 2 Rm ; hðpÞ :¼ infff ðxÞ:
cT 0
B :¼ gj ðxÞ þ pj  0 j 2 1; 2; . . . ; mg
A Im
and compute its Fenchel conjugate:
(Im is the m-dimensional identity matrix). Denote
{b1 , b2 , . . . , bnþm }  Rmþ1 P
the columns of B and K 
inf
the convex cone K := { jj = nþm
j bj : j  0}. By 2 Rm ; h ð Þ ¼ x2X fLðx; Þg if  0
=1 þ1 otherwise
Farkas lemma, this cone K is closed. P
where L(x, ) := f (x) þ i gi (x) is the so-called
(i) Let  := lim inf {h(p): p ! 0}. We have to prove
Lagrangian. We notice that h is convex and that
that   h(0) = inf P. Let {p" } be a sequence in
the equality h(0) = h (0) is equivalent to the zero-
Rm such that p" ! 0 and h(p" ) ! . By the
duality gap relation
definition of h, we may choose x"  0 such that
Ax"  b and (c j x" ) ! . Then we see that the inf sup Lðx; Þ ¼ sup inf Lðx; Þ
column vector x~" associated with (x" , b  Ax" ) 2 x x

Rnþm satisfies: Bx~" 2 K and


  This condition is fulfilled, in particular, if we make
 the following qualification assumption (ensuring
Bx~" !
b that h is continuous at 0 and Theorem 11 applies):
Therefore, 9x0 2 X : f continuous at x0 ; gj ðx0 Þ < 0; 8j ½5

 
 Theorem 20 Assume that [5] holds. Then x  is
2K
b optimal for (P) if and only if there exist Lagrangian
~ = (x, x0 ) such that x  0, x0  0,
and there exists x multipliers 1 , 2 , . . . m in Rþ such that
(c j x) =  and Ax þ x0 = b. It follows that x is !
X
admissible for (P) and then (c j x) =   h(0).  2 argmin f þ
x j gj ; j gj ð
xÞ ¼ 0; 8j
(ii) We repeat the proof of (i) choosing p" = 0 so X j
that  = inf (P). &
Notice that the existence of such a solution x 
n
Thanks to the assertion (i) in Lemma 18, we deduce is ensured if, for example,
P X = R and if, for some
from Theorem 10 that inf (P) = h(0) = h (0) = k > 0, the function f þ k j gj is coercive.
648 Convex Analysis and Duality Methods

Primal–Dual Formulations in Mechanics Futhermore, a pair ( u, ) is optimal if and only if it


satisfies the following system:
We present here the example of elasticity which
motivated the pioneering work by J J Moreau on div  ¼ f on  ðequlibriumÞ
convex duality techniques. Further examples can be ðxÞ 2 @jðx; eð
uÞÞ a:e: on  ðconstitutive lawÞ
found in Ekeland and Temam (1976). An elastic body is
placed in a bounded domain   Rn whose boundary u
¼0 a:e: on 0
 consists of two disjoint parts  = 0 [ 1 . The n ¼ g on 1
unknown u :  ! Rn (deformation) satisfies a Dirichlet
condition u = 0 on 0 , where the body is clamped. The
system is subjected to a surface load g 2 L2 (1 ; Rn ) and Duality in Mass Transport Problems
to a volumic load f 2 L2 (; Rn ). The static equilibrium
problem has the following variational formulation: General Cost Functions
Z Z Let X, Y be a compact metric space and c : X 
ðPÞ inf jðx; eðuÞÞ dx  f  u dx Y ! [0, þ1) a continuous cost function. We denote
u¼0 on 0 
Z  by P(X), P(X  Y) the sets of probability measures
 g  u dHn1 on X and X  Y, respectively. Given two elements
1 2 P(X),  2 P(Y), we denote by ( , ) the subset
of probability measures in P(X  Y) whose margin-
where e(u) := (1=2)(ui, j þ uj, i ) denotes the symmetric
2 als are, respectively, and . Identified as a subset
strain tensor and j : (x, z) 2   Rnsym ! Rþ is a
of (C0 (X  Y)) (the space of signed Radon mea-
convex integrand representing the local elastic
sures on X  Y), it is convex and weakly-star
behavior of the material. We assume a quadratic
compact. The Monge–Kantorovich formulation of
growth as in Remark 15 (in the case of linear
the mass transport problem reads as follows:
elasticity, an isotropic homogeneous material is Z 
characterized by the quadratic form
Tc ð ;Þ:¼ inf cðx;yÞðdxdyÞ:  2 ð ;Þ ½6

XY
jðx; zÞ ¼ jtrðzÞj2 þ jzj2
2 This formulation, where the infimum is achieved (as
, being the Lamé constants). we minimize an l.s.c. functional on a compact set for
We apply Proposition 14 with X = W 1, 2 (; Rn ), the weak star topology), is already a relaxation of
2
Y = L2 (; Rnsym ), Au = e(u) and where we set the initial Monge mass transport problem,
8 R Z 
< Rf  u dx
> #
inf cðx; TxÞ ðdxÞ: T ð Þ ¼ 
T X
ðuÞ ¼  1 g  u dHn1 if u ¼ 0 on 0
>
: where the infimum is searched among all transports
þ1 otherwise
Z maps T : X 7! Y pushing forward on  (i.e., such
ðvÞ ¼ jðx; vÞ dx that (T 1 (B) = (B) for all Borel subset B  Y).
 This is equivalent to restricting the infimum in [6] to
After some computations, we may write the supre- the subclass {T }  ( , ), where
mum appearing in Proposition 14 as our dual Z
problem hT ; ðx; yÞi :¼ ðx; TxÞ ðdxÞ
 Z X
2
ðP  Þ sup  j ðx; Þ dx:  2 L2 ð; Rnsym Þ; In order to find a dual problem for [6], we fix

  2 P(Y) and consider the functional F : Mb (X) !
[0, þ1) defined by
div  ¼ f on ;   n ¼ g on 1

Tc ð ; Þ if  0; ðXÞ ¼ 1
Fð Þ ¼
where j is the Moreau–Fenchel conjugate with þ1 otherwise
respect to the second argument and n(x) denotes
the exterior unit normal on . The matrix-valued (Mb (X) denote the Banach space of (bounded)
map  is called the stress tensor and j the stress signed Radon measures on X).
potential. Note that the boundary conditions for n Lemma 22 F is convex, weakly-star l.s.c. and
have to be understood in the sense of traces. proper. Its Moreau–Fenchel conjugate is given by
Z
Theorem 21 The problems (P) and (P  ) have
solutions and we have the equality: inf(P) = sup (P  ). 8’ 2 C ðXÞ; F ð’Þ ¼  ’c ðyÞðdyÞ
0 
Y
Convex Analysis and Duality Methods 649

where Let us introduce the dual problem of [6]:


c Z Z 
’ ðyÞ :¼ inffcðx; yÞ  ’ðxÞ: x 2 Xg
sup ’ d þ d : ð’; Þ 2 F c ½7

X Y
Proof The convexity property is obvious and the
properness follows from the fact that We will say that (’, ) 2 F c is a pair of c-concave
Z conjugate functions if = ’c and c = ’ (where
c
Fð Þ  cðx; yÞ ðdxdyÞ symmetrically (x) := inf {c(x, y)  (x): y 2 Y}).
XY Checking the latter condition amounts to verifying
Let n be such that n * (weakly star). We may that ’ enjoys the so-called c-concavity property
assume that lim inf n F( n ) = limn F( n ) :=  is finite. ’cc = ’ (in general, we have only ’cc  ’, whereas
Then n and the associated optimal n are prob- ’ccc = ’c ). We refer for instance to Villani (2003) for
ability measures on X and on X  Y, respectively. further details about this c-duality.
As X and Y are compact, possibly passing to a Now, by exploiting Theorem 10 and Lemma 22,
subsequence, we may assume that n * , and we obtain a very simple proof of Kantorovich
clearly we have  2 ( , ). Since c(x, y) is l.s.c. duality theorem:
non-negative, we conclude that Theorem 23 The following duality formula holds:
Z
Z Z 
lim inf Fð n Þ ¼ lim inf cðx; yÞn ðdxdyÞ
n n XY Tc ð ; Þ ¼ sup ’ d þ d : ð’; Þ 2 F c
Z X Y
 cðx; yÞ ðdxdyÞ Moreover, the supremum in the right-hand side
XY
member is achieved by a pair (’,  ) of conjugate
¼ Fð Þ
c-concave functions such that, for any optimal  in
Let us compute now F (’). We have  þ (y) = c(x, y), -a.e.
[6], there holds ’(x)
Z
 Proof By Theorem 10 and Lemma 22, we have
F ð’Þ ¼ inf cðx; yÞðdxdyÞ
XY
Z  Tc ð ; Þ ¼ F ð Þ
Z Z 
 ’ d : 2 PðXÞ;  2 ð ; Þ
¼ sup ’ d þ ’c d: ’ 2 C0 ðXÞ
Z X
Z X Z
Y

¼ inf ðcðx; yÞ  ’ðxÞÞðdxdyÞ:
XY  sup ’ d þ d: ð’; Þ 2 F c
 X Y
 2 ð ; Þ  Tc ð ; Þ
Z
where the last inequality follows from the definition
 ’c ðyÞ ðdyÞ of F c . Therefore, inf [6] = sup [7]. Furthermore, on
Y
the right-hand side of first equality, we increase the
To prove that the last inequality is actually an supremum by substituting ’ with ’cc (recall that
equality, we observe that, for every y 2 Y and ’ 2 ’ccc = ’c ). Thus,
C0 (X), the minimum of the l.s.c. function c(  , y)  ’ Z Z
is attained on the compact set X and there exists a sup½7
¼ sup ’ d þ ’c d: ’ 2 C0 ðXÞ;
Borel selection map S(y) such that ’c (y) = c(S(y), y)  X Y

’(S(y) for all y 2 Y. We obtain the desired equality by
choosing  defined, for every test , by ’ c-concave
Z Z
Take a maximizing sequence (’n , ’cn ) of c-concave
ðx; yÞðdxdyÞ :¼ ðSðyÞ; yÞðdyÞ
XY Y conjugate functions. It is easy to check that {fn }
& is equicontinuous on X: this follows from the c-con-
cavity property and from the uniform continuity of
We observe that, for every ’ 2 C0 (X), the func- c (observe that ’n (x1 )  ’n (x2 ) = ’cc cc
n (x1 )  ’n (x2 ) 
tion ’c introduced in Lemma 22 is continuous (use supY {c(x1 ,  )  c(x2 ,  )}). Then, by Ascoli’s theorem,
the uniform continuity of c) and therefore the pair possibly passing to subsequences, we may assume
(’, ’c ) belong to the class that: ’n  cn converges uniformly to some continuous
 function ’  where {cn } is a suitable sequence of
F c :¼ ð’; Þ 2 C0 ðXÞ  C0 ðYÞ:
reals. Then, one checks that ’  is still c-concave
’ðxÞ þ ðyÞ  cðx; yÞg and that (’n  cn )c = ’cn þ cn converges uniformly to
650 Convex Analysis and Duality Methods

c . Thus, recalling that (X) = (Y) = 1, we


’ consequence of the triangular inequality, we have
deduce that the following equivalence:
Z Z 
c ’ c-concave , ’ðxÞ  ’ðyÞ  cðx; yÞ; 8ðx; yÞ
sup½7
¼ lim ’n d þ ’n d
n
Z X Y

, ’c ¼ ’
Z
c
¼ lim ð’n  cn Þ d þ ð’n þ cn Þ d Let us denote Lip1 (X) := {u 2 C0 (X): u(x)  u(y) 
n X Y
Z Z c(x, y)}. The first assertion of Theorem 23 becomes
¼  d þ
’ c
 d
’ the Kantorovich–Rubintein duality formula:
X Y Z 
The last assertion is a consequence of the extrem- Tc ð ; Þ ¼ max u dð  Þ: u 2 Lip1 ðXÞ ½8

X
ality relation:
As it appears, Tc ( , ) depends only on the differ-
0 ¼ inf½6
 sup½7
ence f =  , which belongs to the space M0 (X) of
Z signed measure on X with zero average. Defining

¼ cðx; yÞ  ’ðxÞ
  ðyÞ ðdxdyÞ N(f ) := Tc (f þ , f  ) provides a seminorm (Kantoro-
XY
vich norm) on M0 (X) (it turns out that M0 (X) is
& not complete and that in general its completion is a
strict subspace of the dual of Lip(X)).
We will now specialize to the case where X is a
Remark 24 compact manifold equipped with a geodesic dis-
(i) In their discrete version (i.e., ,  are atomic tance. This will allow us to link the original problem
measures), problems [6] and [7] can be seen as to another primal–dual formulation closer to that
particular linear programming problems (see the considered in the section ‘‘Primal–dual formulation
section ‘‘Finite-dimensional case’’). in mechanics’’ and yielding to a connection with
(ii) The case X = Y  Rn and c(x, y) = (1=2)jx  yj2 partial differential equations. As a model example,
let us assume that K = ,  where  is a bounded
is important. In this case, the notion of c-concavity
is linked to convexity and the Fenchel transform connected open subset of Rn with a Lipschitz
boundary. Let     be a compact subset (on
since, for every ’ 2 C0 (X), one has
which the transport will have zero cost) and define
!
j  j2 j  j 2 
 ’c ¼ ’ cðx;yÞ:¼ inf H1 ðS n Þ:
2 2 

S Lipschitz curve joining x to y; S   ½9

Then if (’,
 ’c ) is a solution of [7], we find that where H1 denotes the one-dimensional Hausdorff
measure (length). It is easy to check that
jxj2
’0 ðxÞ :¼  ’ðxÞ
 cðx; yÞ ¼ minf ðx; yÞ;  ðx; Þ þ  ðy; Þg
2
where  (x, y) is the geodesic distance on  (induced
is convex continuous and that the extremality by the Euclidean norm). Furthermore, the following
condition: ’(x)
 þ’ c (y) = c(x, y) is equivalent to characterization holds:
Fenchel equality ’0 (x) þ ’0 (y) = (xjy). There-
fore, any optimal  is supported in the graph u 2 Lip1 ðXÞ () u 2 W 1;1 ðÞ;
of the subdifferential map @’0 . In the case jruj  1 a.e. in ; u ¼ cte on  ½10

where is absolutely continuous with respect to


the Lebesgue measure, it is then easy to deduce Since f :=   is balanced, the value of the
that the optimal  is unique and that  = T0 , constant on  in [10] is irrelevant and can be set
where T0 = r’0 is the unique gradient (a.e. to 0. Thus we may rewrite the right hand side
defined) of a convex function such that member of [8] in a equivalent way as
r’]0 ( ) = . This is a celebrated result by Y Z
Brenier (see, e.g., the monographs by Evans max u df: u 2 W 1;1 ðÞ;


(1997) and Villani (2003)). 
jruj  1 a.e. on ; u ¼ 0 on  ½11

The Distance Case

In the following, we assume that X = Y and that We will now derive a new dual problem for [11]
c(x, y) is a semidistance. As an immediate by using Proposition 14. To this aim, we consider
Convex Analysis and Duality Methods 651

 (as a closed subspace of W 1, 1 ()),


X = C1 () Remark 27 Given a solution  for [6], we can
 Rn ), Y  = Mb (;
Y = C0 (;  R n ) and the operator construct a solution  for [12] by selecting for every
A : u 2 X 7! ru 2 Y. (x, y) 2 spt(
 ) a geodesic curve Sxy joining x and y
 f =   and c (possibly passing through the free-cost zone ) and
Theorem 25 Let ,  2 P(),
by setting, for every test :
defined by [9]. Then,
Z Z !
Z

h ; i :¼ 1 
 S dH ðdxdyÞ
Tc ð ; Þ ¼ min  R n Þ;
j j: 2 Mb ð;  
xy
 Sxy



div ¼ f on  n ½12
where Sxy denote the unit oriented tangent vector
(see Bouchitté and Buttazzo (2001)). It is also
possible to show (see Ambrosio (2003)) that any
where the divergence condition is intended in the solution  can be represented as before through a
sense that particular solution . As a consequence, the support
Z Z of any solution  of [12] is supported in the geodesic
 r’ ¼ ’ df envelope of the set spt( ) [ spt() [ . However, we
 
  stress the fact that, in general, there is no uniqueness
at all of the optimal triple ( , u  for [6], [11]
, )
for all ’ 2 C1 compactly supported in R n n.
and [12].
Proof (sketch)
R We apply Proposition 14 with
Remark 28 An approximation procedure for par-
(u) =   u df if u = 0 on  (þ1 otherwise),
 (þ1 otherwise). ticular solutions of problems [11], [12] can be
A ¼ r, and (v) = 0 if jvj  1 on 
obtained by solving a p-Laplace equation and then
We obtain that the minimum  in [12] is reached
by sending p to infinity. Precisely, consider the
and that  = , where
solution up 2 W 1, p () of
 Z

 :¼ inf  u df: u 2 C1 ðÞ; divðjrujp2 ruÞ ¼ f 
on n


 u¼0 on 
jruj  1 on  u ¼ 0 on 
which, for p > n, exists (due to the compact
embedding W 1, p ()  C0 ())  and is unique. In
To prove that  = Tc ( , ) = sup (11), we consider a Bouchitté et al. (2003) it is proved that the sequence
maximizer u  in [11] and prove that it can be {(up , p )}, where p = jrup jp2 rup , is relatively
approximated uniformly by a sequence {un } of compact in Mb (;  Rn )  C0 ( (weakly star with
functions in C1 () which satisfy the same con- respect to the first component) and that every cluster
straints. This technical part is done by truncation point (  solves [11], [12]. It is an open problem
u, )
and convolution arguments (we refer to Bouchitté to know whether or not such a cluster point is
et al. (2003) for details). & unique. If the answer is ‘‘yes,’’ the process described
Remark 26 By localizing the integral identity above would select one optimal pair among all
associated with [12], it is possible to deduce possible solutions. As far as problem [11] is
the optimality conditions which characterize optimal concerned, this problem is connected with the
pairs (  for [11], [12] (without requiring any
u, ) theory of viscosity solutions for the infinite Lapla-
regularity). This is done by using a weak notion cian (see Evans (1997)) although this theory does
of tangential gradient with respect to a measure not provide an answer as it erases the role of the
(see Bouchitté et al. (1997) and Bouchitté and source term f. On the other hand, a new entropy
Fragalà (2002)). If  =  dx where  2 L1 (; Rn ) selection principle should be found for the solutions
and if   @, then we find that  = aru, where the of dual problem [12]. In fact, the following partial
result holds: let E : Mb (;  Rn ) ! R [ {þ1} be the
pair (
u, a) solves the following system:
functional defined by
8
divðaruÞ ¼ f on  ðdiffusion equationÞ <R d
jruj ¼ 1 a.e. on fa > 0g ðeikonal equationÞ Eð Þ :¼  jj logðjjÞ dx if dx and  ¼
dj j
:
u¼0 a.e. on  þ1 otherwise
@u Assume that [12] admits at least one solution 0
¼0 on 
@n such that E( 0 ) < þ1. Then it can be shown that
652 Convex Analysis and Duality Methods


the sequence {p } does converge weakly-star to , Further Reading
the unique minimizer of the problem
Alberti G, Bouchitté G, and Dal Maso G (2003) The calibration
inffEð Þ: solution of ½12
g method for the Mumford–Shah functional and free-disconti-
nuity problems. Calculus of Variations and Partial Differential
The general case, in particular when all optimal Equations 16(3): 299–333.
Ambrosio L (2003) Lecture notes on optimal transport problems.
measures are singular, is open.
In: Mathematical Aspects of Evolving Interfaces (Funchal
Remark 29 Variational problems [11], [12] have 2000), Lecture Notes in Mathematics, vol. 1812, pp. 1–52.
Berlin: Springer.
important counterparts in the theory of elasticity
Borwein M and Lewis SA (2000) Convex Analysis and Nonlinear
and in optimal design problems (see Bouchitté and Optimization. Theory and Examples, CMS Series. Berlin:
Buttazo (2001)). They read, respectively, as Springer.
Z Bouchitté G and Buttazzo G (2001) Characterization of optimal
shapes and masses through Monge–Kantorovich equations.
max u  df: u 2 \p>1 W 1;p ð; R n Þ;


Journal of the European Mathematical Society 3: 139–168.
 Bouchitté G, Buttazzo G, and De Pascale L (2003) A p-Laplacian
ruðxÞ 2 K a:e: on ; u ¼ 0 on  approximation for some mass optimization problems. Journal
of Optimization Theory and Applications 118: 1–25.
Z Bouchitté G, Buttazzo G, and Seppecher P (1997) Energies with
min  R n2 Þ;
0K ð Þ: 2 Mb ð; respect to a measure and applications to low dimensional
sym

 structures. Calculus of Variations and Partial Differential
 Equations 5: 37–54.
div ¼ f on n  Bouchitté G and Dal Maso G (1993) Integral representation and
relaxation of convex local functionals on BVðÞ. Annali della
2 Scuola Superiore di Pisa 20(4): 483–533.
where K  R nsym ) is a convex compact subset of Bouchitté G and Fragalà I (2002) Variational theory of weak
symmetric second-order tensors associated with the geometric structures: the measure method and its applications.
elastic material, 0K () = sup {  z: z 2 K} is convex Variational Methods for Discontinuous Structures, Ser.
positively R1-homogeneous and the functional on PNLDE, vol. 51, pp. 19–40. Basel: Birkhäuser.
Bouchitté G and Fragalà I (2003) Second order energies on thin
measures  0K ( ) is intended in the sense given in
structures: variational theory and non-local effects. Journal of
[1]. A celebrated example is given by Michell’s Functional Analysis 204(1): 228–267.
problem (Michell 1904) where n = 2 and K := {z 2 Bouchitté G and Valadier M (1988) Integral representation of
2
Rnsym , j(z)j  1}, (z) being the largest singular value convex functionals on a space of measures. Journal of
of z. The potential 0K is given by the nondifferenti- Functional Analysis 80: 398–420.
Ekeland I and Temam R (1976) Analyse convexe et problèmes
able convex function 0K () = 1 () þ 2 (), where the
variationnels. Paris: Dunod-Gauthier Villars.
i ()’s are the singular values of . Evans LC (1997) Partial differential equations and Monge–
Kantorovich mass transfer. In: Bott R, Jaffe A, Jerison D,
Unfortunately, it is not known if the vector
Lutsztig G, Singer I, and Yau JT (eds.) Current Developments
variational problem above can be linked to an in Mathematics, pp. 65–126. Cambridge.
optimal transportation problem of the type [6], Michell AGM (1904) The limits of economy of material in frame
even if the analogous of equivalence [10] does exist structures. Philosophical Magazine and Journal of Science
in the Michell’s case, namely (for  convex): 6: 589–597.
Rockafellar RT (1970) Convex Analysis. Princeton: Princeton
ðeðuÞÞ  1 on  University Press.
Villani C (2003) Topics in Optimal transportation, Graduate
() jðuðxÞ  uðyÞjx  yÞj  jx  yj2 ; 8ðx; yÞ studies in Mathematics, vol. 58. Providence, RI: AMS.

Cosmic Censorship see Spacetime Topology, Causal Structure and Singularities


Cosmology: Mathematical Aspects 653

Cosmology: Mathematical Aspects


G F R Ellis, University of Cape Town, radiation); in the case of a scalar field with potential
Cape Town, South Africa V() and spacelike surfaces { = const:}, on choosing
ª 2006 Elsevier Ltd. All rights reserved. ua orthogonal to these surfaces, the stress tensor has
2
a perfect-fluid form with  = (1/2)˙ þ V(),
2
p = (1/2)˙  V(). A cosmological constant  can
be represented as a perfect fluid with  þ p = 0,
Introduction  = p. More general matter may involve a momen-
Mathematical cosmology focuses on the geometrical tum flux density qa and anisotropic pressures ab
and mathematical aspects of the study of the (Ehlers 1961). Whatever the nature of the matter, it
universe as a whole. Because the structure of will usually be required to satisfy energy conditions
spacetime (with metric tensor gab (xj )) is governed (Hawking and Ellis 1973). All realistic matter has a
by gravity, with matter and energy causing space- positive inertial mass density:
time curvature according to the nonlinear gravita-
þp>0 ½3
tional field equations of the theory of general
relativity, it has its roots in differential geometry. It (note that realistic cosmological models are non-
is to be distinguished from the three other major empty), whereas all ordinary matter has a positive
aspects of modern cosmology, namely astrophysical gravitational mass density:
cosmology, high-energy physics cosmology, and
observational cosmology; see Peacock (1999) for  þ 3p > 0 ½4
these aspects. but this is not necessarily true for a scalar field or
The Einstein field equations (EFEs) are effective cosmological constant.
Mathematical cosmology (Ellis and van Elst 1999)
Rab  12 Rgab þ gab ¼ Tab ½1
studies (1) generic properties of solutions with a
where Rab is the Ricci tensor, R the Ricci scalar, Tab preferred 4-velocity field and matter content as
the matter tensor,  the cosmological constant, and indicated above, (2) the standard FLRW models,
 the gravitational constant. Cosmological models (3) approximate FLRW solutions, and (4) other
differ from generic solutions of these equations in exact and approximate cosmological solutions. The
that they have preferred world lines in spacetime ultimate underlying issue is (5) the origin of the
associated with the motion of matter and distribu- universe. We look at these in turn. We aim to use
tion of radiation (Ellis 1971). This is a classic case of covariant methods as far as possible, to avoid being
a broken symmetry: the underlying equations [1] are misled by coordinate effects, and to obtain exact
locally Lorentz invariant but their solutions are not. solutions and exact results as far as possible, because
These preferred world lines, characterized by a unit approximate methods can be misleading in the case
4-velocity vector ua , are associated at late times with of these nonlinear field equations.
‘‘fundamental observers,’’ and a key aspect of
cosmological modeling is determining the observa-
tional relations such observers would determine Exact Properties
through astronomical observations.
The dynamics of cosmological models is deter- We can split the equations into spacelike and
mined by their matter content. This is usually timelike parts relative to the 4-velocity ua , obtain-
represented in simplified form, often using the ing the (1 þ 3) covariant dynamical equations and
‘‘perfect-fluid’’ approximation to represent the effect identities in terms of the fluid shear ab , vorticity
of matter or radiation; that is, !ab , expansion  = ua ;a , and acceleration ab =
ua;b ub (Ehlers 1961, Ellis 1971, Ellis and van Elst
Tab ¼ ð þ pÞua ub þ pgab ½2 1999). The energy density of a perfect fluid obeys
where  is the energy density and p the pressure, and the conservation equation
the matter 4-velocity ub is the preferred cosmo-
S_
logical 4-velocity. This description can include a _ ¼ 3ð þ pÞ ½5
scalar field  with dynamics governed by the S
Klein–Gordon equation, provided ua is normal to with extra terms occurring in the case of more
spacelike surfaces { = const}. Suitable equations of complex matter. From the momentum equations,
state describe the nature of the matter envisaged pressure-free solutions are geodesic (ab = 0). The
(e.g., p = 0 for baryons, whereas p = =3 for crucial Raychaudhuri–Ehlers equation for the
654 Cosmology: Mathematical Aspects

time derivative of the expansion (Ehlers 1961) only if the gravitational field equations remain valid
can be written as to arbitrarily early times; but we would in fact
expect that, at high enough energy densities,

S  quantum gravity would take over from classical
3 ¼ 2ð!2  2 Þ þ ab;b  ð þ 3pÞ þ  ½6 gravity, so whether or not there was indeed a
S 2
singularity would depend on the nature of the as
where the representative length scale S is defined by yet unknown theory of quantum gravity. The cash
 = 3Ṡ=S. This is the basis of the ‘‘fundamental value of the singularity theorems then is the
singularity theorem’’: if in an expanding universe implication that, when the energy conditions are
! = 0 = ab and the combined matter present satisfies satisfied, one would indeed be involved in such a
[4], with   0, then there was a singularity where quantum gravity realm in the very early universe.
S ! 0 a finite time t0 < 1=H0 ago, H0 = (Ṡ=S)0 being
the present value of the Hubble constant. The energy
density will diverge there, so this is a spacetime
The Standard Friedmann–Lemaı̂tre
singularity: an origin of physics, matter, and space- Models
time itself. However, the deduction does not follow if The standard models of cosmology are the Fried-
there is rotation or acceleration, which could mann–Lemaı̂tre (FL) models with Robertson–Walker
conceivably avoid the singularity, so this result is by (RW) geometry: that is, they are exactly spatially
itself inconclusive for realistic cosmologies. homogeneous and locally isotropic, invariant under a
The vorticity obeys conservation laws analogous G6 of isometries (Robertson 1933, Ehlers 1961).
to those in Newtonian theory (Ehlers 1961). They have a unique cosmic time function t, with
Vorticity-free solutions (! = 0) occur whenever the space sections {t = const:} of constant spatial curva-
fluid flow lines are hypersurface-orthogonal in ture orthogonal to the uniquely preferred 4-velocity
spacetime, that is, there exists a cosmic time ua . The fluid acceleration, vorticity, and shear all
function for the comoving observers, which will vanish, and all physical quantities depend only on the
measure proper time along the flow lines if time coordinate t. They can be represented by a
additionally the fluid flow is geodesic. The rate of metric with scale factor S(t):
change of shear is related to the conformal curvature
(Weyl) tensor, which represents the free gravita- ds2  gab dxa dxb
tional field, and which splits into an electric part Eab ¼ dt2 þ S2 ðtÞfdr2 þ f 2 ðrÞðd
2 þ sin2
d2 Þg
and a magnetic part Hab in close analogy with ½7
electromagnetic theory. Shear-free solutions ( = 0)
are very special because they strongly constrain the in comoving coordinates (xa ) = (t, r,
, ), where f (r) =
Weyl tensor; indeed if the flow is shear free and { sin r, r, sinh r} if {k = þ1, 0, 1}, and the matter is a
geodesic, then it either does not expand ( = 0), or perfect fluid with 4-velocity vector ua = dxa =ds = 0a .
does not rotate (! = 0) (Ellis 1967). The set of The curvature of the space sections {t = const:} is
cosmological observations associated with generic K = k=S2 ; these 3-spaces are necessarily closed (com-
cosmological models has been characterized in pact) if they are positively curved (k = þ1), but may be
power series form by Kristian and Sachs (1966), open or closed in the flat (k = 0) and negatively curved
and that result has been extended to general models (k = 1) cases, depending on their topology
by Ellis et al. (1985). (Lachieze-Rey and Luminet 1995).
The local regularity of the theory is expressed in Matter obeys the conservation equation [5], whose
existence and uniqueness theorems for the EFEs, outcome depends on the equation of state; for
provided the matter behavior is well defined through baryons  = M=S3 , whereas for radiation  = M=S4 ,
prescription of suitable equations of state (Hawking where M is a constant. The dynamics of the models is
and Ellis 1973). However, in general the theory governed by the Raychaudhuri equation
breaks down in the large, and this feature is €S 
specified by the Hawking–Penrose singularity theo- 3 ¼  ð þ 3pÞ þ  ½8
rems, predicting the existence of a geodesic incom- S 2
pleteness of spacetime under conditions applicable which has the Friedmann equation
to realistic cosmological models satisfying the energy
3S_ 2 3k
conditions given by eqns [3] and [4] (Hawking and ¼  þ   2 ½9
Ellis 1973, Tipler et al. 1980). However, the S2 S
conclusion does not follow if the energy conditions as a first integral whenever Ṡ 6¼ 0. Depending on the
are not satisfied. Furthermore, the deduction follows matter components present, one can qualitatively
Cosmology: Mathematical Aspects 655

characterize the dynamical behavior of these models nature of which is most clear when represented in
(Robertson 1933) and find exact and approximate conformal diagrams (Hawking and Ellis 1973, Tipler
solutions to these equations as well as phase planes et al. 1980). These result from the fact that light
representing the relation of the different models to can only proceed a finite distance in the finite time
each other; for example, Ehlers and Rindler (1989) since the origin of the universe, and imply that for
give the phase planes for models with noninteracting a standard radiation-dominated hot-big-bang early
matter and radiation and an arbitrary cosmological universe, regions of larger than 1
angular size on
constant. Universes with maxima or minima in S(t) the surface of last scattering, which emits the CBR,
can only occur if k = þ1; when  = 0, the universe are causally disconnected: hence, no causal process
recollapses in the future iff k = þ1. Static solutions since the start of the universe can account for the
are possible only if k = þ1 and (assuming [4]) extreme isotropy of the CBR (T=T ’ 105 over
 > 0. The simplest expanding solutions are the the whole sky, once a dipole anisotropy T=T ’
Einstein–de Sitter universes with k = 0 = . 103 due to our local velocity relative to the
Equation [8] is a special case of [6], with cosmological rest frame is allowed for). This is the
corresponding implications: if the combined matter ‘‘horizon problem,’’ one of the driving forces
present satisfies [4], with   0, then there must have behind the theory of ‘‘inflation’’ (Guth 1981): the
been an initial singularity, or at least the universe idea that, in the very early universe, a slow-rolling
must have emerged from a quantum gravity domain. scalar field led to a brief exponential expansion
The temperature would have been arbitrarily high in through at least 50 e-folds (during which time the
the past, so there was a hot big bang era in the early spacetime was approximately de Sitter), thus
universe where matter and radiation were in equili- smoothing the universe and solving the horizon
brium with each other at very high temperatures that problem (Guth 1981, Peacock 1999). This is
rapidly fell as the universe expanded. Many physical possible because a scalar field can violate the energy
processes took place then, in particular nucleosynth- condition [3] and so allows acceleration: €S > 0.
esis of light elements took place at 109 K. Decou- Consequently, there are now many studies of the
pling of matter and radiation took place at a dynamics of FLRW solutions driven by scalar fields
temperature of  4000 K, followed by formation of and the subsequent decay of these scalar fields into
stars and galaxies (see Peacock (1999) for a discus- radiation. One interesting point is that one can
sion of these physical processes). The black-body obtain exact solutions of this kind for arbitrarily
radiation emitted by the surface of last scattering at chosen evolutions S(t), provided they satisfy a
2
4000 K is observed by us today as cosmic black-body restriction on the magnitude of Ṡ , by running the
radiation (CBR) at a temperature of 2.75 K. field equations backwards to determine the needed
One can determine observational relations for potential V() (Ellis and Madsen 1991). The
these models such as the magnitude–redshift relation inflationary paradigm is dominant in present-day
for ‘‘standard candles’’ at recent times from the EFEs theoretical cosmology, but suffers from the problem
(Sandage 1961). The aim of observations is to that it is not in fact a well-defined theory, for there
determine the Hubble constant H0 , dimensionless is no single accepted proposal for the physical
deceleration parameter q0 = (3=H02 )(€ S=S)0 , and nature of the effective scalar field underlying the
normalized density parameters 0i = 0i =3H02 for supposed exponential expansion; rather there are
each component of matter present. The spatial numerous competing proposals. As the inflaton has
curvature and the cosmological constant then follow not yet been identified, this theory is not yet
from [6] and [9]; also the present scale factor S0 is soundly linked to well-established physics.
determined if k 6¼ 0. The universe is of positive
spatial curvature
P (k = þ1) iff 0  m þ  > 1,
Approximate FL Solutions
where m  i 0i ,  = =3H02 . Current observa-
tions indicate m ’ 0.3,  ’ 0.7, 0 ’ 1.02 The real universe is, of course, not exactly FL, and
0.02. Because the nucleosynthesis results limit the studies of structure formation depend on studies of
baryon density to a very low value (0b ’ 0.02), solutions that are approximately FL models – they
which is about the same as the density of luminous are realistic (‘‘lumpy’’) universe models. These
matter, this indicates the dominant presence of both enable detailed studies of observable properties
nonbaryonic dark matter and a repulsive force such as CBR anisotropies and gravitational lensing
corresponding to either a cosmological constant or induced by matter inhomogeneities, and of the
varying scalar field (dark energy). development of those inhomogeneities from quan-
Crucial causal limitations occur because of the tum fluctuations in the very early universe that then
existence of particle horizons (Rindler 1956), the get expanded to very large scales by inflation.
656 Cosmology: Mathematical Aspects

The key problem here is that apart from the standard of the CBR. The Ehlers–Geren–Sachs (EGS) theorem
coordinate freedom allowed in general relativity, there (Ehlers et al. 1968) provides a sound basis for this
is a serious gauge issue: the background FL model is not argument: it shows that if freely propagating CBR
uniquely determined by the realistic universe model; (obeying the Liouville equation) is exactly isotropic in
however, the magnitudes of many perturbed quantities an expanding universe domain U,then the universe is
depend on how it is fitted into the lumpy model. For exactly FL in that domain (i.e., it has exactly the RW
example, the density perturbation  is determined spatially homogenous and isotropic geometry there),
pointwise by the equation the point being that any inhomogeneities in the
matter distribution between us and the surface of last
ðxi Þ  ðxi Þ  ðxi Þ
scattering will produce anisotropies in the CBR
where (xi ) is the background density. But by temperature we measure. But that result does not
altering the correspondence between the background apply to the real universe, because the CBR is not
and realistic models (specifically, by the choice of exactly isotropic. The ‘‘almost EGS’’ theorem
surfaces (xi ) = const. in the realistic model) one can (Stoeger et al. 1995) shows that this result is stable:
assign that quantity any value, including zero (if one almost isotropic CBR in the domain U implies that
chooses (xi ) = (xi )). This is the ‘‘gauge problem.’’ the universe is almost-FL in that domain. The
One can handle it by using standard variables and application to the real universe comes by making a
keeping close track of the gauge freedom at all weak Copernican assumption: ‘‘we assume we are
times. However, one then ends up with higher-order not special, so all observers in U (taken to be the
equations than necessary because some of the visible part of the universe) will also see almost
perturbation modes present are pure gauge modes isotropic CBR, just as we do.’’ The result then
with no physical significance. Alternatively, one can follows. A further argument for homogeneity of the
fix the gauge by some unique specification of how universe comes from postulating ‘‘uniform thermal
the background model is fitted into the realistic histories’’ (Bonnor and Ellis 1986), but that argument
model, but there is no agreement on a unique way to is yet to be completed and applied in a practical way.
do this, and different choices give different answers.
The preferable resolution is to use gauge-invariant
Anisotropic and Inhomogeneous Models
variables, either coordinate based (Bardeen 1980) or
covariant, based on the (1þ3) covariant decomposi- The FL universes are geometrically extremely special.
tion of spacetime quantities mentioned above (Ellis We wish further to understand the full range of
and Bruni 1989), in either case resulting in pertur- possible universe models, their dynamical behaviors,
bation equations without gauge freedom and of and which of them might, at some epoch, realistically
order corresponding to the physical degrees of represent the real universe. This enables us to see how
freedom. The key point in the latter approach is to the approximate FL models fit into this wider set of
choose covariant variables that vanish in the back- possibilities, and under what circumstances they are
ground spacetime; they are then automatically gauge attractors in this set of cosmologies.
invariant. Realistic structure formation studies carry Exact solutions are characterized by their space-
out this process for a mixture of matter components time symmetries. Symmetries are characterized by
with different average velocities, and extend to a the dimension s of the surfaces of homogeneity and
kinetic theory description of the background radia- the dimension q of the isotropy group at a general
tion (see Ellis and van Elst (1999) and references point, together giving the dimension r = s þ t of the
therein). The outcome is a prediction of the CBR group of isometries Gr (at special points, such as a
anisotropy power spectrum, determined by the center of symmetry, s can decrease and q increase
inhomogeneities in the gravitational field and the but always so that r stays unchanged). In the case of
motions of the matter components at decoupling a cosmological model, because the 4-velocity ua is
(Sachs and Wolfe 1967). This spectrum can then be invariant under isotropies, the only possible dimen-
compared with observations and used in determin- sions for the isotropy group are q = 3, 1, 0; whereas
ing the values of the cosmological parameters the dimension t of the surfaces of homogeneity can
mentioned above (see Peacock 1999). take any value from 4 to 0. This gives the basis for a
One crucial issue is why it is reasonable to use a classification of cosmological spacetimes (Ellis 1967,
perturbed FL model for the observable region of the Ellis and van Elst 1999).
universe. The key argument is that this is plausible When q = 3, we have isotropic solutions – there
because of the high isotropy of all observations are no preferred spatial directions – and it is then
around us when averaged on a sufficiently large a theorem that they must be spatially homoge-
spatial scale, and particularly the very low anisotropy neous FL universes (Ehlers 1961). When q = 1, we
Cosmology: Mathematical Aspects 657

have locally rotationally symmetric (LRS) solu- times. This is an indication that inflation can
tions, with precisely one preferred spacelike direc- succeed in making anisotropic early states resemble
tion at a generic point (Ellis 1967). When q = 0, the FL models at later times. Observational properties
solutions are anisotropic in that there can be no like element abundances and CBR anisotropy
continuous group of rotations leaving the solution patterns can be worked out in these models (some
invariant; however, there can be discrete isotropies of them develop a characteristic isolated ‘‘hot spot’’
in some special cases. in the CBR sky). For q = 1 (r = 4), we have spatially
When t = 4,we have spacetime homogeneous solu- homogeneous LRS models, either Kantowski Sachs
tions, with all physical quantities constant; they cannot or Bianchi universes, and again observations can be
expand (by [5] and [3]). Nevertheless, two cases are of worked out in detail and phase planes developed
interest. For q = 1 (r = 5) we find the Gödel universe, showing their dynamical behavior, often isotropiz-
rotating everywhere with constant vorticity, which ing at late times. There are orthogonal and tilted
illustrates important causal anomalies (Gödel 1949, cases, the latter possibly involving nonscalar singu-
Hawking and Ellis 1973). For q = 3 (r = 6), we find larities. For q = 3 (r = 6), we have the isotropic FL
the Einstein ‘‘static universe’’ (Einstein 1917), the models, discussed above. Both the LRS and isotropic
unique nonexpanding FL model with k = 1 and  > 0. cases could be good models of the real universe.
It is of interest because it could possibly represent the When t = 2, we have inhomogeneous evolving
asymptotic initial state of nonsingular inflationary models. This is a very large family, but the LRS
universe models (Ellis et al. 2003). The higher- (q = 1, r = 2) cases have been examined in detail; in
symmetry models (de Sitter and anti-de Sitter the case of pressure-free matter, these are the
universes with higher-dimensional isotropy groups) Tolman–Bondi inhomogeneous models (Bondi
are not included here because they do not obey the 1947) that can be integrated exactly, and have
energy condition [3] – they are empty universes, been used for many interesting astrophysical and
which can be interesting asymptotic states but are cosmological studies. Krasiński (1997) gives a very
not by themselves good cosmological models. complete catalog of these and lower-symmetry
When t = 3, we have spatially homogeneous inhomogeneous models and their uses in cosmology.
evolving universe models. For q = 0 (r = 3), there A considerable challenge is the dynamical systems
are a large family of Bianchi universes, spatially analysis for generic inhomogeneous models, needed
homogeneous but anisotropic, characterized into to properly understand the early evolution of generic
nine types according to the structure constants of universe models (Uggla et al. 2003), and hence to
the Lie algebra of the three-dimensional symmetry determine what is generic behavior.
group G3 . These can be ‘‘orthogonal’’: the fluid flow
is orthogonal to the surfaces of homogeneity, or
The Origin of the Universe
‘‘tilted’’; the latter case can have fluid rotation or
acceleration, but the former cannot. They exhibit a The issue underlying all this is what led to the initial
large variety of behaviors, including power-law, conditions for the universe, for example, providing
oscillatory, and nonscalar singularities (Tipler et al. the starting conditions for inflation. There are many
1980). A vexed question is whether truly chaotic approaches to studying the quantum gravity phase
behavior occurs in Bianchi IX models. The behavior of cosmology, including the Wheeler–de Witt equa-
of large families of these models has been character- tion, the path-integral approach, string cosmology,
ized in dynamical systems terms (Wainwright and pre-big bang theory, brane cosmology, the ekpyrotic
Ellis 1996), showing the intriguing way that higher- universe, the cyclic universe, and loop quantum
symmetry solutions provide a ‘‘skeleton’’ that guides gravity approaches. These lie beyond the purview of
the behavior of lower-symmetry solutions in the the present article, except to say that they are all
space of spacetimes. Many Bianchi models can be based on unproven extrapolations of known physics.
shown to isotropize at late times, particularly if The physically possible paths will become clearer as
viscosity is present; thus, they are asymptotic to the the nature of quantum gravity is elucidated.
FL universes in the far future. In some cases, Bianchi It is pertinent to note that there exist nonsingular
models exhibit intermediate isotropization: they are realistic cosmological solutions, possible in the light
much like FL models for a large part of their life, but of the violations of the energy condition enabled by
are very different from it both at very early and very the supposed scalar fields that underlie inflationary
late stages of their evolution. These could be good universe theory. These nonsingular solutions can even
models of the real universe. An important theorem avoid the quantum gravity era (Ellis et al. 2003).
by Wald (1983) shows that a cosmological constant However, they have very fine-tuned initial conditions,
will tend to isotropize Bianchi solutions at late which is nowadays considered as a disadvantage; but
658 Cotangent Bundle Reduction

there is no proof that whatever processes led to the Ellis GFR (1971) In: Sachs RK (ed.) General Relativity and
existence of the universe preferred generic rather than Cosmology, Proc. Int. School of Physics ‘‘Enrico Fermi,’’
Course XLVII, p. 104. Academic Press.
fine-tuned conditions; this is a philosophical rather Ellis GFR and Bruni M (1989) Physical Review D 40: 1804.
than physical assumption. It may well be that, as Ellis GFR and van Elst H (1999) In: Lachieze-Ray M (ed.) Theoretical
regards the start of the universe, the options are that and Observational Cosmology, vol. 541 [gr-qc/9812046], Nato
either an initial singularity occurred, or the initial Series C: Mathematical and Physical Sciences: Kluwer.
conditions were very finely tuned and allowed an Ellis GFR and Madsen M (1991) Classical and Quantum Gravity
8: 667.
infinitely existing universe. Investigation of whether Ellis GFR, Murugan J, and Tsagas CG (2003) gr-qc/0307112.
this conjecture is in fact valid, and if so which is the Ellis GFR, Nel SD, Stoeger W, Maartens R, and Whitman AP
best option, are intriguing open topics. (1985) Physics Reports 124(5 and 6): 315.
Gödel K (1949) Reviews of Modern Physics 21: 447.
See also: Einstein Equations: Exact Solutions; Guth A (1981) Physical Review D 23: 347.
Einstein–Cartan Theory; General Relativity: Experimental Hawking SW and Ellis GFR (1973) The Large Scale Structure of
Space Time. Cambridge: Cambridge University Press.
Tests; General Relativity: Overview; Gravitational
Krasiński A (1997) Inhomogeneous Cosmological Models.
Lensing; Lie Groups: General Theory; Newtonian Limit of
Cambridge: Cambridge University Press.
General Relativity; Quantum Cosmology; Shock Wave Kristian J and Sachs RK (1966) The Astrophysical Journal 143: 379.
Refinement of the Friedman–Robertson–Walker Metric; Lachieze-Rey M and Luminet JP (1995) Physics Reports
Spacetime Topology, Causal Structure and Singularities; 254: 135–214.
String Theory: Phenomenology. Robertson HP (1933) Reviews of Modern Physics 5: 62.
Peacock JA (1999) Cosmological Physics. Cambridge: Cambridge
University Press.
Further Reading Rindler W (1956) Monthly Notices of the Royal Astronomical
Society 116: 662.
Bardeen JM (1980) Physical Review D 22: 1882. Sachs RK and Wolfe A (1967) Astrophysical Journal 147: 73.
Bondi H (1947) Monthly Notices of the Royal Astronomical Sandage A (1961) Astrophysical Journal 133: 355.
Society 107: 410. Stoeger W, Maartens R, and Ellis GFR (1995) Astrophysical
Bonnor WB and Ellis GFR (1986) Monthly Notices of the Royal Journal 443: 1.
Astronomical Society 218: 605. Tipler FJ, Clarke CJS, and Ellis GFR (1980) In: Held A (ed.)
Ehlers J (1961) Abh Mainz Akad Wiss u Lit (translated in Gen General Relativity and Gravitation: One Hundred Years after
Rel Grav 25: 1225, 1993). the Birth of Albert Einstein, vol. 2, p. 97. Plenum.
Ehlers J, Geren P, and Sachs RK (1968) Journal of Mathematical Uggla C, van Elst H, Wainwright J, and Ellis GFR (2003) Physical
Physics 9: 1344. Review D gr-qc/0304002 (to appear).
Ehlers J and Rindler W (1989) Monthly Notices of the Royal Wainwright J and Ellis GFR (eds.) (1996) The Dynamical Systems
Astronomical Society 238: 503. Approach to Cosmology. Cambridge: Cambridge University
Einstein A (1917) Sitz Ber Preuss Akad Wiss (translated in The Press.
Principle of Relativity, 1993). Dover. Wald RM (1983) Physical Review D 28: 2118.
Ellis GFR (1967) Journal of Mathematical Physics 8: 1171.

Cotangent Bundle Reduction


J-P Ortega, Université de Franche-Comté, tangent vector V q 2 T q (T Q), where Q : T Q ! Q
Besançon, France is the cotangent bundle projection and T q Q :
T S Ratiu, Ecole Polytechnique Fédérale T q (T Q) ! Tq Q is its tangent map (or derivative)
de Lausanne, Lausanne, Switzerland at q. In natural cotangent bundle coordinates (qi , pi ),
ª 2006 Elsevier Ltd. All rights reserved. we have Q = pi dqi and Q = dqi ^ dpi .
Let  : G Q ! Q be a left smooth action of the Lie
group G on the manifold and Q. Denote by
g  q = (g, q) the action of g 2 G on the point q 2 Q
Introduction
and by g : Q ! Q the diffeomorphism of Q induced
The general symplectic reduction theory (see by g. The lifted left action G T Q ! T Q, given by

Symmetry and Symplectic Reduction) becomes g  q = Tgq g1 ( q ) for g 2 G and q 2 Tq Q,
much richer and has many applications if the preserves Q , and admits the equivariant momentum
symplectic manifold is the cotangent bundle map J : T Q ! g whose expression is h J( q ), i =
(T Q, Q = dQ ) of a manifold Q. The canonical q ((Q (q)), where  2 g , the Lie algebra of G, h , i : g
1-form Q on T Q is given by Q ( q )(V q ) = g ! R is the duality pairing between the dual g and g ,
q (T q Q (V q )), for any q 2 Q, q 2 Tq Q, and and Q (q) = d( exp t, q)=dtjt = 0 is the value of the
Cotangent Bundle Reduction 659

infinitesimal generator vector field Q of the G-action between TQ and T  Q . Note that if g is abelian or
at q 2 Q (see Hamiltonian Group Actions and =0, the embedding ’ is always onto and thus the
Symmetries and Conservation Laws). Throughout reduced space is again, topologically, a cotangent
this article, it is assumed that the G-action on Q, bundle.
and hence on T  Q, is free and proper. Recall also It should be noted that there is a choice in this
that ((T  Q) , (Q ) ) denotes the reduced manifold theorem, namely the 1-form  . Whereas the
at  2 g  (see Symmetry and Symplectic Reduction), reduced symplectic space ((T  Q) , (Q ) ) is intrin-
where (T  Q) := J 1 ()=G is the orbit space of the sic, the symplectic structure on the space T  Q
G -action on the momentum level manifold J 1 () depends on  . The theorem above states that no
and G := {g 2 G j Adg  = } is the isotropy sub- matter how  is chosen, there is a symplectic
group of the coadjoint representation of G on g  . diffeomorphism, which also depends on  , of the
The left-coadjoint representation of g 2 G on  2 g  reduced space onto a submanifold of T  Q .
is denoted by Adg1 .
Cotangent bundle reduction at zero is already quite
Connections
interesting and has many applications. Let  : Q ! Q=G
be the G-principal bundle projection defined by the The 1-form  is usually obtained from a left
proper free action of G on Q, usually referred to as the connection on the principal bundle  : Q ! Q=G or
shape space bundle. Zero is a regular value of J and the  : Q ! Q=G. A left connection 1-form A 2 1 (Q; g )
map ’0 : ((T  Q)0 , (Q )0 ) !(T  (Q=G), Q=G ) given on the left principal G-bundle  : Q ! Q=G is a Lie
by ’0 ([q ])(Tq (vq )) := q (vq ), where q 2 J 1 (0), algebra-valued 1-form A : TQ ! g , where g denotes
[q ] 2 (T  Q)0 , and vq 2 Tq Q, is a well-defined sym- the Lie algebra of G, satisfying the conditions A(Q ) = 
plectic diffeomorphism. for all  2 g and A(Tq g (v)) = Adg (A(v)) for all g 2 G
This theorem generalizes in two nontrivial ways and v 2 Tq Q, where Adg denotes the adjoint action of
when one reduces at a nonzero value of J: an G on g . The horizontal vector sub-bundle HQ of the
embedding and a fibration theorem. connection A is defined as the kernel of A, that is, its
fiber at q 2 Q is the subspace Hq := ker A(q). The map
vq 7! verq (vq ) := [A(q)(vq )]Q (q) is called the vertical
Embedding Version of Cotangent
projection, while the map vq 7! horq (vq ) := vq 
Bundle Reduction
verq (vq ) is called the horizontal projection. Since for
Let  2 g  , Q := Q=G ,  : Q ! Q the projection any vector vq 2 Tq Q we have vq = verq (vq ) þ horq (vq ),
onto the G -orbit space, g  := { 2 g j ad  = 0} the it follows that TQ = HQ  VQ and the maps
Lie algebra of the coadjoint isotropy subgroup G , horq : Tq Q ! Hq Q and verq : Tq Q ! Vq Q are projec-
where ad  := [, ] for any ,  2 g , ad : g  ! g  the tions onto the horizontal and vertical subspaces at every
dual map, 0 := jg  2 g  the restriction of  to g  , q 2 Q.
and ((T  Q) , (Q ) ) the reduced space at . The Connections can be equivalently defined by the
induced G -action on T  Q admits the equivariant choice of a sub-bundle HQ  TQ complementary to
momentum map J  : T  Q ! g  given by J  (q ) = the vertical sub-bundle VQ satisfying the following
J(q )jg  . Assume there is a G -invariant 1-form  G-invariance property: Hgq Q = Tq g (Hq Q) for
on Q with values in ( J  )1 (0 ). Then there is a unique every g 2 G and q 2 Q. The sub-bundle HQ is called,
closed 2-form  on Q such that   = d . Define as before, the horizontal sub-bundle and a connection
the magnetic term B := Q  , where Q : 1-form A is defined by setting A(q)(Q (q) þ uq ) = ,
T  Q ! Q is the cotangent bundle projection, for any  2 g and uq 2 Hq Q.
which is a closed 2-form on T  Q . Then the map The curvature of the connection A is the Lie
’ : ((T  Q) , (Q ) ) ! (T  Q , Q  B ) given by algebra-valued 2-form on Q defined by B(uq , vq ) =
’ ([q ])(Tq  (vq )):= (q   (q))(vq ), for q 2 J 1 (), dA(horq (uq ), horq (vq )). When one replaces vectors in
[q ]2(T  Q) , and vq 2Tq Q, is a symplectic embed- the exterior derivative with their horizontal projec-
ding onto a submanifold of T  Q covering the base tions, then the result is called the exterior covariant
Q . The embedding ’ is a diffeomorphism onto derivative and the preceding formula for B is often
T  Q if and only if g =g  . If the 1-form  takes written as B = DA. Curvature measures the lack of
values in the smaller set J 1 () then the image of ’ is integrability of the horizontal distribution, namely
the the vector sub-bundle [T (VQ)] of T  Q , where B(u, v) = A([hor(u), hor(v)]) for any two vector
VQ  TQ is the vertical vector sub-bundle consisting fields u and v on Q. The Cartan structure equations
of vectors tangent to the G-orbits, that is, its fiber at state that B(u, v) = dA(u, v)  [A(u), A(v)], where
q2Q equals Vq Q={Q (q) j  2g }, and  denotes the the bracket on the right hand side is the Lie
annihilator relative to the natural duality pairing bracket in g .
660 Cotangent Bundle Reduction

Since the connection A is a Lie algebra-valued V : Q ! R. If there is a Lie group G acting on Q by


1-form, for each  2 g  the formula  (q) := isometries and leaving the potential invariant, then
A(q) (), where A(q) : g  ! Tq Q is the dual of the we have a simple mechanical system with symmetry.
linear map A(q) : Tq Q ! g , defines a usual 1-form on The amended or effective potential V : Q ! R at
Q. This 1-form  takes values in J 1 () and is  2 g  is defined by V := H   , where  is the
equivariant in the following sense: g  = Adg  for 1-form associated to the mechanical connection. Its
any g 2 G. expression in terms of the locked moment  of inertia

tensor is given by V (q) := V(q) þ (1=2) , I(q)1  .
Magnetic Terms and Curvature The amended potential naturally induces a smooth
function V b  2 C1 (Q=G ).
There are two methods to construct the 1-form 
from a connection. The first is to start with a The fundamental result about simple mechanical
connection 1-form A 2 1 (Q; g  ) on the principal systems with symmetry is the following. The push-
G -bundle  : Q ! Q=G . Then the 1-form  := forward by the embedding ’ : ((T  Q) , (Q ) ) !
hjg  , A i 2 1 (Q) is G -invariant and has values in (T  Q , Q  B ) of the reduced Hamiltonian
( J  )1 (jg  ). The magnetic term B is the pullback to H 2 C1 ((T  Q) ) of a simple mechanical system
T  (Q=G ) of the jg  -component d of the H = KþV  Q 2 C1 (T  Q) is the restriction to the
curvature of A thought of as a 2-form on the vector sub-bundle ’ ((T  Q) )  T  (Q=G ), which
base Q=G . is also a symplectic submanifold of (T  (Q=G ),
The second method is to start with a connection Q=G  B ), of the simple mechanical system on
A 2 1 (Q, g ) on the principal bundle  : Q ! Q=G, T  (Q=G ) whose kinetic energy is given by the
to define  := h, Ai 2 1 (Q), and to observe that quotient Riemannian metric on Q=G and whose
potential is V b  . However, Hamilton’s equations on
this 1-form is G -invariant and has values in J 1 ().
The magnetic term B is in this case the pullback to T  (Q=G ) for this simple mechanical system are
T  (Q=G ) of the -component d of the curvature computed relative to the magnetic symplectic form
of A thought of as a 2-form on the base Q=G . Q=G B .
There is a wealth of applications starting from
The Mechanical Connection this classical theorem to mechanical systems, span-
ning such diverse areas as topological characteriza-
If (Q, hh , ii) is a Riemannian manifold and G acts by tion of the level sets of the energy–momentum map
isometries, there is a natural connection on the to methods of proving nonlinear stability of relative
bundle  : Q ! Q=G, namely, define the horizontal equilibria (block-diagonalization of the stability
space at a point to be the metric orthogonal to the form in the application of the energy–momentum
vertical space. This connection is called the mechan- method).
ical connection and its horizontal bundle consists of
all vectors vq 2 TQ such that J(hhvq , ii) = 0. Fibration Version of Cotangent Bundle Reduction
To determine the Lie algebra-valued 1-form A of
this connection, the notion of locked inertia tensor There is a second theorem that realizes the reduced
needs to be introduced. This is the linear map space of a cotangent bundle as a locally trivial
I(q) : g ! g  depending smoothly on q 2 Q defined by bundle over shape space Q=G. This version is
the identity hI(q), i = hhQ (q), Q (q)ii for any particularly well suited in the study of quantization
,  2 g . Since the G-action is free, each I(q) is problems and in control theory. The result is the
invertible. The connection 1-form whose horizontal following. Assume that G acts freely and properly
space was defined above is given by A(q)(vq ) = on Q. Then the reduced symplectic manifold (T  Q)
I(q)1 ( J(hhvq , ii)). is a fiber bundle over T  (Q=G) with fiber the
Denote by K : T  Q ! R the kinetic energy of the coadjoint orbit O . How this is related to the
metric hh , ii on the cotangent bundle, that is, Poisson structure of the quotient (T  Q)=G will be
K(hhvq , ii) := (1=2)kvq k2 . The 1-form  = A(  )  is discussed later.
characterized for the mechanical connection A by the
condition K( (q)) = inf {K(q ) j q 2 J 1 () \ Tq Q}. The Kaluza–Klein Construction
The extra term in the symplectic form of the reduced
The Amended Potential
space is called a magnetic term because it has this
A simple mechanical system is a Hamiltonian system interpretation in electromagnetism. To understand
on a cotangent bundle T  Q whose Hamiltonian why B is called a magnetic term, consider the
function is the the sum of the kinetic energy of a problem of a particle of mass m and charge e
Riemannian metric on Q and a potential function moving in R3 under the influence of a given
Cotangent Bundle Reduction 661

magnetic field B = Bx i þ By j þ Bz k, divB = 0. The symplectically diffeomorphic to (T  R3 , dx ^ dpx þ


Lorentz force law (written in the International dy ^ dpy þ dz ^ dpz  B), which coincides with the
System) gives the equations of motion phase space in Answer 1 if we put  = e. This also
gives the physical interpretation of the momentum
dv map J : T  Q = R3  S1  R3  R ! g  = R, J(q, ;
m ¼ ev  B ½1
dt p, p) = p and hence of the variable conjugate to
where e is the charge and v = (x, _ y,_ z_ ) = q_ is the the circle variable : p represents the charge.
velocity of the particle. What is the Hamiltonian Moreover, the magnetic term in the symplectic
description of these equations? form is, up to a charge factor, the magnetic field.
There are two possible answers to this question. The kinetic energy Hamiltonian
To formulate them, associate to the divergence free 1 1
vector field B the closed 2-form B = Bx dy ^ dz hðq; ; p; pÞ :¼ kpk2 þ p2
2m 2
By dx ^ dz þ Bz dx ^ dy. Also, write B = curl A for
some other vector field A = (Ax , Ay , Az ) on R 3 , of the Kaluza–Klein metric, that is, the Riemannian
called the magnetic potential. metric obtained by keeping the standard metrics on
Answer 1 Take on T  R3 the symplectic form each factor and declaring R3 and S1 orthogonal,
B = dx ^ dpx þ dy ^ dpy þ dz ^ dpz  eB, where induces the reduced Hamiltonian
(px , py , pz ) = p := mv is the momentum of the 1 1
particle, and h = mkvk2 =2 = m(x_ 2 þ y_ 2 þ z_ 2 )=2 is the h ðqÞ ¼ kpk2 þ 2
2m 2
Hamiltonian, the kinetic energy of the particle. A
direct verification shows that dh = B (Xh ,  ), where which, up to the constant 2 =2, equals the kinetic
energy Hamiltonian in Answer 1. Note that this
@ @ @ @ reduced system is not the geodesic flow of the
Xh ¼ x_ þ y_ þ z_ þ eðBz y_  By z_ Þ
@x @y @z @px Euclidean metric because of the presence of the
@ @ magnetic term in the symplectic form. However,
þ eðBx z_  Bz xÞ
_ þ eðBy x_  Bz xÞ
_ ½2 the equations of motion of a charged particle in a
@py @pz
magnetic field are obtained by reducing the geodesic
which gives the equations of motion [1]. flow of the Kaluza–Klein metric.
Answer 2 Take on T  R3 the canonical symplec- A similar construction is carried out in Yang–
tic form  = dx ^ dpx þ dy ^ dpy þ dz ^ dpz and the Mills theory where A is a connection on a principal
Hamiltonian hA = kp  eAk2 =2m. A direct verifica- bundle and B is its curvature. Magnetic terms also
tion shows that dhA = (XhA ,  ), where XhA has the appear in classical mechanics. For example, in
same expression [2]. rotating systems the Coriolis force (up to a dimen-
Next we show how the magnetic term in the sional factor) plays the role of the magnetic term.
symplectic form B is obtained by reduction from
the Kaluza–Klein system. Let Q = R3  S1 with Reconstruction of Dynamics
the circle G = S1 acting on Q, only on the second for Cotangent Bundles
factor. Identify the Lie algebra g of S1 with R. Since
A general reconstruction method of the dynamics
the infinitesimal generator of this action defined
from the reduced dynamics was given in (see
by  2 g = R has the expression Q (q, ) = (q, ; 0, ),
Symmetry and Symplectic Reduction). For cotangent
if TS1 is trivialized as S1  R, a momentum
bundles, using the mechanical connection, this
map J : T  Q = R3 S1 R 3 R ! g  = R is given by
method simplifies considerably.
J(q, ; p,p) = (p,p)  (0,)= p, that is, J(q, ; p,p)=p.
Start with the following general situation. Let G act
In this case, the coadjoint action is trivial, so for any
freely on the configuration manifold Q; let h : T  Q !
 2 g  = R, we have G = S1 , g  = R, and 0 = . The
R be a G-invariant Hamiltonian,  2 g  , q 2 J 1 (),
1-form  = (Ax dx þ Ay dy þ Az dz þ d ) 2 1 (Q),
and c (t) the integral curve of the reduced system with
where d denotes the length 1-form on S1 , is clearly
initial condition [q ] 2 (T  Q) given by the reduced
G = S1 -invariant, has values in J 1 () = {(q, ; p, ) j
Hamiltonian function h : (T  Q) ! R. In terms of a
q, p 2 R 3 , 2 S1 }, and its exterior differential equals
connection A 2 1 ( J 1 (); g  ) on the left G -principal
d = B. Thus, the closed 2-form  on the base
bundle J 1 () ! (T  Q) the reconstruction procedure
Q = Q=G = Q=S1 = R3 equals B and hence
proceeds in four steps:
the magnetic term, that is, the closed 2-form
B = Q  on T  Q = T  R 3 , is also B since
Step 1: Horizontally lift the curve c (t) 2 (T  Q)
Q : Q = R 3  S1 ! Q=G = R3 is the projection. to a curve d(t) 2 J 1 () with d(0) = q .
Therefore, the reduced space (T  Q) is
Step 2: Set (t) = A(d(t))(Xh (d(t))) 2 g  .
662 Cotangent Bundle Reduction

Step 3: With (t) 2 g  determined in step 2, solve (c) Reconstruction of dynamics for simple
the nonautonomous differential equation g(t) _ = mechanical systems with symmetry. The case of
Te Lg(t) (t) with initial condition g(0) = e, where Lg simple mechanical systems with symmetry deserves
denotes left translation on G; this is the step that special attention since several steps in the recon-
involves ‘‘quadratures’’ and is the main obstacle struction method can be simplified. For simple
to finding explicit formulas. mechanical systems, the knowledge of the base

Step 4: The curve c(t) = g(t)  d(t), with d(t) found integral curve q(t) suffices to determine the entire
in step 1 and g(t) found in step 3 is the integral integral curve on T  Q. Indeed, if h = K þ V  q is
curve of Xh with initial condition c(0) = q . the Hamiltonian, the Legendre transformation
Fh : T  Q ! TQ determines the Lagrangian system
This method depends on the choice of the conne-
on TQ given by ‘(uq ) = (1=2)kuq k2  V(uq ), for
ction A 2 1 ( J 1 (); g  ). Here are several particular
uq 2 Tq Q. Lagrange’s equations are second-order
cases when this procedure simplifies.
and thus the evolution of the velocities is given by
(a) One-dimensional coadjoint isotropy group. If
the time derivative q(t) _ of the base integral curve.
G = S1 or G = R, identify g  with R via the map
Since Fh = (F‘)1 , the solution of the Hamiltonian
a 2 R $ a
2 g  , where
2 g  ,
6¼ 0, is a generator of
system is given by F‘(q(t)). _ Using the explicit
g  . Then a connection 1-form on the S1 (or R)
expression of the mechanical connection and the
principal bundle J 1 () ! (T  Q) is the 1-form A on
notation given in the general procedure, the method
J 1 () given by A = (1=h,
i)  , where  is the
of reconstruction simplifies to the following steps.
pullback of the canonical 1-form 2 1 (T  Q) to
To find the integral curve c(t) of the simple mecha-
the submanifold J 1 (). The curvature of this
nical system with G-symmetry h = K þ V  Q on
connection is the 2-form on (T  Q) given by
T  Q with initial condition c(0) = q 2 Tq Q, know-
curv(A) = (1=h,
i)! , where ! is the reduced
ing the integral curve c (t) of the reduced Hamil-
symplectic form on (T  Q) . In this case, the curve
tonian system on (T  Q) given by the reduced
(t) 2 g  in step 2 is given by (t) = [h](d(t)), where
Hamiltonian function h : (T  Q) ! R with initial
 2 X(T  Q) is the Liouville vector field character-
condition c (0) = [q ] one proceeds in the follow-
ized by the property of being the unique vector field
ing manner. Recall the symplectic embedding
on T  Q that satisfies the relation d (, ) = . In
’ : ((T  Q) , (Q ) ) ! (T  (Q=G ), Q=G  B ). The
canonical coordinates (qi , pi ) on T  Q,  = pi @p @
.
i curve ’ (c (t)) 2 T  (Q=G ) is an integral curve of
(b) Induced connection. Any connection A 2
the Hamiltonian system on (T  (Q=G ), Q=G  B )
1 (Q; g  ) on the left principal bundle Q ! Q=G
given by the function that is the sum of the kinetic
induces a connection A 2 1 ( J 1 (); g  ) by A(q )
energy of the quotient Riemannian metric and the
(Vq ) := A(q)(Tq Q (Vq )), where q 2 Q, q 2 Tq Q, b  . Let q (t) :=
quotient amended potential V
Vq 2 Tq (T  Q), and Q : T  Q ! Q is the cotangent
Q=G (c (t)) be the base integral curve of this system,
bundle projection. In this case, the curve (t) 2 g  in
where Q=G : T  (Q=G ) ! Q=G is the cotangent
step 2 is given by (t) = A(q(t))(Fh(d(t)), where
bundle projection.
q(t) := Q (d(t)) is the base integral curve and the
vector bundle morphism Fh : T  Q ! TQ is the fiber
Step 1: Relative to the mechanical connection
derivative of h given by Amech 2 1 (Q; g  ), horizontally lift q (t) 2 Q=G
 to a curve qh (t) 2 Q passing through qh (0) = q.
d
Step 2: Determine (t) 2 g  from the algebraic system
Fhðq Þðq Þ :¼  hðq þ tq Þ
dt t¼0 hh(t)Q (qh (t)), Q (qh (t))ii = h, i for all  2 g  ,
where hh , ii is the G-invariant kinetic energy
for any q , q 2 Ta Q. Two particular instances of
Riemannian metric on Q. This implies that q_ h (0)
this situation are noteworthy.
and (0)Q (q) are the horizontal and vertical compo-
(b1) Assume that the Hamiltonian h is that of a nents of the vector ]q 2 Tq Q which is associated by
simple mechanical system with symmetry. the metric hh , ii to the initial condition q .
Choosing A to be the mechanical connection
Step 3: Solve g(t) _ = Te Lg(t) (t) in G with initial
Amech , the curve (t) 2 g  in step 2 is given by condition g(0) = e.
(t) = Amech (q(t)) (hhd(t), ii).
Step 4: The curve q(t) := g(t)  qh (t), with qh (t)
(b2) If Q = G is a Lie group, dim G = 1, and
is a and g(t) determined in steps 2 and 4, respectively,
generator of g  , then the connection A 2 1 (G) is the base integral curve of the simple mechanical
can be chosen to equal A(g) := (1=h,
i) system with symmetry defined by the function h
Tg Rg1 (), where
is a generator of g  and Rg satisfying q(0) = 0. The curve (Fh)1 (q(t)) _ 2 TQ
is right translation on G. is the integral curve of this system with initial
Cotangent Bundle Reduction 663

condition c(0) = q . In addition, q0 (t) = g(t)  group and (  ,  ) is a positive-definite metric


(q_ h (t) þ (t)Q (qh (t))) is the horizontal plus vertical invariant under the adjoint action of G on g
decomposition relative to the connection induced satisfying (
, ) = hh
Q (q), Q (q)ii for all q 2 Q
on J 1 () ! (T  Q) by the mechanical connection and
,  2 g , then the element (t) in step 2 can
Amech 2 1 (Q; g  ). be chosen to be constant and is determined by
the identity (, ) = jg  on g  . The solution of
There are several important situations when
the equation on step 3 is then g(t) = exp(t).
step 3, the main obstruction to an explicit solution ˙ is proportional to (t). Try
(c3) The case when (t)
of the reconstruction problem, can be carried out.
to find a real-valued function f (t) such that
We shall review some of them below.
g(t) = exp(f (t)(t)) is a solution of the equation
(c1) The case G = S1 . If G is abelian, Rthe equation in g(t)
_ = Te Lg(t) (t) with f (0) = 0. This gives, for
t
step 3 has the solution g(t) = exp 0 (s)ds. If, in small t, the equation f_ (t)(t) þ f (t)(t) ˙ = (t),
1
addition, G = S , then (s) can be explicitly that is, it is necessary that (t) and (t) ˙ be
determined by step 2. Indeed, if
2 g  is a proportional. So, if (t) ˙ = (t)(t) for some
generator of g  , writing (s) = a(s)
for some known R t smooth R s function (t), then this gives
smooth real-valued function a defined on some f (t) = 0 exp( t (r)dr) ds.
open interval around the origin, the algebraic (c4) The case of G solvable. Write g(t) = exp(f1 (t)1 )
equation in step 2 implies that hha(s)(t)Q (qh (t)), exp(f2 (t)2 )    exp(fn (t)n ), for some basis

Q (qh (t))ii = h,
i, which gives a(s)= h,
i= {1 , 2 , . . . , n } of g  and some smooth real-valued
k
Q (qh (s))k2 . Therefore, the base integral curve of functions fi , i = 1, 2, . . . , n, defined around zero. It
the solution of the simple mechanical system with is known that if G is solvable, the equation in
symmetry on T  Q passing through q is step 3 can be solved by quadratures for the fi .
Z t !
ds
qðtÞ ¼ exp h;
i 2

 qh ðtÞ Reconstruction Phases for Simple Mechanical
0 k
Q ðqh ðsÞÞk
Systems with S 1 Symmetry
and Consider a simple mechanical system with symmetry
Z ! G on the Riemannian manifold (Q, hh , ii) with
t
ds G-invariant potential V 2 C1 (Q). If  2 g  , let V
qðtÞ
_ ¼ exp h;
i

0 k
Q ðqh ðsÞÞk2 be the amended potential and V b  2 C1 (Q=G ) the
! induced function on the base. Let c : [0, T] ! T  Q be
h;
i an integral curve of the system with Hamiltonian
 q_ h ðtÞ þ
Q ðqh ðtÞÞ
k
Q ðqh ðsÞÞk2 h = K þ V  Q and suppose that its projection
c : [0, T] ! (T  Q) to the reduced space is a closed
(c2) The case of compact Lie groups. An obvious integral curve of the reduced system with Hamil-
situation when the differential equation in step 3 tonian h . The reconstruction phase associated to
can be solved is if (t) =  for all t, where  is a the loop c (t) is the group element g 2 G , satisfying
given element of g  . Then the solution is the identity c(T) = g  c(0). We shall present two
g(t) = exp(t). However, step 2 puts certain explicit formulas of the reconstruction phase for the
restrictions under this hypothesis, because it case when G = S1 . Let
2 g  = R be a generator of
requires that hh(t)Q (qh (t)), Q (qh (t))ii = h, i the coadjoint isotropy algebra and write c(T) =
for any  2 g  . This is satisfied if there is a exp(’
)  c(0); in this case, ’ is identified with the
bilinear nondegenerate form ( , ) on g satisfy- reconstruction phase and, as we shall see in concrete
ing (
, ) = hh
Q (q), Q (q)ii for all q 2 Q and mechanical examples, it truly represents an angle.

,  2 g . This implies that ( , ) is positive If G = S1 , the G -principal bundle  : J 1 () !
definite and invariant under the adjoint action (T  Q) := J 1 ()=G admits two natural connec-
of G on g , so semisimple Lie algebras of tions: A = (1=
)  2 1 ( J 1 ()), where  is the
noncompact type are excluded. If G is com- pullback of the canonical 1-form on the cotangent
pact, which ensures the existence of a positive bundle to the momentum level submanifold J 1 (),
adjoint invariant inner product on g , and and Q Amech 2 1 ( J 1 ()). There is no reason to
Q = G, this condition implies that the kinetic choose one connection over the other and thus there
energy metric is invariant under the adjoint are two natural formulas for the reconstruction
action. There are examples in which such phase in this case. Let c (t) be a periodic orbit of
conditions are natural, such as in Kaluza– period T of the reduced system and denote also by
Klein theories. Thus, if G is a compact Lie h the value of the Hamiltonian function on it.
664 Cotangent Bundle Reduction

Assume that D is a two-dimensional surface in Casimir functions that are all smooth functions of
(T  Q) whose boundary is the loop c (t). Since the kk2 , where  2 R 3 denotes the body angular
manifolds (T  Q) and T  (Q=S1 ) are diffeomorphic momentum.
(but not symplectomorphic), it makes sense to The Hamiltonian of the rigid body on the Lie–
consider the base integral curve q (t) obtained by Poisson space T  SO(3)=SO(3) ffi R 3 is given by
projecting c (t) to the base Q=S1 , which is a closed  
curve of period T. Denote by 1 21 22 23
hðÞ :¼ þ þ
Z T 2 I1 I2 I3
b  i :¼ 1
hV b  ðq ðtÞÞ dt
V where I1 , I2 , I3 > 0 are the principal moments of
T 0
inertia of the body. Let I := diag(I1 , I2 , I3 ) denote the
the average of V b  over the loop q (t). Let qh (t) 2 Q moment of inertia tensor diagonalized in a principal-
be the Amech -horizontal lift of q (t) to Q and let be axis body frame. The Lie–Poisson bracket on R3 is
the Amech -holonomy of the loop q (t) measured from given by {f , g}() =   (rf ()  rg()) and the
q(0), the base Rpoint of c(0); its expression is given by equation of motions are  ˙ =   , where  2 R3 is
R
exp = exp( D B), where B is the curvature of the the body angular velocity given in terms of  by
mechanical connection. Denote by ! the reduced i := =Ii , for i = 1, 2, 3, that is,  = I1 . The
symplectic form on (T  Q) . With these notations the trajectories of the these equations are found by
phase ’ is given by intersecting a family of homothetic energy ellipsoids
ZZ with the angular momentum concentric spheres. If
1 2ðh  hV b  iÞT
’¼ ! þ I1 > I2 > I3 , one immediately sees that all orbits are

D 
periodic with the exception of four centers (the two
Z T
ds possible rotations about the long and the short
¼ þ 
2
½3 moment of inertia axis of the body), two saddles
0 k
Q ðqh ðsÞÞk
(the two rotations about the middle moment of
The first terms in both formulas are the so-called inertia axis of the body), and four heteroclinic orbits
geometric phases because they carry only geometric connecting the two saddles.
information given by the connection, whereas the Suppose that (t) is a periodic orbit on the sphere
second terms are called the dynamic phases since S2kk with period T. After time T, by how much has
they encapsulate information directly linked to the the rigid body rotated in space? The answer to this
Hamiltonian. The expression of the total phase as a question follows directly from [3]. Taking
= =kk
sum of a geometric and a dynamic phase is not and the potential v 0 we get
intrinsic and is connection dependent. It can even
2h T
happen that one of these summands vanishes. We ’ ¼  þ
shall consider now two concrete examples: the free kk
ZZ
rigid body and the heavy top. 2kIðsÞk2  ððsÞ  IðsÞÞðtr IÞ
¼ ds
D ððsÞ  IðsÞÞ2
Reconstruction Phases for the Free Rigid Body Z T
ds
þ kk3
The motion of the free rigid body is a geodesic with 0 ððsÞ  IðsÞÞ
respect to a left-invariant Riemannian metric on
where D is one of the two spherical caps on S2kk
SO(3) given by the moment of inertia of the body.
whose boundary is the periodic orbit (t), h is the
The phase space of the free rigid body motion is
value of the total energy on the solution (t), and 
T  SO(3) and a momentum map J : T  SO(3) ! R 3 of
is the oriented solid angle, that is,
the lift of left translation to the cotangent bundle is Z Z
given by right translation to the identity element. 1 areaD
 :¼  ! ; jj ¼
We have identified here so(3) with R3 by the kk D kk2
Lie algebra isomorphism x 2 (R3 ,  ) 7! x ^ 2 (so(3),
[ , ]), where x^(y) = x  y, and so(3) with R3 by
the inner product on R 3 . The reduced manifold
Reconstruction Phases for the Heavy Top
J 1 ()=G is identified with the sphere S2kk in R3 of
radius kk with the symplectic form ! = dS=kk, The heavy top is a simple mechanical systems with
where dS is the standard area form on S2kk and G ffi symmetry S1 on T  SO(3) whose Hamiltonian function
S1 is the group of rotations around the axis . These is given by h(h ) := (1=2)k]h k2 þ Mg‘k  h , where
concentric spheres are the coadjoint orbits of the Lie– h 2 SO(3), h 2 Th SO(3), k is the unit vector of the
Poisson space so(3) and represent the level sets of the spatial Oz axis (pointing in the direction opposite to
Cotangent Bundle Reduction 665

that of the gravity force), M 2 R is the total mass of the action. The leaves of this Poisson manifold are the
body, g 2 R is the value of the gravitational accelera- orbit reduced spaces J 1 (O )=G, where O  g  is
tion, the fixed point about which the body moves is the the coadjoint G-orbit through  2 g  (see Symmetry
origin, and is the unit vector of the straight line and Symplectic Reduction). Is there an explicit
segment of length ‘ connecting the origin to the center formula for this reduced Poisson bracket on a
of mass of the body. This Hamiltonian is left invariant manifold diffeomorphic to (T  Q)=G? It turns out
under rotations about the spatial Oz axis. A momen- that this question has two possible answers, once a
tum map induced by this S1 -action is given by connection on the principal bundle  : Q ! Q=G is
J : T  SO(3) ! R, J(h ) = Te Lh (h )  k; recall that introduced. The discussion below will also link to
Te Lh (h ) =:  2 R3 is the body angular momentum. the fibration version of cotangent bundle reduction.
The reduced space J 1 ()=S1 is generically the cotan- In order to present these answers, we review two
gent bundle of the unit sphere endowed with the bundle constructions. Let G act freely and properly
symplectic structure given by the sum of the canonical on the manifold P and consider the a (left) principal
form plus a magnetic term; equivalently, this is the G-bundle  : P ! P=G := M. Let : N ! M be a
coadjoint orbit in the dual of the Euclidean Lie algebra surjective submersion. Then the pullback bundle
se(3) = R 3  R 3 given by O = {(, ) j    = , ˜ : (n, p) 2 P~ := {(n, p) 2 N  P j (p) = (n)} 7! n 2 N
kk2 = 1}. The projection map J 1 () ! O imple- over N is also a principal (left) G-bundle relative to
menting the symplectic diffeomorphism between the the action g  (n, p) := (n, g  p).
reduced space and the coadjoint orbit in se(3) is If there is a (left) G-action a manifold V, then the
given by h 7! (, ) := (Te Lh (h ), h1 k). The orbit diagonal G-action g  (p, v) = (g  p, g  v) on P  V is
symplectic form ! on O has the expression also free and proper and one can form the asso-
! (, )((  x þ   y,   x), (  x0 þ   y0 , ciated bundle P G V := (P  V)=G which is a
  x0 )) =   (x  x0 )    (x  y0  x0  y) for any locally trivial fiber bundle E : [p, v] 2 E := P G
x, x0 , y, y0 2 R3 . The heavy-top equations  ˙ =  þ V 7! (p) 2 M over M with fibers diffeomorphic to
Mg‘  , ˙ =    are Lie–Poisson equations on V. Analogously, one can form the associated fiber
se(3) for the Hamiltonian h(, ) = (1=2)   þ bundle E~ : E ~ := P
~ G V ! N. Summarizing, the
Mg‘  and the Lie–Poisson bracket {f , g}(, ) = associated bundle E ~ =P
~ G V ! N is obtained
  (r f  r g)    (r f  r g  r g  r f ), from the principal bundle  : P ! M, the surjective
where r and r denote the partial gradients. submersion : N ! M, and the G-manifold V by
Let ((t), (t)) be a periodic orbit of period T of pullback and association, in this order.
the heavy-top equations. After time T, by how much These operations can be reversed. First, form the
has the heavy top rotated in space? The answer is associated bundle E : E = P G V ! M and then
provided by [3]: pull it back by the surjective submersion : N ! M
ZZ  Z T  to N to get the pullback bundle ~E : E ~ ! N. The map
1 1 ~ ~
 : P G V ! E defined by ([(n, p), v]) := (n, [p, v])
’¼ ! þ 2h T  2Mg‘ ðsÞ  ds
 D  0 is an isomorphism of locally trivial fiber bundles.
ZZ
2kIðsÞk2  ððsÞ  IðsÞÞðtr IÞ These general considerations will be used now to
¼ ds realize the quotient Poisson manifold (T  Q)=G in
D ððsÞ  IðsÞÞ2
Z T two different ways. Let Q be a manifold and G a Lie
ds group (with Lie algebra g ) acting freely and properly
þ
0 ðsÞ  IðsÞ on it. Let A 2 1 (Q; g ) be a connection 1-form on
where D is the spherical cap on the unit sphere the left G-principal bundle  : Q ! Q=G. Pull back
whose boundary is the closed curve (t) and D is a the G-bundle  : Q ! Q=G by the cotangent bundle
two-dimensional submanifold of the orbit O projection Q=G : T  (Q=G) ! Q=G to T  (Q=G) to
obtain the G-principal bundle ~Q=G : ([q] , q) 2 Q ~ :=
bounded by the closed integral curve ((t), (t)). 
The first terms in each summand represent the {([q] , q) j [q] = (q), q 2 Q} 7! [q] 2 T (Q=G). This
geometric phase and the second terms the dynamic bundle is isomorphic to the annihilator (VQ) 
phase. T  Q of the vertical bundle VQ := ker T  TQ.
Next, form the coadjoint bundle S : S := Q ~ G
  ~
g ! T (Q=G) of Q, S (([q] , q), ) = [q] , that is,
Gauged Poisson Structures
the associated vector bundle to the G-principal
If the Lie group G acts freely and properly on a bundle Q ~ ! T  (Q=G) given by the coadjoint repres-
smooth manifold Q, then (T  Q)=G is a quotient entation of G on g  . The connection-dependent map
Poisson manifold (see Poisson Reduction), where the A : S ! (T  Q)=G defined by A ([([q] , q), ]) :=
quotient is taken relative to the (left) lifted cotangent [Tq ([q] ) þ A(q) ], where q 2 Q, q 2 Tq Q, and
666 Cotangent Bundle Reduction

 2 g  , is a vector bundle isomorphism over Q=G. by re W f (w)(v ) = df (w)(v , T(q, ) Qg  (horq
A [q] [q]
The Sternberg space is the Poisson manifold (S, { , }S ), (T[q] Q=G (v[q] )), 0)) where Qg  : Q  g  !
where { , }S is the pullback to S by A of the quotient Q G g  = e g  is the orbit map. The symbol r eW
A
Poisson bracket on (T  Q)=G. signifies that this is a covariant derivative on the
Next, we proceed in the opposite order. Construct pullback bundle W induced by the covariant
first the coadjoint bundle  ~g  : [q, ] 2 e g  := Q G derivative rA on the coadjoint bundle e g  . This
g  7! [q] 2 Q=G associated to the principal bundle covariant derivative rA is induced on e g  by the
 : Q ! Q=G and then pull it back by the cotangent connection A.
bundle projection Q=G : T  (Q=G) ! Q=G to
For f 2 C1 (W), we have dSA~ (f  ) = (r e W f )  .
A
T  (Q=G) to obtain the vector bundle W : W :=
To write the two gauged Poisson brackets on S and
{([q] , [q, ]) j Q=G ([q] ) =  ~g  ([q, ]) = [q]}, W ([q] ,
on W explicitly, we denote by ~g = Q G g the
[q, ]) = [q] over T  (Q=G). Note that W = T 
adjoint bundle of  : Q ! Q=G, by Q=G the
(Q=G)  e g  and hence W is also a vector bundle over
canonical symplectic structure on T  (Q=G), by
Q/G. Let HQ be the horizontal sub-bundle defined by
B 2 2 (Q; g ) the curvature of A, and by B the
the connection A; thus, TQ = HQ  VQ, where
~g -valued 2-form B 2 2 (Q=G; ~g ) on the base Q=G
Hq Q := ker A(q). For each q 2 Q, the linear map
defined by B([q])(u[q] , v[q] ) = [q, B(q)(uq , vq )], for any
Tq jHq Q : Hq Q ! T[q] (Q=G) is an isomorphism. Let
uq , vq 2 Tq Q that satisfy Tq (uq ) = u[q] and
horq := (Tq jHq Q )1 : T[q] (Q=G) ! Hq Q  Tq Q be
Tq (vq ) = v[q] . Note that both S and W  are Lie
the horizontal lift operator induced by the connection
algebra bundles, that is, their fibers are Lie algebras
A. Thus, horq : Tq Q ! T[q] 
(Q=G) is a linear surjective
and the fiberwise Lie bracket operation depends
map whose kernel is the annihilator (Hq Q) of the
smoothly on the base point. If f 2 C1 (S), denote by
horizontal space. The connection-dependent map ~ G g the usual fiber derivative of f.
f = s 2 S = Q
A : (T  Q)=G ! W defined by A ([q ]) := (horq
Similarly, if f 2 C1 (W) denote by f = w 2 W  the
(q ), [q, J(q )]), where q 2 Q, q 2 Tq Q, and J : T 
usual fiber derivative of f. Finally, ] : T 
Q ! g  is the momentum map of the lifted action,
(T  (Q=G)) ! T(T  (Q=G)) is the vector bundle iso-
h J(q ), i = q ((Q (q)) for  2 g , is a vector bundle
morphism induced by Q=G . The Poisson bracket of
isomorphism over Q/G and A  A = . The Wein-
f , g 2 C1 (S) is given by
stein space is the Poisson manifold (W, { , }W ), where
{ , }W is the push-forward by A of the Poisson 
ff ; ggS ðsÞ ¼ Q=G ð½q Þ d SA~f ðsÞ] ; d SA~gðsÞ]
bracket of (T  Q)=G. In particular,  : S ! W is a


connection independent Poisson diffeomorphism. The f g
Poisson brackets on S and on W are called gauged  s; ;
s s
Poisson brackets. They are expressed explicitly in terms D  E
of various covariant derivatives induced on S and on þ v; ðQ=G BÞð½q Þ d SA~f ðsÞ] ; d SA~gðsÞ]
W by the connection A 2 1 (Q; g ).
Recall that the connection A on the principal g  . The Poisson bracket f , g 2
where v = [q, ] 2 e
bundle  : Q ! Q=G naturally induces connections C1 (W) is given by
on pullback bundles and affine connections on  W
associated vector bundles. Thus, both S and W ff ; ggW ðwÞ ¼ Q=G ð½q Þ r e W gðwÞ]
e f ðwÞ] ; r
A A
carry covariant derivatives induced by A. They are

f g
given, according to general definitions, in the cases  w; ;
w w
under consideration, by: D  W E

þ v; ðQ=G BÞð½q Þ r e f ðwÞ] ; re W gðwÞ]
A A

If f 2 C1 (S), s = [([q] , q), ] 2 S, and v[q] 2 T[q]


T  (Q=G), then d SA~ f (s) 2 T [q] T  (Q=G) is defined Note that their structure is of the form: ‘‘canonical’’
by d SA~ f (s)(v[q] ) := df (s)ðT(([q] , q), ) Qg
~  v[q] , horq bracket plus a (left) ‘‘Lie–Poisson’’ bracket plus a
(T[q] (v[q] ))Þ, 0ÞÞ where Qg ~ 
~  : Q  g !QG
~ curvature coupling term.
 S
g = S is the orbit map. The symbol d A~ signifies
that this is a covariant derivative on the The Symplectic Leaves of the Sternberg
associated bundle S induced by the connection and Weinstein Spaces
~ on the principal G-pullback bundle
A ~  g  ! T  Q given by ’A (([q] , q),
The map ’A : Q
Q~ ! T  (Q=G). This connection A ~ is the pullback ~  g ,
) := Tq ([q] ) þ A(q) , where (([q] , q),)2 Q

connection defined by A. is a G-equivariant diffeomorphism; the G-action

If f 2 C1 (W), w = ([q] , [q, ]) 2 W, and v[q] 2 T[q] on T  Q is by cotangent lift and on Q ~  g  is
T  (Q=G), then r e W f (w) 2 T  T  (Q=G) is defined 
g  (([q] , q), ) = (([q] , g  q), Adg1 ). The pullback J A
A [q]
Cotangent Bundle Reduction 667

of the momentum map to Q ~  g  has the expression Further Reading


J A (([q] , q), ) = , so if O  g  is a coadjoint orbit we
~ Abraham R and Marsden JE (1978) Foundations of Mechanics,
have J 1 A (O)= Q  O, and hence the orbit reduced 2nd edn. Reading, MA: Addison-Wesley.
manifold J 1 A (O)=G, whose connected components Guichardet A (1984) On rotation and vibration motions of
are the symplectic leaves of S, equals Q ~ G O. Its molecules. Annales de l’Institut Henri Poincaré. Physique
symplectic form is the Sternberg minimal coupling Théorique 40: 329–342.
form ! ~  Iwai T (1987) A geometric setting for classical molecular
O þ S Q=G .
dynamics. Annales de l’Institut Henri Poincaré. Physique
In this formula, the 2-form ! ~
O has not been Théorique. 47: 199–219.
defined yet. It is uniquely defined by the identity Kummer M (1981) On the construction of the reduced phase
~ b  
Qg
~ ! O = d A þ O !O , where !O is the minus orbit space of a Hamiltonian system with symmetry. Indiana
symplectic form on O (see Symmetry and Symplectic University Mathematics Journal 30: 281–291.
Reduction), O : Q ~  O ! O is the projection on the Lewis D, Marsden JE, Montgomery R, and Ratiu TS (1986) The
b 2 2 (Q ~  O) is the 2-form Hamiltonian structure for dynamic free boundary problems.
second factor, and A Physica D 18: 391–404.
given by b [q] , q), )
A(( ((u[q] , vq ), ) =
  Marsden JE, Montgomery R, and Ratiu TS (1990) Reduction,
 , A(q)(vq ) for (([q] , q), ) 2 Q ~  O, (u , vq ) 2 symmetry, and phases in mechanics. Memoirs of the American
[q]
~ and  2 g  .
T([q] , q) Q, Mathematical Society 88(436).
The symplectic leaves of the Weinstein space Marsden JE, Misiołek G, Ortega J-P, Perlmutter M, and Ratiu TS
(2005) Hamiltonian Reduction by Stages, Lecture Notes in
W are obtained by pushing forward by  the Mathematics. Springer.
symplectic leaves of the Sternberg space. They are Marsden JE and Perlmutter M (2000) The orbit bundle picture of
the connected components of the symplectic cotangent bundle reduction. Comptes Rendus Mathématiques de
manifolds (T  (Q=G)  (Q G O), T  (Q=G) Q=G þ l’Académie des Sciences. La Société Royale du Canada
Q G O !  22: 33–54.
Q G O ), where O is a coadjoint orbit in g ,
 Marsden JE and Ratiu TS (2003) Introduction to Mechanics and
Q=G is the canonical symplectic form on T (Q=G), Symmetry, 2nd edn. second printing; 1st edn. (1994), Texts in
! QG O is a closed 2-form on Q G O to be defined Applied Mathematics, vol. 17. New York: Springer.
below, and T  (Q=G) : T  (Q=G)  (Q G O) ! Montgomery R (1984) Canonical formulations of a particle in a
T (Q=G), QG O : T  (Q=G)  (Q G O) ! Q G O
 Yang–Mills field. Letters in Mathematical Physics 8: 59–67.
are the projections. The closed 2-form ! Montgomery R (1991) How much does a rigid body rotate? A
QG O 2
Berry’s phase from the eighteenth century. American Journal
2 (Q G O) is uniquely determined by the identity of Physics 59: 394–398.
Q  O ! 
Q G O = !QO , where QO : Q  O ! Q G O Montgomery R, Marsden JE, and Ratiu TS (1984) Gauged Lie
2
is the orbit space projection, ! QO 2  (Q  O) is Poisson structures. In: Marsden J (ed.) Fluids and Plasmas:

closed and given by ! QO (q, )((u q , ad ),
Geometry and Dynamics, Contemporary Mathematiques,
  vol. 28, pp. 101–114. Providence, RI: American Mathematical
(vq , ad )) := d(A  idO )(q, ) ((uq , ad ), (vq ,
Society.
ad )) þ !  
O ()(ad , ad ), and A  idO 2  (Q
1
Satzer WJ (1977) Canonical reduction of mechanical systems

g ) is given  by (A  idO )(q, )(uq , ad ) = invariant under abelian group actions with an application to
, A(q)(uq ) , for q 2 Q,  2 g  , uq , vq 2 Tq Q, ,  2 g . celestial mechanics. Indiana, University Mathematics Journal
Thus, on the Sternberg and Weinstein spaces, 26: 951–976.
Simo JC, Lewis D, and Marsden JE (1991) The stability of relative
both the Poisson bracket as well as the symplectic
equilibria. Part I: The reduced energy–momentum method.
form on the leaves have explicit connection Archive for Rational Mechanics and Analysis 115: 15–59.
dependent formulas (see Gauge Theory: Mathema- Smale S (1970) Topology and mechanics. Inventiones Mathema-
tical Applications for a general treatment of gauge ticae 10: 305–331, 11: 45–64.
theories). Sternberg S (1977) Minimal coupling and the symplectic mechanics of
a classical particle in the presence of a Yang–Mills field.
Proceedings of the National Academy of Sciences 74: 5253–5254.
See also: Gauge Theory: Mathematical Applications;
Weinstein A (1978) A universal phase space for particles in Yang–
Hamiltonian Group Actions; Poisson Reduction;
Mills fields. Letters in Mathematical Physics 2: 417–420.
Symmetries and Conservation Laws; Symmetry and Zaalani N (1999) Phase space reduction and Poisson structure.
Symplectic Reduction. Journal of Mathematical Physics 40(7): 3431–3438.
668 Critical Phenomena in Gravitational Collapse

Critical Phenomena in Gravitational Collapse


C Gundlach, University of Southampton, for the black hole mass M in the limit p ! p
Southampton, UK from above.
ª 2006 Elsevier Ltd. All rights reserved. 2. Universality. While p and C depend on the
particular one-parameter family of data, the critical
exponent  has a universal value,  ’ 0.374, for all
one-parameter families of scalar-field data. Further-
Introduction more, for a finite time in a finite region of space, the
solutions generated by all near-critical data
Sufficiently dense concentrations of mass–energy in approach one and the same solution  , called the
general relativity collapse irreversibly and form black critical solution:
holes. More precisely, the singularity theorems state r t  t 
that once a closed trapped surface has developed, some ðr; tÞ ’  ;

½2
world lines will only extend to a finite length in the L L
future – they end in a spacetime singularity. Further- The constants t and L depend again on the
more, the cosmic censorship hypothesis states that this family of initial data, but  (r, t) is universal. This
singularity is hidden away inside a black hole. One universal phase ends when the evolution decides
can, therefore, classify initial data in general relativity between black hole formation and dispersion.
which describe an isolated system with no black hole The universal critical solution is approached by
present into those which remain regular, and those any initial data that are sufficiently close to the
which form a black hole during their evolution. black hole threshold, on either side, and from any
Theorems on the stability of Minkowski spacetime, one-parameter family.
and similar results for some types of matter coupled to 3. Scale-echoing. The critical solution  (r, t) is
gravity, imply that sufficiently weak (in some technical unchanged when one rescales space and time by
sense) initial data will remain regular. On the other a factor e :
hand, no necessary or sufficient criterion for black hole  
formation is known. For very strong data the existence  ðr; tÞ ¼  e r; e t ½3
of a closed trapped surface implies black hole
where  ’ 3.44 for the scalar field.
formation, but although the data themselves may be
regular, the trapped surface must already be inside the The same phenomena were quickly discovered in
black hole. Between the very weak and very strong many other types of matter coupled to gravity, and
regime, there is a middle regime of initial data for even in vacuum gravity (where gravitational waves can
which one cannot decide if they will or will not form a form black holes). The echoing period  and critical
black hole, other than evolving them in time. exponent  depend on the type of matter, but the
The threshold between collapse and dispersion was existence of the phenomena appears to be generic. For
first explored systematically by Choptuik (1992). He some types of matter (e.g., perfect fluid matter), the
concentrated on the simple model of a spherically critical solution is continuously scale invariant (or
symmetric massless scalar (matter) field (r, t). In this continuously self-similar, CSS) in the sense that
model, the scalar-field matter must either form a black
hole, or disperse to infinity – it cannot form stable  ðr; tÞ ¼  ðr=tÞ ½4
stars. Choptuik explored the space of initial data by rather than scale-periodic (or discretely self-similar,
means of one-parameter families of initial data which DSS) as in [3]. (We use the notation  (x) for the
interpolate between strong data (say with large function of one variable r=t.) We have described
parameter p) that form a black hole and weak data scale invariance and scale-echoing here in terms
(with small p) that disperse. The critical value p of the of coordinates, but these do admit geometric,
parameter p can be found for each family by evolving coordinate-invariant definitions, which are not
many data sets from that family. Near the black hole restricted to spherical symmetry.
threshold, Choptuik found the following phenomena: There is also another kind of critical behavior at the
1. Mass scaling. By fine-tuning the initial data to black hole threshold. Here, too, the evolution goes
the threshold along any one-parameter family, through a universal critical solution, but it is static,
one can make arbitrarily small black holes. Near rather than scale invariant. As a consequence, the mass
the threshold, the black hole mass scales as of black holes near the threshold takes a universal
finite value (some fixed fraction of the mass of the
M ’ Cðp  p Þ for p  p ½1 critical solution), instead of showing power-law
Critical Phenomena in Gravitational Collapse 669

scaling. In an analogy with first- and second-order is that the same spacetime can be sliced in many
phase transitions in statistical mechanics, the critical different ways, none of which is preferred. There-
phenomena with a finite mass at the black hole fore, to turn general relativity into a dynamical
threshold are called type I, and the critical phenomena system, one has to fix a slicing (and in practice also
with power-law scaling of the mass are called type II. coordinates on each slice). In the example of the
At this point, we characterize the degree of rigor spherically symmetric massless scalar field, using
of the various parts of the theory that is summarized polar slicing and an area radial coordinate r, a point
in this article. Critical phenomena were discovered in phase space can be characterized by the two
in the numerical time evolution of generic asympto- functions
tically flat initial data. Numerical evolution of many  
elements of a specific one-parameter family, and @
Z ¼ ðrÞ; r ðrÞ ½5
fine-tuning to the black hole threshold along that @t
family showed self-similarity and mass scaling near
In spherical symmetry, there are no degrees of
the threshold. Doing this for a number of randomly
freedom in the scalar field, and Cauchy data for
chosen one-parameter families suggests that these
the metric can be reconstructed from Z using the
phenomena, and in particular the echoing scale 
Einstein constraints.
and mass-scaling exponent , are universal between
The phase space consists of two halves: initial
initial data within one model (e.g., the spherical
data whose time evolution always remains regular,
scalar field). Numerical experiments, however, can
and data which contain a black hole or form one
only explore a finite-dimensional subspace of the
during time evolution. The numerical evidence
infinite-dimensional space of initial data (phase
collected from individual one-parameter families of
space) of the field theory, and so cannot prove
data suggests that the black hole threshold that
universality.
separates the two is a smooth hypersurface. The
We go further by applying the theory of dynami-
mass-scaling law [1] can, therefore, be restated
cal systems to general relativity. The arguments
without explicit reference to one-parameter families.
summarized in the next section would be difficult to
Let P be any function on phase space such that data
make rigorous, as the dynamical system under
sets with P > 0 form black holes, and data with P < 0
consideration is infinite dimensional, but they
do not, and which is analytic in a neighborhood of
suggest a focus on fixed points of the dynamical
the black hole threshold P = 0. The black hole mass
system and their linear perturbations. Even though
as a function on phase space is then given by
the dynamical systems motivation is not mathema-
tically rigorous, the linearized analysis itself is a M ’ FðPÞ P ½6
well-defined problem that can be solved numerically
to essentially arbitrary precision. This proves uni- for P > 0, where F(P) > 0 is an analytic function.
versality on a perturbative level, and provides Consider now the time evolution in this dynami-
numerical values of  and . A combination of the cal system, near the threshold (‘‘critical surface’’)
global dynamical systems analysis and perturbative between black hole formation and dispersion. A
analysis even predicts further critical exponents for phase-space trajectory that starts out in a critical
black hole charge and angular momentum. Finally, surface by definition never leaves it. A critical
critical phenomena have been discovered in a surface is, therefore, a dynamical system in its own
number of systems (different types of matter and right, with one dimension fewer. If it has an
symmetry restrictions), and this suggests that they attracting fixed point, such a point is called a
may be generic for some large class of field theories critical point. It is an attractor of codimension 1,
(although details such as the numerical values of and the critical surface is its basin of attraction. The
 and  do depend on the system), but there is no fact that the critical solution is an attractor of
conclusive evidence for this at present. codimension 1 is visible in its linear perturbations: it
has an infinite number of decaying perturbation
modes tangential to (and spanning) the critical
The Dynamical Systems Picture
surface, and a single growing mode not tangential
When we consider general relativity as an infinite- to the critical surface.
dimensional dynamical system, a solution curve is a Any trajectory beginning near the critical surface,
spacetime. Points along the curve are Cauchy but not necessarily near the critical point, moves
surfaces in the spacetime, which can be thought of almost parallel to the critical surface toward the
as moments of time. An important difference critical point. As the phase point approaches the
between general relativity and other field theories critical point, its movement parallel to the surface
670 Critical Phenomena in Gravitational Collapse

Flat space fixed point terms this corresponds to a discrete symmetry (DSS
rather than CSS in type II, or a pulsating critical
Black hole solution, rather than a stationary one, in type I).
threshold
One-parameter
family of
initial data Self-Similarity and Mass Scaling
Critical
point Type II critical phenomena occur where the critical
solution is scale invariant (self-similar, CSS or DSS).
p<p Using suitable spacetime coordinates, a CSS solution
p=p * can be characterized as independent of a time
p>p * coordinate  which is also a logarithmic scale.
*
Similarly, a DSS solution can be characterized as
periodic in . For example, starting from the scale
periodicity [3] in polar-radial coordinates, we
Black hole fixed point replace r and t by new coordinates
r  tt 

Figure 1 The phase-space picture for the black hole threshold x ;    ln  ½7
t  t L
in the presence of a critical point. The arrow lines are time
evolutions, corresponding to spacetimes. The line without an where the accumulation time t and scale L must be
arrow is not a time evolution, but a one-parameter family of initial matched to the one-parameter family under con-
data that crosses the black hole threshold at p = p . (Reproduced
sideration.  has been defined so that it increases as
with permission from Gundlach C (2003) Critical phenomena in
gravitational collapse. Physics Reports 376: 339–405.) t increases and approaches t from below. It is useful
to think of r, t, and L as having dimension length in
units c = G = 1, and of x and  as dimensionless.
slows down, while its distance and velocity out of Choptuik’s observation, expressed in these coordi-
the critical surface are still small. The phase point nates, is that in any near-critical solution there is
spends sometime moving slowly near the critical a spacetime region where the fields Z are well
point. Eventually, it moves away from the critical approximated by the critical solution, or
point in the direction of the growing mode, and ends Zðx; Þ ’ Z ðx; Þ ½8
up on an attracting fixed point.
This is the origin of universality: any initial data with
set that is close to the black hole threshold (on either Z ðx;  þ Þ ¼ Z ðx; Þ ½9
side) evolves to a spacetime that approximates the
critical spacetime for sometime. When it finally Note that the time parameter of the dynamical
approaches either the dispersion fixed point or the system must be chosen as  if a CSS solution is to be
black hole fixed point, it does so on a trajectory that a fixed point, or a DSS solution a cycle. More
appears to be coming from the critical point itself. generally (going beyond spherical symmetry), on any
All near-critical solutions are passing through one of self-similar spacetime one can introduce coordinates
these two funnels. All details of the initial data have x = (, x1 , x2 , x3 ) in which the metric is of the form
been forgotten, except for the distance from the
g ¼ e2 g ½10
black hole threshold: the closer the initial phase
point is to the critical surface, the more the solution and where ḡ is independent of  for a CSS
curve approaches the critical point, and the longer it spacetime, and periodic in  for a DSS spacetime.
will remain close to it. These coordinates are not unique.
In all systems that have been examined, the black The critical exponent  can be calculated from the
hole threshold contains at least one critical point. A linear perturbations of the critical solution. In order
fixed point of the dynamical system represents a to keep the notation simple, the discussion will be
spacetime with an additional continuous symmetry restricted to a critical solution that is spherically
that generic solutions do not have. If the critical symmetric and CSS, which is correct, for example,
spacetime is time independent in the usual sense, we for perfect-fluid matter.
have type I critical phenomena; if the symmetry is Let us assume that we have fine-tuned initial data
scale invariance, we have type II critical phenomena. close to the black hole threshold so that in a region
The attractor within the critical surface may also be the resulting spacetime is well approximated by the
a limit cycle, rather than a fixed point. In spacetime CSS critical solution. This part of the spacetime
Critical Phenomena in Gravitational Collapse 671

corresponds to the section of the phase-space These Cauchy data at t = tp depend on the initial
trajectory that lingers near the critical point. In this data at t = 0 only through the overall scale Lp , and
region, we can linearize around Z . As Z does not through the sign in front of . If the field equations
depend on , its linear perturbations can depend themselves are scale invariant, or asymptotically
on  only exponentially. Labeling the perturbation scale invariant at scales Lp and smaller, the black
modes by i, a single mode perturbation is of hole mass, which has dimensions of length in
the form gravitational units, must be proportional to the
initial data scale Lp , the only length scale that is
Z ¼ Ci ei  Zi ðxÞ ½11 present. Therefore,
In the near-critical regime, we can therefore
M / Lp / ðp  p Þ1=0 ½18
approximate the solution as
X
1 and we have found the critical exponent to be  = 1=0 .
i 
Zðx; Þ ’ Z ðxÞ þ Ci ðpÞ e Zi ðxÞ ½12
i¼0

The notation Ci (p) is used because the perturbation The Analogy with Statistical Mechanics
amplitudes Ci depend on the initial data, and hence The existence of a threshold where a qualitative
on the parameter p that controls the initial data. change takes place, universality, scale invariance,
If Z is a critical solution, by definition there is and critical exponents suggest that there is a
exactly one i with positive real part (in fact, it is mathematical analogy between type II critical
purely real), say 0 . As t ! t from below, which phenomena and critical phase transitions in statis-
corresponds to  ! 1, all other perturbations decay tical mechanics.
and can be neglected. By definition, the critical In equilibrium statistical mechanics, observable
solution corresponds to p = p , and so we must have macroscopic quantities, such as the magnetization of
C0 (p ) = 0. Linearizing around p , we obtain a ferromagnetic material, are derived as statistical
 averages over microstates of the system. The
dC0 
Zðx; Þ ’ Z ðxÞ þ ðp  p Þ e0  Z0 ðxÞ ½13 expected value of an observable is
dp p X
hAi ¼ AðmicrostateÞ eHðmicrostate;Þ ½19
in a region of the spacetime. microstates
Now we extract Cauchy data at one particular
value of  within that region, namely at p The Hamiltonian H depends on the parameters ,
defined by which comprise the temperature, parameters char-
 acterizing the system such as interaction energies of
dC0  the constituent molecules, and macroscopic forces
jp  p je0 p  ½14
dp p such as the external magnetic field. The objective of
statistical mechanics is to derive relations between
where is an arbitrary small constant, so that the macroscopic quantities A and parameters .
Zðx;p Þ ’ Z ðxÞ  Z0 ðxÞ ½15 Phase transitions in thermodynamics are thresholds
in the space of external forces  at which the
where  is the sign of p  p , left behind because by macroscopic observables A, or one of their derivatives,
definition is positive. As  increases from p , the change discontinuously. In a ferromagnetic material
growing perturbation becomes nonlinear and the at high temperatures, the magnetization m of the
approximation [13] breaks down. Then either a material (alignment of atomic spins) is determined by
black hole forms (say for the positive sign), or the the external magnetic field B. At low temperatures, the
solution disperses (for the negative sign). We need material shows a spontaneous magnetization even at
not follow this nonlinear evolution in detail to find zero external field, which breaks rotational symmetry.
the black hole mass scaling in the former case: With increasing temperature, the spontaneous magne-
dimensional analysis is sufficient. Going back to tization m decreases and vanishes at the Curie
coordinates t and r, we have temperature T as

r r jmj  ðT  TÞ ½20


Zðr; tp Þ ’ Z  Z0 ½16
Lp Lp
In the presence of a very weak external field, the
where spontaneous magnetization aligns itself with the
external field B, while its strength is, to leading
Lp  Lep ½17 order, independent of B. The function m(B, T),
672 Critical Phenomena in Gravitational Collapse

therefore, changes discontinuously at B = 0. The line taking into account that the -evolution in critical
B = 0 for T < T is, therefore, a line of first-order collapse is toward smaller scales, while the renor-
phase transitions between the possible directions of malization group flow goes toward larger scales:
the spontaneous magnetization (in a one-dimen- therefore,
diverges at the critical point, while M
sional system, between m up and m down). This line vanishes.
ends at the critical point (B = 0, T = T ) where the We have shown above that the black hole mass is
order parameter jmj vanishes. The role of B = 0 as controlled by one global function P on phase space.
the critical value of B is obscured by the fact that Clearly, P is the gravity equivalent of T  T in
B = 0 is singled out by symmetry. the ferromagnet. But it is tempting to speculate
A critical phase transition involves scale-invariant (Gundlach 2002)that there is also a gravity equiva-
physics. One sign of this is that fluctuations appear lent of the external magnetic field B, which gives rise
on a large range of length scales between the to a second independent critical exponent. At least
underlying atomic scale and the scale of the sample. in some situations, the angular momentum of the
In particular, the atomic scale, and any dimensionful initial data can play this role. Note that, like B,
parameters associated with that scale, must become angular momentum is a vector, with a critical value
irrelevant at the critical point. This can be taken as that is zero because all other values break rotational
the starting point for obtaining properties of the symmetry. Furthermore, the final black hole can
system at the critical point. have nonvanishing angular momentum, which must
One first defines a semigroup acting on micro- depend on the angular momentum of the initial
states: the renormalization group. Its action is to data. The former is analogous to the magnetization
group together a small number of particles as a m, the latter to the external field B. It can be shown
single particle of a fictitious new system, using some that this analogy holds perturbatively for small
averaging procedure. Alternatively, this can also be angular momentum. Future numerical simulations
done in Fourier space. One then defines a dual will show if it goes further.
action of the renormalization group on the space of
Hamiltonians by demanding that the partition
Universality and Cosmic Censorship
function is invariant under the renormalization
group action: Critical phenomena in gravitational collapse first
X X generated interest because a complicated self-similar
0
eH ¼ eH ½21 structure and dimensionless numbers  and  arise
microstates microstates0 from generic initial data evolved by quite simple
field equations. Another point of interest is the
The renormalized Hamiltonian H 0 is in general rather detailed analogy of phenomena in a determi-
more complicated than the original one, but it can nistic field theory with critical phase transitions in
be approximated by a fixed expression where only statistical mechanics. But critical phenomena are
a finite number of parameters  are adjusted. Fixed important for general relativity mostly for a differ-
points of the renormalization group correspond to ent reason.
Hamiltonians with the parameters  at their critical Black holes are among the most important
values. The critical value of any dimensional solutions of general relativity because of their
parameter  must be zero (or infinity). Only universality: the black hole uniqueness theorems
dimensionless combinations can have nontrivial state that stable black holes are completely deter-
critical values. mined by their mass, angular momentum, and
The behavior of thermodynamical quantities at electric charge – the Kerr–Newman family of black
the critical point is in general not trivial to calculate. holes. Perturbation theory shows that any perturba-
But the action of the renormalization group on tions of black holes from the Kerr–Newman solu-
length scales is given by its definition. The blowup tions must be radiated away.
of the correlation length
at the critical point is, Critical solutions have a similar importance
therefore, the easiest critical exponent to calculate. because they are generic intermediate states of
We make contact with critical phenomena in the evolution that are also independent of the
gravitational collapse by considering the time evolu- initial data. An important distinction is that
tion in coordinates (, x) as a renormalization group critical solutions depend on the matter model,
action. The calculation of the critical exponent for and are therefore less universal than black holes.
the black hole mass M is the precise analog of the However, critical phenomena in gravitational
calculation of the critical exponent for the correla- collapse seem to arise in axisymmetric vacuum
tion length
, substituting T  T for p  p , and spacetimes, and so are apparently not linked to the
Critical Phenomena in Gravitational Collapse 673

presence of matter. Furthermore, they also arise in Outlook


perfect-fluid matter with the equation of state
Critical phenomena in gravitational collapse are now
p = =3, which is that of an ultrarelativistic gas.
well understood in spherical symmetry, both theoreti-
This is a good approximation for matter at very
cally and in numerical simulations. In some matter
high density, such as in the big bang. This is
models, the phenomenology is quite complicated, but
important because critical phenomena probe
it still fits into the basic picture outlined here.
arbitrarily large matter densities or spacetime
The crucial question as to what happens beyond
curvatures as the initial data are fine-tuned to the
spherical symmetry remains largely unanswered at
black hole threshold. At even higher densities,
the time of writing. Perturbation theory around
presumably on the Planck scale, scale invariance is
spherical symmetry suggests that critical phenom-
again broken by quantum-gravity effects, and
ena are not restricted to exactly spherical situa-
so critical phenomena will end there.
tions. This is also supported by simulations in
The cosmic censorship conjecture states that
axisymmetric (highly nonspherical) vacuum grav-
naked singularities do not arise from suitably
ity. Other simulations of nonspherical gravitational
generic initial data for suitably well-behaved mat-
collapse which cover the necessary range of space-
ter. Critical phenomena in gravitational collapse
time scales required to see critical phenomena are
have forced a tightening of this conjecture. Type II
only just becoming available, and the results are
(self-similar) critical solutions contain a naked
not yet clear-cut. For collapse with angular
singularity, that is, a point of infinite spacetime
momentum, no high-resolution calculations have
curvature from which information can reach a
yet been carried out. As the necessary techniques
distant observer. (By contrast, the singularity inside
become available, one should be prepared for
a black hole is hidden from distant observers.) On a
numerical simulations to make dramatic extensions
kinematical level, this could be seen already from
or corrections to the picture of critical collapse
the form [10] of the metric. Because the critical
drawn up here.
solution is the end state for all initial data that are
exactly on the black hole threshold, all initial data See also: Computational Methods in General Relativity:
on the black hole threshold form a naked singular- The Theory; Spacetime Topology, Causal Structure and
ity. As type II critical phenomena appear to be Singularities; Stability of Minkowski Space; Stationary
generic at least in spherical symmetry, this means Black Holes.
that in generic self-gravitating systems, the space of
regular initial data that form naked singularities is
larger than expected, namely of codimension 1.
Excluding naked singularities from generic initial Further Reading
data may be the sharpest version of cosmic censor- Abrahams AM and Evans CR (1993) Critical behavior and
ship one can now hope to prove. scaling in vacuum axisymmetric gravitational collapse. Physi-
Another point of interest in critical collapse is that cal Review Letters 70: 2980–2983.
it allows one to make a small region of arbitrarily Choptuik MW (1993) Universality and scaling in gravitational
collapse of a massless scalar field. Physical Review Letters
high curvature from finite-curvature initial data.
70: 9–12.
This may be a route for probing quantum-gravity Choptuik MW (1999) Critical behavior in gravitational collapse.
effects. Similarly, one can make black holes that are Progress of Theoretical Physics 136 (suppl.): 353–365.
much smaller than any length scale present in the Evans CR and Coleman JS (1994) Critical phenomena and self-
initial data or the matter equation of state. An similarity in the gravitational collapse of radiation fluid.
Physical Review Letters 72: 1782–1785.
application has been suggested for this in cosmol-
Gundlach C (1999) Living Reviews in Relativity 2: 4 (published
ogy, where primordial black holes could have electronically at http://www.livingreviews.org).
masses much smaller than the Hubble scale at Gundlach C (2002) Critical gravitational collapse with angular
which they are created, rather than of the order of momentum: from critical exponents to universal scaling
this scale. functions. Physical Review D 65: 064019.
674 Current Algebra

Current Algebra
G A Goldin, Rutgers University, Piscataway, NJ, USA More specifically (Adler and Dashen 1968), let
ª 2006 Elsevier Ltd. All rights reserved. F a (x), a = 1, 2, . . . ,8,  = 0, 1, 2, 3, be an octet of
hadronic vector currents, where as usual
x = (x ) = (x0 , x) denotes a point in four-dimensional
spacetime. Likewise, introduce an axial vector octet
Introduction
F 5
a (x). Unless otherwise specified, we use natural
Certain commutation relations among the current units, where h = 1 and c = 1. Define the correspond-
density operators in quantum field theories define ing charges Fa and Fa5 to be the space integrals of the
an infinite-dimensional Lie algebra. The original time components of these currents, that is,
current algebra of Gell-Mann described weak and Z
electromagnetic currents of the strongly interacting 0
Fa ðx Þ ¼ d3 xF 0a ðx0 ; xÞ
particles (hadrons), leading to the Adler–Weisberger Z ½1
formula and other important physical results. This Fa5 ðx0 Þ ¼ d3 xF 50 ðx0
; xÞ
a
helped inspire mathematical and quantum-theoretic
developments such as the Sugawara model, light where d3 x = dx1 dx2 dx3 . Then F1 , F2 , F3 are the
cone currents, Virasoro algebra, the mathematical three components I1 , I2 , I3 of the isotopic spin, and
theory of affine Kac–Moody algebras, and non- pffiffiffi
Y = (2 3=3)F8 is the hypercharge. The usual elec-
relativistic current algebra in quantum and statis- tromagnetic current Jem
(x0 , x) is given by
tical physics. Lie algebras of local currents may be
pffiffiffi !
the infinitesimal representations of loop groups,  3 

local current groups or gauge groups, diffeomorph- Jem ¼ q F 3 þ F ½2
3 8
ism groups, and their semidirect products or other
extensions. Broadly construed, current algebra thus where q is the unit elementary charge, and the total
R
leads directly into the representation theory of charge is given by Q = d3 x Jem 0
(x0 , x) = q(I3 þ Y=2).
infinite-dimensional groups and algebras. Applica- The hadronic part of the weak current entering an
tions have ranged across conformally invariant effective Lagrangian can be written as
field theory, vertex operator algebras, exactly h   i
solvable lattice and continuum models in statistical 
Jw ¼ F 1  F 5
1 þ i F 2  F 5
2 cos C
physics, exotic particle statistics and q-commuta- h   i
tion relations, hydrodynamics and quantized vortex þ F 4  F 54 þ i F 5  F 55 sin C ½3
motion. This brief survey describes but a few
highlights. where C is the Cabibbo angle (determined experi-
mentally to be  0.27 rad). The terms with F 1  F 51
and F 2  F 52 are strangeness conserving, those with
Relativistic Local Current Algebra F 4  F 54 and F 5  F 55 are not.
for Hadrons The main current algebra hypothesis is that the
time components F 0 and F 50 of these octets satisfy
To model superfluidity, Landau had proposed in
the equal-time commutation relations:
1941 a quantum hydrodynamics fundamentally
 0 0 
based on local fluid densities and currents as F a ðx ; xÞ; F 0b ðy0 ; yÞ x0 ¼y0
(operator) dynamical variables. However, current X
algebra came into its own in theoretical physics with ¼ ið3Þ ðx  yÞ cabd F 0d ðx0 ; xÞ
the ideas of Gell-Mann in the early 1960s. The basic  0 0
d

concept, in the era just preceding quantum chromo- F a ðx ; xÞ; F 50 0
b ðy ; yÞ x0 ¼y0
dynamics (QCD), was that even without knowing X ½4
the Lagrangian governing hadron dynamics in ¼ ið3Þ ðx  yÞ cabd F 50 0
d ðx ; xÞ
d
detail, exact kinematical information – the local  50 0 
symmetry – could still be encoded in an algebra of F a ðx ; xÞ; F 50 0
b ðy ; yÞ x0 ¼y0
X
currents. The local (vector and axial vector) current ¼ ið3Þ ðx  yÞ cabd F 0d ðx0 ; xÞ
density operators, expressed where possible in terms d
of underlying quantized field operators in Hilbert
space, were to form two octets of Lorentz 4-vectors, where the cabd are structure constants of the Lie
with each octet corresponding to the eight genera- algebra of SU(3), antisymmetric in the indices. Since
tors of the compact Lie group SU(3). current commutators relate bilinear expressions to
Current Algebra 675

linear ones, they fix the normalizations of the beyond an experimental test of the algebra of
currents. The chiral currents F L  5
a = ð1=2Þ(F a  F a ) charges to test the actual local current algebra.
R  5
and F a = ð1=2Þ(F a þ F a ) commute with each Here, the prediction pertained to structure functions
other, so that the local current algebra decomposes in the deep inelastic scattering of neutrinos. This
into two independent pieces. was elaborated by Bjorken to inelastic electron
The Dirac -functions in eqns [4] require that F 0a and scattering. On the theoretical side, the study of the
50
F a be interpreted as (unbounded) operator-valued chiral current in perturbation theory led into the
distributions; while the fixed-time condition suggests theory of anomalies. All these ideas were highly
these should make mathematical sense as influential in subsequent theoretical work (Treiman
three-dimensional distributions, with x0 held constant. et al. 1985, Mickelsson 1989).
Such distributions may be modeled on the test-function It is a natural idea to try to extend eqns [4] or [6],
space D of real-valued, compactly supported, C1 which elegantly express the combined ideas of
functions on the spacelike hyperplane R3 . For functions locality and symmetry, to an equal-time commutator
fa , fa5 2 D, one has formally the ‘‘smeared currents’’ algebra that would also include the space compo-
that are expected to be bona fide (unbounded) nents of the local currents F ka , k = 1, 2, 3. One may
operators in Hilbert space; suppressing x0 , write without difficulty the commutators of the
Z charges in [1] with these space components:
F 0a ðfa Þ ¼ d3 xfa ðxÞF 0a ðx0 ; xÞ
R 3 ½Fa ðx0 Þ; F kb ðx0 ; xÞ ¼ ½Fa5 ðx0 Þ; F 5k 0
b ðx ; xÞ
Z ½5 X
 
50 5
F a fa ¼ d3 x fa5 ðxÞF 50
a ðx0
; xÞ ¼i cabd F kd ðx0 ; xÞ
R3 d
½7
Equations [4] then become ½Fa ðx0 Þ; F 5k 0 5 0 k 0
b ðx ; xÞ ¼ ½Fa ðx Þ; F b ðx ; xÞ
 0    X
F a ðfa Þ; F 0b ðfb Þ ¼ F 50 50
a ðfa Þ; F b ðfb Þ
¼i cabd F 5k 0
d ðx ; xÞ
X d
¼i F 0d ðcabd fa fb Þ
d ½6 But the commutator of the local time component
 0  X with the local space component of the current
F a ðfa Þ; F 50
b ðfb Þ ¼ i F 50
d ðcabd fa fb Þ cannot be merely the obvious extrapolation from
d
eqns [4] and [7], that is, it cannot be
Let g(x) be a C map from R3 to the Lie algebra G of
1

chiral SU(3)  SU(3), equal to zero outside a compact ½F a ðx0 ; xÞ; F kb ðy0 ; yÞx0 = y0
set. The set of all such G-valued functions forms an X
= ið3Þ ðx  yÞ cabd F kd ðx0 ; xÞ
infinite-dimensional Lie algebra under the pointwise
d
bracket, [g, g0 ](x) = [g(x), g0 (x)]. Let us call this Lie
algebra map0 (R3 , G), where the subscript 0 indicates and so forth. Under very general conditions, for a
the condition of compact support when that is relativistic theory based on local quantum fields or
applicable (on compact manifolds, we omit the sub- local observables, additional ‘‘Schwinger terms’’ are
script). Expanding g(x) with respect to a fixed basis of required on the right-hand sides of such commu-
G, we straightforwardly identify the map g with the tators (Renner 1968).
5 Well-known difficulties in specifying the Schwinger
two octets
P of 0test functions
P 50fa 5and fa . Then, defining
F (g) = a F a (fa ) þ a F a (fa ), eqns [6] are inter- terms are associated with the fact that operator-
preted (for fixed x0 ) as a representation F of valued distributions are singular when regarded as if
map0 (R3 , G). they were functions of spacetime points. Thus, the
Integrating out the spatial variables entirely using product of two distributions at a point is often
eqns [1] leads to a representation at x0 of G by the singular or undefined. When the currents forming a
charges Fa and Fa5 . The Adler–Weisberger sum rule local current algebra are written as normal-ordered
was first derived (in 1965) from the commutation products of field operator distributions and their
relations of these charges, together with the assump- derivatives, the Schwinger terms in their commuta-
tion of a partially conserved axial-vector current tion relations may be calculated, for example, by
(PCAC). It connected nucleon -decay coupling with ‘‘splitting points’’ in the arguments of the underlying
pion–nucleon scattering cross sections, agreeing well fields, and subsequently letting the separation tend
with experiment. Various low-energy theorems toward zero. The general form of a Schwinger term
followed, also in accord with experiment. Shortly typically involves the derivative of a -function times
thereafter, Adler was able to eliminate the PCAC an operator. This may be a multiple of the identity
assumption, and derived a further sum rule going (i.e., a c-number) or not, depending on the underlying
676 Current Algebra

field-theoretic model. Furthermore, when the number Related to the Sugawara current algebra, with s = 1
of spacetime dimensions is greater than 1 þ 1, the and the spatial dimension compactified, are affine
c-number Schwinger terms turn out to be infinite. Kac–Moody and Virasoro algebras (Goddard and
Hence, we do not obtain this way a bona- fide Olive 1986, Kac 1990). Consider the infinite-dimen-
infinite-dimensional, equal-time commutator algebra sional Lie algebra map(S1 , G) of smooth functions
comprising all the components of the local currents. from the circle to G under the pointwise bracket. This
is also called a loop algebra. Referring to the basis Fa ,
define Ta(m) for integer m to be the Fourier function
Sugawara, Kac–Moody, and  ! Fa exp [im]. The pointwise bracket in
Virasoro Algebras map(S1 , G) gives [Ta(m) , Tb(n) ] = id cabd Td(mþn) for these
generators. The corresponding (untwisted) affine
Since equations such as [4] and [6] are not explicitly Kac–Moody algebra is a (uniquely defined, nontri-
dependent on how the currents are constructed from vial) one-dimensional central extension of this loop
underlying canonical fields, one has the possibility algebra – that is, the new generator commutes with all
of writing a theory entirely in terms of self-adjoint elements of the Lie algebra and, in an irreducible
currents as the dynamical variables, bypassing the representation, must be a multiple of the identity.
field operators entirely, and expressing a Hamilto- In such a representation, the new bracket can be
nian operator directly in terms of such local written as
currents. This is in the spirit of approaches to X
quantum field theory based on local algebras of ðnÞ ðmþnÞ
½ TaðmÞ ; Tb  ¼ i cabd Td þ kmab m;n I ½10
observables. It suggests consideration of relativistic d
current algebras with finite c-number or operator
Schwinger terms in s þ 1 dimensions, s  1. where k is a constant. Here, Ta(m = 0) is again a
The Sugawara model, which is of this type, turned representation of G. Self-adjointness of the local
out to be one of the most influential of those currents in the representation imposes the condition


proposed in the late 1960s and early 1970s. Ta(m) = Ta(m) .


Henceforth, let G be a compact Lie group, and G Now the compactly supported C1 (tangent)
its Lie algebra; let Fa , a = 1, . . . , dim G, be a basis for vector fields on a C1 manifold M form a natural
G, with [Fa , Fb ] = id cabd Fd . The Sugawara current Lie algebra under the Lie bracket, denoted by
algebra, at the fixed time x0 = y0 (which, from here vect0 (M). In local Euclidean coordinates, for g1 , g 2 2
on, we suppress in the notation), is given by vect0 (M), one can write this bracket as
X ½g 1 ; g 2  ¼ g1  rg 2  g 2  rg 1 ½11
½ Ja0 ðxÞ; Jb0 ðyÞ ¼ ið3Þ ðx  yÞ cabd Jd0 ðxÞ
d
X As the affine Kac–Moody algebras are central
½ Ja0 ðxÞ; Jbk ðyÞ ¼ i ð3Þ
ðx  yÞ cabd Jdk ðxÞ extensions of the algebra of G-valued functions on
d ½8 S1 , so are Virasoro algebras central extensions of the
@ algebra of vector fields on S1 . Let L(m) denote
þ icab k ð3Þ ðx  yÞI
@x the (complexified) vector field described by
½ Jak ðxÞ; Jb‘ ðyÞ ¼ 0 ðk; ‘ ¼ 1; 2; 3Þ exp [im](1=i)@=@, for integer m. These genera-
tors then satisfy [L(m) , L(n) ] = (m  n)L(mþn) .
where Ja = (Ja0 , Jak ), k = 1, 2, 3, is again a 4-vector, c is a Adjoining to the Lie algebra of vector fields a
finite constant, and I is the identity operator. The time new central element (commuting with all the
components in eqns [8] behave like the local currents in L(m) ), the Virasoro bracket in an irreducible
eqns [4]. The Schwinger term is a c-number, while representation is given by the formula
setting the commutators of the space components to
zero is the simplest choice consistent with the Jacobi ½LðmÞ ; LðnÞ  ¼ ðm  nÞLðmþnÞ
identity. The Sugawara Hamiltonian is given in terms of ðm þ 1Þmðm  1Þ
the local currents by the formal expression: þc m;n I ½12
12
Z " #
1X 3 0 2
X3
k 2 where the numerical coefficient c is called the
H¼ d x Ja ðxÞ þ Ja ðxÞ ½9 Virasoro central charge; self-adjointness of the
2c a R3 k¼1 
currents imposes L(m) = L(m) . It is straightforward
where the pointwise products of the currents require to verify that eqn [12] satisfies the Jacobi identity.
interpretation in the particular representation. This The special form of the central term in the Virasoro
Hamiltonian leads to current conservation equations current algebra results from the Gelfand–Fuks
for the Ja . cohomology on the algebra of vector fields.
Current Algebra 677

The Kac–Moody and Virasoro algebras, both M under the pointwise bracket, exponentiates to the
modeled on S1 , may be combined to form a natural local current group Map0 (M, G), consisting of
semidirect sum of Lie algebras, with the additional smooth maps from M to G that are the identity
bracket outside a compact set in M, under the pointwise
group operation. When M is taken to be the four-
½TaðmÞ ; LðnÞ  ¼ mTaðmþnÞ ½13 dimensional spacetime manifold (rather than a
Roughly speaking, the Kac–Moody generators cor- spacelike hyperplane), the local current group
respond to Fourier transforms of charge densities on modeled on M is mathematically a gauge group for
S1 , whereas the Virasoro generators correspond to nonabelian gauge field theory.
Fourier transforms of infinitesimal motions in S1 . Likewise, the algebra vect0 (M) exponentiates to
The central extensions provide the finite, c-number the group Diff 0 (M) of compactly supported C1
Schwinger terms. These structures have important diffeomorphisms of M (under composition). The
application to light cone current algebra, confor- Kac–Moody and Virasoro algebras exponentiate to
mally invariant quantum field theories in (1 þ 1)- central extensions of the loop group Map(S1 , G) and
dimensional spacetime, the quantum theory of the diffeomorphism group Diff(S1 ), respectively. The
strings, exactly solvable models in statistical semidirect sums of the Lie algebras are the infinite-
mechanics, and many other domains. simal generators of semidirect products of the
Of greatest physical importance, both in quantum groups.
field theory and statistical mechanics, are those Under appropriate technical conditions, self-
irreducible, self-adjoint representations of the Virasoro adjoint representations of current algebras generate
algebra known as highest weight representations, (and may be obtained from) continuous unitary
where the spectrum of the operator L(m = 0) is bounded representations of the corresponding groups. The
below. In these applications, one represents a pair of needed technical conditions have to do with the
Virasoro algebras by mutually commuting sets of existence of a dense set of analytic vectors belonging
operators L(m) and L  (m) . In the quantum theory, for to a common, dense invariant domain of essential
example, one takes the total energy H / L  (0) þ L(0) , self-adjointness for the currents.
 (0) (0)
and the total momentum P / L  L . In a highest
weight representation, there is a unique eigenstate of
L(0) having the lowest eigenvalue h; for this ‘‘vacuum’’ Nonrelativistic Current Algebra
jhi, L(m) jhi = 0, m > 0.
Friedan, Qiu, and Shenker showed in 1984 that In nonrelativistic local current algebra, Schwinger
highest weight representations are characterized by a terms do not appear. In 1968, Dashen and Sharp
class of specific, non-negative values of the central defined (at fixed time t, suppressed in the present
charge c and, correspondingly, of h: either c  1 (and notation) a mass density (x) = m  (x) (x) and a
h  0) or c = 1  6(‘ þ 2)1 (‘ þ 3)1 , ‘ = 1, 2, 3, . . . momentum density J(x) = (h=2i){  (x)r (x) 

(and h assumes a corresponding, specified set of values [r (x)] (x)}, where is a second-quantized cano-
for each value of ‘). In a beautiful application to the nical field; here we keep h in the notation. The
study of the critical behavior of well-known statistical resulting equal-time algebra is the semidirect sum:
systems, in which the generator of dilations is ½ðxÞ; ðyÞ ¼ 0
proportional to L  (0) þ L(0) , they discovered a direct
@
correspondence with permitted values of the central ½ðxÞ; Jk ðyÞ ¼ ih k ½ð3Þ ðx  yÞðxÞ
charge; thus, c = 1=2 for the Ising model, c = 7=10 for @x
the tricritical Ising model, c = 4=5 for the three-state @ ð3Þ ½14
½Jk ðxÞ; J‘ ðyÞ ¼ ih ½ ðx  yÞJ‘ ðyÞ
Potts model, and c = 6=7 for the tricritical three-state @yk

Potts model. @
 ‘ ½ð3Þ ðx  yÞJk ðxÞ
@x
Since this current algebra is independent of whether
Current Algebras and Groups obeys commutation or anticommutation relations,
Local current algebras may be exponentiated to the information as to particle statistics (Bose or
obtain corresponding infinite-dimensional topologi- Fermi) is not encoded in the Lie algebra itself but in
cal groups (Pressley and Segal 1986, Mickelsson the choice of its representation (up to unitary
1989, Kac 1990). Let G be a Lie group whose Lie equivalence). Again interpreting  andR Jk as operator-
algebra is G. The algebra map0 (M, G), consisting of valued distributions,
R define (f ) = R3 d3 x f (x)(x)
3
smooth, compactly supported G-valued functions on and J(g) = R3 d x 3k = 1 gk (x)Jk (x), where f and the
678 Current Algebra

components gk of the vector field g belong to the and  (


) 1 to obtain a unitary group representa-
function-space D. Then the Lie algebra becomes tion on complex-valued wave functions; but inequi-
valent cocycles describe unitarily inequivalent
½ðf1 Þ; ðf2 Þ ¼ 0 representations.
½ðf Þ; JðgÞ ¼ i
hðg  rf Þ ½15 The configuration space (N) , N = 1, 2, 3, . . . ,
½Jðg1 Þ; Jðg 2 Þ ¼ i
hJð½g1 ; g2 Þ consists of N-point subsets of Rs , and (N) is the
(local) Lebesgue measure on (N) . The correspond-
Equations [15] are a representation by self-adjoint ing diffeomorphism group and local current algebra
operators of the semidirect sum of the abelian Lie representations describe N identical quantum parti-
algebra D with vect0 (R3 ). The corresponding group cles in s-dimensional space. When 1, we have
is the natural semidirect product of the space D bosonic exchange symmetry. Inequivalent cocycles
(regarded as an abelian topological group under on (N) are obtained (for s  2) by inducing
addition) with Diff 0 (R3 ). (generalizing Mackey’s method) from inequivalent
The construction generalizes to a general manifold unitary representations of the fundamental group
M or manifold with boundary (in place of R3 ), and 1 [(N) ]. For s  3, this fundamental group is the
to a general set of charge densities that generate the symmetric group SN of particle permutations; the
local Lie algebra map0 (M, G). When M = S1 , we have odd representation of SN , N  2, gives fermionic
the Kac–Moody and Virasoro algebras with central exchange symmetry, while the higher-dimensional
charge zero. However, L(0) in the nonrelativistic representations are associated with particles satisfy-
(1 þ 1)-dimensional quantum theories is propor- ing the parastatistics of Greenberg and Messiah.
tional to the total momentum P, and thus is When s = 2, however, 1 [(N) ] is the braid group
unbounded above and below. BN . Goldin, Menikoff, and Sharp obtained induced
The continuous unitary representations of representations of the current algebra describing the
Diff 0 (M), or its semidirect product with a local intermediate statistics proposed by Leinaas and
current group at fixed time, thus describe nonrela- Myrheim for identical particles in 2-space. Such
tivistic quantum systems (Albeverio et al. 1999, excitations, subsequently termed ‘‘anyons’’ by Wilc-
Goldin 2004). The unitary representation V(),  2 zek and characterized as charge-flux tube compo-
Diff 0 (M), satisfies V(gr ) = exp [i(r=
h)J(g)], where sites, are important constructs in the theory of
r 2 R and gr is the one-parameter flow in Diff 0 (M) surface phenomena such as the quantum Hall effect,
generated by the vector field g. Such a representa- and anyonic statistics has also been applied to the
tion may be described very generally by means of a study of high-Tc superconductivity. Current algebra
measure  on a configuration space , quasi-invariant representations induced by higher-dimensional
under a group action of Diff 0 (M) on , together representations of BN describe the statistics of
with a unitary 1-cocycle on Diff 0 (M)  . The ‘‘plektons.’’ Similarly, current algebra in nonsimply
Hilbert space for the representation is connected space describes the Aharonov–Bohm
H = L2d (, W), which is the space of measurable effect. R
functions (
),
2 , taking values in an inner Let  (h) = Rs ds x h(x)  (x) denote the smeared
product space W, and square integrable with respect creation field. Let the indexed set of representations
to . The unitary representation V is given by N , JN , N = 0, 1, 2, . . . , satisfying the current algebra
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [15], act in Hilbert spaces HN , where  (h) : HN !
d HNþ1 , (h) : HNþ1 ! HN , (h)jH0 = 0, so that 
½VðÞð
Þ ¼  ð
Þð
Þ ð
Þ ½16
d and intertwine the N-particle diffeomorphism
group
L1 representations. Let (f ) and J(g) act on
where 
denotes the group action Diff0 (M)   ! H N , so that (f )N = N (f )N , J(g)N =
N=0
;  is the measure on  transformed by  (which, JN (g)N . Then conditions for a Fock space hier-
by the quasi-invariance of , is absolutely contin- archy are specified by commutator brackets of the
uous with respect to ); d =d is the Radon– fields with the currents:
Nikodym derivative; and  (
) : W ! W is a system
 
of unitary operators in W obeying the cocycle ½ðf Þ; ðhÞ ¼ ðN¼1 ðf ÞhÞ
equation  
½18
½JðgÞ; ðhÞ ¼ ðJN¼1 ðgÞhÞ
1 2 ð
Þ ¼ 1 ð
Þ 2 ð1
Þ ½17
The local creation and annihilation fields for anyons
Equations [16] and [17] hold outside sets of in R2 , obeying [18], satisfy q-commutation relations,
-measure zero in . Given the quasi-invariant where q is the relative phase change associated with
measure  on , one may always choose W = C a single counterclockwise exchange of two anyons,
Current Algebra 679

and the q-commutator [A, B]q = AB  qBA. These Further Reading


relations generalize the canonical commutation
Adler SL and Dashen RF (1968) Current Algebras and Applica-
(q = 1) and anticommutation (q = 1) relations of tions to Particle Physics. New York: Benjamin.
quantum field theory. Albeverio S, Kondratiev YuG, and Röckner M (1999) Diffeo-
When  is the configuration space of infinite but morphism groups and current algebras: configuration space
locally finite subsets of Rs , nonrelativistic current analysis in quantum theory. Reviews in Mathematical Physics
algebra describes the physics of infinite gases in 11: 1–23.
Arnol’d VI and Khesin BA (1998) Topological Methods in
continuum classical or quantum statistical Hydrodynamics. Applied Mathematical Sciences, vol. 125.
mechanics. Here, the most important kinds of Berlin: Springer.
measures  are Poisson measures (associated with Gell-Mann M and Ne’eman Y (2000) The Eightfold Way (1964)
gases of noninteracting particles at fixed average (reissued). Cambridge, MA: Perseus Publishing.
density) or Gibbsian measures (associated with Goddard P and Olive D (1986) Kac–Moody and Virasoro
algebras in relation to quantum physics. International Journal
translation-invariant two-body interactions). These of Modern Physics A 1: 303–414.
measures describe equilibrium states and correlation Goldin GA (1996) Quantum vortex configurations. Acta Physica
functions in the classical case, and specify the Polonica B 27: 2341–2355.
current algebra representations in the quantum Goldin GA (2004) Lectures on diffeomorphism groups in
theory. quantum physics. In: Govaerts J, Hounkonnou N, and
Msezane AZ (eds.) Contemporary Problems in Mathematical
The group of volume-preserving diffeomorphisms Physics: Proceedings of the Third International Workshop,
was taken by Arnold as the symmetry group of an pp. 3–93. Singapore: World Scientific.
ideal, classical, incompressible fluid, and Marsden Goldin GA and Sharp DH (1991) The diffeomorphism group
and Weinstein described the hydrodynamics of such approach to anyons. International Journal of Modern Physics
a fluid using the Lie–Poisson bracket associated with B 5: 2625–2640.
Ismagilov RS (1996) Representations of Infinite-Dimensional
the nonrelativistic current algebra of divergenceless Groups. Translations of Mathematical Monographs, vol. 152.
vector fields. The idea of using this algebra to study Providence, RI: American Mathematical Society.
quantized fluid motion, included in the program Kac V (1990) Infinite Dimensional Lie Algebras. Cambridge:
proposed by Rasetti and Regge, formed the basis of Cambridge University Press.
the subsequent study of quantized vortex structures Marsden J and Weinstein A (1983) Coadjoint orbits, vortices, and
Clebsch variables for incompressible fluids. Physica D
in superfluids from the point of view of geometric 7: 305–323.
quantization on coadjoint orbits of the diffeomorph- Mickelsson J (1989) Current Algebras and Groups. New York:
ism group. This leads to quantum configuration Plenum.
spaces whose elements are no longer sets of points – Ottesen JT (1995) Infinite Dimensional Groups and Algebras in
for example, spaces of vortex filaments in R2 , or Quantum Physics. Berlin: Springer.
Pressley A and Segal G (1986) Loop Groups. Oxford: Oxford
ribbons and tubes in R3 . University Press.
Renner B (1968) Current Algebras and Their Applications.
See also: Algebraic Approach to Quantum Field Theory; Oxford: Pergamon.
Electroweak Theory; Quantum Chromodynamics; Sharp DH and Wightman AS (eds.) (1974) Local Currents and
Solitons and Kac–Moody Lie Algebras; Symmetries in Their Applications. New York: Elsevier.
Quantum Field Theory: Algebraic Aspects; Toda Lattices; Treiman SB, Jackiw R, Zumino B, and Witten E (1985) Current
Two-Dimensional Conformal Field Theory and Vertex Algebra and Anomalies. Singapore: World Scientific.
Operator Algebras.

You might also like